Change Log

When What
November 15th, 2015 Donated by Gordon Fraser


Studies who have been using the data (in any form) are required to include the following reference:

 author = {Fraser, Gordon and Arcuri, Andrea},
 title = {Sound Empirical Evidence in Software Testing},
 booktitle = {Proceedings of the 34th International Conference on Software Engineering},
 series = {ICSE '12},
 year = {2012},
 isbn = {978-1-4673-1067-3},
 location = {Zurich, Switzerland},
 pages = {178--188},
 numpages = {11},
 url = {https://dl.acm.org/citation.cfm?id=2337223.2337245},
 acmid = {2337245},
 publisher = {IEEE Press},
 address = {Piscataway, NJ, USA},

About the Data

Attribute Information

test case generation; unit testing; search-based software engineering; benchmark

Paper Abstract

Several promising techniques have been proposed to automate different tasks in software testing, such as test data generation for object-oriented software. However, reported studies in the literature only show the feasibility of the proposed techniques, because the choice of the employed artifacts in the case studies (e.g., software applications) is usually done in a non-systematic way. The chosen case study might be biased, and so it might not be a valid representative of the addressed type of software (e.g., internet applications and embedded systems). The common trend seems to be to accept this fact and get over it by simply discussing it in a threats to validity section. In this paper, we evaluate search-based software testing (in particular the EvoSuite tool) when applied to test data generation for open source projects. To achieve sound empirical results, we randomly selected 100 Java projects from SourceForge, which is the most popular open source repository (more than 300,000 projects with more than two million registered users). The resulting case study not only is very large (8,784 public classes for a total of 291,639 bytecode level branches), but more importantly it is statistically sound and representative for open source projects. Results show that while high coverage on commonly used types of classes is achievable, in practice environmental dependencies prohibit such high coverage, which clearly points out essential future research directions. To support this future research, our SF100 case study can serve as a much needed corpus of classes for test generation.