skip to main content


Title: Using Search-Based Test Generation to Discover Real Faults in Guava
Testing costs can be reduced through automated unit test generation. An important benchmark for such tools is their ability to detect real faults. Fault databases, such as Defects4J, assist in this task. The Guava project - a collection of Java libraries from Google - offers an opportunity to expand such databases with additional complex faults. We have identified 11 faults in the Guava project, added them to Defects4J, and assessed the ability of the EvoSuite framework to detect these faults. Ultimately, EvoSuite was able to detect three faults. Analysis of the remaining faults offers lessons in how to improve generation tools. We offer these faults to the community to assist future benchmarking efforts.  more » « less
Award ID(s):
1657299
NSF-PAR ID:
10047325
Author(s) / Creator(s):
; ;
Date Published:
Journal Name:
International Symposium on Search Based Software Engineering
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. An important benchmark for test generation tools is their ability to detect real faults. We have identified 16 real faults in Gson—a Java library for manipulating JSON data—and added them to the Defects4J fault database. Tests generated using the EvoSuite framework are able to detect seven faults. Analysis of the remaining faults offers lessons in how to improve generation. We offer these faults to the community to assist future research. 
    more » « less
  2. A number of criteria have been proposed to judge test suite adequacy. While search-based test generation has improved greatly at criteria coverage, the produced suites are still often ineffective at detecting faults. Efficacy may be limited by the single-minded application of one criterion at a time when generating suites - a sharp contrast to human testers, who simultaneously explore multiple testing strategies. We hypothesize that automated generation can be improved by selecting and simultaneously exploring multiple criteria. To address this hypothesis, we have generated multi-criteria test suites, measuring efficacy against the Defects4J fault database. We have found that multi-criteria suites can be up to 31.15% more effective at detecting complex, real-world faults than suites generated to satisfy a single criterion and 70.17% more effective than the default combination of all eight criteria. Given a fixed search budget, we recommend pairing a criterion focused on structural exploration - such as Branch Coverage - with targeted supplemental strategies aimed at the type of faults expected from the system under test. Our findings offer lessons to consider when selecting such combinations. 
    more » « less
  3. null (Ed.)
    The Northeast Cyberteam Program is a collaborative effort across Maine, New Hampshire, Vermont, and Massachusetts that seeks to assist researchers at small and medium-sized institutions in the region with making use of cyberinfrastructure, while simultaneously building the next generation of research computing facilitators. Recognizing that research computing facilitators are frequently in short supply, the program also places intentional emphasis on capturing and disseminating best practices in an effort to enable opportunities to leverage and build on existing solutions whenever practical. The program combines direct assistance to computationally intensive research projects; experiential learning opportunities that pair experienced mentors with students interested in research computing facilitation; sharing of resources and knowledge across large and small institutions; and tools that enable efficient oversight and possible replication of these ideas in other regions. Each project involves a researcher seeking to better utilize cyberinfrastructure in research, a student facilitator, and a mentor with relevant domain expertise. These individuals may be at the same institution or at separate institutions. The student works with the researcher and the mentor to become a bridge between the infrastructure and the research domain. Through this model, students receive training and opportunities that otherwise would not be available, research projects get taken to a higher level, and the effectiveness of the mentor is multiplied. Providing tools to enable self-service learning is a key concept in our strategy to develop facilitators through experiential learning, recognizing that one of the most fundamental skills of successful facilitators is their ability to quickly learn enough about new domains and applications to be able draw parallels with their existing knowledge and help to solve the problem at hand. The Cyberteam Portal is used to access the self-service learning resources developed to provide just-in-time information delivery to participants as they embark on projects in unfamiliar domains, and also serves as a receptacle for best practices, tools, and techniques developed during a project. Tools include Ask.CI, an interactive site for questions and answers; a learning resources repository used to collect online training modules vetted by Cyberteam projects that provide starting points for subsequent projects or independent activities; and a Github repository. The Northeast Cyberteam was created with funding from the National Science Foundation, but has developed strategies for sustainable operations. 
    more » « less
  4. Summary

    Search‐based unit test generation, if effective at fault detection, can lower the cost of testing. Such techniques rely on fitness functions to guide the search. Ultimately, such functions represent test goals that approximate—but do not ensure—fault detection. The need to rely on approximations leads to two questions—can fitness functions produce effective tests and, if so, which should be used to generate tests?To answer these questions, we have assessed the fault‐detection capabilities of unit test suites generated to satisfy eight white‐box fitness functions on 597 real faults from the Defects4J database. Our analysis has found that the strongest indicators of effectiveness are a high level of code coverage over the targeted class and high satisfaction of a criterion's obligations. Consequently, the branch coverage fitness function is the most effective. Our findings indicate that fitness functions that thoroughly explore system structure should be used as primary generation objectives—supported by secondary fitness functions that explore orthogonal, supporting scenarios. Our results also provide further evidence that future approaches to test generation should focus on attaining higher coverage of private code and better initialization and manipulation of class dependencies.

     
    more » « less
  5. Learning-based fault localization has been intensively studied recently. Prior studies have shown that traditional Learning-to-Rank techniques can help precisely diagnose fault locations using various dimensions of fault-diagnosis features, such as suspiciousness values computed by various off-the-shelf fault localization techniques. However, with the increasing dimensions of features considered by advanced fault localization techniques, it can be quite challenging for the traditional Learning-to-Rank algorithms to automatically identify effective existing/latent features. In this work, we propose DeepFL, a deep learning approach to automatically learn the most effective existing/latent features for precise fault localization. Although the approach is general, in this work, we collect various suspiciousness-value-based, fault-proneness-based and textual-similarity-based features from the fault localization, defect prediction and information retrieval areas, respectively. DeepFL has been studied on 395 real bugs from the widely used Defects4J benchmark. The experimental results show DeepFL can significantly outperform state-of-the-art TraPT/FLUCCS (e.g., localizing 50+ more faults within Top-1). We also investigate the impacts of deep model configurations (e.g., loss functions and epoch settings) and features. Furthermore, DeepFL is also surprisingly effective for cross-project prediction. 
    more » « less