skip to main content


Title: Multifaceted test suite generation using primary and supporting fitness functions
Dozens of criteria have been proposed to judge testing adequacy. Such criteria are important, as they guide automated generation efforts. Yet, the current use of such criteria in automated generation contrasts how such criteria are used by humans. For a human, coverage is part of a multifaceted combination of testing strategies. In automated generation, coverage is typically the goal, and a single fitness function is applied at one time. We propose that the key to improving the fault detection efficacy of search-based test generation approaches lies in a targeted, multifaceted approach pairing primary fitness functions that effectively explore the structure of the class under test with lightweight supporting fitness functions that target particular scenarios likely to trigger an observable failure. This report summarizes our findings to date, details the hypothesis of primary and supporting fitness functions, and identifies outstanding research challenges related to multifaceted test suite generation. We hope to inspire new advances in search-based test generation that could benefit our software-powered society.  more » « less
Award ID(s):
1657299
NSF-PAR ID:
10082004
Author(s) / Creator(s):
Date Published:
Journal Name:
SBST '18 Proceedings of the 11th International Workshop on Search-Based Software Testing
Page Range / eLocation ID:
2 to 5
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Summary

    Search‐based unit test generation, if effective at fault detection, can lower the cost of testing. Such techniques rely on fitness functions to guide the search. Ultimately, such functions represent test goals that approximate—but do not ensure—fault detection. The need to rely on approximations leads to two questions—can fitness functions produce effective tests and, if so, which should be used to generate tests?To answer these questions, we have assessed the fault‐detection capabilities of unit test suites generated to satisfy eight white‐box fitness functions on 597 real faults from the Defects4J database. Our analysis has found that the strongest indicators of effectiveness are a high level of code coverage over the targeted class and high satisfaction of a criterion's obligations. Consequently, the branch coverage fitness function is the most effective. Our findings indicate that fitness functions that thoroughly explore system structure should be used as primary generation objectives—supported by secondary fitness functions that explore orthogonal, supporting scenarios. Our results also provide further evidence that future approaches to test generation should focus on attaining higher coverage of private code and better initialization and manipulation of class dependencies.

     
    more » « less
  2. A number of criteria have been proposed to judge test suite adequacy. While search-based test generation has improved greatly at criteria coverage, the produced suites are still often ineffective at detecting faults. Efficacy may be limited by the single-minded application of one criterion at a time when generating suites - a sharp contrast to human testers, who simultaneously explore multiple testing strategies. We hypothesize that automated generation can be improved by selecting and simultaneously exploring multiple criteria. To address this hypothesis, we have generated multi-criteria test suites, measuring efficacy against the Defects4J fault database. We have found that multi-criteria suites can be up to 31.15% more effective at detecting complex, real-world faults than suites generated to satisfy a single criterion and 70.17% more effective than the default combination of all eight criteria. Given a fixed search budget, we recommend pairing a criterion focused on structural exploration - such as Branch Coverage - with targeted supplemental strategies aimed at the type of faults expected from the system under test. Our findings offer lessons to consider when selecting such combinations. 
    more » « less
  3. Summary

    Researchers and practitioners have designed and implemented various automated test case generators to support effective software testing. Such generators exist for various languages (e.g., Java, C#, or Python) and various platforms (e.g., desktop, web, or mobile applications). The generators exhibit varying effectiveness and efficiency, depending on the testing goals they aim to satisfy (e.g., unit‐testing of libraries versus system‐testing of entire applications) and the underlying techniques they implement. In this context, practitioners need to be able to compare different generators to identify the most suited one for their requirements, while researchers seek to identify future research directions. This can be achieved by systematically executing large‐scale evaluations of different generators. However, executing such empirical evaluations is not trivial and requires substantial effort to select appropriate benchmarks, setup the evaluation infrastructure, and collect and analyse the results. In this Software Note, we present ourJUnit Generation Benchmarking Infrastructure(JUGE) supporting generators (search‐based, random‐based, symbolic execution, etc.) seeking to automate the production of unit tests for various purposes (validation, regression testing, fault localization, etc.). The primary goal is to reduce the overall benchmarking effort, ease the comparison of several generators, and enhance the knowledge transfer between academia and industry by standardizing the evaluation and comparison process. Since 2013, several editions of a unit testing tool competition, co‐located with the Search‐Based Software Testing Workshop, have taken place whereJUGEwas used and evolved. As a result, an increasing amount of tools (over 10) from academia and industry have been evaluated onJUGE, matured over the years, and allowed the identification of future research directions. Based on the experience gained from the competitions, we discuss the expected impact ofJUGEin improving the knowledge transfer on tools and approaches for test generation between academia and industry. Indeed, theJUGEinfrastructure demonstrated an implementation design that is flexible enough to enable the integration of additional unit test generation tools, which is practical for developers and allows researchers to experiment with new and advanced unit testing tools and approaches.

     
    more » « less
  4. While adequacy criteria offer an end-point for testing, they do not mandate how targets are covered. Branch Coverage may be attained through direct calls to methods, or through indirect calls between methods. Automated generation is biased towards the rapid gains offered by indirect coverage. Therefore, even with the same end-goal, humans and automation produce very different tests. Direct coverage may yield tests that are more understandable, and that detect faults missed by traditional approaches. However, the added burden for the generation framework may result in lower coverage and faults that emerge through method interactions may be missed. To compare the two approaches, we have generated test suites for both, judging efficacy against real faults. We have found that requiring direct coverage results in lower achieved coverage and likelihood of fault detection. However, both forms of Branch Coverage cover code and detect faults that the other does not. By isolating methods, Direct Branch Coverage is less constrained in the choice of input. However, traditional Branch Coverage is able to leverage method interactions to discover faults. Ultimately, both are situationally applicable within the context of a broader testing strategy. 
    more » « less
  5. Background Home health aides (HHAs) provide necessary hands-on care to older adults and those with chronic conditions in their homes. Despite their integral role, HHAs experience numerous challenges in their work, including their ability to communicate with other health care professionals about patient care while caring for patients and access to educational resources. Although technological interventions have the potential to address these challenges, little is known about the technological landscape and existing technology-based interventions designed for and used by this workforce. Objective We conducted a scoping review of the scientific literature to identify existing studies that have described, designed, deployed, or tested technology-based tools and apps intended for use by HHAs to care for patients at home. To complement our literature review, we conducted a landscape analysis of existing mobile apps intended for HHAs providing in-home care. Methods We searched the following databases from their inception to October 2020: Ovid MEDLINE, Ovid Embase, Cochrane Library, and CINAHL (EBSCO). A total of 3 researchers screened the yield using prespecified inclusion and exclusion criteria. In addition, 4 researchers independently reviewed these articles, and a fifth researcher arbitrated when needed. Among studies that met the inclusion criteria, data were extracted and summarized narratively. An analysis of mobile health apps designed for HHAs was performed using a predefined set of terms to search Google Play and Apple App stores. Overall, 2 researchers independently screened the resulting apps, and those that met the inclusion criteria were categorized according to their intended purpose and functionality. Results Of the 8643 studies retrieved, 182 (2.11%) underwent full-text review, and 4.9% (9/182) met our inclusion criteria. Approximately half (4/9, 44%) of the studies were descriptive in nature, proposing technology-based systems (eg, web portals and dashboards) or prototypes without a technical or user-based evaluation of the technology. In most (7/9, 78%) papers, HHAs were just one of several users and not the sole or primary intended users of the technology. Our review of mobile apps yielded 166 Android and iOS apps, of which 48 (29%) met the inclusion criteria. These apps provided HHAs with one or more of the following functions: electronic visit verification (29/48, 60%), clocking in and out (23/48, 48%), documentation (22/48, 46%), task checklist (19/48, 40%), communication between HHA and agency (14/48, 29%), patient information (6/48, 13%), resources (5/48, 10%), and communication between HHA and patients (4/48, 8%). Of the 48 apps, 25 (52%) performed monitoring functions, 4 (8%) performed supporting functions, and 19 (40%) performed both. Conclusions A limited number of studies and mobile apps have been designed to support HHAs in their work. Further research and rigorous evaluation of technology-based tools are needed to assess their impact on the work HHAs provide in patient’s homes. 
    more » « less