skip to main content


Title: Resurgence of Regression Test Selection for C++
Regression testing - running available tests after each project change - is widely practiced in industry. Despite its widespread use and importance, regression testing is a costly activity. Regression test selection (RTS) optimizes regression testing by selecting only tests affected by project changes. RTS has been extensively studied and several tools have been deployed in large projects. However, work on RTS over the last decade has mostly focused on languages with abstract computing machines(e.g., JVM). Meanwhile development practices (e.g., frequency of commits, testing frameworks, compilers) in C++ projects have dramatically changed and the way we should design and implement RTS tools and the benefits of those tools is unknown. We present a design and implementation of an RTS technique, dubbed RTS++, that targets projects written in C++, which compile to LLVM IR and use the Google Test testing framework. RTS++ uses static analysis of a function call graph to select tests. RTS++ integrates with many existing build systems, including AutoMake, CMake, and Make. We evaluated RTS++ on 11 large open-source projects, totaling 3,811,916 lines of code. To the best of our knowledge, this is the largest evaluation of an RTS technique for C++. We measured the benefits of RTS++compared to running all available tests (i.e., retest-all). Our results show that RTS++ reduces the number of executed tests and end-to-end testing time by 88% and 61% on average.  more » « less
Award ID(s):
1703637
NSF-PAR ID:
10099901
Author(s) / Creator(s):
; ;
Date Published:
Journal Name:
12th IEEE Conference on Software Testing, Validation, and Verification (ICST 2019)
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Regression testing - rerunning tests at each code version to detect newly-broken functionality - is important and widely practiced. But regression testing is costly due to the large number of tests and high frequency of code changes. Regression test selection (RTS) optimizes regression testing by rerunning only a subset of tests that can be affected by code changes. Researchers showed that RTS based on dynamic and static program analysis can save substantial testing time for (medium-sized) open-source projects. Simultaneously, practitioners showed that RTS based on machine learning (ML) is lightweight and works well on very large software repositories, e.g., in Facebook’s monorepository. We combine analysis-based RTS and ML-based RTS by using ML-based RTS to choose a subset of tests selected by analysis-based RTS. To do so, we first design several novel ML-based RTS techniques that leverage mutation analysis to obtain a training set for learning the impact of code changes on test outcomes. Then, we empirically evaluate, using 10 projects, the benefits of combining various ML models with analysis-based RTS. We also compare combining the techniques with using each technique individually. Combining ML-based RTS with two analysis-based RTS techniques - Ekstazi and STARTS - selects 25.34% and 21.44% fewer tests. 
    more » « less
  2. Regression testing is an important but expensive activity in software development. Among various types of tests, web service tests are usually one of the most expensive (due to network communications) but widely adopted types of tests in commercial software development. Regression test selection (RTS) aims to reduce the number of tests which need to be retested by only running tests that are affected by code changes. Although a large number of RTS techniques have been proposed in the past few decades, these techniques have not been adopted on large-scale web service testing. This is because most existing RTS techniques either require direct code dependency between tests and code under test or cannot be applied on large scale systems with enough efficiency. In this paper, we present a novel RTS technique, TestSage, that performs RTS for web service tests on large scale commercial software. With a small overhead, TestSage is able to collect fine grained (function level) dependency between test and service under test that do not directly depend on each other. TestSage has also been successfully applied to large complex systems with over a million functions. We conducted experiments of TestSage on a large scale backend service at Google. Experimental results show that TestSage reduces 34% of testing time when running all AEC (Analysis, Execution and Collection) phases, 50% of testing time while running without collection phase. TestSage has been integrated with internal testing framework at Google and runs day-to-day at the company. 
    more » « less
  3. Regression testing---rerunning tests on each code version to detect newly-broken functionality---is important and widely practiced. But, regression testing is costly due to the large number of tests and the high frequency of code changes. Regression test selection (RTS) optimizes regression testing by only rerunning a subset of tests that can be affected by changes. Researchers showed that RTS based on program analysis can save substantial testing time for (medium-sized) open-source projects. Practitioners also showed that RTS based on machine learning (ML) works well on very large code repositories, e.g., in Facebook's monorepository. We combine analysis-based RTS and ML-based RTS by using the latter to choose a subset of tests selected by the former. We first train several novel ML models to learn the impact of code changes on test outcomes using a training dataset that we obtain via mutation analysis. Then, we evaluate the benefits of combining ML models with analysis-based RTS on 10 projects, compared with using each technique alone. Combining ML-based RTS with two analysis-based RTS techniques-Ekstazi and STARTS-selects 25.34% and 21.44% fewer tests, respectively. 
    more » « less
  4. Regression test selection (RTS) speeds up regression testing by only re-running tests that might be affected by code changes. Ideal RTS safely selects all affected tests and precisely selects only affected tests. But, aiming for this ideal is often slower than re-running all tests. So, recent RTS techniques use program analysis to trade precision for speed, i.e., lower regression testing time, or even use machine learning to trade safety for speed. We seek to make recent analysis-based RTS techniques more precise, to further speed up regression testing. Independent studies suggest that these techniques reached a “performance wall” in the speed-ups that they provide. We manually inspect code changes to discover those that do not require re-running tests that are only affected by such changes. We categorize 29 kinds of changes that we find from five projects into 13 findings, 11 of which are semantics-modifying. We enhance two RTS techniques—Ekstazi and STARTS—to reason about our findings. Using 1,150 versions of 23 projects, we evaluate the impact on safety and precision of leveraging such changes. We also evaluate if our findings from a few projects can speed up regression testing in other projects. The results show that our enhancements are effective and they can generalize. On average, they result in selecting 41.7% and 31.8% fewer tests, and take 33.7% and 28.7% less time than Ekstazi and STARTS, respectively, with no loss in safety. 
    more » « less
  5. Regression test selection (RTS) speeds up regression testing by only re-running tests that might be affected by code changes. Ideal RTS safely selects all affected tests and precisely selects only affected tests. But, aiming for this ideal is often slower than re-running all tests. So, recent RTS techniques use program analysis to trade precision for speed, i.e., lower regression testing time, or even use machine learning to trade safety for speed. We seek to make recent analysis-based RTS techniques more precise, to further speed up regression testing. Independent studies suggest that these techniques reached a “performance wall” in the speed-ups that they provide. We manually inspect code changes to discover those that do not require re-running tests that are only affected by such changes. We categorize 29 kinds of changes that we found from five projects into 13 findings, 11 of which are semantics-modifying. We enhance two RTS techniques—Ekstazi and STARTS—to reason about our findings. Using 1,150 versions of 23 projects, we evaluate the impact on safety and precision of leveraging such changes. We also evaluate if our findings from a few projects can speed up regression testing in other projects. The results show that our enhancements are effective and they can generalize. On average, they result in selecting 41.7% and 31.8% fewer tests, and take 33.7% and 28.7% less time than Ekstazi and STARTS, respectively, with no loss in safety. 
    more » « less