skip to main content


Title: Efficient dependency detection for safe Java test acceleration
Slow builds remain a plague for software developers. The frequency with which code can be built (compiled, tested and packaged) directly impacts the productivity of developers: longer build times mean a longer wait before determining if a change to the application being built was successful. We have discovered that in the case of some languages, such as Java, the majority of build time is spent running tests, where dependencies between individual tests are complicated to discover, making many existing test acceleration techniques unsound to deploy in practice. Without knowledge of which tests are dependent on others, we cannot safely parallelize the execution of the tests, nor can we perform incremental testing (i.e., execute only a subset of an application's tests for each build). The previous techniques for detecting these dependencies did not scale to large test suites: given a test suite that normally ran in two hours, the best-case running scenario for the previous tool would have taken over 422 CPU days to find dependencies between all test methods (and would not soundly find all dependencies) — on the same project the exhaustive technique (to find all dependencies) would have taken over 1e300 years. We present a novel approach to detecting all dependencies between test cases in large projects that can enable safe exploitation of parallelism and test selection with a modest analysis cost.  more » « less
Award ID(s):
1302269 1161079
NSF-PAR ID:
10112169
Author(s) / Creator(s):
; ; ;
Date Published:
Journal Name:
10th Joint Meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on the Foundations of Software Engineering (ESEC/FSE)
Page Range / eLocation ID:
770 to 781
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Summary

    Researchers and practitioners have designed and implemented various automated test case generators to support effective software testing. Such generators exist for various languages (e.g., Java, C#, or Python) and various platforms (e.g., desktop, web, or mobile applications). The generators exhibit varying effectiveness and efficiency, depending on the testing goals they aim to satisfy (e.g., unit‐testing of libraries versus system‐testing of entire applications) and the underlying techniques they implement. In this context, practitioners need to be able to compare different generators to identify the most suited one for their requirements, while researchers seek to identify future research directions. This can be achieved by systematically executing large‐scale evaluations of different generators. However, executing such empirical evaluations is not trivial and requires substantial effort to select appropriate benchmarks, setup the evaluation infrastructure, and collect and analyse the results. In this Software Note, we present ourJUnit Generation Benchmarking Infrastructure(JUGE) supporting generators (search‐based, random‐based, symbolic execution, etc.) seeking to automate the production of unit tests for various purposes (validation, regression testing, fault localization, etc.). The primary goal is to reduce the overall benchmarking effort, ease the comparison of several generators, and enhance the knowledge transfer between academia and industry by standardizing the evaluation and comparison process. Since 2013, several editions of a unit testing tool competition, co‐located with the Search‐Based Software Testing Workshop, have taken place whereJUGEwas used and evolved. As a result, an increasing amount of tools (over 10) from academia and industry have been evaluated onJUGE, matured over the years, and allowed the identification of future research directions. Based on the experience gained from the competitions, we discuss the expected impact ofJUGEin improving the knowledge transfer on tools and approaches for test generation between academia and industry. Indeed, theJUGEinfrastructure demonstrated an implementation design that is flexible enough to enable the integration of additional unit test generation tools, which is practical for developers and allows researchers to experiment with new and advanced unit testing tools and approaches.

     
    more » « less
  2. Enterprise software updates depend on the interaction between user and developer organizations. This interaction becomes especially complex when a single developer organization writes software that services hundreds of different user organizations. Miscommunication during patching and deployment efforts lead to insecure or malfunctioning software installations. While developers oversee the code, the update process starts and ends outside their control. Since developer test suites may fail to capture buggy behavior finding and fixing these bugs starts with user generated bug reports and 3rd party disclosures. The process ends when the fixed code is deployed in production. Any friction between user, and developer results in a delay patching critical bugs. Two common causes for friction are a failure to replicate user specific circumstances that cause buggy behavior and incompatible software releases that break critical functionality. Existing test generation techniques are insufficient. They fail to test candidate patches for post-deployment bugs and to test whether the new release adversely effects customer workloads. With existing test generation and deployment techniques, users can't choose (nor validate) compatible portions of new versions and retain their previous version's functionality. We present two new technologies to alleviate this friction. First, Test Generation for Ad Hoc Circumstances transforms buggy executions into test cases. Second, Binary Patch Decomposition allows users to select the compatible pieces of update releases. By sharing specific context around buggy behavior and developers can create specific test cases that demonstrate if their fixes are appropriate. When fixes are distributed by including extra context users can incorporate only updates that guarantee compatibility between buggy and fixed versions. We use change analysis in combination with binary rewriting to transform the old executable and buggy execution into a test case including the developer's prospective changes that let us generate and run targeted tests for the candidate patch. We also provide analogous support to users, to selectively validate and patch their production environments with only the desired bug-fixes from new version releases. This paper presents a new patching workflow that allows developers to validate prospective patches and users to select which updates they would like to apply, along with two new technologies that make it possible. We demonstrate our technique constructs tests cases more effectively and more efficiently than traditional test case generation on a collection of real world bugs compared to traditional test generation techniques, and provides the ability for flexible updates in real world scenarios. 
    more » « less
  3. Unit testing focuses on verifying the functions of individual units of a software system. It is challenging due to the high inter dependencies among software units. Developers address this by mocking—replacing the dependency by a “fake” object. Despite the existence of powerful, dedicated mocking frameworks, developers often turn to a “hand-rolled” approach—inheritance. That is, they create a subclass of the dependent class and mock its behavior through method overriding. However, this requires tedious implementation and compromises the design quality of unit tests. This work contributes a fully automated refactoring framework to identify and replace the usage of inheritance by using Mockito—a well received mocking framework. Our approach is built upon the empirical experience from five open source projects that use inheritance for mocking. We evaluate our approach on nine other projects. Results show that our framework is efficient, generally applicable to new datasets, mostly preserves test case behaviors in detecting defects (in the form of mutants), and decouples test code from production code. The qualitative evaluation by experienced developers suggests that the auto-refactoring solutions generated by our framework improve the quality of the unit test cases in various aspects, such as making test conditions more explicit, as well as improved cohesion, readability, understandability, and maintainability with test cases. Finally, we submit 23 pull requests containing our refactoring solutions to the open-source projects. It turns out that, 9 requests are accepted/merged, 6 requests are rejected, the remaining requests are pending (5 requests), with unexpected exceptions (2 requests), or undecided (1 request). In particular, among the 21 open source developers that are involved in the reviewing process, 81% give positive votes. This indicates that our refactoring solutions are quite well received by the open-source projects and developers. 
    more » « less
  4. null (Ed.)
    Unit testing focuses on verifying the functions of individual units of a software system. It is challenging due to the high inter-dependencies among software units. Developers address this by mocking-replacing the dependency by a "faked" object. Despite the existence of powerful, dedicated mocking frameworks, developers often turn to a "hand-rolled" approach-inheritance. That is, they create a subclass of the dependent class and mock its behavior through method overriding. However, this requires tedious implementation and compromises the design quality of unit tests. This work contributes a fully automated refactoring framework to identify and replace the usage of inheritance by using Mockito-a well received mocking framework. Our approach is built upon the empirical experience from five open source projects that use inheritance for mocking. We evaluate our approach on four other projects. Results show that our framework is efficient, generally applicable to new datasets, mostly preserves test case behaviors in detecting defects (in the form of mutants), and decouples test code from production code. The qualitative evaluation by experienced developers suggests that the auto-refactoring solutions generated by our framework improve the quality of the unit test cases in various aspects, such as making test conditions more explicit, as well as improved cohesion, readability, understandability, and maintainability with test cases. 
    more » « less
  5. null (Ed.)
    The Amundsen Sea sector of Antarctica has long been considered the most vulnerable part of the West Antarctic Ice Sheet (WAIS) because of the great water depth at the grounding line and the absence of substantial ice shelves. Glaciers in this configuration are thought to be susceptible to rapid or runaway retreat. Ice flowing into the Amundsen Sea Embayment is undergoing the most rapid changes of any sector of the Antarctic Ice Sheet outside the Antarctic Peninsula, including changes caused by substantial grounding-line retreat over recent decades, as observed from satellite data. Recent models suggest that a threshold leading to the collapse of WAIS in this sector may have been already crossed and that much of the ice sheet could be lost even under relatively moderate greenhouse gas emission scenarios. Drill cores from the Amundsen Sea provide tests of several key questions about controls on ice sheet stability. The cores offer a direct record of glacial history offshore from a drainage basin that receives ice exclusively from the WAIS, which allows clear comparisons between the WAIS history and low-latitude climate records. Today, warm Circumpolar Deep Water (CDW) is impinging onto the Amundsen Sea shelf and causing melting of the underside of the WAIS in most places. Reconstructions of past CDW intrusions can assess the ties between warm water upwelling and large-scale changes in past grounding-line positions. Carrying out these reconstructions offshore from the drainage basin that currently has the most substantial negative mass balance of ice anywhere in Antarctica is thus of prime interest to future predictions. The scientific objectives for this expedition are built on hypotheses about WAIS dynamics and related paleoenvironmental and paleoclimatic conditions. The main objectives are 1. To test the hypothesis that WAIS collapses occurred during the Neogene and Quaternary and, if so, when and under which environmental conditions; 2. To obtain ice-proximal records of ice sheet dynamics in the Amundsen Sea that correlate with global records of ice-volume changes and proxy records for atmospheric and ocean temperatures; 3. To study the stability of a marine-based WAIS margin and how warm deep-water incursions control its position on the shelf; 4. To find evidence for earliest major grounded WAIS advances onto the middle and outer shelf; 5. To test the hypothesis that the first major WAIS growth was related to the uplift of the Marie Byrd Land dome. International Ocean Discovery Program (IODP) Expedition 379 completed two very successful drill sites on the continental rise of the Amundsen Sea. Site U1532 is located on a large sediment drift, now called Resolution Drift, and penetrated to 794 m with 90% recovery. We collected almost-continuous cores from the Pleistocene through the Pliocene and into the late Miocene. At Site U1533, we drilled 383 m (70% recovery) into the more condensed sequence at the lower flank of the same sediment drift. The cores of both sites contain unique records that will enable study of the cyclicity of ice sheet advance and retreat processes as well as bottom-water circulation and water mass changes. In particular, Site U1532 revealed a sequence of Pliocene sediments with an excellent paleomagnetic record for high-resolution climate change studies of the previously sparsely sampled Pacific sector of the West Antarctic margin. Despite the drilling success at these sites, the overall expedition experienced three unexpected difficulties that affected many of the scientific objectives: 1. The extensive sea ice on the continental shelf prevented us from drilling any of the proposed shelf sites. 2. The drill sites on the continental rise were in the path of numerous icebergs of various sizes that frequently forced us to pause drilling or leave the hole entirely as they approached the ship. The overall downtime caused by approaching icebergs was 50% of our time spent on site. 3. An unfortunate injury to a member of the ship's crew cut the expedition short by one week. Recovery of core on the continental rise at Sites U1532 and U1533 cannot be used to precisely indicate the position of ice or retreat of the ice sheet on the shelf. However, these sediments contained in the cores offer a range of clues about past WAIS extent and retreat. At Sites U1532 and U1533, coarse-grained sediments interpreted to be ice-rafted debris (IRD) were identified throughout all recovered time periods. A dominant feature of the cores is recorded by lithofacies cyclicity, which is interpreted to represent relatively warmer periods variably characterized by higher microfossil abundance, greater bioturbation, and higher counts of IRD alternating with colder periods characterized by dominantly gray laminated terrigenous muds. Initial comparison of these cycles to published records from the region suggests that the units interpreted as records of warmer time intervals in the core tie to interglacial periods and the units interpreted as deposits of colder periods tie to glacial periods. The cores from the two drill sites recovered sediments of purely terrigenous origin intercalated or mixed with pelagic or hemipelagic deposits. In particular, Site U1533, which is located near a deep-sea channel originating from the continental slope, contains graded sands and gravel transported downslope from the shelf to the abyssal plain. The channel is likely the path of such sediments transported downslope by turbidity currents or other sediment-gravity flows. The association of lithologic facies at both sites predominantly reflects the interplay of downslope and contouritic sediment supply with occasional input of more pelagic sediment. Despite the lack of cores from the shelf, our records from the continental rise reveal the timing of glacial advances across the shelf and thus the existence of a continent-wide ice sheet in West Antarctica at least during longer time periods since the late Miocene. Cores from both sites contain abundant coarse-grained sediments and clasts of plutonic origin transported either by downslope processes or by ice rafting. If detailed provenance studies confirm our preliminary assessment that the origin of these samples is from the plutonic bedrock of Marie Byrd Land, their thermochronological record will potentially reveal timing and rates of denudation and erosion linked to crustal uplift. The chronostratigraphy of both sites enables the generation of a seismic sequence stratigraphy not only for the Amundsen Sea rise but also for the western Amundsen Sea along the Marie Byrd Land margin through a connecting network of seismic lines. 
    more » « less