Build systems are responsible for building software correctly and quickly. Unfortunately, traditional build tools like make are correct and fast only when developers precisely enumerate dependencies for every incremental build step. Forward build systems improve correctness over traditional build tools by discovering dependencies automatically, but existing forward build tools have two fundamental flaws. First, they are incorrect; existing forward build tools miss dependencies because their models of system state are incomplete. Second, they rely on users to manually specify incremental build steps, increasing the programmer burden for fast builds. This paper introduces Riker, a forward build system that guarantees fast, correct builds. Riker builds are easy to specify; in many cases a single command such as gcc *.c suffices. From these simple specifications, Riker automatically discovers fast incremental rebuild opportunities. Riker models the entire POSIX filesystem—not just files, but directories, pipes, and so on. This model guarantees that every dependency is checked on every build so every output is correct. We use Riker to build 14 open source packages including LLVM and memcached. Riker incurs a median overhead of 8.8% on the initial full build. On average, Riker’s incremental builds realize 94% of make’s incremental speedup with no manual effort and no risk of errors. 
                        more » 
                        « less   
                    
                            
                            Root Cause Localization for Unreproducible Builds via Causality Analysis Over System Call Tracing
                        
                    
    
            Localization of the root causes for unreproducible builds during software maintenance is an important yet challenging task, primarily due to limited runtime traces from build processes and high diversity of build environments. To address these challenges, in this paper, we propose RepTrace, a framework that leverages the uniform interfaces of system call tracing for monitoring executed build commands in diverse build environments and identifies the root causes for unreproducible builds by analyzing the system call traces of the executed build commands. Specifically, from the collected system call traces, RepTrace performs causality analysis to build a dependency graph starting from an inconsistent build artifact (across two builds) via two types of dependencies: read/write dependencies among processes and parent/child process dependencies, and searches the graph to find the processes that result in the inconsistencies. To address the challenges of massive noisy dependencies and uncertain parent/child dependencies, RepTrace includes two novel techniques: (1) using differential analysis on multiple builds to reduce the search space of read/write dependencies, and (2) computing similarity of the runtime values to filter out noisy parent/child process dependencies. The evaluation results of RepTrace over a set of real-world software packages show that RepTrace effectively finds not only the root cause commands responsible for the unreproducible builds, but also the files to patch for addressing the unreproducible issues. Among its Top-10 identified commands and files, RepTrace achieves high accuracy rate of 90.00% and 90.56% in identifying the root causes, respectively. 
        more » 
        « less   
        
    
                            - Award ID(s):
- 1816615
- PAR ID:
- 10190934
- Date Published:
- Journal Name:
- 2019 34th IEEE/ACM International Conference on Automated Software Engineering (ASE)
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
- 
            
- 
            Build systems are responsible for building software correctly and quickly. Unfortunately, traditional build tools like make are correct and fast only when developers precisely enumerate dependencies for every incremental build step. Forward build systems improve correctness over traditional build tools by discovering dependencies automatically, but existing forward build tools have two fundamental flaws. First, they are incorrect; existing forward build tools miss dependencies because their models of system state are incomplete. Second, they rely on users to manually specify incremental build steps, increasing the programmer burden for fast builds. This paper introduces Riker, a forward build system that guarantees fast, correct builds. Riker builds are easy to specify; in many cases a single command such as gcc *.c suffices. From these simple specifications, Riker automatically discovers fast incremental rebuild opportunities. Riker models the entire POSIX filesystem—not just files, but directories, pipes, and so on. This model guarantees that every dependency is checked on every build so every output is correct. We use Riker to build 14 open source packages including LLVM and memcached. Riker incurs a median overhead of 8.8% on the initial full build. On average, Riker's incremental builds realize 94% of make's incremental speedup with no manual effort and no risk of errors.more » « less
- 
            Continuous Integration (CI) allows developers to check whether their code can build successfully and pass tests across various system environments with every commit. To use a CI platform, a developer must provide configuration files within a code repository to specify build conditions. Incorrect configuration settings lead to CI build failures, which can take hours to run, wasting valuable developer time and delaying product release dates. Debugging CI configurations is a slow and error-prone process. The only way to check the correctness of CI configurations is to push a commit and wait for the build result. We present VeriCI, the first system for localizing CI configuration errors at the code level. VeriCI runs as a static analysis tool, before the developer sends the build request to the CI server. Our key insight is that the commit history and the corresponding build histories available in CI environments can be used both for build error prediction and build error localization. We leverage the build history as a labeled dataset to automatically derive customized rules describing correct CI configurations, using supervised machine learning techniques. To more accurately identify root causes, we train a neural network that filters out constraints that are less likely to be connected to the root cause of build failure. We evaluate VeriCI on real world data from GitHub and achieve 91% accuracy of predicting a build failure and correctly identify the root cause in 75% of cases. We also conducted a between-subjects user study with 20 software developers, showing that VeriCI significantly helps users in identifying and fixing errors in CI.more » « less
- 
            n recent years, we have been enhancing and updating gem5’s GPU support. First, we have enhanced gem5’s GPU support for ML workloads such that gem5 can now run. Moreover, as part of this support, we created, validated, and released a Docker image that contains the proper software and libraries needed to run GCN3 and Vega GPU models in gem5. With this container, users can run the gem5 GPU model, as well as build the ROCm applications that they want to run in the GPU model, out of the box without needing to properly install the appropriate ROCm software and libraries. Additionally, we have updated gem5 to make it easier to reproduce results, including releasing support for a number of GPU workloads in gem5-resources and enabling continuous integration testing on future GPU commits. However, in an effort to provide sufficient coverage, the cur- rent testing support for GPU tests requires significant runtime both for the nightly and weekly regression tests. Currently most of these regression tests test the GPU SE mode support, since GPU FS mode support is still nascent. Unfortunately, much of this time is spent parsing input files to create arrays and other data structures that the GPU subsequently computes on. Although SE mode does not simulate the system calls needed to read these input files, nevertheless this still represents a significant overhead that increases runtime and prevents other tests (potentially providing additional coverage) from being run in that same timeframe. In an effort to address this, in the work we have been working on utilizing SE mode’s avoiding modeling system calls to speed up the runtime of the GPU regression tests. Specifically, we redesign the input reading phase of these GPU tests to create and use mmap’d files for their input arrays (which SE mode completes all at once) instead of reading in the files entry by entry. In doing so, we see significant reductions in runtime of at least 29%more » « less
- 
            In recent years, we have been enhancing and updating gem5’s GPU support. First, we have enhanced gem5’s GPU support for ML workloads such that gem5 can now run. Moreover, as part of this support, we created, validated, and released a Docker image that contains the proper software and libraries needed to run GCN3 and Vega GPU models in gem5. With this container, users can run the gem5 GPU model, as well as build the ROCm applications that they want to run in the GPU model, out of the box without needing to properly install the appropriate ROCm software and libraries. Additionally, we have updated gem5 to make it easier to reproduce results, including releasing support for a number of GPU workloads in gem5-resources and enabling continuous integration testing on future GPU commits. However, in an effort to provide sufficient coverage, the cur- rent testing support for GPU tests requires significant runtime both for the nightly and weekly regression tests. Currently most of these regression tests test the GPU SE mode support, since GPU FS mode support is still nascent. Unfortunately, much of this time is spent parsing input files to create arrays and other data structures that the GPU subsequently computes on. Although SE mode does not simulate the system calls needed to read these input files, nevertheless this still represents a significant overhead that increases runtime and prevents other tests (potentially providing additional coverage) from being run in that same timeframe. In an effort to address this, in the work we have been working on utilizing SE mode’s avoiding modeling system calls to speed up the runtime of the GPU regression tests. Specifically, we redesign the input reading phase of these GPU tests to create and use mmap’d files for their input arrays (which SE mode completes all at once) instead of reading in the files entry by entry. In doing so, we see significant reductions in runtime of at least 29%more » « less
 An official website of the United States government
An official website of the United States government 
				
			 
					 
					
 
                                    