Localization of the root causes for unreproducible builds during software maintenance is an important yet challenging task, primarily due to limited runtime traces from build processes and high diversity of build environments. To address these challenges, in this paper, we propose RepTrace, a framework that leverages the uniform interfaces of system call tracing for monitoring executed build commands in diverse build environments and identifies the root causes for unreproducible builds by analyzing the system call traces of the executed build commands. Specifically, from the collected system call traces, RepTrace performs causality analysis to build a dependency graph starting from an inconsistent build artifact (across two builds) via two types of dependencies: read/write dependencies among processes and parent/child process dependencies, and searches the graph to find the processes that result in the inconsistencies. To address the challenges of massive noisy dependencies and uncertain parent/child dependencies, RepTrace includes two novel techniques: (1) using differential analysis on multiple builds to reduce the search space of read/write dependencies, and (2) computing similarity of the runtime values to filter out noisy parent/child process dependencies. The evaluation results of RepTrace over a set of real-world software packages show that RepTrace effectively finds not only the root cause commands responsible for the unreproducible builds, but also the files to patch for addressing the unreproducible issues. Among its Top-10 identified commands and files, RepTrace achieves high accuracy rate of 90.00% and 90.56% in identifying the root causes, respectively.
more »
« less
Root cause analysis as a tool for forging shared vision and partnership: Lessons for strengthening the process.
In this Rapid Community Report - Process Reflection, the STEM PUSH Network (Pathways for Underrepresented Students to HigherEd), an NSF INCLUDES Alliance, describes a root cause analysis process used to build the conceptual foundation of the improvement network and establish a shared vision and clear roles for the partnership. Four layers of reflection, including internal evaluation, external evaluation, advisory council review, and an NSF reverse site visit, surfaced the need for and strategies to strengthen equity and youth agency in the root cause analysis process.
more »
« less
- Award ID(s):
- 1930990
- PAR ID:
- 10330899
- Date Published:
- Journal Name:
- NSF INCLUDES Rapid Community Reports
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Abstract Misconfiguration is a known and increasingly serious problem in enterprise systems due to frequent code updates and retuning of the configuration parameters. Diagnosing complex, residual misconfiguration problems that lead to inaccessible services or failed transactions often starts with either a user complaint or observation by administrators, followed by a largely manual process of deciding what tests to run and how to proceed with further testing based on the test results. The goal of this paper is to automate this process and thereby make root-cause analysis of accessibility related misconfigurations much speedier and much more effective. We explore a domain-knowledge-driven methodology, called ConfExp using a network emulator that runs real enterprise networking protocols. Thus, by using commonly used tests, we show that the root-cause can be determined in all cases where discriminative tests exist. The methodology also highlights areas where more discriminative tests are needed to pinpoint the precise configuration variables at fault.more » « less
-
The Open ROADM ecosystem enables greater flexibility in deploying optical networks through centralized SDN management and intelligent service orchestration. However, since the maintenance signaling is yet to be standardized and implemented within each device, it is difficult to identify the root cause of issues and manage the network effectively when faults propagate in the network. To tackle this problem, this paper presents a proof-of-concept implementation of an alarm correlation and ticketing system for the multi-vendor Open ROADM ecosystem. The proposed system uses a graph-based method to identify the root cause of alarms and generate tickets for network operations teams. The results of our laboratory tests demonstrate the effectiveness of the proposed system in managing alarms in the multi-vendor Open ROADM ecosystem.more » « less
-
Continuous Integration (CI) allows developers to check whether their code can build successfully and pass tests across various system environments with every commit. To use a CI platform, a developer must provide configuration files within a code repository to specify build conditions. Incorrect configuration settings lead to CI build failures, which can take hours to run, wasting valuable developer time and delaying product release dates. Debugging CI configurations is a slow and error-prone process. The only way to check the correctness of CI configurations is to push a commit and wait for the build result. We present VeriCI, the first system for localizing CI configuration errors at the code level. VeriCI runs as a static analysis tool, before the developer sends the build request to the CI server. Our key insight is that the commit history and the corresponding build histories available in CI environments can be used both for build error prediction and build error localization. We leverage the build history as a labeled dataset to automatically derive customized rules describing correct CI configurations, using supervised machine learning techniques. To more accurately identify root causes, we train a neural network that filters out constraints that are less likely to be connected to the root cause of build failure. We evaluate VeriCI on real world data from GitHub and achieve 91% accuracy of predicting a build failure and correctly identify the root cause in 75% of cases. We also conducted a between-subjects user study with 20 software developers, showing that VeriCI significantly helps users in identifying and fixing errors in CI.more » « less
-
null (Ed.)Software-defined networking (SDN) has emerged as a flexible network architecture for central and programmatic control. Although SDN can improve network security oversight and policy enforcement, ensuring the security of SDN from sophisticated attacks is an ongoing challenge for practitioners. Existing network forensics tools attempt to identify and track such attacks, but holistic causal reasoning across control and data planes remains challenging. We present PicoSDN, a provenance-informed causal observer for SDN attack analysis. PicoSDN leverages fine-grained data and execution partitioning techniques, as well as a unified control and data plane model, to allow practitioners to efficiently determine root causes of attacks and to make informed decisions on mitigating them. We implement PicoSDN on the popular ONOS SDN controller. Our evaluation across several attack case studies shows that PicoSDN is practical for the identification, analysis, and mitigation of SDN attacks.more » « less
An official website of the United States government

