Abstract Many critical codebases are written in C, and most of them use preprocessor directives to encode variability, effectively encoding software product lines. These preprocessor directives, however, challenge any static code analysis. SPLlift, a previously presented approach for analyzing software product lines, is limited to Java programs that use a rather simple feature encoding and to analysis problems with a finite and ideally small domain. Other approaches that allow the analysis of real-world C software product lines use special-purpose analyses, preventing the reuse of existing analysis infrastructures and ignoring the progress made by the static analysis community. This work presents VarAlyzer , a novel static analysis approach for software product lines. VarAlyzer first transforms preprocessor constructs to plain C while preserving their variability and semantics. It then solves any given distributive analysis problem on transformed product lines in a variability-aware manner. VarAlyzer ’s analysis results are annotated with feature constraints that encode in which configurations each result holds. Our experiments with 95 compilation units of OpenSSL show that applying VarAlyzer enables one to conduct inter-procedural, flow-, field- and context-sensitive data-flow analyses on entire product lines for the first time, outperforming the product-based approach for highly-configurable systems.
SugarC: scalable desugaring of real-world preprocessor usage into pure C
Variability-aware analysis is critical for ensuring the quality of configurable C software. An important step toward the development of variability-aware analysis at scale is to transform real-world C software that uses both C and preprocessor into pure C code, by replacing the preprocessor's compile-time variability with C's runtime-variability. In this work, we design and implement a desugaring tool, SugarC, that transforms away real-world preprocessor usage. SugarC augments C's formal grammar specification with translation rules, performs simultaneous type checking during desugaring, and introduces numerous optimizations to address challenges that appear in real-world preprocessor usage. The experiments on DesugarBench, a benchmark consisting of 108 manually-created programs, show that SugarC supports many more language features than two existing desugaring tools. When applied on three real-world configurable C software, SugarC desugared 774 out of 813 files in the three programs, taking at most ten minutes in the worst case and less than two minutes for 95% of the C files.
- Publication Date:
- NSF-PAR ID:
- Journal Name:
- ICSE '22: Proceedings of the 44th International Conference on Software Engineering
- Page Range or eLocation-ID:
- 2056 to 2067
- Sponsoring Org:
- National Science Foundation
More Like this
Toward detection and characterization of variability bugs in configurable C software: an empirical studyVariability in C software is a useful tool, but critical bugs that only exist in certain configurations are easily missed by conventional debugging techniques. Even with a small number of features, the configuration space of configurable software is too large to analyze exhaustively. Variability-aware static analysis for bug detection is being developed, but remains at too early a stage to be fully usable in real-world C programs. In this work, we present a methodology of finding variability bugs by combining variability-oblivious bug detectors, static analysis of build processes, and dynamic feature interaction inference. We further present an empirical study in which we test our methodology on two highly configurable C programs. We found our methodology to be effective, finding 88 true bugs between the two programs, of which 64 were variability bugs.
Many critical software systems developed in C utilize compile-time configurability. The many possible configurations of this software make bug detection through static analysis difficult. While variability-aware static analyses have been developed, there remains a gap between those and state-of-the-art static bug detection tools. In order to collect data on how such tools may perform and to develop real-world benchmarks, we present a way to leverage configuration sampling, off-the-shelf “variability-oblivious” bug detectors, and automatic feature identification techniques to simulate a variability-aware analysis. We instantiate our approach using four popular static analysis tools on three highly configurable, real-world C projects, obtaining 36,061 warnings, 80% of which are variability warnings. We analyze the warnings we collect from these experiments, finding that most results are variability warnings of a variety of kinds such as NULL dereference. We then manually investigate these warnings to produce a benchmark of 77 confirmed true bugs (52 of which are variability bugs) useful for future development of variability-aware analyses.
Obeid, Iyad Selesnick (Ed.)Electroencephalography (EEG) is a popular clinical monitoring tool used for diagnosing brain-related disorders such as epilepsy . As monitoring EEGs in a critical-care setting is an expensive and tedious task, there is a great interest in developing real-time EEG monitoring tools to improve patient care quality and efficiency . However, clinicians require automatic seizure detection tools that provide decisions with at least 75% sensitivity and less than 1 false alarm (FA) per 24 hours . Some commercial tools recently claim to reach such performance levels, including the Olympic Brainz Monitor  and Persyst 14 . In this abstract, we describe our efforts to transform a high-performance offline seizure detection system  into a low latency real-time or online seizure detection system. An overview of the system is shown in Figure 1. The main difference between an online versus offline system is that an online system should always be causal and has minimum latency which is often defined by domain experts. The offline system, shown in Figure 2, uses two phases of deep learning models with postprocessing . The channel-based long short term memory (LSTM) model (Phase 1 or P1) processes linear frequency cepstral coefficients (LFCC)  features from each EEGmore »
Highly-configurable systems written in C form our most critical computing infrastructure. The preprocessor is integral to C, because conditional compilation enables such systems to produce efficient object code. However, the preprocessor makes code harder to reason about for both humans and tools. Previous approaches to this challenge developed new program analyses for unpreprocessed source code or developed new languages and constructs to replace the preprocessor. But having specialpurpose analyses means maintaining a new toolchain, while new languages face adoption challenges and do not help with existing software. We propose the best of worlds: eliminate the preprocessor but preserve its benefits. Our design replaces preprocessor usage with C itself, augmented with syntax-preserving, backwards-compatible dependent types. We discuss automated conditional compilation to replicate preprocessor performance. Our approach opens new directions for research into new compiler optimizations, dependent types for configurable software, and automated translation away from preprocessor use.