skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Low-Level Reachability Analysis Based on Formal Logic
Reachability is an important problem in program analysis. Automatically being able to show that – and how – a certain state is reachable, can be used to detect bugs and vulnerabilities. Various research has focused on formalizing a program logic that connects preconditions to post-conditions in the context of reachability analysis, e.g., must+, Lisbon Triples, and Outcome Logic. Outcome Logic and its variants can be seen as an adaptation of Hoare Logic and Incorrectness Logic. In this paper, we aim to study 1.) how such a formal reachability logic can be used for automated precondition generation, and 2.) how it can be used to reason over low-level assembly code. Automated precondition generation for reachability logic enables us to find inputs that provably trigger an assertion (i.e., a post-condition). Motivation for focusing on low-level code is that low-level code accurately describes actual program behavior, can be targeted in cases where source code is unavailable, and allows reasoning over low-level properties like return pointer integrity. An implementation has been developed, and the entire system is proven to be sound and complete (the latter only in the absence of unresolved indirections) in the Isabelle/HOL theorem prover. Initial results are obtained on litmus tests and case studies. The results expose limitations: traversal may not terminate, and more scalability would require a compositional approach. However, the results show as well that precondition generation based on low-level reachability logic allows exposing bugs in low-level code.  more » « less
Award ID(s):
2234257
PAR ID:
10500431
Author(s) / Creator(s):
; ; ;
Editor(s):
Prevosto, Virgile; Seceleanu, Cristina
Publisher / Repository:
Springer, Cham
Date Published:
Journal Name:
Lecture notes in computer science
ISSN:
1611-3349
ISBN:
978-3-031-38828-6
Format(s):
Medium: X
Location:
Tento, Italy
Sponsoring Org:
National Science Foundation
More Like this
  1. Automated Program Repair (APR) is one of the most recent advances in automated debugging, and can directly fix buggy programs with minimal human intervention. Although various advanced APR techniques (including search-based or semantic-based ones) have been proposed, they mainly work at the source-code level and it is not clear how bytecode-level APR performs in practice. Also, empirical studies of the existing techniques on bugs beyond what has been reported in the original papers are rather limited. In this paper, we implement the first practical bytecode-level APR technique, PraPR, and present the first extensive study on fixing real-world bugs (e.g., Defects4J bugs) using JVM bytecode mutation. The experimental results show that surprisingly even PraPR with only the basic traditional mutators can produce genuine fixes for 17 bugs; with simple additional commonly used APR mutators, PraPR is able to produce genuine fixes for 43 bugs, significantly outperforming state-of-the-art APR, while being over 10X faster. Furthermore, we performed an extensive study of PraPR and other recent APR tools on a large number of additional real-world bugs, and demonstrated the overfitting problem of recent advanced APR tools for the first time. Lastly, PraPR has also successfully fixed bugs for other JVM languages (e.g., for the popular Kotlin language), indicating PraPR can greatly complement existing source-code-level APR. 
    more » « less
  2. Static analysis tools have demonstrated effectiveness at finding bugs in real world code. Such tools are increasingly widely adopted to improve software quality in practice. Automated Program Repair (APR) has the potential to further cut down on the cost of improving software quality. However, there is a disconnect between these effective bug-finding tools and APR. Recent advances in APR rely on test cases, making them inapplicable to newly discovered bugs or bugs difficult to test for deterministically (like memory leaks). Additionally, the quality of patches generated to satisfy a test suite is a key challenge. We address these challenges by adapting advances in practical static analysis and verification techniques to enable a new technique that finds and then accurately fixes real bugs without test cases. We present a new automated program repair technique using Separation Logic. At a high-level, our technique reasons over semantic effects of existing program fragments to fix faults related to general pointer safety properties: resource leaks, memory leaks, and null dereferences. The procedure automatically translates identified fragments into source-level patches, and verifies patch correctness with respect to reported faults. In this work we conduct the largest study of automatically fixing undiscovered bugs in real-world code to date. We demonstrate our approach by correctly fixing 55 bugs, including 11 previously undiscovered bugs, in 11 real-world projects. 
    more » « less
  3. null (Ed.)
    Dynamic code, i.e., code that is created or modified at runtime, is ubiquitous in today’s world. The behavior of dynamic code can depend on the logic of the dynamic code generator in subtle and non-obvious ways, e.g., JIT compiler bugs can lead to exploitable vulnerabilities in the resulting JIT-compiled code. Existing approaches to program analysis do not provide adequate support for reasoning about such behavioral relationships. This paper takes a first step in addressing this problem by describing a program representation and a new notion of dependency that allows us to reason about dependency and information flow relationships between the dynamic code generator and the generated dynamic code. Experimental results show that analyses based on these concepts are able to capture properties of dynamic code that cannot be identified using traditional program analyses. 
    more » « less
  4. Compiler correctness is crucial, as miscompilation can falsify program behaviors, leading to serious consequences over the software supply chain. In the literature, fuzzing has been extensively studied to uncover compiler defects. However, compiler fuzzing remains challenging: Existing arts focus on black- and grey-box fuzzing, which generates test programs without sufficient understanding of internal compiler behaviors. As such, they often fail to construct test programs to exercise intricate optimizations. Meanwhile, traditional white-box techniques, such as symbolic execution, are computationally inapplicable to the giant codebase of compiler systems. Recent advances demonstrate that Large Language Models (LLMs) excel in code generation/understanding tasks and even have achieved state-of-the-art performance in black-box fuzzing. Nonetheless, guiding LLMs with compiler source-code information remains a missing piece of research in compiler testing. To this end, we propose WhiteFox, the first white-box compiler fuzzer using LLMs with source-code information to test compiler optimization, with a spotlight on detecting deep logic bugs in the emerging deep learning (DL) compilers. WhiteFox adopts a multi-agent framework: (i) an LLM-based analysis agent examines the low-level optimization source code and produces requirements on the high-level test programs that can trigger the optimization; (ii) an LLM-based generation agent produces test programs based on the summarized requirements. Additionally, optimization-triggering tests are also used as feedback to further enhance the test generation prompt on the fly. Our evaluation on the three most popular DL compilers (i.e., PyTorch Inductor, TensorFlow-XLA, and TensorFlow Lite) shows that WhiteFox can generate high-quality test programs to exercise deep optimizations requiring intricate conditions, practicing up to 8 times more optimizations than state-of-the-art fuzzers. To date, WhiteFox has found in total 101 bugs for the compilers under test, with 92 confirmed as previously unknown and 70 already fixed. Notably, WhiteFox has been recently acknowledged by the PyTorch team, and is in the process of being incorporated into its development workflow. Finally, beyond DL compilers, WhiteFox can also be adapted for compilers in different domains, such as LLVM, where WhiteFox has already found multiple bugs. 
    more » « less
  5. null (Ed.)
    Generate-and-validate (G&V) automated program repair (APR) techniques have been extensively studied during the past decade. Meanwhile, such techniques can be extremely time-consuming due to the manipulation of program code to fabricate a large number of patches and also the repeated test executions on patches to identify potential fixes. PraPR, a recent G&V APR technique, reduces such costs by modifying program code directly at the level of compiled JVM bytecode with on-the-fly patch validation, which directly allows multiple bytecode patches to be tested within the same JVM process. However, PraPR is limited due to its unique bytecode-repair design, and is basically unsound/imprecise as it assumes that patch executions do not change global JVM state and affect later patch executions on the same JVM process. In this paper, we propose a unified patch validation framework, named UniAPR, to perform the first empirical study of on-the-fly patch validation for state-of-the-art source-code-level APR techniques widely studied in the literature; furthermore, UniAPR addresses the imprecise patch validation issue by resetting the JVM global state via runtime bytecode transformation. We have implemented UniAPR as a publicly available fully automated Maven Plugin. Our study demonstrates for the first time that on-the-fly patch validation can often speed up state-of-the-art source-code-level APR by over an order of magnitude, enabling all existing APR techniques to explore a larger search space to fix more bugs in the near future. Furthermore, our study shows the first empirical evidence that vanilla on-the-fly patch validation can be imprecise/unsound, while UniAPR with JVM reset is able to mitigate such issues with negligible overhead. 
    more » « less