NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Robustifying Debug Information Updates in LLVM via Control-Flow Conformance Analysis

https://doi.org/10.1145/3729267

Huang, Shan; Liang, Jingjing; Su, Ting; Zhang, Qirun (June 2025, Proceedings of the ACM on Programming Languages)

Optimizing compilers, such as LLVM, generatedebug informationin machine code to aid debugging. This information is particularly important when debugging optimized code, as modern software is often compiled with optimization enabled. However, properly updating debug information to reflect code transformations during optimization is a complex task that often relies on manual effort. This complexity makes the process prone to errors, which can lead to incorrect or lost debug information. Finding and fixing potential debug information update errors is vital to maintaining the accuracy and reliability of the overall debugging process. To our knowledge, no existing techniques can rectify debug information update errors in LLVM. While black-box testing approaches can find such bugs, they can neither pinpoint the root causes nor suggest fixes. To fill the gap, we propose thefirsttechnique torobustifydebug information updates in LLVM. In particular, our robustification approach can find and fix incorrect debug location updates. Central to our approach is the observation that the debug locations in the original and optimized programs must satisfy aconformance relation. The relation ensures that LLVM optimizations do not introduce extraneous debug location information on the control-flow paths of the optimized programs. We introducecontrol-flow conformance analysis, a novel approach that determines the reference updates ensuring the conformance relation by observing the execution of LLVM optimization passes and analyzing the debug locations in the control-flow graphs of programs under optimization. The determined reference updates are then used to check developer-written updates in LLVM. When discrepancies arise, the reference updates serve as the update skeletons to guide the fixing. We realized our approach as a tool named MetaLoc, which determines proper debug location updates for LLVM optimizations. More importantly, with MetaLoc, we have reported and patched 46 previously unknown update errors in LLVM. All the patches, along with 22 new regression tests, have been merged into the LLVM codebase, effectively improving the accuracy and reliability of debug information in all programs optimized by LLVM. Furthermore, our approach uncovered and led to corrections in two issues within LLVM’s official documentation on debug information updates.
more » « less
Free, publicly-accessible full text available June 10, 2026
Fast Constraint Synthesis for C++ Function Templates

https://doi.org/10.1145/3720422

Ding, Shuo; Zhang, Qirun (April 2025, Proceedings of the ACM on Programming Languages)

C++ templates are a powerful feature for generic programming and compile-time computations, but C++ compilers often emit overly verbose template error messages. Even short error messages often involve unnecessary and confusing implementation details, which are difficult for developers to read and understand. To address this problem, C++20 introduced constraints and concepts, which impose requirements on template parameters. The new features can define clearer interfaces for templates and can improve compiler diagnostics. However, manually specifying template constraints can still be non-trivial, which becomes even more challenging when working with legacy C++ projects or with frequent code changes. This paper bridges the gap and proposes an automatic approach to synthesizing constraints for C++ function templates. We utilize a lightweight static analysis to analyze the usage patterns within the template body and summarize them into constraints for each type parameter of the template. The analysis is inter-procedural and uses disjunctions of constraints to model function overloading. We have implemented our approach based on the Clang frontend and evaluated it on two C++ libraries chosen separately from two popular library sets: algorithm from the Standard Template Library (STL) and special functions from the Boost library, both of which extensively use templates. Our tool can process over 110k lines of C++ code in less than 1.5 seconds and synthesize non-trivial constraints for 30%-40% of the function templates. The constraints synthesized for algorithm align well with the standard documentation, and on average, the synthesized constraints can reduce error message lengths by 56.6% for algorithm and 63.8% for special functions.
more » « less
SMT Theory Arbitrage: Approximating Unbounded Constraints using Bounded Theories

https://doi.org/10.1145/3656387

Mikek, Benjamin; Zhang, Qirun (June 2024, Proceedings of the ACM on Programming Languages)

SMT solvers are foundational tools for reasoning about constraints in practical problems both within and outside program analysis. Faster SMT solving improves the performance of practical tools and expands the set of tractable problems. Existing approaches to improving solver performance either focus on general algorithms applied below the level of individual theories, or focus on optimizations within a single theory. Unbounded constraints in which the number of possible variable values is infinite, such as real numbers and integers, pose a particularly difficult challenge for solvers. Bounded constraints in which the set of possible values is finite, such as bitvectors and floating-point numbers, on the other hand, are decidable and have been the subject of extensive performance improvement efforts. This paper introduces a theory arbitrage: we transform unbounded constraints, which are often expensive to solve, into bounded constraints, which are typically cheaper to solve. By converting unbounded problems into bounded ones, theory arbitrage takes advantage of better performance on bounded constraints and unlocks optimization techniques that only apply to bounded theories. The transformation is achieved by harnessing a novel abstract interpretation strategy to infer bounds. The bounded transformed constraint is then an underapproximation of the semantics of the unbounded original. We realize our method for the theories of integers and real numbers with a practical tool (STAUB). Our evaluation demonstrates that theory arbitrage alone can speed up individual constraints by orders of magnitude and achieve up to a 1.4× speedup on average across nonlinear integer benchmarks. Furthermore, it enables the use of the recent compiler optimization-based technique SLOT for unbounded SMT theories, unlocking a further speedup of up to 3×. Finally, we incorporate STAUB into a practical termination proving tool and observe an overall 9% improvement in performance.
more » « less
Full Text Available
Speeding up SMT Solving via Compiler Optimization

https://doi.org/10.1145/3611643.3616357

Mikek, Benjamin; Zhang, Qirun (November 2023, ACM)
Recursive State Machine Guided Graph Folding for Context-Free Language Reachability

https://doi.org/10.1145/3591233

Lei, Yuxiang; Sui, Yulei; Tan, Shin_Hwei; Zhang, Qirun (June 2023, Proceedings of the ACM on Programming Languages)

Context-free language reachability (CFL-reachability) is a fundamental framework for program analysis. A large variety of static analyses can be formulated as CFL-reachability problems, which determines whether specific source-sink pairs in an edge-labeled graph are connected by a reachable path, i.e., a path whose edge labels form a string accepted by the given CFL. Computing CFL-reachability is expensive. The fastest algorithm exhibits a slightly subcubic time complexity with respect to the input graph size. Improving the scalability of CFL-reachability is of practical interest, but reducing the time complexity is inherently difficult. In this paper, we focus on improving the scalability of CFL-reachability from a more practical perspective---reducing the input graph size. Our idea arises from the existence of trivial edges, i.e., edges that do not affect any reachable path in CFL-reachability. We observe that two nodes joined by trivial edges can be folded---by merging the two nodes with all the edges joining them removed---without affecting the CFL-reachability result. By studying the characteristic of the recursive state machines (RSMs), an alternative form of CFLs, we propose an approach to identify foldable node pairs without the need to verify the underlying reachable paths (which is equivalent to solving the CFL-reachability problem). In particular, given a CFL-reachability problem instance with an input graph G and an RSM, based on the correspondence between paths in G and state transitions in RSM, we propose a graph folding principle, which can determine whether two adjacent nodes are foldable by examining only their incoming and outgoing edges. On top of the graph folding principle, we propose an efficient graph folding algorithm GF. The time complexity of GF is linear with respect to the number of nodes in the input graph. Our evaluations on two clients (alias analysis and value-flow analysis) show that GF significantly accelerates RSM/CFL-reachability by reducing the input graph size. On average, for value-flow analysis, GF reduces 60.96% of nodes and 42.67% of edges of the input graphs, obtaining a speedup of 4.65× and a memory usage reduction of 57.35%. For alias analysis, GF reduces 38.93% of nodes and 35.61% of edges of the input graphs, obtaining a speedup of 3.21× and a memory usage reduction of 65.19%.
more » « less
Studying and Understanding the Tradeoffs Between Generality and Reduction in Software Debloating

https://doi.org/10.1145/3551349.3556970

Xin, Qi; Zhang, Qirun; Orso, Alessandro (October 2022, The 37th {IEEE/ACM} International Conference on Automated Software Engineering)

Full Text Available
Static inference meets deep learning: a hybrid type inference approach for python

https://doi.org/10.1145/3510003.3510038

Peng, Yun; Gao, Cuiyun; Li, Zongjie; Gao, Bowei; Lo, David; Zhang, Qirun; Lyu, Michael (May 2022, Proceedings of the 44th International Conference on Software Engineering)

Full Text Available
Efficient algorithms for dynamic bidirected Dyck-reachability

https://doi.org/10.1145/3498724

Li, Yuanbo; Satya, Kris; Zhang, Qirun (January 2022, Proceedings of the ACM on Programming Languages)

Dyck-reachability is a fundamental formulation for program analysis, which has been widely used to capture properly-matched-parenthesis program properties such as function calls/returns and field writes/reads. Bidirected Dyck-reachability is a relaxation of Dyck-reachability on bidirected graphs where each edge u → ( i v labeled by an open parenthesis “( i ” is accompanied with an inverse edge v → ) i u labeled by the corresponding close parenthesis “) i ”, and vice versa. In practice, many client analyses such as alias analysis adopt the bidirected Dyck-reachability formulation. Bidirected Dyck-reachability admits an optimal reachability algorithm. Specifically, given a graph with n nodes and m edges, the optimal bidirected Dyck-reachability algorithm computes all-pairs reachability information in O ( m ) time. This paper focuses on the dynamic version of bidirected Dyck-reachability. In particular, we consider the problem of maintaining all-pairs Dyck-reachability information in bidirected graphs under a sequence of edge insertions and deletions. Dynamic bidirected Dyck-reachability can formulate many program analysis problems in the presence of code changes. Unfortunately, solving dynamic graph reachability problems is challenging. For example, even for maintaining transitive closure, the fastest deterministic dynamic algorithm requires O ( n 2 ) update time to achieve O (1) query time. All-pairs Dyck-reachability is a generalization of transitive closure. Despite extensive research on incremental computation, there is no algorithmic development on dynamic graph algorithms for program analysis with worst-case guarantees. Our work fills the gap and proposes the first dynamic algorithm for Dyck reachability on bidirected graphs. Our dynamic algorithms can handle each graph update ( i.e. , edge insertion and deletion) in O ( n ·α( n )) time and support any all-pairs reachability query in O (1) time, where α( n ) is the inverse Ackermann function. We have implemented and evaluated our dynamic algorithm on an alias analysis and a context-sensitive data-dependence analysis for Java. We compare our dynamic algorithms against a straightforward approach based on the O ( m )-time optimal bidirected Dyck-reachability algorithm and a recent incremental Datalog solver. Experimental results show that our algorithm achieves orders of magnitude speedup over both approaches.
more » « less
Full Text Available

Search for: All records