skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Search for: All records

Creators/Authors contains: "Zhang, Qirun"

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Optimizing compilers, such as LLVM, generatedebug informationin machine code to aid debugging. This information is particularly important when debugging optimized code, as modern software is often compiled with optimization enabled. However, properly updating debug information to reflect code transformations during optimization is a complex task that often relies on manual effort. This complexity makes the process prone to errors, which can lead to incorrect or lost debug information. Finding and fixing potential debug information update errors is vital to maintaining the accuracy and reliability of the overall debugging process. To our knowledge, no existing techniques can rectify debug information update errors in LLVM. While black-box testing approaches can find such bugs, they can neither pinpoint the root causes nor suggest fixes. To fill the gap, we propose thefirsttechnique torobustifydebug information updates in LLVM. In particular, our robustification approach can find and fix incorrect debug location updates. Central to our approach is the observation that the debug locations in the original and optimized programs must satisfy aconformance relation. The relation ensures that LLVM optimizations do not introduce extraneous debug location information on the control-flow paths of the optimized programs. We introducecontrol-flow conformance analysis, a novel approach that determines the reference updates ensuring the conformance relation by observing the execution of LLVM optimization passes and analyzing the debug locations in the control-flow graphs of programs under optimization. The determined reference updates are then used to check developer-written updates in LLVM. When discrepancies arise, the reference updates serve as the update skeletons to guide the fixing. We realized our approach as a tool named MetaLoc, which determines proper debug location updates for LLVM optimizations. More importantly, with MetaLoc, we have reported and patched 46 previously unknown update errors in LLVM. All the patches, along with 22 new regression tests, have been merged into the LLVM codebase, effectively improving the accuracy and reliability of debug information in all programs optimized by LLVM. Furthermore, our approach uncovered and led to corrections in two issues within LLVM’s official documentation on debug information updates. 
    more » « less
    Free, publicly-accessible full text available June 10, 2026
  2. C++ templates are a powerful feature for generic programming and compile-time computations, but C++ compilers often emit overly verbose template error messages. Even short error messages often involve unnecessary and confusing implementation details, which are difficult for developers to read and understand. To address this problem, C++20 introduced constraints and concepts, which impose requirements on template parameters. The new features can define clearer interfaces for templates and can improve compiler diagnostics. However, manually specifying template constraints can still be non-trivial, which becomes even more challenging when working with legacy C++ projects or with frequent code changes. This paper bridges the gap and proposes an automatic approach to synthesizing constraints for C++ function templates. We utilize a lightweight static analysis to analyze the usage patterns within the template body and summarize them into constraints for each type parameter of the template. The analysis is inter-procedural and uses disjunctions of constraints to model function overloading. We have implemented our approach based on the Clang frontend and evaluated it on two C++ libraries chosen separately from two popular library sets: algorithm from the Standard Template Library (STL) and special functions from the Boost library, both of which extensively use templates. Our tool can process over 110k lines of C++ code in less than 1.5 seconds and synthesize non-trivial constraints for 30%-40% of the function templates. The constraints synthesized for algorithm align well with the standard documentation, and on average, the synthesized constraints can reduce error message lengths by 56.6% for algorithm and 63.8% for special functions. 
    more » « less
  3. SMT solvers are foundational tools for reasoning about constraints in practical problems both within and outside program analysis. Faster SMT solving improves the performance of practical tools and expands the set of tractable problems. Existing approaches to improving solver performance either focus on general algorithms applied below the level of individual theories, or focus on optimizations within a single theory. Unbounded constraints in which the number of possible variable values is infinite, such as real numbers and integers, pose a particularly difficult challenge for solvers. Bounded constraints in which the set of possible values is finite, such as bitvectors and floating-point numbers, on the other hand, are decidable and have been the subject of extensive performance improvement efforts. This paper introduces a theory arbitrage: we transform unbounded constraints, which are often expensive to solve, into bounded constraints, which are typically cheaper to solve. By converting unbounded problems into bounded ones, theory arbitrage takes advantage of better performance on bounded constraints and unlocks optimization techniques that only apply to bounded theories. The transformation is achieved by harnessing a novel abstract interpretation strategy to infer bounds. The bounded transformed constraint is then an underapproximation of the semantics of the unbounded original. We realize our method for the theories of integers and real numbers with a practical tool (STAUB). Our evaluation demonstrates that theory arbitrage alone can speed up individual constraints by orders of magnitude and achieve up to a 1.4× speedup on average across nonlinear integer benchmarks. Furthermore, it enables the use of the recent compiler optimization-based technique SLOT for unbounded SMT theories, unlocking a further speedup of up to 3×. Finally, we incorporate STAUB into a practical termination proving tool and observe an overall 9% improvement in performance. 
    more » « less
  4. Context-free language reachability (CFL-reachability) is a prominent model for formulating program analysis problems. Almost all CFL-reachability algorithms are based on the Reps-Horwitz-Sagiv (RHS) tabulation. In essence, the RHS tabulation, based on normalized context-free grammars, is similar to the CYK algorithm for CFL-parsing. Consider a normalized rule S ::= A B and a CFL-reachability problem instance of computing S-edges in the input graph. The RHS tabulation obtains all summary edges (i.e., S-, A-, and B-edges) based on the grammar rules. However, many A- and B-edges are wasted because only a subset of those edges eventually contributes to generating S-edges in the input graph. This paper proposes a new tabulation strategy for speeding up CFL-reachability by eliminating wasted and unnecessary summary edges. We particularly focus on recursive nonterminals. Our key technical insight is that the wasted edge generations and insertions caused by recursive nonterminals can be avoided by modifying the parse trees either statically (by transforming the grammar) or dynamically (using a specialized online CFL-reachability solver). For example, if a recursive nonterminal B, generated by a rule B ::= B X, appears on the right-hand side of a rule S ::= A B, we can make S recursive (by introducing a new rule S ::= S X) and eliminate the original recursive rule (B ::= B X). Due to the rule S ::= S X, the shapes of the parse trees associated with the left-hand-side nonterminal S become more skewed. Thus, we name our approach skewed tabulation for CFL-reachability. Skewed tabulation can significantly improve the scalability of CFL-reachability by reducing wasted and unnecessary summary edges. We have implemented skewed tabulation and applied the corresponding CFL-reachability algorithm to an alias analysis, a value-flow analysis, and a taint analysis. Our extensive evaluation based on SPEC 2017 benchmarks yields promising results. For the three client analyses, CFL-reachability based on skewed tabulation can achieve 3.34×, 1.13× and 2.05× speedup over the state-of-the-art RHS-tabulation-based CFL-reachability solver and consume 60.05%, 20.38% and 63.06% less memory, respectively. Furthermore, the cost of grammar transformation for skewed tabulation is negligible, typically taking less than one second. 
    more » « less
  5. Hermenegildo, Manuel; Morales, José (Ed.)
    Context-free language reachability is an important program analysis framework, but the exact analysis problems can be intractable or undecidable, where CFL-reachability approximates such problems. For the same problem, there could be many over-approximations based on different CFLs \(C_1,\ldots ,C_n\). Suppose the reachability result of each \(C_i\) produces a set \(P_i\) of reachable vertex pairs. Is it possible to achieve better precision than the straightforward intersection \(\bigcap _{i=1}^n P_i\)? This paper gives an affirmative answer: although CFLs are not closed under intersections, in CFL-reachability we can “intersect” graphs. Specifically, we propose mutual refinement to combine different CFL-reachability-based over-approximations. Our key insight is that the standard CFL-reachability algorithm can be slightly modified to trace the edges that contribute to the reachability results of \(C_1\), and \(C_2\)-reachability only need to consider contributing edges of \(C_1\), which can, in turn, trace the edges that contribute to \(C_2\)-reachability, etc. We prove that there exists a unique optimal refinement result (fix-point). Experimental results show that mutual refinement can achieve better precision than the straightforward intersection with reasonable extra cost. 
    more » « less
  6. Many problems in programming language theory and formal methods are undecidable, so they cannot be solved precisely. Practical techniques for dealing with undecidable problems are often based on decidable approximations. Undecidability implies that those approximations are always imprecise. Typically, practitioners use heuristics andad hocreasoning to identify imprecision issues and improve approximations, but there is a lack of computability-theoretic foundations about whether those efforts can succeed. This paper shows a surprising interplay between undecidability and decidable approximations: there exists a class of undecidable problems, such that it is computable to transform any decidable approximation to a witness input demonstrating its imprecision. We call those undecidable problemswitnessable problems. For example, if a program propertyPis witnessable, then there exists a computable functionfP, such thatfPtakes as input the code of any program analyzer targetingPand produces an input programwon which the program analyzer is imprecise. An even more surprising fact is that the class of witnessable problems includes almost all undecidable problems in programming language theory and formal methods. Specifically, we prove the diagonal halting problemKis witnessable, and the class of witnessable problems is closed under complements and many-one reductions. In particular, all “non-trivial semantic properties of programs” mentioned in Rice’s theorem are witnessable. We also explicitly construct a problem in the non-witnessable (and undecidable) class and show that both classes have cardinality 20. Our results offer a new perspective on the understanding of undecidability: for witnessable problems, although it is impossible to solve them precisely, it is always possible to improve any decidable approximation to make it closer to the precise solution. This fact formally demonstrates that research efforts on such approximations are promising and shows there exist universal ways to identify precision issues of program analyzers, program verifiers, SMT solvers, etc., because their essences are decidable approximations of witnessable problems. 
    more » « less