skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Search for: All records

Creators/Authors contains: "Zhang, Qirun"

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. SMT solvers are foundational tools for reasoning about constraints in practical problems both within and outside program analysis. Faster SMT solving improves the performance of practical tools and expands the set of tractable problems. Existing approaches to improving solver performance either focus on general algorithms applied below the level of individual theories, or focus on optimizations within a single theory. Unbounded constraints in which the number of possible variable values is infinite, such as real numbers and integers, pose a particularly difficult challenge for solvers. Bounded constraints in which the set of possible values is finite, such as bitvectors and floating-point numbers, on the other hand, are decidable and have been the subject of extensive performance improvement efforts. This paper introduces a theory arbitrage: we transform unbounded constraints, which are often expensive to solve, into bounded constraints, which are typically cheaper to solve. By converting unbounded problems into bounded ones, theory arbitrage takes advantage of better performance on bounded constraints and unlocks optimization techniques that only apply to bounded theories. The transformation is achieved by harnessing a novel abstract interpretation strategy to infer bounds. The bounded transformed constraint is then an underapproximation of the semantics of the unbounded original. We realize our method for the theories of integers and real numbers with a practical tool (STAUB). Our evaluation demonstrates that theory arbitrage alone can speed up individual constraints by orders of magnitude and achieve up to a 1.4× speedup on average across nonlinear integer benchmarks. Furthermore, it enables the use of the recent compiler optimization-based technique SLOT for unbounded SMT theories, unlocking a further speedup of up to 3×. Finally, we incorporate STAUB into a practical termination proving tool and observe an overall 9% improvement in performance. 
    more » « less
    Free, publicly-accessible full text available June 20, 2025
  2. Hermenegildo, Manuel; Morales, José (Ed.)
    Context-free language reachability is an important program analysis framework, but the exact analysis problems can be intractable or undecidable, where CFL-reachability approximates such problems. For the same problem, there could be many over-approximations based on different CFLs \(C_1,\ldots ,C_n\). Suppose the reachability result of each \(C_i\) produces a set \(P_i\) of reachable vertex pairs. Is it possible to achieve better precision than the straightforward intersection \(\bigcap _{i=1}^n P_i\)? This paper gives an affirmative answer: although CFLs are not closed under intersections, in CFL-reachability we can “intersect” graphs. Specifically, we propose mutual refinement to combine different CFL-reachability-based over-approximations. Our key insight is that the standard CFL-reachability algorithm can be slightly modified to trace the edges that contribute to the reachability results of \(C_1\), and \(C_2\)-reachability only need to consider contributing edges of \(C_1\), which can, in turn, trace the edges that contribute to \(C_2\)-reachability, etc. We prove that there exists a unique optimal refinement result (fix-point). Experimental results show that mutual refinement can achieve better precision than the straightforward intersection with reasonable extra cost. 
    more » « less
  3. Context-free language reachability (CFL-reachability) is a fundamental framework for program analysis. A large variety of static analyses can be formulated as CFL-reachability problems, which determines whether specific source-sink pairs in an edge-labeled graph are connected by a reachable path, i.e., a path whose edge labels form a string accepted by the given CFL. Computing CFL-reachability is expensive. The fastest algorithm exhibits a slightly subcubic time complexity with respect to the input graph size. Improving the scalability of CFL-reachability is of practical interest, but reducing the time complexity is inherently difficult. In this paper, we focus on improving the scalability of CFL-reachability from a more practical perspective---reducing the input graph size. Our idea arises from the existence of trivial edges, i.e., edges that do not affect any reachable path in CFL-reachability. We observe that two nodes joined by trivial edges can be folded---by merging the two nodes with all the edges joining them removed---without affecting the CFL-reachability result. By studying the characteristic of the recursive state machines (RSMs), an alternative form of CFLs, we propose an approach to identify foldable node pairs without the need to verify the underlying reachable paths (which is equivalent to solving the CFL-reachability problem). In particular, given a CFL-reachability problem instance with an input graph G and an RSM, based on the correspondence between paths in G and state transitions in RSM, we propose a graph folding principle, which can determine whether two adjacent nodes are foldable by examining only their incoming and outgoing edges. On top of the graph folding principle, we propose an efficient graph folding algorithm GF. The time complexity of GF is linear with respect to the number of nodes in the input graph. Our evaluations on two clients (alias analysis and value-flow analysis) show that GF significantly accelerates RSM/CFL-reachability by reducing the input graph size. On average, for value-flow analysis, GF reduces 60.96% of nodes and 42.67% of edges of the input graphs, obtaining a speedup of 4.65× and a memory usage reduction of 57.35%. For alias analysis, GF reduces 38.93% of nodes and 35.61% of edges of the input graphs, obtaining a speedup of 3.21× and a memory usage reduction of 65.19%. 
    more » « less
  4. Many problems in programming language theory and formal methods are undecidable, so they cannot be solved precisely. Practical techniques for dealing with undecidable problems are often based on decidable approximations. Undecidability implies that those approximations are always imprecise. Typically, practitioners use heuristics andad hocreasoning to identify imprecision issues and improve approximations, but there is a lack of computability-theoretic foundations about whether those efforts can succeed. This paper shows a surprising interplay between undecidability and decidable approximations: there exists a class of undecidable problems, such that it is computable to transform any decidable approximation to a witness input demonstrating its imprecision. We call those undecidable problemswitnessable problems. For example, if a program propertyPis witnessable, then there exists a computable functionfP, such thatfPtakes as input the code of any program analyzer targetingPand produces an input programwon which the program analyzer is imprecise. An even more surprising fact is that the class of witnessable problems includes almost all undecidable problems in programming language theory and formal methods. Specifically, we prove the diagonal halting problemKis witnessable, and the class of witnessable problems is closed under complements and many-one reductions. In particular, all “non-trivial semantic properties of programs” mentioned in Rice’s theorem are witnessable. We also explicitly construct a problem in the non-witnessable (and undecidable) class and show that both classes have cardinality 20. Our results offer a new perspective on the understanding of undecidability: for witnessable problems, although it is impossible to solve them precisely, it is always possible to improve any decidable approximation to make it closer to the precise solution. This fact formally demonstrates that research efforts on such approximations are promising and shows there exist universal ways to identify precision issues of program analyzers, program verifiers, SMT solvers, etc., because their essences are decidable approximations of witnessable problems. 
    more » « less
  5. An interleaved-Dyck (InterDyck) language consists of the interleaving of two or more Dyck languages, where each Dyck language represents a set of strings of balanced parentheses.InterDyck-reachability is a fundamental framework for program analyzers that simultaneously track multiple properly-matched pairs of actions such as call/return, lock/unlock, or write-data/read-data.Existing InterDyck-reachability algorithms are based on the well-known tabulation technique. This paper presents a new perspective on solving InterDyck-reachability. Our key observation is that for the single-source-single-target InterDyck-reachability variant, it is feasible to summarize all paths from the source node to the target node based onpath expressions. Therefore, InterDyck-reachability becomes an InterDyck-path-recognition problem over path expressions. Instead of computing summary edges as in traditional tabulation algorithms, this new perspective enables us to express InterDyck-reachability as aparenthesis-countingproblem, which can be naturally formulated via integer linear programming (ILP). We implemented our ILP-based algorithm and performed extensive evaluations based on two client analyses (a reachability analysis for concurrent programs and a taint analysis). In particular, we evaluated our algorithm against two types of algorithms: (1) the general all-pairs InterDyck-reachability algorithms based on linear conjunctive language (LCL) reachability and synchronized pushdown system (SPDS) reachability, and (2) two domain-specific algorithms for both client analyses. The experimental results are encouraging. Our algorithm achieves 1.42×, 28.24×, and 11.76× speedup for the concurrency-analysis benchmarks compared to all-pair LCL-reachability, SPDS-reachability, and domain-specific tools, respectively; 1.2×, 69.9×, and 0.98× speedup for the taint-analysis benchmarks. Moreover, the algorithm also provides precision improvements, particularly for taint analysis, where it achieves 4.55%, 11.1%, and 6.8% improvement, respectively. 
    more » « less
  6. Many program-analysis problems can be formulated as graph-reachability problems. Interleaved Dyck language reachability ( InterDyck -reachability) is a fundamental framework to express a wide variety of program-analysis problems over edge-labeled graphs. The InterDyck language represents an intersection of multiple matched-parenthesis languages (i.e., Dyck languages). In practice, program analyses typically leverage one Dyck language to achieve context-sensitivity, and other Dyck languages to model data dependencies, such as field-sensitivity and pointer references/dereferences. In the ideal case, an InterDyck -reachability framework should model multiple Dyck languages simultaneously . Unfortunately, precise InterDyck -reachability is undecidable. Any practical solution must over-approximate the exact answer. In the literature, a lot of work has been proposed to over-approximate the InterDyck -reachability formulation. This article offers a new perspective on improving both the precision and the scalability of InterDyck -reachability: we aim at simplifying the underlying input graph G . Our key insight is based on the observation that if an edge is not contributing to any InterDyck -paths, we can safely eliminate it from G . Our technique is orthogonal to the InterDyck -reachability formulation and can serve as a pre-processing step with any over-approximating approach for InterDyck -reachability. We have applied our graph simplification algorithm to pre-processing the graphs from a recent InterDyck -reachability-based taint analysis for Android. Our evaluation of three popular InterDyck -reachability algorithms yields promising results. In particular, our graph-simplification method improves both the scalability and precision of all three InterDyck -reachability algorithms, sometimes dramatically. 
    more » « less
  7. Dyck-reachability is a fundamental formulation for program analysis, which has been widely used to capture properly-matched-parenthesis program properties such as function calls/returns and field writes/reads. Bidirected Dyck-reachability is a relaxation of Dyck-reachability on bidirected graphs where each edge u → ( i v labeled by an open parenthesis “( i ” is accompanied with an inverse edge v → ) i u labeled by the corresponding close parenthesis “) i ”, and vice versa. In practice, many client analyses such as alias analysis adopt the bidirected Dyck-reachability formulation. Bidirected Dyck-reachability admits an optimal reachability algorithm. Specifically, given a graph with n nodes and m edges, the optimal bidirected Dyck-reachability algorithm computes all-pairs reachability information in O ( m ) time. This paper focuses on the dynamic version of bidirected Dyck-reachability. In particular, we consider the problem of maintaining all-pairs Dyck-reachability information in bidirected graphs under a sequence of edge insertions and deletions. Dynamic bidirected Dyck-reachability can formulate many program analysis problems in the presence of code changes. Unfortunately, solving dynamic graph reachability problems is challenging. For example, even for maintaining transitive closure, the fastest deterministic dynamic algorithm requires O ( n 2 ) update time to achieve O (1) query time. All-pairs Dyck-reachability is a generalization of transitive closure. Despite extensive research on incremental computation, there is no algorithmic development on dynamic graph algorithms for program analysis with worst-case guarantees. Our work fills the gap and proposes the first dynamic algorithm for Dyck reachability on bidirected graphs. Our dynamic algorithms can handle each graph update ( i.e. , edge insertion and deletion) in O ( n ·α( n )) time and support any all-pairs reachability query in O (1) time, where α( n ) is the inverse Ackermann function. We have implemented and evaluated our dynamic algorithm on an alias analysis and a context-sensitive data-dependence analysis for Java. We compare our dynamic algorithms against a straightforward approach based on the O ( m )-time optimal bidirected Dyck-reachability algorithm and a recent incremental Datalog solver. Experimental results show that our algorithm achieves orders of magnitude speedup over both approaches. 
    more » « less