 Award ID(s):
 1917924
 NSFPAR ID:
 10345187
 Date Published:
 Journal Name:
 ACM Transactions on Programming Languages and Systems
 Volume:
 44
 Issue:
 2
 ISSN:
 01640925
 Page Range / eLocation ID:
 1 to 28
 Format(s):
 Medium: X
 Sponsoring Org:
 National Science Foundation
More Like this

Dyckreachability is a fundamental formulation for program analysis, which has been widely used to capture properlymatchedparenthesis program properties such as function calls/returns and field writes/reads. Bidirected Dyckreachability is a relaxation of Dyckreachability on bidirected graphs where each edge u → ( i v labeled by an open parenthesis “( i ” is accompanied with an inverse edge v → ) i u labeled by the corresponding close parenthesis “) i ”, and vice versa. In practice, many client analyses such as alias analysis adopt the bidirected Dyckreachability formulation. Bidirected Dyckreachability admits an optimal reachability algorithm. Specifically, given a graph with n nodes and m edges, the optimal bidirected Dyckreachability algorithm computes allpairs reachability information in O ( m ) time. This paper focuses on the dynamic version of bidirected Dyckreachability. In particular, we consider the problem of maintaining allpairs Dyckreachability information in bidirected graphs under a sequence of edge insertions and deletions. Dynamic bidirected Dyckreachability can formulate many program analysis problems in the presence of code changes. Unfortunately, solving dynamic graph reachability problems is challenging. For example, even for maintaining transitive closure, the fastest deterministic dynamic algorithm requires O ( n 2 ) update time to achieve O (1) query time. Allpairs Dyckreachability is a generalization of transitive closure. Despite extensive research on incremental computation, there is no algorithmic development on dynamic graph algorithms for program analysis with worstcase guarantees. Our work fills the gap and proposes the first dynamic algorithm for Dyck reachability on bidirected graphs. Our dynamic algorithms can handle each graph update ( i.e. , edge insertion and deletion) in O ( n ·α( n )) time and support any allpairs reachability query in O (1) time, where α( n ) is the inverse Ackermann function. We have implemented and evaluated our dynamic algorithm on an alias analysis and a contextsensitive datadependence analysis for Java. We compare our dynamic algorithms against a straightforward approach based on the O ( m )time optimal bidirected Dyckreachability algorithm and a recent incremental Datalog solver. Experimental results show that our algorithm achieves orders of magnitude speedup over both approaches.more » « less

null (Ed.)Many program analyses need to reason about pairs of matching actions, such as call/return, lock/unlock, or setfield/getfield. The family of Dyck languages { D k }, where D k has k kinds of parenthesis pairs, can be used to model matching actions as balanced parentheses. Consequently, many programanalysis problems can be formulated as Dyckreachability problems on edgelabeled digraphs. Interleaved Dyckreachability (InterDyckreachability), denoted by D k ⊙ D k reachability, is a natural extension of Dyckreachability that allows one to formulate programanalysis problems that involve multiple kinds of matchingaction pairs. Unfortunately, the general InterDyckreachability problem is undecidable. In this paper, we study variants of InterDyckreachability on bidirected graphs , where for each edge ⟨ p , q ⟩ labeled by an open parenthesis ”( a ”, there is an edge ⟨ q , p ⟩ labeled by the corresponding close parenthesis ”) a ”, and vice versa . Languagereachability on a bidirected graph has proven to be useful both (1) in its own right, as a way to formalize many programanalysis problems, such as pointer analysis, and (2) as a relaxation method that uses a fast algorithm to overapproximate languagereachability on a directed graph. However, unlike its directed counterpart, the complexity of bidirected InterDyckreachability still remains open. We establish the first decidable variant (i.e., D 1 ⊙ D 1 reachability) of bidirected InterDyckreachability. In D 1 ⊙ D 1 reachability, each of the two Dyck languages is restricted to have only a single kind of parenthesis pair. In particular, we show that the bidirected D 1 ⊙ D 1 problem is in PTIME. We also show that when one extends each Dyck language to involve k different kinds of parentheses (i.e., D k ⊙ D k reachability with k ≥ 2), the problem is NPhard (and therefore much harder). We have implemented the polynomialtime algorithm for bidirected D 1 ⊙ D 1 reachability. D k ⊙ D k reachability provides a new overapproximation method for bidirected D k ⊙ D k reachability in the sense that D k ⊙ D k reachability can first be relaxed to bidirected D 1 ⊙ D 1 reachability, and then the resulting bidirected D 1 ⊙ D 1 reachability problem is solved precisely. We compare this D 1 ⊙ D 1 reachabilitybased approach against another known overapproximating D k ⊙ D k reachability algorithm. Surprisingly, we found that the overapproximation approach based on bidirected D 1 ⊙ D 1 reachability computes more precise solutions, even though the D 1 ⊙ D 1 formalism is inherently less expressive than the D k ⊙ D k formalism.more » « less

An interleavedDyck (InterDyck) language consists of the interleaving of two or more Dyck languages, where each Dyck language represents a set of strings of balanced parentheses.InterDyckreachability is a fundamental framework for program analyzers that simultaneously track multiple properlymatched pairs of actions such as call/return, lock/unlock, or writedata/readdata.Existing InterDyckreachability algorithms are based on the wellknown tabulation technique.
This paper presents a new perspective on solving InterDyckreachability. Our key observation is that for the singlesourcesingletarget InterDyckreachability variant, it is feasible to summarize all paths from the source node to the target node based on
path expressions . Therefore, InterDyckreachability becomes an InterDyckpathrecognition problem over path expressions. Instead of computing summary edges as in traditional tabulation algorithms, this new perspective enables us to express InterDyckreachability as aparenthesiscounting problem, which can be naturally formulated via integer linear programming (ILP).We implemented our ILPbased algorithm and performed extensive evaluations based on two client analyses (a reachability analysis for concurrent programs and a taint analysis). In particular, we evaluated our algorithm against two types of algorithms: (1) the general allpairs InterDyckreachability algorithms based on linear conjunctive language (LCL) reachability and synchronized pushdown system (SPDS) reachability, and (2) two domainspecific algorithms for both client analyses. The experimental results are encouraging. Our algorithm achieves 1.42×, 28.24×, and 11.76× speedup for the concurrencyanalysis benchmarks compared to allpair LCLreachability, SPDSreachability, and domainspecific tools, respectively; 1.2×, 69.9×, and 0.98× speedup for the taintanalysis benchmarks. Moreover, the algorithm also provides precision improvements, particularly for taint analysis, where it achieves 4.55%, 11.1%, and 6.8% improvement, respectively.

Contextfree language reachability (CFLreachability) is a fundamental framework for program analysis. A large variety of static analyses can be formulated as CFLreachability problems, which determines whether specific sourcesink pairs in an edgelabeled graph are connected by a reachable path, i.e., a path whose edge labels form a string accepted by the given CFL. Computing CFLreachability is expensive. The fastest algorithm exhibits a slightly subcubic time complexity with respect to the input graph size. Improving the scalability of CFLreachability is of practical interest, but reducing the time complexity is inherently difficult. In this paper, we focus on improving the scalability of CFLreachability from a more practical perspectivereducing the input graph size. Our idea arises from the existence of trivial edges, i.e., edges that do not affect any reachable path in CFLreachability. We observe that two nodes joined by trivial edges can be foldedby merging the two nodes with all the edges joining them removedwithout affecting the CFLreachability result. By studying the characteristic of the recursive state machines (RSMs), an alternative form of CFLs, we propose an approach to identify foldable node pairs without the need to verify the underlying reachable paths (which is equivalent to solving the CFLreachability problem). In particular, given a CFLreachability problem instance with an input graph G and an RSM, based on the correspondence between paths in G and state transitions in RSM, we propose a graph folding principle, which can determine whether two adjacent nodes are foldable by examining only their incoming and outgoing edges. On top of the graph folding principle, we propose an efficient graph folding algorithm GF. The time complexity of GF is linear with respect to the number of nodes in the input graph. Our evaluations on two clients (alias analysis and valueflow analysis) show that GF significantly accelerates RSM/CFLreachability by reducing the input graph size. On average, for valueflow analysis, GF reduces 60.96% of nodes and 42.67% of edges of the input graphs, obtaining a speedup of 4.65× and a memory usage reduction of 57.35%. For alias analysis, GF reduces 38.93% of nodes and 35.61% of edges of the input graphs, obtaining a speedup of 3.21× and a memory usage reduction of 65.19%.more » « less

Hermenegildo, Manuel ; Morales, José (Ed.)Contextfree language reachability is an important program analysis framework, but the exact analysis problems can be intractable or undecidable, where CFLreachability approximates such problems. For the same problem, there could be many overapproximations based on different CFLs \(C_1,\ldots ,C_n\). Suppose the reachability result of each \(C_i\) produces a set \(P_i\) of reachable vertex pairs. Is it possible to achieve better precision than the straightforward intersection \(\bigcap _{i=1}^n P_i\)? This paper gives an affirmative answer: although CFLs are not closed under intersections, in CFLreachability we can “intersect” graphs. Specifically, we propose mutual refinement to combine different CFLreachabilitybased overapproximations. Our key insight is that the standard CFLreachability algorithm can be slightly modified to trace the edges that contribute to the reachability results of \(C_1\), and \(C_2\)reachability only need to consider contributing edges of \(C_1\), which can, in turn, trace the edges that contribute to \(C_2\)reachability, etc. We prove that there exists a unique optimal refinement result (fixpoint). Experimental results show that mutual refinement can achieve better precision than the straightforward intersection with reasonable extra cost.more » « less