An interleavedDyck (InterDyck) language consists of the interleaving of two or more Dyck languages, where each Dyck language represents a set of strings of balanced parentheses.InterDyckreachability is a fundamental framework for program analyzers that simultaneously track multiple properlymatched pairs of actions such as call/return, lock/unlock, or writedata/readdata.Existing InterDyckreachability algorithms are based on the wellknown tabulation technique.
This paper presents a new perspective on solving InterDyckreachability. Our key observation is that for the singlesourcesingletarget InterDyckreachability variant, it is feasible to summarize all paths from the source node to the target node based on
We implemented our ILPbased algorithm and performed extensive evaluations based on two client analyses (a reachability analysis for concurrent programs and a taint analysis). In particular, we evaluated our algorithm against two types of algorithms: (1) the general allpairs InterDyckreachability algorithms based on linear conjunctive language (LCL) reachability and synchronized pushdown system (SPDS) reachability, and (2) two domainspecific algorithms for both client analyses. The experimental results are encouraging. Our algorithm achieves 1.42×, 28.24×, and 11.76× speedup for the concurrencyanalysis benchmarks compared to allpair LCLreachability, SPDSreachability, and domainspecific tools, respectively; 1.2×, 69.9×, and 0.98× speedup for the taintanalysis benchmarks. Moreover, the algorithm also provides precision improvements, particularly for taint analysis, where it achieves 4.55%, 11.1%, and 6.8% improvement, respectively.
more » « less Award ID(s):
 1917924
 NSFPAR ID:
 10472621
 Publisher / Repository:
 ACM
 Date Published:
 Journal Name:
 Proceedings of the ACM on Programming Languages
 Volume:
 7
 Issue:
 POPL
 ISSN:
 24751421
 Page Range / eLocation ID:
 1003 to 1026
 Format(s):
 Medium: X
 Sponsoring Org:
 National Science Foundation
More Like this

Dyckreachability is a fundamental formulation for program analysis, which has been widely used to capture properlymatchedparenthesis program properties such as function calls/returns and field writes/reads. Bidirected Dyckreachability is a relaxation of Dyckreachability on bidirected graphs where each edge u → ( i v labeled by an open parenthesis “( i ” is accompanied with an inverse edge v → ) i u labeled by the corresponding close parenthesis “) i ”, and vice versa. In practice, many client analyses such as alias analysis adopt the bidirected Dyckreachability formulation. Bidirected Dyckreachability admits an optimal reachability algorithm. Specifically, given a graph with n nodes and m edges, the optimal bidirected Dyckreachability algorithm computes allpairs reachability information in O ( m ) time. This paper focuses on the dynamic version of bidirected Dyckreachability. In particular, we consider the problem of maintaining allpairs Dyckreachability information in bidirected graphs under a sequence of edge insertions and deletions. Dynamic bidirected Dyckreachability can formulate many program analysis problems in the presence of code changes. Unfortunately, solving dynamic graph reachability problems is challenging. For example, even for maintaining transitive closure, the fastest deterministic dynamic algorithm requires O ( n 2 ) update time to achieve O (1) query time. Allpairs Dyckreachability is a generalization of transitive closure. Despite extensive research on incremental computation, there is no algorithmic development on dynamic graph algorithms for program analysis with worstcase guarantees. Our work fills the gap and proposes the first dynamic algorithm for Dyck reachability on bidirected graphs. Our dynamic algorithms can handle each graph update ( i.e. , edge insertion and deletion) in O ( n ·α( n )) time and support any allpairs reachability query in O (1) time, where α( n ) is the inverse Ackermann function. We have implemented and evaluated our dynamic algorithm on an alias analysis and a contextsensitive datadependence analysis for Java. We compare our dynamic algorithms against a straightforward approach based on the O ( m )time optimal bidirected Dyckreachability algorithm and a recent incremental Datalog solver. Experimental results show that our algorithm achieves orders of magnitude speedup over both approaches.more » « less

Many programanalysis problems can be formulated as graphreachability problems. Interleaved Dyck language reachability ( InterDyck reachability) is a fundamental framework to express a wide variety of programanalysis problems over edgelabeled graphs. The InterDyck language represents an intersection of multiple matchedparenthesis languages (i.e., Dyck languages). In practice, program analyses typically leverage one Dyck language to achieve contextsensitivity, and other Dyck languages to model data dependencies, such as fieldsensitivity and pointer references/dereferences. In the ideal case, an InterDyck reachability framework should model multiple Dyck languages simultaneously . Unfortunately, precise InterDyck reachability is undecidable. Any practical solution must overapproximate the exact answer. In the literature, a lot of work has been proposed to overapproximate the InterDyck reachability formulation. This article offers a new perspective on improving both the precision and the scalability of InterDyck reachability: we aim at simplifying the underlying input graph G . Our key insight is based on the observation that if an edge is not contributing to any InterDyck paths, we can safely eliminate it from G . Our technique is orthogonal to the InterDyck reachability formulation and can serve as a preprocessing step with any overapproximating approach for InterDyck reachability. We have applied our graph simplification algorithm to preprocessing the graphs from a recent InterDyck reachabilitybased taint analysis for Android. Our evaluation of three popular InterDyck reachability algorithms yields promising results. In particular, our graphsimplification method improves both the scalability and precision of all three InterDyck reachability algorithms, sometimes dramatically.more » « less

null (Ed.)Many program analyses need to reason about pairs of matching actions, such as call/return, lock/unlock, or setfield/getfield. The family of Dyck languages { D k }, where D k has k kinds of parenthesis pairs, can be used to model matching actions as balanced parentheses. Consequently, many programanalysis problems can be formulated as Dyckreachability problems on edgelabeled digraphs. Interleaved Dyckreachability (InterDyckreachability), denoted by D k ⊙ D k reachability, is a natural extension of Dyckreachability that allows one to formulate programanalysis problems that involve multiple kinds of matchingaction pairs. Unfortunately, the general InterDyckreachability problem is undecidable. In this paper, we study variants of InterDyckreachability on bidirected graphs , where for each edge ⟨ p , q ⟩ labeled by an open parenthesis ”( a ”, there is an edge ⟨ q , p ⟩ labeled by the corresponding close parenthesis ”) a ”, and vice versa . Languagereachability on a bidirected graph has proven to be useful both (1) in its own right, as a way to formalize many programanalysis problems, such as pointer analysis, and (2) as a relaxation method that uses a fast algorithm to overapproximate languagereachability on a directed graph. However, unlike its directed counterpart, the complexity of bidirected InterDyckreachability still remains open. We establish the first decidable variant (i.e., D 1 ⊙ D 1 reachability) of bidirected InterDyckreachability. In D 1 ⊙ D 1 reachability, each of the two Dyck languages is restricted to have only a single kind of parenthesis pair. In particular, we show that the bidirected D 1 ⊙ D 1 problem is in PTIME. We also show that when one extends each Dyck language to involve k different kinds of parentheses (i.e., D k ⊙ D k reachability with k ≥ 2), the problem is NPhard (and therefore much harder). We have implemented the polynomialtime algorithm for bidirected D 1 ⊙ D 1 reachability. D k ⊙ D k reachability provides a new overapproximation method for bidirected D k ⊙ D k reachability in the sense that D k ⊙ D k reachability can first be relaxed to bidirected D 1 ⊙ D 1 reachability, and then the resulting bidirected D 1 ⊙ D 1 reachability problem is solved precisely. We compare this D 1 ⊙ D 1 reachabilitybased approach against another known overapproximating D k ⊙ D k reachability algorithm. Surprisingly, we found that the overapproximation approach based on bidirected D 1 ⊙ D 1 reachability computes more precise solutions, even though the D 1 ⊙ D 1 formalism is inherently less expressive than the D k ⊙ D k formalism.more » « less

Contextfree language reachability (CFLreachability) is a fundamental framework for program analysis. A large variety of static analyses can be formulated as CFLreachability problems, which determines whether specific sourcesink pairs in an edgelabeled graph are connected by a reachable path, i.e., a path whose edge labels form a string accepted by the given CFL. Computing CFLreachability is expensive. The fastest algorithm exhibits a slightly subcubic time complexity with respect to the input graph size. Improving the scalability of CFLreachability is of practical interest, but reducing the time complexity is inherently difficult. In this paper, we focus on improving the scalability of CFLreachability from a more practical perspectivereducing the input graph size. Our idea arises from the existence of trivial edges, i.e., edges that do not affect any reachable path in CFLreachability. We observe that two nodes joined by trivial edges can be foldedby merging the two nodes with all the edges joining them removedwithout affecting the CFLreachability result. By studying the characteristic of the recursive state machines (RSMs), an alternative form of CFLs, we propose an approach to identify foldable node pairs without the need to verify the underlying reachable paths (which is equivalent to solving the CFLreachability problem). In particular, given a CFLreachability problem instance with an input graph G and an RSM, based on the correspondence between paths in G and state transitions in RSM, we propose a graph folding principle, which can determine whether two adjacent nodes are foldable by examining only their incoming and outgoing edges. On top of the graph folding principle, we propose an efficient graph folding algorithm GF. The time complexity of GF is linear with respect to the number of nodes in the input graph. Our evaluations on two clients (alias analysis and valueflow analysis) show that GF significantly accelerates RSM/CFLreachability by reducing the input graph size. On average, for valueflow analysis, GF reduces 60.96% of nodes and 42.67% of edges of the input graphs, obtaining a speedup of 4.65× and a memory usage reduction of 57.35%. For alias analysis, GF reduces 38.93% of nodes and 35.61% of edges of the input graphs, obtaining a speedup of 3.21× and a memory usage reduction of 65.19%.more » « less

Counting the frequency of subgraphs in large networks is a classic research question that reveals the underlying substructures of these networks for important applications. However, subgraph counting is a challenging problem, even for subgraph sizes as small as five, due to the combinatorial explosion in the number of possible occurrences. This article focuses on the fivecycle, which is an important special case of fivevertex subgraph counting and one of the most difficult to count efficiently. We design two new parallel fivecycle counting algorithms and prove that they are work efficient and achieve polylogarithmic span. Both algorithms are based on computing low outdegree orientations, which enables the efficient computation of directed twopaths and threepaths, and the algorithms differ in the ways in which they use this orientation to eliminate doublecounting. Additionally, we present new parallel algorithms for obtaining unbiased estimates of fivecycle counts using graph sparsification. We develop fast multicore implementations of the algorithms and propose a work scheduling optimization to improve their performance. Our experiments on a variety of realworld graphs using a 36core machine with twoway hyperthreading show that our best exact parallel algorithm achieves 10–46× selfrelative speedup, outperforms our serial benchmarks by 10–32×, and outperforms the previous stateoftheart serial algorithm by up to 818×. Our best approximate algorithm, for a reasonable probability parameter, achieves up to 20× selfrelative speedup and is able to approximate fivecycle counts 9–189× faster than our best exact algorithm, with between 0.52% and 11.77% error.more » « less