skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Recursive State Machine Guided Graph Folding for Context-Free Language Reachability
Context-free language reachability (CFL-reachability) is a fundamental framework for program analysis. A large variety of static analyses can be formulated as CFL-reachability problems, which determines whether specific source-sink pairs in an edge-labeled graph are connected by a reachable path, i.e., a path whose edge labels form a string accepted by the given CFL. Computing CFL-reachability is expensive. The fastest algorithm exhibits a slightly subcubic time complexity with respect to the input graph size. Improving the scalability of CFL-reachability is of practical interest, but reducing the time complexity is inherently difficult. In this paper, we focus on improving the scalability of CFL-reachability from a more practical perspective---reducing the input graph size. Our idea arises from the existence of trivial edges, i.e., edges that do not affect any reachable path in CFL-reachability. We observe that two nodes joined by trivial edges can be folded---by merging the two nodes with all the edges joining them removed---without affecting the CFL-reachability result. By studying the characteristic of the recursive state machines (RSMs), an alternative form of CFLs, we propose an approach to identify foldable node pairs without the need to verify the underlying reachable paths (which is equivalent to solving the CFL-reachability problem). In particular, given a CFL-reachability problem instance with an input graph G and an RSM, based on the correspondence between paths in G and state transitions in RSM, we propose a graph folding principle, which can determine whether two adjacent nodes are foldable by examining only their incoming and outgoing edges. On top of the graph folding principle, we propose an efficient graph folding algorithm GF. The time complexity of GF is linear with respect to the number of nodes in the input graph. Our evaluations on two clients (alias analysis and value-flow analysis) show that GF significantly accelerates RSM/CFL-reachability by reducing the input graph size. On average, for value-flow analysis, GF reduces 60.96% of nodes and 42.67% of edges of the input graphs, obtaining a speedup of 4.65× and a memory usage reduction of 57.35%. For alias analysis, GF reduces 38.93% of nodes and 35.61% of edges of the input graphs, obtaining a speedup of 3.21× and a memory usage reduction of 65.19%.  more » « less
Award ID(s):
2114627 1917924
PAR ID:
10602297
Author(s) / Creator(s):
 ;  ;  ;  
Publisher / Repository:
Association for Computing Machinery (ACM)
Date Published:
Journal Name:
Proceedings of the ACM on Programming Languages
Volume:
7
Issue:
PLDI
ISSN:
2475-1421
Format(s):
Medium: X Size: p. 318-342
Size(s):
p. 318-342
Sponsoring Org:
National Science Foundation
More Like this
  1. Context-free language reachability (CFL-reachability) is a prominent model for formulating program analysis problems. Almost all CFL-reachability algorithms are based on the Reps-Horwitz-Sagiv (RHS) tabulation. In essence, the RHS tabulation, based on normalized context-free grammars, is similar to the CYK algorithm for CFL-parsing. Consider a normalized rule S ::= A B and a CFL-reachability problem instance of computing S-edges in the input graph. The RHS tabulation obtains all summary edges (i.e., S-, A-, and B-edges) based on the grammar rules. However, many A- and B-edges are wasted because only a subset of those edges eventually contributes to generating S-edges in the input graph. This paper proposes a new tabulation strategy for speeding up CFL-reachability by eliminating wasted and unnecessary summary edges. We particularly focus on recursive nonterminals. Our key technical insight is that the wasted edge generations and insertions caused by recursive nonterminals can be avoided by modifying the parse trees either statically (by transforming the grammar) or dynamically (using a specialized online CFL-reachability solver). For example, if a recursive nonterminal B, generated by a rule B ::= B X, appears on the right-hand side of a rule S ::= A B, we can make S recursive (by introducing a new rule S ::= S X) and eliminate the original recursive rule (B ::= B X). Due to the rule S ::= S X, the shapes of the parse trees associated with the left-hand-side nonterminal S become more skewed. Thus, we name our approach skewed tabulation for CFL-reachability. Skewed tabulation can significantly improve the scalability of CFL-reachability by reducing wasted and unnecessary summary edges. We have implemented skewed tabulation and applied the corresponding CFL-reachability algorithm to an alias analysis, a value-flow analysis, and a taint analysis. Our extensive evaluation based on SPEC 2017 benchmarks yields promising results. For the three client analyses, CFL-reachability based on skewed tabulation can achieve 3.34×, 1.13× and 2.05× speedup over the state-of-the-art RHS-tabulation-based CFL-reachability solver and consume 60.05%, 20.38% and 63.06% less memory, respectively. Furthermore, the cost of grammar transformation for skewed tabulation is negligible, typically taking less than one second. 
    more » « less
  2. Hermenegildo, Manuel; Morales, José (Ed.)
    Context-free language reachability is an important program analysis framework, but the exact analysis problems can be intractable or undecidable, where CFL-reachability approximates such problems. For the same problem, there could be many over-approximations based on different CFLs \(C_1,\ldots ,C_n\). Suppose the reachability result of each \(C_i\) produces a set \(P_i\) of reachable vertex pairs. Is it possible to achieve better precision than the straightforward intersection \(\bigcap _{i=1}^n P_i\)? This paper gives an affirmative answer: although CFLs are not closed under intersections, in CFL-reachability we can “intersect” graphs. Specifically, we propose mutual refinement to combine different CFL-reachability-based over-approximations. Our key insight is that the standard CFL-reachability algorithm can be slightly modified to trace the edges that contribute to the reachability results of \(C_1\), and \(C_2\)-reachability only need to consider contributing edges of \(C_1\), which can, in turn, trace the edges that contribute to \(C_2\)-reachability, etc. We prove that there exists a unique optimal refinement result (fix-point). Experimental results show that mutual refinement can achieve better precision than the straightforward intersection with reasonable extra cost. 
    more » « less
  3. Dyck-reachability is a fundamental formulation for program analysis, which has been widely used to capture properly-matched-parenthesis program properties such as function calls/returns and field writes/reads. Bidirected Dyck-reachability is a relaxation of Dyck-reachability on bidirected graphs where each edge u → ( i v labeled by an open parenthesis “( i ” is accompanied with an inverse edge v → ) i u labeled by the corresponding close parenthesis “) i ”, and vice versa. In practice, many client analyses such as alias analysis adopt the bidirected Dyck-reachability formulation. Bidirected Dyck-reachability admits an optimal reachability algorithm. Specifically, given a graph with n nodes and m edges, the optimal bidirected Dyck-reachability algorithm computes all-pairs reachability information in O ( m ) time. This paper focuses on the dynamic version of bidirected Dyck-reachability. In particular, we consider the problem of maintaining all-pairs Dyck-reachability information in bidirected graphs under a sequence of edge insertions and deletions. Dynamic bidirected Dyck-reachability can formulate many program analysis problems in the presence of code changes. Unfortunately, solving dynamic graph reachability problems is challenging. For example, even for maintaining transitive closure, the fastest deterministic dynamic algorithm requires O ( n 2 ) update time to achieve O (1) query time. All-pairs Dyck-reachability is a generalization of transitive closure. Despite extensive research on incremental computation, there is no algorithmic development on dynamic graph algorithms for program analysis with worst-case guarantees. Our work fills the gap and proposes the first dynamic algorithm for Dyck reachability on bidirected graphs. Our dynamic algorithms can handle each graph update ( i.e. , edge insertion and deletion) in O ( n ·α( n )) time and support any all-pairs reachability query in O (1) time, where α( n ) is the inverse Ackermann function. We have implemented and evaluated our dynamic algorithm on an alias analysis and a context-sensitive data-dependence analysis for Java. We compare our dynamic algorithms against a straightforward approach based on the O ( m )-time optimal bidirected Dyck-reachability algorithm and a recent incremental Datalog solver. Experimental results show that our algorithm achieves orders of magnitude speedup over both approaches. 
    more » « less
  4. Gomes, L.; Lorenz, R. (Ed.)
    Efficient manipulation of binary or multi-valueddecision diagrams (BDDs or MDDs) is critical in symbolic verification tools. Despite the applicability of MDDs to real-world tasks such as discovering the reachable states of a model, their large demands on hardware resources, especially memory, limit algorithmic scalability. In this paper, we focus on memory-constrained algorithms that employ a novel O(m log n)-time under-approximation technique for MDDs, where m and n are the number of MDD edges and nodes, respectively. The effectiveness of our approach is demonstrated experimentally by a reduction in peak memory usage for the symbolic reachability computation of a set of Petri nets. 
    more » « less
  5. The transitive closure of a graph is a new graph where every vertex is directly connected to all vertices to which it had a path in the original graph. Transitive closures are useful for reachability and relationship querying. Finding the transitive closure can be computationally expensive and requires a large memory footprint as the output is typically larger than the input. Some of the original research on transitive closures assumed that graphs were dense and used dense adjacency matrices. We have since learned that many real-world networks are extremely sparse, and the existing methods do not scale. In this work, we introduce a new algorithm called Anti-section Transitive Closure (ATC) for finding the transitive closure of a graph. We present a new parallel edges operation – anti-sections – for finding new edges to reachable vertices. ATC scales to massively multithreaded systems such as NVIDIA’s GPU with tens of thousands of threads. We show that the anti-section operation shares some traits with the triangle counting intersection operation in graph analysis. Lastly, we view the transitive closure problem as a dynamic graph problem requiring edge insertions. By doing this, our memory footprint is smaller. We also show a method for creating the batches in parallel using two different techniques: dual-round and hash. Using these techniques and the Hornet dynamic graph data structure, we show our new algorithm on an NVIDIA Titan V GPU. We compare with other packages such as NetworkX, SEI-GBTL, SuiteSparse, and cuSparse. 
    more » « less