skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Mutual Refinements of Context-Free Language Reachability
Context-free language reachability is an important program analysis framework, but the exact analysis problems can be intractable or undecidable, where CFL-reachability approximates such problems. For the same problem, there could be many over-approximations based on different CFLs \(C_1,\ldots ,C_n\). Suppose the reachability result of each \(C_i\) produces a set \(P_i\) of reachable vertex pairs. Is it possible to achieve better precision than the straightforward intersection \(\bigcap _{i=1}^n P_i\)? This paper gives an affirmative answer: although CFLs are not closed under intersections, in CFL-reachability we can “intersect” graphs. Specifically, we propose mutual refinement to combine different CFL-reachability-based over-approximations. Our key insight is that the standard CFL-reachability algorithm can be slightly modified to trace the edges that contribute to the reachability results of \(C_1\), and \(C_2\)-reachability only need to consider contributing edges of \(C_1\), which can, in turn, trace the edges that contribute to \(C_2\)-reachability, etc. We prove that there exists a unique optimal refinement result (fix-point). Experimental results show that mutual refinement can achieve better precision than the straightforward intersection with reasonable extra cost.  more » « less
Award ID(s):
2237440
PAR ID:
10507107
Author(s) / Creator(s):
;
Editor(s):
Hermenegildo, Manuel; Morales, José
Publisher / Repository:
Springer
Date Published:
Journal Name:
International Static Analysis Symposium
Volume:
14284
ISBN:
978-3-031-44244-5
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Context-free language reachability (CFL-reachability) is a fundamental framework for program analysis. A large variety of static analyses can be formulated as CFL-reachability problems, which determines whether specific source-sink pairs in an edge-labeled graph are connected by a reachable path, i.e., a path whose edge labels form a string accepted by the given CFL. Computing CFL-reachability is expensive. The fastest algorithm exhibits a slightly subcubic time complexity with respect to the input graph size. Improving the scalability of CFL-reachability is of practical interest, but reducing the time complexity is inherently difficult. In this paper, we focus on improving the scalability of CFL-reachability from a more practical perspective---reducing the input graph size. Our idea arises from the existence of trivial edges, i.e., edges that do not affect any reachable path in CFL-reachability. We observe that two nodes joined by trivial edges can be folded---by merging the two nodes with all the edges joining them removed---without affecting the CFL-reachability result. By studying the characteristic of the recursive state machines (RSMs), an alternative form of CFLs, we propose an approach to identify foldable node pairs without the need to verify the underlying reachable paths (which is equivalent to solving the CFL-reachability problem). In particular, given a CFL-reachability problem instance with an input graph G and an RSM, based on the correspondence between paths in G and state transitions in RSM, we propose a graph folding principle, which can determine whether two adjacent nodes are foldable by examining only their incoming and outgoing edges. On top of the graph folding principle, we propose an efficient graph folding algorithm GF. The time complexity of GF is linear with respect to the number of nodes in the input graph. Our evaluations on two clients (alias analysis and value-flow analysis) show that GF significantly accelerates RSM/CFL-reachability by reducing the input graph size. On average, for value-flow analysis, GF reduces 60.96% of nodes and 42.67% of edges of the input graphs, obtaining a speedup of 4.65× and a memory usage reduction of 57.35%. For alias analysis, GF reduces 38.93% of nodes and 35.61% of edges of the input graphs, obtaining a speedup of 3.21× and a memory usage reduction of 65.19%. 
    more » « less
  2. Many program-analysis problems can be formulated as graph-reachability problems. Interleaved Dyck language reachability ( InterDyck -reachability) is a fundamental framework to express a wide variety of program-analysis problems over edge-labeled graphs. The InterDyck language represents an intersection of multiple matched-parenthesis languages (i.e., Dyck languages). In practice, program analyses typically leverage one Dyck language to achieve context-sensitivity, and other Dyck languages to model data dependencies, such as field-sensitivity and pointer references/dereferences. In the ideal case, an InterDyck -reachability framework should model multiple Dyck languages simultaneously . Unfortunately, precise InterDyck -reachability is undecidable. Any practical solution must over-approximate the exact answer. In the literature, a lot of work has been proposed to over-approximate the InterDyck -reachability formulation. This article offers a new perspective on improving both the precision and the scalability of InterDyck -reachability: we aim at simplifying the underlying input graph G . Our key insight is based on the observation that if an edge is not contributing to any InterDyck -paths, we can safely eliminate it from G . Our technique is orthogonal to the InterDyck -reachability formulation and can serve as a pre-processing step with any over-approximating approach for InterDyck -reachability. We have applied our graph simplification algorithm to pre-processing the graphs from a recent InterDyck -reachability-based taint analysis for Android. Our evaluation of three popular InterDyck -reachability algorithms yields promising results. In particular, our graph-simplification method improves both the scalability and precision of all three InterDyck -reachability algorithms, sometimes dramatically. 
    more » « less
  3. Let \begin{document}$$ f_c(z) = z^2+c $$\end{document} for \begin{document}$$ c \in {\mathbb C} $$\end{document}. We show there exists a uniform upper bound on the number of points in \begin{document}$$ {\mathbb P}^1( {\mathbb C}) $$\end{document} that can be preperiodic for both \begin{document}$$ f_{c_1} $$\end{document} and \begin{document}$$ f_{c_2} $$\end{document}, for any pair \begin{document}$$ c_1\not = c_2 $$\end{document} in \begin{document}$$ {\mathbb C} $$\end{document}. The proof combines arithmetic ingredients with complex-analytic: we estimate an adelic energy pairing when the parameters lie in \begin{document}$$ \overline{\mathbb{Q}} $$\end{document}, building on the quantitative arithmetic equidistribution theorem of Favre and Rivera-Letelier, and we use distortion theorems in complex analysis to control the size of the intersection of distinct Julia sets. The proofs are effective, and we provide explicit constants for each of the results. 
    more » « less
  4. Dyck-reachability is a fundamental formulation for program analysis, which has been widely used to capture properly-matched-parenthesis program properties such as function calls/returns and field writes/reads. Bidirected Dyck-reachability is a relaxation of Dyck-reachability on bidirected graphs where each edge u → ( i v labeled by an open parenthesis “( i ” is accompanied with an inverse edge v → ) i u labeled by the corresponding close parenthesis “) i ”, and vice versa. In practice, many client analyses such as alias analysis adopt the bidirected Dyck-reachability formulation. Bidirected Dyck-reachability admits an optimal reachability algorithm. Specifically, given a graph with n nodes and m edges, the optimal bidirected Dyck-reachability algorithm computes all-pairs reachability information in O ( m ) time. This paper focuses on the dynamic version of bidirected Dyck-reachability. In particular, we consider the problem of maintaining all-pairs Dyck-reachability information in bidirected graphs under a sequence of edge insertions and deletions. Dynamic bidirected Dyck-reachability can formulate many program analysis problems in the presence of code changes. Unfortunately, solving dynamic graph reachability problems is challenging. For example, even for maintaining transitive closure, the fastest deterministic dynamic algorithm requires O ( n 2 ) update time to achieve O (1) query time. All-pairs Dyck-reachability is a generalization of transitive closure. Despite extensive research on incremental computation, there is no algorithmic development on dynamic graph algorithms for program analysis with worst-case guarantees. Our work fills the gap and proposes the first dynamic algorithm for Dyck reachability on bidirected graphs. Our dynamic algorithms can handle each graph update ( i.e. , edge insertion and deletion) in O ( n ·α( n )) time and support any all-pairs reachability query in O (1) time, where α( n ) is the inverse Ackermann function. We have implemented and evaluated our dynamic algorithm on an alias analysis and a context-sensitive data-dependence analysis for Java. We compare our dynamic algorithms against a straightforward approach based on the O ( m )-time optimal bidirected Dyck-reachability algorithm and a recent incremental Datalog solver. Experimental results show that our algorithm achieves orders of magnitude speedup over both approaches. 
    more » « less
  5. Abstract CSS-T codes were recently introduced as quantum error-correcting codes that respect a transversal gate. A CSS-T code depends on a CSS-T pair, which is a pair of binary codes$$(C_1, C_2)$$ ( C 1 , C 2 ) such that$$C_1$$ C 1 contains$$C_2$$ C 2 ,$$C_2$$ C 2 is even, and the shortening of the dual of$$C_1$$ C 1 with respect to the support of each codeword of$$C_2$$ C 2 is self-dual. In this paper, we give new conditions to guarantee that a pair of binary codes$$(C_1, C_2)$$ ( C 1 , C 2 ) is a CSS-T pair. We define the poset of CSS-T pairs and determine the minimal and maximal elements of the poset. We provide a propagation rule for nondegenerate CSS-T codes. We apply some main results to Reed–Muller, cyclic and extended cyclic codes. We characterize CSS-T pairs of cyclic codes in terms of the defining cyclotomic cosets. We find cyclic and extended cyclic codes to obtain quantum codes with better parameters than those in the literature. 
    more » « less