skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: egg: Fast and extensible equality saturation
An e-graph efficiently represents a congruence relation over many expressions. Although they were originally developed in the late 1970s for use in automated theorem provers, a more recent technique known as equality saturation repurposes e-graphs to implement state-of-the-art, rewrite-driven compiler optimizations and program synthesizers. However, e-graphs remain unspecialized for this newer use case. Equality saturation workloads exhibit distinct characteristics and often require ad-hoc e-graph extensions to incorporate transformations beyond purely syntactic rewrites. This work contributes two techniques that make e-graphs fast and extensible, specializing them to equality saturation. A new amortized invariant restoration technique called rebuilding takes advantage of equality saturation's distinct workload, providing asymptotic speedups over current techniques in practice. A general mechanism called e-class analyses integrates domain-specific analyses into the e-graph, reducing the need for ad hoc manipulation. We implemented these techniques in a new open-source library called egg. Our case studies on three previously published applications of equality saturation highlight how egg's performance and flexibility enable state-of-the-art results across diverse domains.  more » « less
Award ID(s):
1749570
PAR ID:
10603123
Author(s) / Creator(s):
 ;  ;  ;  ;  ;  
Publisher / Repository:
Association for Computing Machinery (ACM)
Date Published:
Journal Name:
Proceedings of the ACM on Programming Languages
Volume:
5
Issue:
POPL
ISSN:
2475-1421
Format(s):
Medium: X Size: p. 1-29
Size(s):
p. 1-29
Sponsoring Org:
National Science Foundation
More Like this
  1. The problem of automatically proving the equality of terms over recursive functions and inductive data types is challenging, as such proofs often require auxiliary lemmas which must themselves be proven. Previous attempts at lemma discovery compromise on either efficiency or efficacy.Goal-directedapproaches are fast but limited in expressiveness, as they can only discover auxiliary lemmas which entail their goals.Theory explorationapproaches are expressive but inefficient, as they exhaustively enumerate candidate lemmas. We introducee-graph guided lemma discovery, a new approach to finding equational proofs that makes theory exploration goal-directed. We accomplish this by using e-graphs and equality saturation to efficiently construct and compactly represent the space ofallgoal-oriented proofs. This allows us to explore only those auxiliary lemmasguaranteedto help make progress on some of these proofs. We implemented our method in a new prover called CCLemma and compared it with three state-of-the-art provers across a variety of benchmarks. CCLemma performs consistently well on two standard benchmarks and additionally solves 50% more problems than the next best tool on a new challenging set. 
    more » « less
  2. Roy, Sudeepa; Kara, Ahmet (Ed.)
    Equality saturation is an emerging technique for program and query optimization developed in the programming language community. It performs term rewriting over an E-graph, a data structure that compactly represents a program space. Despite its popularity, the theory of equality saturation lags behind the practice. In this paper, we define a fixpoint semantics of equality saturation based on tree automata and uncover deep connections between equality saturation and the chase. We characterize the class of chase sequences that correspond to equality saturation. We study the complexities of terminations of equality saturation in three cases: single-instance, all-term-instance, and all-E-graph-instance. Finally, we define a syntactic criterion based on acyclicity that implies equality saturation termination. 
    more » « less
  3. Library learningcompresses a given corpus of programs by extracting common structure from the corpus into reusable library functions. Prior work on library learning suffers from two limitations that prevent it from scaling to larger, more complex inputs. First, it explores too many candidate library functions that are not useful for compression. Second, it is not robust to syntactic variation in the input. We proposelibrary learning modulo theory(LLMT), a new library learning algorithm that additionally takes as input an equational theory for a given problem domain. LLMT uses e-graphs and equality saturation to compactly represent the space of programs equivalent modulo the theory, and uses a novele-graph anti-unificationtechnique to find common patterns in the corpus more directly and efficiently. We implemented LLMT in a tool named babble. Our evaluation shows that babble achieves better compression orders of magnitude faster than the state of the art. We also provide a qualitative evaluation showing that babble learns reusable functions on inputs previously out of reach for library learning. 
    more » « less
  4. Rewrite rules are critical in equality saturation, an increasingly popular technique in optimizing compilers, synthesizers, and verifiers. Unfortunately, developing high-quality rulesets is difficult and error-prone. Recent work on automatically inferring rewrite rules does not scale to large terms or grammars, and existing rule inference tools are monolithic and opaque. Equality saturation users therefore struggle to guide inference and incrementally construct rulesets. As a result, most users still manually develop and maintain rulesets. This paper proposes Enumo, a new domain-specific language for programmable theory exploration. Enumo provides a small set of core operators that enable users to strategically guide rule inference and incrementally build rulesets. Short Enumo programs easily replicate results from state-of-the-art tools, but Enumo programs can also scale to infer deeper rules from larger grammars than prior approaches. Its composable operators even facilitate developing new strategies for ruleset inference. We introduce a new fast-forwarding strategy that does not require evaluating terms in the target language, and can thus support domains that were out of scope for prior work. We evaluate Enumo and fast-forwarding across a variety of domains. Compared to state-of-the-art techniques, enumo can synthesize better rulesets over a diverse set of domains, in some cases matching the effects of manually-developed rulesets in systems driven by equality saturation. 
    more » « less
  5. Many compilers, synthesizers, and theorem provers rely on rewrite rules to simplify expressions or prove equivalences. Developing rewrite rules can be difficult: rules may be subtly incorrect, profitable rules are easy to miss, and rulesets must be rechecked or extended whenever semantics are tweaked. Large rulesets can also be challenging to apply: redundant rules slow down rule-based search and frustrate debugging. This paper explores how equality saturation, a promising technique that uses e-graphs toapplyrewrite rules, can also be used toinferrewrite rules. E-graphs can compactly represent the exponentially large sets of enumerated terms and potential rewrite rules. We show that equality saturation efficiently shrinks both sets, leading to faster synthesis of smaller, more general rulesets. We prototyped these strategies in a tool dubbed Ruler. Compared to a similar tool built on CVC4, Ruler synthesizes 5.8× smaller rulesets 25× faster without compromising on proving power. In an end-to-end case study, we show Ruler-synthesized rules which perform as well as those crafted by domain experts, and addressed a longstanding issue in a popular open source tool. 
    more » « less