NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

CCLemma: E-Graph Guided Lemma Discovery for Inductive Equational Proofs

https://doi.org/10.1145/3674653

Kurashige, Cole; Ji, Ruyi; Giridharan, Aditya; Barbone, Mark; Noor, Daniel; Itzhaky, Shachar; Jhala, Ranjit; Polikarpova, Nadia (August 2024, Proceedings of the ACM on Programming Languages)

The problem of automatically proving the equality of terms over recursive functions and inductive data types is challenging, as such proofs often require auxiliary lemmas which must themselves be proven. Previous attempts at lemma discovery compromise on either efficiency or efficacy.Goal-directedapproaches are fast but limited in expressiveness, as they can only discover auxiliary lemmas which entail their goals.Theory explorationapproaches are expressive but inefficient, as they exhaustively enumerate candidate lemmas. We introducee-graph guided lemma discovery, a new approach to finding equational proofs that makes theory exploration goal-directed. We accomplish this by using e-graphs and equality saturation to efficiently construct and compactly represent the space ofallgoal-oriented proofs. This allows us to explore only those auxiliary lemmasguaranteedto help make progress on some of these proofs. We implemented our method in a new prover called CCLemma and compared it with three state-of-the-art provers across a variety of benchmarks. CCLemma performs consistently well on two standard benchmarks and additionally solves 50% more problems than the next best tool on a new challenging set.
more » « less
Full Text Available
Superfusion: Eliminating Intermediate Data Structures via Inductive Synthesis

https://doi.org/10.1145/3656415

Ji, Ruyi; Zhao, Yuwei; Polikarpova, Nadia; Xiong, Yingfei; Hu, Zhenjiang (June 2024, Proceedings of the ACM on Programming Languages)

Intermediate data structures are a common cause of inefficiency in functional programming.Fusionattempts to eliminate intermediate data structures by combining adjacent data traversals into one; existing fusion techniques, however, are based on predefined rewrite rules and hence are limited in expressiveness. In this work we explore a different approach to eliminating intermediate data structures, based on inductive program synthesis. We dub this approachsuperfusion(by analogy withsuperoptimization, which uses inductive synthesis for program optimization). Starting from a reference program annotated with data structures to be eliminated, superfusion first generates asketchwhere program fragments operating on those data structures are replaced with holes; it then fills the holes with constant-time expressions such that the resulting program is equivalent to the reference. The main technical challenge here is scalability because optimized programs are often complex, making the search space intractably large for naive enumeration. To address this challenge, our key insight is to first synthesize aghost functionthat describes the relationship between the original intermediate data structure and its compressed version; this function, although not used in the final program, serves to decompose the joint sketch filling problem into independent simpler problems for each hole. We implement superfusion in a tool calledSuFuand evaluate it on a dataset of 290 tasks collected from prior work on deductive fusion and program restructuring. The results show that SuFu solves 264 out of 290 tasks, exceeding the capabilities of rewriting-based fusion systems and achieving comparable performance with specialized approaches to program restructuring on their respective domains.
more » « less
babble: Learning Better Abstractions with E-Graphs and Anti-unification

https://doi.org/10.1145/3571207

Cao, David; Kunkel, Rose; Nandi, Chandrakana; Willsey, Max; Tatlock, Zachary; Polikarpova, Nadia (January 2023, Proceedings of the ACM on Programming Languages)

Library learningcompresses a given corpus of programs by extracting common structure from the corpus into reusable library functions. Prior work on library learning suffers from two limitations that prevent it from scaling to larger, more complex inputs. First, it explores too many candidate library functions that are not useful for compression. Second, it is not robust to syntactic variation in the input. We proposelibrary learning modulo theory(LLMT), a new library learning algorithm that additionally takes as input an equational theory for a given problem domain. LLMT uses e-graphs and equality saturation to compactly represent the space of programs equivalent modulo the theory, and uses a novele-graph anti-unificationtechnique to find common patterns in the corpus more directly and efficiently. We implemented LLMT in a tool named babble. Our evaluation shows that babble achieves better compression orders of magnitude faster than the state of the art. We also provide a qualitative evaluation showing that babble learns reusable functions on inputs previously out of reach for library learning.
more » « less
Searching entangled program spaces

https://doi.org/10.1145/3547622

Koppel, James; Guo, Zheng; de_Vries, Edsko; Solar-Lezama, Armando; Polikarpova, Nadia (August 2022, Proceedings of the ACM on Programming Languages)

Many problem domains, including program synthesis and rewrite-based optimization, require searching astronomically large spaces of programs. Existing approaches often rely on building specialized data structures—version-space algebras, finite tree automata, or e-graphs—to compactly represent such spaces. At their core, all these data structures exploit independence of subterms; as a result, they cannot efficiently represent more complex program spaces, where the choices of subterms are entangled. We introduceequality-constrained tree automata(ECTAs), a new data structure, designed to compactly represent large spaces of programs with entangled subterms. We present efficient algorithms for extracting programs from ECTAs, implemented in a performant Haskell library, ecta. Using the ecta library, we construct Hectare, a type-driven program synthesizer for Haskell. Hectare significantly outperforms a state-of-the-art synthesizer Hoogle+—providing an average speedup of 8×—despite its implementation being an order of magnitude smaller.
more » « less
Type-Directed Program Synthesis for RESTful APIs

https://doi.org/10.1145/3519939.3523450

Guo, Zheng; Cao, David; Tjong, Davin; Yang, Jean; Schlesinger, Cole; Polikarpova, Nadia (January 2022, Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation)

With the rise of software-as-a-service and microservice architectures, RESTful APIs are now ubiquitous in mobile and web applications. A service can have tens or hundreds of API methods, making it a challenge for programmers to find the right combination of methods to solve their task. We present APIphany, a component-based synthesizer for programs that compose calls to RESTful APIs. The main innovation behind APIphany is the use of precise semantic types, both to specify user intent and to direct the search. APIphany contributes three novel mechanisms to overcome challenges in adapting component-based synthesis to the REST domain: (1) a type inference algorithm for augmenting REST specifications with semantic types; (2) an efficient synthesis technique for “wrangling” semi-structured data, which is commonly required in working with RESTful APIs; and (3) a new form of simulated execution to avoid executing APIs calls during synthesis. We evaluate APIphany on three real-world APIs and 32 tasks extracted from GitHub repositories and StackOverflow. In our experiments, APIphany found correct solutions to 29 tasks, with 23 of them reported among top ten synthesis results.
more » « less
Full Text Available
Storm: Refinement Types for Secure Web Applications

Lehmann, Nico; Kunkel, Rose; Brown, Jordan; Yang, Jean; Vazou, Niki; Polikarpova, Nadia; Stefan, Deian; Jhala, Ranjit (January 2021, USENIX Symposium on Operating Systems Design and Implementation)

We present STORM, a web framework that allows developers to build MVC applications with compile-time enforcement of centrally specified data-dependent security policies. STORM ensures security using a Security Typed ORM that refines the (type) abstractions of each layer of the MVC API with logical assertions that describe the data produced and consumed by the underlying operation and the users allowed access to that data. To evaluate the security guarantees of STORM, we build a formally verified reference implementation using the Labeled IO (LIO) IFC framework. We present case studies and end-to- end applications that show how STORM lets developers specify diverse policies while centralizing the trusted code to under 1% of the application, and statically enforces security with modest type annotation overhead, and no run-time cost.
more » « less
Full Text Available
Digging for fold: synthesis-aided API discovery for Haskell

https://doi.org/10.1145/3428273

James, Michael B.; Guo, Zheng; Wang, Ziteng; Doshi, Shivani; Peleg, Hila; Jhala, Ranjit; Polikarpova, Nadia (November 2020, Proceedings of the ACM on Programming Languages)
null (Ed.)
Full Text Available
Liquid information flow control

https://doi.org/10.1145/3408987

Polikarpova, Nadia; Stefan, Deian; Yang, Jean; Itzhaky, Shachar; Hance, Travis; Solar-Lezama, Armando (August 2020, Proceedings of the ACM on Programming Languages)

We present Lifty, a domain-specific language for data-centric applications that manipulate sensitive data. A Lifty programmer annotates the sources of sensitive data with declarative security policies, and the language statically and automatically verifies that the application handles the data according to the policies. Moreover, if verification fails, Lifty suggests a provably correct repair, thereby easing the programmer burden of implementing policy enforcing code throughout the application. The main insight behind Lifty is to encode information flow control using liquid types, an expressive yet decidable type system. Liquid types enable fully automatic checking of complex, data dependent policies, and power our repair mechanism via type-driven error localization and patch synthesis. Our experience using Lifty to implement three case studies from the literature shows that (1) the Lifty policy language is sufficiently expressive to specify many real-world policies, (2) the Lifty type checker is able to verify secure programs and find leaks in insecure programs quickly, and (3) even if the programmer leaves out all policy enforcing code, the Lifty repair engine is able to patch all leaks automatically within a reasonable time.
more » « less
Program synthesis by type-guided abstraction refinement

https://doi.org/10.1145/3371080

Guo, Zheng; James, Michael; Justo, David; Zhou, Jiaxiao; Wang, Ziteng; Jhala, Ranjit; Polikarpova, Nadia (December 2019, Proceedings of the ACM on Programming Languages)

We consider the problem of type-directed component-based synthesis where, given a set of (typed) components and a querytype, the goal is to synthesize atermthat inhabits the query. Classical approaches based on proof search in intuitionistic logics do not scale up to the standard libraries of modern languages, which span hundreds or thousands of components. Recent graph reachability based methods proposed for Java do scale, but only apply to monomorphic data and components: polymorphic data and components infinitely explode the size of the graph that must be searched, rendering synthesis intractable. We introducetype-guided abstraction refinement(TYGAR), a new approach for scalable type-directed synthesis over polymorphic datatypes and components. Our key insight is that we can overcome the explosion by building a graph overabstract typeswhich represent a potentially unbounded set of concrete types. We show how to use graph reachability to search for candidate terms over abstract types, and introduce a new algorithm that usesproofs of untypeabilityof ill-typed candidates to iterativelyrefinethe abstraction until a well-typed result is found. We have implemented TYGAR in H+, a tool that takes as input a set of Haskell libraries and a query type, and returns a Haskell term that uses functions from the provided libraries to implement the query type. Our support for polymorphism allows H+ to work with higher-order functions and type classes, and enables more precise queries due to parametricity. We have evaluated H+ on 44 queries using a set of popular Haskell libraries with a total of 291 components. H+ returns an interesting solution within the first five results for 32 out of 44 queries. Our results show that TYGAR allows H+ to rapidly return well-typed terms, with the median time to first solution of just 1.4 seconds. Moreover, we observe that gains from iterative refinement over exhaustive enumeration are more pronounced on harder queries.
more » « less

Search for: All records