Lazy evaluation is a powerful tool for functional programmers. It enables the concise expression of on-demand computation and a form of compositionality not available under other evaluation strategies. However, the stateful nature of lazy evaluation makes it hard to analyze a program's computational cost, either informally or formally. In this work, we present a novel and simple framework for formally reasoning about lazy computation costs based on a recent model of lazy evaluation: clairvoyant call-by-value. The key feature of our framework is its simplicity, as expressed by our definition of the clairvoyance monad. This monad is both simple to define (around 20 lines of Coq) and simple to reason about. We show that this monad can be effectively used to mechanically reason about the computational cost of lazy functional programs written in Coq.
more »
« less
Story of Your Lazy Function’s Life: A Bidirectional Demand Semantics for Mechanized Cost Analysis of Lazy Programs
Lazy evaluation is a powerful tool that enables better compositionality and potentially better performance in functional programming, but it is challenging to analyze its computation cost. Existing works either require manually annotating sharing, or rely on separation logic to reason about heaps of mutable cells. In this paper, we propose a bidirectional demand semantics that allows for extrinsic reasoning about the computation cost of lazy programs without relying on special program logics. To show the effectiveness of our approach, we apply the demand semantics to a variety of case studies including insertion sort, selection sort, Okasaki's banker's queue, and the implicit queue. We formally prove that the banker's queue and the implicit queue are both amortized and persistent using the Rocq Prover (formerly known as Coq). We also propose the reverse physicist's method, a novel variant of the classical physicist's method, which enables mechanized, modular and compositional reasoning about amortization and persistence with the demand semantics.
more »
« less
- Award ID(s):
- 2006535
- PAR ID:
- 10539211
- Publisher / Repository:
- ACM
- Date Published:
- Journal Name:
- Proceedings of the ACM on Programming Languages
- Volume:
- 8
- Issue:
- ICFP
- ISSN:
- 2475-1421
- Page Range / eLocation ID:
- 30 to 63
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Deterministic multithreading (DMT) fundamentally requires total, deterministic ordering of synchronization operations on each synchronization variable, i.e. a partial ordering over all synchronization operations. In practice, prior DMT systems totally order all synchronization operations, regardless of synchronization variable; the result is severe performance degradation for highly concurrent applications using fine-grained synchronization. Motivated by this class of programs, we propose lazy determinism as a way to go beyond this total order bottleneck. Lazy determinism executes synchronization operations speculatively, and enforces determinism by subsequently validating the resulting order of operations. If an ordering violation is detected, part of the computation is restarted. By enforcing only the partial ordering required to guarantee determinism, lazy determinism increases the available parallelism during deterministic execution. We implement LazyDet via a pure-software runtime system accelerated by custom Linux kernel support. Our experiments with hash table benchmarks from Synchrobench show roughly an order of magnitude improvement in the performance of lock-based data structures compared to the state of the art in eager determinism. For benchmarks from PARSEC-2, SPLASH-2, and Phoenix, we demonstrate runtime improvements of up to 2× on the programs that challenge deterministic execution environments the most.more » « less
-
Many data processing systems allow SQL queries that calluser-defined functions (UDFs)written in conventional programming languages. While such SQL extensions provide convenience and flexibility to users, queries involving UDFs are not as efficient as their pure SQL counterparts that invoke SQL’s highly-optimized built-in functions. Motivated by this problem, we propose a new technique for translating SQL queries with UDFs to pure SQL expressions. Unlike prior work in this space, our method is not based on syntactic rewrite rules and can handle a much more general class of UDFs. At a high-level, our method is based on counterexample-guided inductive synthesis (CEGIS) but employs a novel compositional strategy that decomposes the synthesis task into simpler sub-problems. However, because there is no universal decomposition strategy that works for all UDFs, we propose a novellazy inductive synthesisapproach that generates a sequence of decompositions that correspond to increasingly harder inductive synthesis problems. Because most realistic UDF-to-SQL translation tasks are amenable to a fine-grained decomposition strategy, our lazy inductive synthesis method scales significantly better than traditional CEGIS. We have implemented our proposed technique in a tool called CLIS for optimizing Spark SQL programs containing Scala UDFs. To evaluate CLIS, we manually study 100 randomly selected UDFs and find that 63 of them can be expressed in pure SQL. Our evaluation on these 63 UDFs shows that CLIS can automatically synthesize equivalent SQL expressions in 92% of the cases and that it can solve 2.4× more benchmarks compared to a baseline that does not use our compositional approach. We also show that CLIS yields an average speed-up of 3.5× for individual UDFs and 1.3× to 3.1× in terms of end-to-end application performance.more » « less
-
Blanchette, Jasmin; Kovács, Laura; Pattinson, Dirk (Ed.)Dynamic arrays, also referred to as vectors, are fundamental data structures used in many programs. Modeling their semantics efficiently is crucial when reasoning about such programs. The theory of arrays is widely supported but is not ideal, because the number of elements is fixed (determined by its index sort) and cannot be adjusted, which is a problem, given that the length of vectors often plays an important role when reasoning about vector programs. In this paper, we propose reasoning about vectors using a theory of sequences. We introduce the theory, propose a basic calculus adapted from one for the theory of strings, and extend it to efficiently handle common vector operations. We prove that our calculus is sound and show how to construct a model when it terminates with a saturated configuration. Finally, we describe an implementation of the calculus in cvc5 and demonstrate its efficacy by evaluating it on verification conditions for smart contracts and benchmarks derived from existing array benchmarks.more » « less
-
Blanchette, Jasmin; Kovacs, Laura; Pattinson, Dirk (Ed.)Dynamic arrays, also referred to as vectors, are fundamental data structures used in many programs. Modeling their semantics efficiently is crucial when reasoning about such programs. The theory of arrays is widely supported but is not ideal, because the number of elements is fixed (determined by its index sort) and cannot be adjusted, which is a problem, given that the length of vectors often plays an important role when reasoning about vector programs. In this paper, we propose reasoning about vectors using a theory of sequences. We introduce the theory, propose a basic calculus adapted from one for the theory of strings, and extend it to efficiently handle common vector operations. We prove that our calculus is sound and show how to construct a model when it terminates with a saturated configuration. Finally, we describe an implementation of the calculus in cvc5 and demonstrate its efficacy by evaluating it on verification conditions for smart contracts and benchmarks derived from existing array benchmarks.more » « less
An official website of the United States government

