skip to main content


Title: Analyzing binding extent in 3CPS
To date, the most effective approach to compiling strict, higher-order functional languages (such as OCaml, Scheme, and SML) has been to use whole-program techniques to convert the program to a first-order monomorphic representation that can be optimized using traditional compilation techniques. This approach, popularized by MLton, has limitations, however. We are interested in exploring a different approach to compiling such languages, one that preserves the higher-order and polymorphic character of the program throughout optimization. To enable such an approach, we must have effective analyses that both provide precise information about higher-order programs and that scale to larger units of compilation. This paper describes one such analysis for determining the extent of variable bindings. We classify the extent of variables as either register (only one binding instance can be live at any time), stack (the lifetimes of binding instances obey a LIFO order), or heap (binding lifetimes are arbitrary). These extents naturally connect variables to the machine resources required to represent them. We believe that precise information about binding extents will enable efficient management of environments, which is a key problem in the efficient compilation of higher-order programs. At the core of the paper is the 3CPS intermediate representation, which is a factored CPS-based intermediate representation (IR) that statically marks variables to indicate their binding extent. We formally specify the management of this binding structure by means of a small-step operational semantics and define a static analysis that determines the extents of the variables in a program. We evaluate our analysis using a standard suite of SML benchmark programs. Our implementation gets surprisingly high yield and exhibits scalable performance. While this paper uses a CPS-based IR, the algorithm and results are easily transferable to other λ-calculus IRs, such as ANF.  more » « less
Award ID(s):
2212538 2212537
NSF-PAR ID:
10415464
Author(s) / Creator(s):
; ;
Date Published:
Journal Name:
Proceedings of the ACM on Programming Languages
Volume:
6
Issue:
ICFP
ISSN:
2475-1421
Page Range / eLocation ID:
650 to 678
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. We describe the design of 3CPS, a compiler intermediate representation (IR) we have developed for use in compiling call-by-value functional languages such as SML, OCaml, Scheme, and Lisp. The language is a low-level form designed in tandem with a matching suite of static analyses. It reflects our belief that the core task of an optimising compiler for a functional language is to reason about the environment structure of the program. Our IR is distinguished by the presence of extent annotations, added to all variables (and verified by static analysis). These annotations are defined in terms of the semantics of the IR, but they directly tell the compiler what machine resources are needed to implement the environment structure of each annotated variable. 
    more » « less
  2. Graph-based intermediate representations (IRs) are widely used for powerful compiler optimizations, either interprocedurally in pure functional languages, or intraprocedurally in imperative languages. Yet so far, no suitable graph IR exists for aggressive global optimizations in languages with both effects and higher-order functions: aliasing and indirect control transfers make it difficult to maintain sufficiently granular dependency information for optimizations to be effective. To close this long-standing gap, we propose a novel typed graph IR combining a notion of reachability types with an expressive effect system to compute precise and granular effect dependencies at an affordable cost while supporting local reasoning and separate compilation. Our high-level graph IR imposes lexical structure to represent structured control flow and nesting, enabling aggressive and yet inexpensive code motion and other optimizations for impure higher-order programs. We formalize the new graph IR based on a λ-calculus with a reachability type-and-effect system along with a specification of various optimizations. We present performance case studies for tensor loop fusion, CUDA kernel fusion, symbolic execution of LLVM IR, and SQL query compilation in the Scala LMS compiler framework using the new graph IR. We observe significant speedups of up to 21x.

     
    more » « less
  3. Compositional compiler verification is a difficult problem that focuses on separate compilation of program components with possibly different verified compilers. Logical relations are widely used in proving correctness of program transformations in higher-order languages; however, they do not scale to compositional verification of multi-pass compilers due to their lack of transitivity. The only known technique to apply to compositional verification of multi-pass compilers for higher-order languages is parametric inter-language simulations (PILS), which is however significantly more complicated than traditional proof techniques for compiler correctness. In this paper, we present a novel verification framework for lightweight compositional compiler correctness . We demonstrate that by imposing the additional restriction that program components are compiled by pipelines that go through the same sequence of intermediate representations , logical relation proofs can be transitively composed in order to derive an end-to-end compositional specification for multi-pass compiler pipelines. Unlike traditional logical-relation frameworks, our framework supports divergence preservation—even when transformations reduce the number of program steps. We achieve this by parameterizing our logical relations with a pair of relational invariants . We apply this technique to verify a multi-pass, optimizing middle-end pipeline for CertiCoq, a compiler from Gallina (Coq’s specification language) to C. The pipeline optimizes and closure-converts an untyped functional intermediate language (ANF or CPS) to a subset of that language without nested functions, which can be easily code-generated to low-level languages. Notably, our pipeline performs more complex closure-allocation optimizations than the state of the art in verified compilation. Using our novel verification framework, we prove an end-to-end theorem for our pipeline that covers both termination and divergence and applies to whole-program and separate compilation, even when different modules are compiled with different optimizations. Our results are mechanized in the Coq proof assistant. 
    more » « less
  4. Garbage collection (GC) support for unmanaged languages can reduce programming burden in reasoning about liveness of dynamic objects. It also avoids temporal memory safety violations and memory leaks.SoundGC for weakly-typed languages such as C/C++, however, remains an unsolved problem. Current value-based GC solutions examine values of memory locations to discover the pointers, and the objects they point to. The approach is inherently unsound in the presence of arbitrary type casts and pointer manipulations, which are legal in C/C++. Such language features are regularly used, especially in low-level systems code.

    In this paper, we propose Dynamic Pointer Provenance Tracking to realize sound GC. We observe that pointers cannot be created out-of-thin-air, and they must have provenance to at least one valid allocation. Therefore, by tracking pointer provenance from the source (e.g., malloc) through both explicit data-flow and implicit control-flow, our GC has sound and precise information to compute the set of all reachable objects at any program state. We discuss several static analysis optimizations, that can be employed during compilation aided with profiling, to significantly reduce the overhead of dynamic provenance tracking from nearly 8× to 16% for well-behaved programs that adhere to the C standards. Pointer provenance based sound GC invocation is also 13% faster and reclaims 6% more memory on average, compared to an unsound value-based GC.

     
    more » « less
  5. null (Ed.)
    The type-theoretic notions of existential abstraction, subtyping, subsumption, and intersection have useful analogues in separation-logic proofs of imperative programs. We have implemented these as an enhancement of the verified software toolchain (VST). VST is an impredicative concurrent separation logic for the C language, implemented in the Coq proof assistant, and proved sound in Coq. For machine-checked functional-correctness verification of software at scale, VST embeds its expressive program logic in dependently typed higher-order logic (CiC). Specifications and proofs in the program logic can leverage the expressiveness of CiC—so users can overcome the abstraction gaps that stand in the way of top-to-bottom verification: gaps between source code verification, compilation, and domain-specific reasoning, and between different analysis techniques or formalisms. Until now, VST has supported the specification of a program as a flat collection of function specifications (in higher-order separation logic)—one proves that each function correctly implements its specification, assuming the specifications of the functions it calls. But what if a function has more than one specification? In this work, we exploit type-theoretic concepts to structure specification interfaces for C code. This brings modularity principles of modern software engineering to concrete program verification. Previous work used representation predicates to enable data abstraction in separation logic. We go further, introducing function-specification subsumption and intersection specifications to organize the multiple specifications that a function is typically associated with. As in type theory, if 𝜙 is a of 𝜓, that is 𝜙<:𝜓, then 𝑥:𝜙 implies 𝑥:𝜓, meaning that any function satisfying specification 𝜙 can be used wherever a function satisfying 𝜓 is demanded. Subsumption incorporates separation-logic framing and parameter adaptation, as well as step-indexing and specifications constructed via mixed-variance functors (needed for C’s function pointers). 
    more » « less