Verified compilation of open modules (i.e., modules whose functionality depends on other modules) provides a foundation for end-to-end verification of modular programs ubiquitous in contemporary software. However, despite intensive investigation in this topic for decades, the proposed approaches are still difficult to use in practice as they rely on assumptions about the internal working of compilers which make it difficult for external users to apply the verification results. We propose an approach to verified compositional compilation without such assumptions in the setting of verifying compilation of heterogeneous modules written in first-order languages supporting global memory and pointers. Our approach is based on the memory model of CompCert and a new discovery that a Kripke relation with a notion of memory protection can serve as a uniform and composable semantic interface for the compiler passes. By absorbing the rely-guarantee conditions on memory evolution for all compiler passes into this Kripke Memory Relation and by piggybacking requirements on compiler optimizations onto it, we get compositional correctness theorems for realistic optimizing compilers as refinements that directly relate native semantics of open modules and that are ignorant of intermediate compilation processes. Such direct refinements support all the compositionality and adequacy properties essential for verified compilation of open modules. We have applied this approach to the full compilation chain of CompCert with its Clight source language and demonstrated that our compiler correctness theorem is open to composition and intuitive to use with reduced verification complexity through end-to-end verification of non-trivial heterogeneous modules that may freely invoke each other (e.g., mutually recursively).
more »
« less
An abstract stack based approach to verified compositional compilation to machine code
A key ingredient contributing to the success of CompCert, the state-of-the-art verified compiler for C, is its block-based memory model, which is used uniformly for all of its languages and their verified compilation. However, CompCert's memory model lacks an explicit notion of stack. Its target assembly language represents the runtime stack as an unbounded list of memory blocks, making further compilation of CompCert assembly into more realistic machine code difficult since it is not possible to merge these blocks into a finite and continuous stack. Furthermore, various notions of verified compositional compilation rely on some kind of mechanism for protecting private stack data and enabling modification to the public stack-allocated data, which is lacking in the original CompCert. These problems have been investigated but not fully addressed before, in the sense that some advanced optimization passes that significantly change the ways stack blocks are (de-)allocated, such as tailcall recognition and inlining, are often omitted. We propose a lightweight and complete solution to the above problems. It is based on the enrichment of CompCert's memory model with an abstract stack that keeps track of the history of stack frames to bound the stack consumption and that enforces a uniform stack access policy by assigning fine-grained permissions to stack memory. Using this enriched memory model for all the languages of CompCert, we are able to reprove the correctness of the full compilation chain of CompCert, including all the optimization passes. In the end, we get Stack-Aware CompCert, a complete extension of CompCert that enforces the finiteness of the stack and fine-grained stack permissions. Based on Stack-Aware CompCert, we develop CompCertMC, the first extension of CompCert that compiles into a low-level language with flat memory spaces. Based on CompCertMC, we develop Stack-Aware CompCertX, a complete extension of CompCert that supports a notion of compositional compilation that we call contextual compilation by exploiting the uniform stack access policy provided by the abstract stack.
more »
« less
- PAR ID:
- 10601961
- Publisher / Repository:
- Association for Computing Machinery (ACM)
- Date Published:
- Journal Name:
- Proceedings of the ACM on Programming Languages
- Volume:
- 3
- Issue:
- POPL
- ISSN:
- 2475-1421
- Format(s):
- Medium: X Size: p. 1-30
- Size(s):
- p. 1-30
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Memory models play an important role in verified compilation of imperative programming languages. A representative one is the block-based memory model of CompCert---the state-of-the-art verified C compiler. Despite its success, the abstraction over memory space provided by CompCert's memory model is still primitive and inflexible. In essence, it uses a fixed representation for identifying memory blocks in a global memory space and uses a globally shared state for distinguishing between used and unused blocks. Therefore, any reasoning about memory must work uniformly for the global memory; it is impossible to individually reason about different sub-regions of memory (i.e., the stack and global definitions). This not only incurs unnecessary complexity in compiler verification, but also poses significant difficulty for supporting verified compilation of open or concurrent programs which need to work with contextual memory, as manifested in many previous extensions of CompCert. To remove the above limitations, we propose an enhancement to the block-based memory model based on nominal techniques; we call it the nominal memory model. By adopting the key concepts of nominal techniques such as atomic names and supports to model the memory space, we are able to 1) generalize the representation of memory blocks to any types satisfying the properties of atomic names and 2) remove the global constraints for managing memory blocks, enabling flexible memory structures for open and concurrent programs. To demonstrate the effectiveness of the nominal memory model, we develop a series of extensions of CompCert based on it. These extensions show that the nominal memory model 1) supports a general framework for verified compilation of C programs, 2) enables intuitive reasoning of compiler transformations on partial memory; and 3) enables modular reasoning about programs working with contextual memory. We also demonstrate that these extensions require limited changes to the original CompCert, making the verification techniques based on the nominal memory model easy to adopt.more » « less
-
Compositional compiler verification is a difficult problem that focuses on separate compilation of program components with possibly different verified compilers. Logical relations are widely used in proving correctness of program transformations in higher-order languages; however, they do not scale to compositional verification of multi-pass compilers due to their lack of transitivity. The only known technique to apply to compositional verification of multi-pass compilers for higher-order languages is parametric inter-language simulations (PILS), which is however significantly more complicated than traditional proof techniques for compiler correctness. In this paper, we present a novel verification framework forlightweight compositional compiler correctness. We demonstrate that by imposing the additional restriction that program components are compiled by pipelines that go throughthe same sequence of intermediate representations, logical relation proofs can be transitively composed in order to derive an end-to-end compositional specification for multi-pass compiler pipelines. Unlike traditional logical-relation frameworks, our framework supports divergence preservation—even when transformations reduce the number of program steps. We achieve this by parameterizing our logical relations with a pair ofrelational invariants. We apply this technique to verify a multi-pass, optimizing middle-end pipeline for CertiCoq, a compiler from Gallina (Coq’s specification language) to C. The pipeline optimizes and closure-converts an untyped functional intermediate language (ANF or CPS) to a subset of that language without nested functions, which can be easily code-generated to low-level languages. Notably, our pipeline performs more complex closure-allocation optimizations than the state of the art in verified compilation. Using our novel verification framework, we prove an end-to-end theorem for our pipeline that covers both termination and divergence and applies to whole-program and separate compilation, even when different modules are compiled with different optimizations. Our results are mechanized in the Coq proof assistant.more » « less
-
Microcontrollers are the heart of embedded systems. Due to cost and power constraints, they do not have memory management units (MMUs) or even memory protection units (MPUs). As a result, embedded software faces two related challenges both concerned with the stack. First, in a multi-tasking environment, physical memory used by the stack is usually statically allocated per task. Second, a stack overflow is difficult to detect for lower-end microcontrollers without an MPU. In this work, we argue that segmented stacks, a notion investigated and subsequently dismissed for systems with virtual memory, can solve both challenges for embedded software. We show that many problems with segmented stacks vanish on embedded systems and present novel solutions to the rest. Importantly, we show that segmented stacks, combined with Rust, can guarantee memory safety without MMU or MPU. Moreover, segmented stacks allow memory to be dynamically allocated to per-task stacks and can improve memory efficiency when combined with proper scheduling.more » « less
-
In recent years, the MLIR framework has had explosive growth due to the need for extensible deep learning compilers for hardware accelerators. Such examples include Triton, CIRCT, and ONNX-MLIR. MLIR compilers introduce significant complexities in localizing bugs or inefficiencies because of their layered optimization and transformation process with compilation passes. While existing delta debugging techniques can be used to identify a minimum subset of IR code that reproduces a given bug symptom, their naive application to MLIR is time-consuming because real-world MLIR compilers usually involve a large number of compilation passes. Compiler developers must identify a minimized set of relevant compilation passes to reduce the footprint of MLIR compiler code to be inspected for a bug fix. We propose DuoReduce, a dual-dimensional reduction approach for MLIR bug localization. DuoReduce leverages three key ideas in tandem to design an efficient MLIR delta debugger. First, DuoReduce reduces compiler passes that are irrelevant to the bug by identifying ordering dependencies among the different compilation passes. Second, DuoReduce uses MLIR-semantics-aware transformations to expedite IR code reduction. Finally, DuoReduce leverages cross-dependence between the IR code dimension and the compilation pass dimension by accounting for which IR code segments are related to which compilation passes to reduce unused passes. Experiments with three large-scale MLIR compiler projects find that DuoReduce outperforms syntax-aware reducers such as Perses and Vulcan in terms of IR code reduction by 31.6% and 21.5% respectively. If one uses these reducers by enumerating all possible compilation passes (on average 18 passes), it could take up to 145 hours. By identifying ordering dependencies among compilation passes, DuoReduce reduces this time to 9.5 minutes. By identifying which compilation passes are unused for compiling reduced IR code, DuoReduce reduces the number of passes by 14.6%. This translates to not needing to examine 281 lines of MLIR compiler code on average to fix the bugs. DuoReduce has the potential to significantly reduce debugging effort in MLIR compilers, which serves as the foundation for the current landscape of machine learning and hardware accelerators.more » « less
An official website of the United States government
