<?xml-model href='http://www.tei-c.org/release/xml/tei/custom/schema/relaxng/tei_all.rng' schematypens='http://relaxng.org/ns/structure/1.0'?><TEI xmlns="http://www.tei-c.org/ns/1.0">
	<teiHeader>
		<fileDesc>
			<titleStmt><title level='a'>Compositional optimizations for CertiCoq</title></titleStmt>
			<publicationStmt>
				<publisher></publisher>
				<date>08/22/2021</date>
			</publicationStmt>
			<sourceDesc>
				<bibl> 
					<idno type="par_id">10322111</idno>
					<idno type="doi">10.1145/3473591</idno>
					<title level='j'>Proceedings of the ACM on Programming Languages</title>
<idno>2475-1421</idno>
<biblScope unit="volume">5</biblScope>
<biblScope unit="issue">ICFP</biblScope>					

					<author>Zoe Paraskevopoulou</author><author>John M. Li</author><author>Andrew W. Appel</author>
				</bibl>
			</sourceDesc>
		</fileDesc>
		<profileDesc>
			<abstract><ab><![CDATA[Compositional compiler verification is a difficult problem that focuses on separate compilation of program components with possibly different verified compilers. Logical relations are widely used in proving correctness of program transformations in higher-order languages; however, they do not scale to compositional verification of multi-pass compilers due to their lack of transitivity. The only known technique to apply to compositional verification of multi-pass compilers for higher-order languages is parametric inter-language simulations (PILS), which is however significantly more complicated than traditional proof techniques for compiler correctness. In this paper, we present a novel verification framework for              lightweight compositional compiler correctness              . We demonstrate that by imposing the additional restriction that program components are compiled by pipelines that go through              the same sequence of intermediate representations              , logical relation proofs can be transitively composed in order to derive an end-to-end compositional specification for multi-pass compiler pipelines. Unlike traditional logical-relation frameworks, our framework supports divergence preservation—even when transformations reduce the number of program steps. We achieve this by parameterizing our logical relations with a pair of              relational invariants              .                        We apply this technique to verify a multi-pass, optimizing middle-end pipeline for CertiCoq, a compiler from Gallina (Coq’s specification language) to C. The pipeline optimizes and closure-converts an untyped functional intermediate language (ANF or CPS) to a subset of that language without nested functions, which can be easily code-generated to low-level languages. Notably, our pipeline performs more complex closure-allocation optimizations than the state of the art in verified compilation. Using our novel verification framework, we prove an end-to-end theorem for our pipeline that covers both termination and divergence and applies to whole-program and separate compilation, even when different modules are compiled with different optimizations. Our results are mechanized in the Coq proof assistant.]]></ab></abstract>
		</profileDesc>
	</teiHeader>
	<text><body xmlns="http://www.tei-c.org/ns/1.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xlink="http://www.w3.org/1999/xlink">
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1">INTRODUCTION</head><p>The state of the art in compiler verification is devising proof techniques that support not only whole-program compilation but also separate compilation of program components with the same or different compilers. The specification of a compiler is typically expressed as a relation between a source and a target program, R (e src , e trg ). For R to be a meaningful compiler specification, it must be an adequate relation-it must imply that the target program refines the behavior of the source semantics and parameterizing our logical relation with a relational invariant, a binary relation that relates the fuel consumption of the source and target programs. We then show that for an appropriate class of relational invariants the logical relation implies that divergence is preserved.</p><p>A similar approach is used in the verified CakeML compiler. CakeML's forward simulations establish that the fuel consumption of the source never exceeds that of the target; this implies divergence preservation. But this condition is not satisfied by transformations that reduce the number of program steps. Therefore, CakeML introduces (in each intermediate language) a Tick instruction that has no computational meaning other than consuming fuel. Transformations will insert Tick instructions as needed to keep the fuel consumption of the target equal that of the source program. This approach requires a Tick-removal pass that must be proved correct with a backwards simulation. In our framework, we show that a more general relation between fuel consumption suffices to show divergence preservation and hence, we do not need to add a Tick instruction to our language.</p><p>We apply this framework to verify the multi-pass optimizing middle-end pipeline of Certi-Coq <ref type="bibr">[Anand et al. 2017]</ref>. CertiCoq is a compiler from Gallina (Coq's specification language) to machine language. Its front end uses MetaCoq to reify Coq functions and erase types with a proved-sound transformation <ref type="bibr">[Sozeau et al. 2019</ref>]. Constructor applications are then converted to &#951;-long form. The program is CPS-converted <ref type="bibr">[Paraskevopoulou and Grover 2021]</ref> or ANF-converted into the intermediate language &#955; ANF , a higher-order, functional intermediate representation in an A-normal form <ref type="bibr">[Flanagan et al. 1993</ref>]. CertiCoq's optimizer and closure-converter translates to &#955; C ANF , a closureless, flat-scope subset of &#955; ANF . Lastly, there is a proved-correct translation <ref type="bibr">[Savary B&#233;langer et al. 2019]</ref> to CompCert Clight <ref type="bibr">[Leroy 2009a</ref>]. Any C compiler can be used as the back end, but only CompCert can yield an end-to-end correctness guarantee.</p><p>Our &#955; ANF pipeline follows a compilation by transformation approach <ref type="bibr">[Kelsey and Hudak 1989;</ref><ref type="bibr">Fradet and Le M&#233;tayer 1991;</ref><ref type="bibr">Jones and Santos 1995]</ref>: through the interaction of small and modular transformations it optimizes the code and gradually compiles away language features that cannot be mapped directly to the target language. The pipeline comprises 7 distinct transformations: shrink reduction <ref type="bibr">[Appel and Jim 1997a;</ref><ref type="bibr">Benton et al. 2005]</ref>, inlining, uncurrying, lambda lifting, hoisting, and dead-parameter elimination. Similar transformations have been implemented and described in other compilers <ref type="bibr">[Adams et al. 1986;</ref><ref type="bibr">Steele 1978;</ref><ref type="bibr">Appel 1992;</ref><ref type="bibr">Jones 1996;</ref><ref type="bibr">Kennedy 2007;</ref><ref type="bibr">Leroy et al. 2020]</ref>, where it is observed that simplifications performed by one phase can &#322;cascade&#382;, enabling more simplifications by the same or other phases. Through the interaction of these passes our pipeline achieves efficient closure allocation strategies, avoiding useless heap allocation of function closures. To our knowledge, CertiCoq's closure allocation strategies are the most advanced among formally verified compilers. Our preliminary experimental results show that CertiCoq with our &#955; ANF pipeline runs at native-code speed and that the performance of the code is improved by our closure allocation optimizations. Our verification framework is particularly suitable for the design of the pipeline. New transformations can be easily added (and proved correct with the same machinery) and old transformations can be reordered without any effect on the compositional top-level theorem. Our separate compilation theorem applies to programs compiled with different versions of the pipeline.</p><p>Contributions. In summary, we present:</p><p>&#8226; A novel lightweight technique for compositional compiler correctness. The technique is compatible with standard logical relations and supports compositional verification of different pipelines that go through the same representation changes. &#8226; A novel method for proving divergence preservation using logical relations, compatible with optimizations that reduce the number of program steps and compositional compiler correctness.</p><p>&#8226; &#955; ANF , an optimizing, modular, general-purpose, extensible, compilation pipeline for pure, higherorder functional languages, verified with the above techniques. Our pipeline implements more sophisticated closure strategies than previous verified compilers. Our results are fully mechanized in the Coq proof assistant. We begin with a language-generic overview of our novel compositional verification approach and our technique for showing divergence preservation.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2">OVERVIEW OF THE VERIFICATION APPROACH 2.1 Compositional Reasoning</head><p>Step-indexed logical relations <ref type="bibr">[Appel and McAllester 2001;</ref><ref type="bibr">Ahmed 2006</ref>] are a useful tool for proving a variety of properties about lambda calculus and related languages. Binary logical relations are commonly used to show program equivalence and refinement <ref type="bibr">[Benton and Hur 2009;</ref><ref type="bibr">Perconti and Ahmed 2014;</ref><ref type="bibr">Owens et al. 2016</ref><ref type="bibr">Owens et al. , 2017;;</ref><ref type="bibr">Timany et al. 2017]</ref>, making them a useful tool for compiler verification. A particularly desirable property of binary logical relations in compiler verification <ref type="bibr">[Benton and Hur 2009;</ref><ref type="bibr">Hur and Dreyer 2011]</ref> is horizontal compositionality: two pairs of language expressions that are independently shown to inhabit the logical relation will still inhabit the logical relation when they are pairwise composed with the same language construct. For example, if for a logical relation R and for two pairs of expressions we have that R(e 1 , e 2 ) and R(e &#8242; 1 , e &#8242; 2 ), then we can show that applying e 1 to e &#8242; 1 and e 2 to e &#8242; 2 yields related expressions, i.e., R(e 1 e &#8242; 1 , e 2 e &#8242; 2 ). This is established by the so-called compatibility lemmas of the logical relation. Since linking is typically defined using language constructs, logical relations are compatible with linking. This is a crucial property for compositional compiler correctness: two pairs of logically related programs of arbitrary provenance can be linked together to obtain two logically related programs. For example, assume that compilers comp 1 and comp 2 are independently verified to satisfy a logical relation R: &#8704; e, R(e, comp i (e)), with i &#8712; {1, 2}. Then, for any programs e lib and e client , we can derive that R ([e lib ]e client , [comp 1 (e lib )]comp 2 (e client )), where [e lib ]e client denotes linking the library program e lib with the client program e client .</p><p>Unfortunately, cross-language logical relations<ref type="foot">foot_1</ref> are not transitively (or vertically) composable. That means, that if R 1-2 , R 2-3 , and R 1-3 are logical relations between languages L 1 and L 2 , L 2 and L 3 , and L 1 and L 3 , one cannot derive that &#8704; e 1 e 2 e 3 , R 1-2 (e 1 , e 2 ) &#8658; R 2-3 (e 2 , e 3 ) &#8658; R 1-3 (e 1 , e 3 ) Therefore, logical relations cannot be used to prove compositional specifications for multi-pass compilers. To understand this, consider the definition of logical relations for function values: two functions are logically related if they map logically related values to logically related results. To</p><p>and f 3 , one would have to show that given arguments v 1 and v 3 such that R 1-3 (v 1 , v 3 ), R 1-3 relates the expressions f 1 v 1 and f 3 v 3 . But the proof is stuck: there is no way to use the hypotheses since there is no value</p><p>To address this limitation, <ref type="bibr">Hur et al. [2012]</ref> develop parametric bisimulations (PBs), a new relation that combines ideas from Kripke logical relations and bisimulations to derive a relation that is both horizontally and vertically composable. Later, <ref type="bibr">Neis et al. [2015]</ref> develop parametric inter-language simulations (PILS) that generalize PBs to a cross-language setting. Roughly, PBs and PILs work by parameterizing the relation with an unknown, arbitrary relation, the &#322;assumed equivalence&#382; and drawing function arguments from this relation. But applying related functions to &#322;assumed equivalent&#382; arguments is not guaranteed to produce the same observable behavior since nothing is known about assumed equivalence. The solution is to define a local term equivalence relation that asserts that two terms have the same observable behavior, excluding what happens in calls related by assumed equivalence. These definitions are more convoluted than standard logical relations used in compiler correctness (e.g., the one in <ref type="bibr">[Owens et al. 2017]</ref>). Even though they avoid the usual step-indexing that is needed to make logical relations well-founded, they need to be coinductively defined and hence, proofs require coinductive reasoning. As those authors say in their papers, the metatheory of PBs and PILS (including the proof of transitivity) are significantly complicated. For example, a very subtle point in the transitivity proof is decomposing the assumed equivalence between e 1 and e 3 to derive assumed equivalences between e 1 and e 2 and e 2 and e 3 . Furthermore, in their original form, PBs and PILS do not admit &#951;-conversion (which many transformations perform) and further adjustments are needed to make the rule admissible <ref type="bibr">[Hur et al. 2014]</ref>.</p><p>In this work, we show that if we are willing to restrict our compositional specification to only apply to compilers that encompass the same sequence of intermediate representations, standard logical relations can be used to derive a compositional specification for multi-pass compilers. We observe that adequacy and horizontal compositionality are closed under relation composition, under the assumption that program refinement is transitively composable (which is true in our setting and in many other settings as well <ref type="bibr">[Leroy 2009b;</ref><ref type="bibr">Chlipala 2010;</ref><ref type="bibr">Tan et al. 2016]</ref>). This follows immediately by the definition of horizontal compositionality, adequacy, and relation composition.</p><p>Therefore, we can obtain a top-level compositional relation by composing the logical relations that are used to verify each pass. For example, we might have compilers that go through languages L 1 , L 2 and L 3 with two passes each that are verified with logical relations R 1-2 and R 2-3 . We define a top-level relation</p><p>Then, it is easy to show that R is adequate and compatible with linking.</p><p>The above top-level relation will work for any compilers that have two passes, one between L 1 and L 2 , and one between L 2 and L 3 , that satisfy R 1-2 and R 2-3 respectively. However, compilers often perform one or more same-language transformations on the same intermediate representation. We may want to verify compilers that include an arbitrary number of same language transformations on language L 2 that satisfy a same-language logical relation R 2-2 . To that end, we define the following relation:</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>}</head><p>Because untyped, same-language logical relations, like R 2-2 , are reflexive,<ref type="foot">foot_2</ref> R + 2-2 is satisfied by zero or more L 2 transformations. Therefore, the top-level relation</p><p>is satisfied by compilers that encompass a pass from L 1 to L 2 that satisfies R 1-2 , (possibly) an arbitrary number of L 2 passes that satisfy R 2-2 , and a pass from L 2 to L 3 that satisfies R 2-3 . As we will explain, the proof of horizontal compositionality for R + 2-2 makes use of reflexivity. We apply this approach to prove correct our multipass &#955; ANF pipeline. In our pipeline, only one representation change happens (by our closure conversion transformation) and the rest of the passes are same-language transformations before and after closure conversion. In our setting, we only need two logical relations: a cross-representation logical relation (for closure conversion that changes the representation of function values), and a same-language, reflexive logical relation (for all other same-language transformations). The benefit of this approach is that each pass can be proved correct with a standard logical relation that has well-understood metatheory. Then the compositionality framework can be built on top of the logical relations without any changes to the logical-relations machinery. We found that defining the top-level compositional compiler relations and proving adequacy and compatibility with linking was mostly straightforward. The definitions of the top-level relations and the proofs of adequacy and compatibility (including the top-level behavioral refinement) comprise 319 lines of code and 490 lines of proof.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.2">Divergence Preservation</head><p>Correct compilers are expected to preserve divergent behaviors: if the source program infiniteloops, then the target program should not return a result. Another contribution of our verification framework is a general technique to show that divergence is preserved using a forward simulation between big-step, fuel-based semantics. This is a delicate verification problem, especially when transformations reduce the number of program steps. Existing techniques apply to simulations that use small-step or big-step semantics.</p><p>Showing divergence preservation with a forward simulation between small-step semantics requires reasoning about stuttering steps <ref type="bibr">[Leroy 2009b;</ref><ref type="bibr">Barthe et al. 2019</ref>]. The problem is that a small-step simulation for a step-reducing transformation states that whenever the source takes a step (e src &#8594; e &#8242; src ), then the target may take zero or more steps (e trg &#8594; * e &#8242; trg ). This, however, does not exclude the possibility of the source program taking an infinite reduction sequence while the target program terminates to a value. This is called the stuttering problem. The standard solution is to define a well-founded measure on program states and show that it is strictly decreasing when a source step does not correspond to a target step. However, small-step semantics complicate semantic preservation proofs for transformations that introduce administrative redexes <ref type="bibr">[Plotkin 1975;</ref><ref type="bibr">Dargaye and Leroy 2007]</ref> (our closure conversion, uncurrying, and lambda lifting all introduce administrative redexes). Therefore, big-step semantics are often favored in semantics preservation proofs. But big-step semantics do not usually capture terminating behaviors. One solution is to do a separate proof using a big-step coinductive judgment that captures divergent behaviors <ref type="bibr">[Leroy and Grall 2009]</ref>. But this tends to be rather complicated and as requires formalizing two judgments and carrying out a separate semantics preservation proof for each of them. To avoid the complications of both small-step semantics and coinductive big-step semantics, the CakeML compiler uses a fuel-based big-step semantics <ref type="bibr">[Owens et al. 2016</ref><ref type="bibr">[Owens et al. , 2017] ]</ref> that, just like ours, can capture divergent behaviors. But to show divergence preservation, CakeML proofs require that the compiled program consumes at least as much fuel as the source program. Since this is not true for all transformations, CakeML uses a special instruction, Tick, in all of its intermediate representations that decreases the amount of available fuel. Transformations introduce a Tick instruction whenever steps are compiled away to keep the fuel consumption of the source upper bounded by the fuel consumption of the target program. The instruction is propagated through all intermediate languages and it is erased before code generation. It complicates correctness proofs by requiring an extra Tick-removal pass that has to be proved correct with a backwards simulation <ref type="bibr">[Owens et al. 2017, Section 7]</ref>.</p><p>In our work, we take a more general approach: we weaken the c src &#8804; c trg relation that is enforced by CakeML to the fuel consumption of the source and target programs to a relation c src &#8804; f (c trg ), where f can be any strictly monotonic function. Therefore, we allow the target to consume less fuel than the source program and we do not need to add Tick instructions to our intermediate representations.</p><p>First, we define a fuel-based big-step evaluation judgment, written e &#8659; c r , that asserts that with some fuel value c, the evaluation of an expression e yields a result r . Each evaluation rule consumes a unit of fuel. The result can be either a value or an out-of-time exception (OOT) if there is not enough fuel to evaluate the expression. We say that a program diverges if for all fuel values the evaluation of a program times out: e &#8657; def = &#8704; c, e &#8659; c OOT. We formally prove (for our &#955; ANF semantics in section 4) that in order to show preservation of divergence, it is enough to show that when the source program times out with some fuel c src , then so does the target program with some fuel c trg , such that c src &#8804; f (c trg ) for some strictly monotonic function f . In order to prove this more general bound, which is typically different for each transformation, we use a logical relation that is parameterized by relational invariants on the fuel consumption of the two programs. The compatibility lemmas of the logical relation are adjusted so that we can compositionally prove that the invariants hold.</p><p>More precisely, we define a step-indexed logical relation is parameterized by two relational invariants: Q L the local invariant, and Q G the global invariant:</p><p>The logical relation asserts, as usual, that if expression e 1 evaluates to a result r 1 for any fuel value c 1 that is less than the step-index, then the target expression e 2 evaluates with some fuel value c 2 , to some result r 2 such that the two results are related with a result relation. Crucially, it enforces that the fuel values of the two executions are related with the local invariant Q L . The resulting relation relates results that are either both out-of-time exceptions or both related values. It is parameterized by the global invariant Q G . This is more subtle; we defer its explanation to section 5 where we give a concrete definition of the result relation.</p><p>By showing that a transformation, trans, inhabits the relation for an appropriate choice of local invariant we show that divergence is preserved. That is, we must prove &#8704; e, E k (e, trans(e)) {Q L ; Q G } for some Q L that implies that when Q L (c 1 , c 2 ), then c 1 &#8804; f (c 2 ) for some strictly monotonic f . We found that such an upper bound was quite hard to find for the inlining transformation (but quite easy to establish with our logical relation once found). It required us to tweak our semantics to keep track of the total number of function calls that are executed by the program in addition to the fuel consumption.</p><p>This approach is inspired by that of <ref type="bibr">Paraskevopoulou and Appel [2019]</ref> who show (among other things) that a closure conversion transformation preserves diverging behaviors. In particular, they use a logical relation parameterized by pre-and postconditions&#208;which are analogous to our relational invariants&#208;to impose a lower bound on the steps of the closure-converted program. However, they do not show how this technique can be applied to transformations that reduce the number of program steps.</p><p>In the next section, we describe our the intermediate representation and the multi-pass pipeline on which we apply these verification technique. In section 5, we set up our verification framework that incorporates these ideas.</p><p>3 THE &#955; ANF PIPELINE Our &#955; ANF pipeline efficiently compiles away first-class functions and transforms the code to a representation that can be readily compiled to a first-order, low-level language. The pipeline will create a known and an escaping instance for each function <ref type="bibr">[Appel 1992</ref>]. Known instances are used only in statically known function calls, whereas escaping instances are used when a function escapes to a different scope either through function return or parameter passing. Known instances can be compiled more efficiently using specialized calling conventions. Known curried functions can be uncurried by compiling them to functions that receive all of their arguments simultaneously. In addition, closure records of known functions can be eliminated and their closure environments can be allocated either in a heap-allocated environment or in registers. The latter is achieved with a lambda-lifting transformation that transforms functions to receive their free variables as parameters. In this section we formally define the &#955; ANF intermediate representation and we give an overview of the &#955; ANF pipeline.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.1">The &#955; ANF Intermediate Language</head><p>The &#955; ANF language (fig. <ref type="figure">1</ref>) is a lambda-calculus in A-normal form <ref type="bibr">[Flanagan et al. 1993]</ref>, extended with constructors, projections, case analysis and (mutually) recursive function definitions. There are two kinds of function calls: let-bound function application, where the control returns to the caller, and tail call. The patterns in case-analysis statements discriminate only the constructor and do not bind any arguments; arguments of constructors must be projected explicitly in the body of each case-clause. We use overbar to indicate repetition, so fun f x = e 1 in e 2 defines a list of zero or more mutually recursive functions, each of which may have zero or more parameters. We use metanotation for operations on such lists: [] for empty list, x :: x for consing an element at the beginning of a list, x + + y for list concatenation, and |x | for list length. The input &#955; ANF program may have nested functions with free variables. The output program is in a closure-converted subset &#955; C ANF . In this restricted subset, programs have the form fun f x = e f in e, where all the functions are closed, and neither e nor the function bodies e f contain function definitions. There are only two levels of scope in the program: the scope that contains the global definitions of the mutually defined functions and the local scope of each function body. In &#955; C ANF , function values can be represented as bare function pointers and do not require closures. In addition, all intermediate results are explicitly evaluated and stored in variables. Therefore, &#955; C ANF is straightforward to code-generate to low-level languages such as C or assembly.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.2">&#955; ANF Transformations</head><p>The &#955; ANF pipeline consists of 13 passes: Shrink-Uncurry-Inline-Shrink-&#955;Lift-Inline-Shrink-ClosureConvert-Hoist-Inline-Shrink-DPE-Shrink, built from the following 7 transformations.</p><p>Shrinking is a transformation that performs shrink reductions <ref type="bibr">[Appel and Jim 1997b]</ref>: case and projection folding (i.e., static evaluation of case statements and projections of values that are statically known), dead code elimination, and inlining of functions that are called exactly once. We use the proved-correct shrink-reducer of Savary B&#233;langer and <ref type="bibr">Appel [2017]</ref> which operates on the CPS subset of our &#955; ANF . We extended the implementation and the proof to apply to the full &#955; ANF language and we adapted the proof to our verification framework. The previous proof of the shrink reducer showed semantic preservation only for terminating programs.</p><p>Inlining inlines nonrecursive function calls picked based on a heuristic or marked for inlining by a different pass (uncurrying and lambda lifting). Our current heuristic is to inline small-bodied functions. Inlining let-bound function calls suffers from the usual problems of inlining in A-normal form languages <ref type="bibr">[Kennedy 2007</ref>]. When inlining a let-bound function call, our inliner will perform a renormalization step so that the program is in A-normal form. Inlining let-bound function calls whose body is a case expression requires introducing a join point <ref type="bibr">[Maurer et al. 2017]</ref>. Our current inliner will not inline such functions, but this is not a fundamental limitation of our approach.</p><p>Uncurrying. After ANF conversion, all functions are unary (or 2-ary if CPS conversion is used, which adds an additional continuation argument). Multi-argument functions in Coq are curried: they receive an argument and return a new function until all of the arguments are consumed, causing a closure allocation for each intermediate function return. Our uncurrying transformation converts calls to statically known curried functions into efficient multi-argument calls. It does so by detecting curried functions based on their syntactic structure and introducing a new multi-argument function. The approach is described by <ref type="bibr">Appel [1992]</ref>.</p><p>Closure conversion and hoisting convert the input program to a program with flat function structure. Closure conversion eliminates free variables in functions by introducing explicit closures. Then a separate pass hoists all functions into a top-level block of mutually recursive closed function definitions. After this pass, all variable references are either references to function parameters and local variables or references to function names defined in the top-level function block.</p><p>Our closure-conversion transformation treats all functions uniformly. It closure-converts every function by installing a new environment parameter and creating a heap-allocated closure. The rest of the &#955; ANF transformations eliminate redundant closures: lambda lifting creates known and escaping instances and (selectively) converts known functions to receive their free variables as parameters, shrink-reduction statically evaluates projections from statically known closure pairs, dead parameter elimination removes useless environment parameters.</p><p>Lambda Lifting <ref type="bibr">[Hughes 1982;</ref><ref type="bibr">Johnsson 1985</ref>] runs before closure conversion and is key to the implementation of our efficient closure allocation strategies. This transformation creates closed functions by passing free variables as extra arguments so that they do not have to be passed through a closure environment. Effectively, this stores known-function closure environments in registers (instead of allocating them in the heap). Lambda lifting typically lifts closed function definitions to the top-level (hence its name); but we defer that step to our hoisting transformation. Our lambda lifting can be expressed as the congruent, transitive closure of the following local rewrite step.</p><p>where fv &#8838; fv(fun</p><p>It defines a known copy f &#8242; of the function f that receives its free variables as parameters. The function f is then redefined in terms of f &#8242; in both the local scope of the function and after the function definition. Known calls to f inside e 1 or e 2 , will be inlined to call f &#8242; . When the function escapes, f &#8242; is called through the escaping wrapper f . Recursive calls will always call the known function instance f &#8242; . The known and the unknown instances are not mutually defined, because mutually defined functions share their closure environment. Therefore, the closure environment of the known instance would not become eliminated.</p><p>Our lambda lifting transformation is selective: not every known function instance will receive its free variables as parameters. As observed in other compilers <ref type="bibr">[Santos 1995;</ref><ref type="bibr">Jones 1996</ref><ref type="bibr">] [Leroy et al. 2020, Chapter 21]</ref>, lambda lifting every function indiscriminately may result in worse performance. We devise a new set of criteria for lambda lifting that, according to our preliminary experimental results, avoid worsening performance in the set of benchmarks we consider. Furthermore, selective lambda lifting achieves considerable performance improvement (6%-7%) in some cases. The criteria are as follows:</p><p>(1) The total number of a function's arguments after lambda lifting should not exceed the available registers in the machine in order to avoid register spilling.</p><p>(2) When a nontail call is performed, parameters and locals in (caller-save) registers must be pushed into the stack before the call and popped after it. If a free variable is used only late in a function-body, after such calls, it would be fetched late from the closure environment and no such pushing and popping would have occurred. But when such a free variable becomes a parameter (live across several calls), this generates more memory traffic than fetching its value from a heap-allocated closure environment. So we lambda-lift free variables only if they are live across at most one function call. To our knowledge, we are the first to make this observation about the performance of lambda lifting. As we demonstrate in our evaluation section, lambda lifting without taking this criterion into account does not give as good performance improvement.</p><p>(3) When a lambda-lifted function is called, the extra arguments can bloat the size of the calling function's closure if these variables are not part of the caller's local variables or closure environment. Our lambda lifting will (optionally) not inline calls to escaping function wrappers if that would cause an increase the size of the caller's closure environment. Note that this heuristic is off by default. Often there is a cascading effect and the closure of the caller can be avoided too, so this strategy can miss opportunities for optimization. We believe these are the most refined lambda lifting strategies implemented by any compiler. Flambda's lambda lifter is not selective, and GHC's Core-to-Core (selective) lambda lifter <ref type="bibr">[Santos 1995;</ref><ref type="bibr">Jones 1996;</ref><ref type="bibr">Graf and Jones 2019]</ref> only considers strategy (3). We also considered inlining the known function call inside the escaping wrapper (as OCaml's Flambda <ref type="bibr">[Leroy et al. 2020, Chapter 21</ref>] optimizing pipeline does) but we did not observe performance improvement&#208;probably because the back end implements tail calls to known functions as efficient jumps with arguments, or even by fall-through.</p><p>Dead parameter elimination removes unused arguments from functions. It performs liveness analysis around (mutually) recursive functions to find which arguments are truly needed by the computation, and it marks other arguments as unused. Some of the unused arguments will be useless closure environments installed by closure conversion: if (before closure conversion) a function is closed and has only known calls, then no closure environment is needed. Rather than adding this special case to our closure-conversion algorithm, we just build the closure anyway, and then clean it up with DPE and shrink-reduction.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.3">Compilation by Example</head><p>We illustrate the compilation of a program through the &#955; ANF pipeline with an example. We deviate slightly from the syntax of &#955; ANF in order to keep the code shorter and more readable. In the following program, the function interleave traverses a list interleaving the free variables x and y among its elements. Then the function is applied to list l1 and it is also applied to each element of the list (of lists) l using the map function.</p><p>In the above program, uncurrying has already been done, and the known function map (assuming that it is defined earlier in the program) is applied to both of its arguments simultaneously. Next, lambda lifting is applied. It creates two copies of interleave. The known copy, interleave_known, receives its free variables as arguments and is a closed function. The escaping copy, interleave, is defined both inside the function body and after the function definition and will be used at escaping positions. Our inliner will inline statically known interleave calls. The shrinker will eliminate the dead code (e.g., the definition of interleave inside interleave_known). After these transformations, we have:</p><p>Closure conversion runs next, after which all function applications will fetch the code and the environment from the closure pair. We use the constructor C cc for closure pairs and the constructor C f for the closure environment of the function interleave. The projections from statically known closure-pairs will be statically evaluated by shrinking, and the constructed closure will be dead code, and therefore deleted. The redundant environment argument of known functions interleave_known and map_known will be deleted by dead parameter elimination. The only function that will get a heap-allocated environment and closure pair is the escaping interleave function. The final code is shown below.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4">SEMANTICS AND COMPILER CORRECTNESS</head><p>Our goal is to prove that our &#955; ANF pipeline compiles programs correctly, that is, the target program preserves the source program's observable behaviors, such as termination and divergence. We will prove compiler correctness both for closed, whole programs and open programs, compiled separately and linked at the target level. A program obtained after linking modules at the target level should preserve the behavior of the program obtained by linking the modules at the source-level. We are interested not only in compiling programs separately, but also in compiling them with different optimizations enabled. Not every optimization will be beneficial for every program so we might turn optimizations off. We might also improve some &#955; ANF optimizations (e.g., implement more efficient closure environment representations <ref type="bibr">[Shao and Appel 2000]</ref>) or add new &#955; ANF optimizations. Then we should be able to link newly compiled modules with previously compiled modules.</p><p>For simplicity, we omit mutually defined functions from the formal definitions the rest of the paper. Our implementation and mechanization fully support mutually recursive functions.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.1">Semantics</head><p>To prove correctness of a compiler we formalize the semantics of the source language (&#955; ANF ) and target language (also &#955; ANF ). 4 The semantics of &#955; ANF is a fuel-based, big-step semantics. It relates a term with a final result under a fuel value; the final result will be an out-of-time exception if there is insufficient fuel to carry out a computation. Using this fuel-based definition we can semantically characterize nonterminating terms and hence state and prove that divergence is preserved through our compiler.</p><p>We first define &#955; ANF values, evaluation environments, and results (fig. <ref type="figure">2</ref>). A &#955; ANF value is either a constructed value or a closure value consisting of an environment and a function definition. An environment is a partial map from variables to values. The final result returned by the semantics is either a value (wrapped in a Res constructor) or an out-of-time exception (OOT). Our semantic definition is indexed with a fuel and a trace value. The fuel keeps track of the length of the derivation tree of the evaluation. It is used to define diverging executions and show that diverging behaviors are preserved. The trace value counts the number of application steps (let-bound or tail) in the derivation tree. We found that this is necessary in order to express an invariant for the inlining transformation in ANF form. In section 6, we give an upper bound of the fuel consumption of a program before inlining as a function of both the fuel consumption of the inlinlined program and the trace of the input program. Our mechanized semantic definitions are parameterized by two abstract commutative monoids that can be instantiated with different fuel and trace models. But since in this paper we use one concrete fuel and trace instantiation, we only present those.</p><p>We now have all the tools we need to define the semantics. The judgment (&#963; , e) &#8659; c t r asserts that the configuration consisting of environment &#963; and term e evaluates with fuel c to result r incurring a trace t. Recall that a result is either a value or an out of time exception. The semantic judgment is mutually defined with the auxiliary judgment (&#963; , e) c t r . Conceptually, &#8659; is responsible for the fuel and trace profiling, whereas evaluates the outermost constructor of the term.</p><p>Each evaluation step first goes through the rule Step that does the bookkeeping of trace and fuel resources and invokes the judgment to evaluate the outermost constructor of the term. The fuel needed to evaluate the outermost constructor of a term e is always one unit (given by fuel(e)). The 4 Our target program is in the &#955; C ANF subset. Any &#955; C ANF program can be interpreted using &#955; ANF semantics or using a simpler &#955; C ANF semantics, which does not use closure values, and the observable result will be the same. We move to closure-less semantics during our code generation proof, so we don't need &#955; C ANF semantics for the results covered in this paper.</p><p>Step where fuel(e) = 1 trace( f y) = 1 trace(let x = f y in e) = 1 trace(e) = 0 if e is not a function call trace is unit only when the expression is an application and zero otherwise (given by trace(e)). The rules of are straightforward. Observe that the semantics of &#955; ANF does not permit partial application of functions. All functions must be fully applied to their arguments. This trivially holds after ANF translation since functions are curried and all functions and applications are unary. Multi-argument functions are introduced by the &#955; ANF pipeline and are never partially applied. The rule oot throws an out-of-time exception whenever there is not enough fuel to evaluate the outermost constructor of the term. In this case, an empty trace is recorded.</p><p>Using this semantics, we can define what it means for a term to diverge. A term diverges, written (&#963; , e) &#8657;, if for all fuel values, the computation runs out of time.</p><p>When e is closed and evaluates in the empty environment we will write e &#8659; c t r and e &#8657;.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.2">Correctness Specification</head><p>With the semantics in hand, we can state the top-level correctness specification for the &#955; ANF pipeline. </p><p>We say that a compiler comp is correct for whole programs if for all terms e, e &#8839; B comp(e). The value refinement relation &#8839; V specifies when a target value refines a source value:</p><p>The value refinement relates constructed values that have the same outermost constructor and pairwise related values. It relates &#955; ANF closure values with explicitly constructed closure values (with constructor tag C cc ), but it does not specify anything about the behavior of the function parts of the closure. Why? We use behavioral refinement to relate closed, top-level programs. When we run a whole, top-level program we can observe only the first-order components of the result. First order results are constructed values that can be inspected with appropriate knowledge of the memory representation. In contrast, to observe a function value, a program must be linked with another program that will apply the result to some arguments. We address this next.</p><p>First, we need to define what linking programs means in the &#955; ANF language. Our linking operation links a client program e client that has a free variable x with a library program e lib that computes the value of the variable x. It is defined below.</p><p>Intuitively, we can think of the linking operator as a closing substitution that substitutes the variable x in the client program with the term e lib . But the syntactic restrictions of A-normal form mean that in &#955; ANF we cannot simply substitute an identifier for an expression or write let x = e lib in e client . Therefore we use a zero-arity function to wrap the computation e lib and we bind the result of application of this function to an empty list of arguments to the variable x. The linking operator can be generalized to multiple external references.</p><p>Using behavioral refinement and the linking operator we can define what it means for separate compilation to be correct. Two programs e and e &#8242; that are separately compiled with compilers comp 1 and comp 2 may be safely linked if we have:</p><p>The specification for separate compilation asserts that the whole target program obtained by linking the two compiled programs refines the behavior of the whole source program obtained by linking the two programs at the source level.</p><p>When the same compiler is used to compile both programs (i.e., comp 1 = comp 2 ) and compilation commutes with linking (i.e., comp</p><p>, it is trivial to prove behavioral refinement for linking. However, neither is true for the &#955; ANF pipeline. We want to be able to use different optimizations when compiling programs separately. Even if we were willing to restrict ourselves to the same pipeline, the commutation requirement does not hold for multiple reasons (e.g., alpha-conversion, possible inlining of the zero-arity linking function f ). So there is no easy recipe to show correctness for separate compilation. The novel compositional compiler correctness technique that we introduce in section 5 allows us to show behavioral refinement for separate compilation.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.3">Divergence Preservation</head><p>We now formalize the statement that we outlined in section 2.2: in order to show that divergence is preserved it suffices to show that the fuel consumption of the source program is upper bounded by a strictly monotonic function of the fuel consumption of the target program.</p><p>First, we need to prove a fuel-monotonicity property for our semantics: if a program times out for some fuel value c, then it times out for all smaller fuel values.</p><p>Lemma 4.1 (Fuel monotonicity). Assume that (&#963; , e) &#8659; c t OOT. Then, for any c &#8242; &#8804; c, there exist</p><p>Using the above lemma, we can prove the desired divergence preservation lemma.</p><p>Lemma 4.2 (Divergence preservation). Let f be a function N &#8594; N such that f (x ) &#8804; f (y) &#8658; x &#8804; y. Assume that for two configurations (&#963; 1 , e 1 ) and (&#963; 2 , e 2 ) we know that if (&#963; 1 , e 1 ) &#8659; c 1 t 1 OOT then there exist c 2 and t 2 such that (&#963; 2 , e 2 ) &#8659; c 2 t 2 OOT and c 1 &#8804; f (c 2 ). Then if (&#963; 1 , e 1 ) &#8657; we have that (&#963; 2 , e 2 ) &#8657;.</p><p>Proof. Let c be a fuel value. We must show that there exists t such that (&#963; 2 , e 2 ) &#8659; c t OOT. From the hypothesis that e 1 is a diverging program, we know that (&#963; 1 , e 1 ) &#8659; f (c ) t 1 OOT for some t 1 . Therefore, we can derive that (&#963; 2 , e 2 ) &#8659; c 2 t 2 OOT for some c 2 and t 2 such that f (c) &#8804; f (c 2 ). But from the hypothesis about f we have that c &#8804; c 2 . From the fuel monotonicity property of our semantics we obtain t &#8804; t 2 such that (&#963; 2 , e 2 ) &#8659; c t OOT. &#9633;</p><p>In the next section we will set up a logical relation that let us impose arbitrary binary relations on the fuel consumption of two programs. If such relation implies the required bound of lemma 4.2, then we can establish divergence preservation. For transformations that do not reduce fuel we take f to be the identity function. In order express to express an upper bound the fuel consumption of the program before inlining in terms of the fuel consumption of the program after inlining, we will use the number of function applications performed during the source program's execution, which is captured by the trace value of the semantics of &#955; ANF .</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5">COMPOSITIONAL PROOF FRAMEWORK</head><p>To prove correctness of &#955; ANF transformations, we use untyped, step-indexed logical relations. Although logical relations are more commonly indexed by types, untyped logical relations also appear in the literature <ref type="bibr">[Acar et al. 2008;</ref><ref type="bibr">Paraskevopoulou and Appel 2019;</ref><ref type="bibr">Owens et al. 2017;</ref><ref type="bibr">Georges et al. 2021</ref>]. We set up two different logical relations: a reflexive relation to prove correct transformations that don't change the representation of function values, and a nonreflexive relation to prove correct closure conversion, which does change the representation of functions. Then, as we outlined in section 2, we compose these relations to derive a top-level relation that is both adequate and compatible with linking. Ultimately, that will allow us to derive a separate compilation theorem for our pipeline that asserts that we can safely link programs complied through the &#955; ANF pipeline, perhaps using different &#955; ANF optimizations.</p><p>The The expression relation is E k ((&#963; 1 , e 1 ), (&#963; 2 , e 2 )) {Q L ; Q G }. The first argument k is the usual step index that is needed for the well-foundedness of the definition. The next two arguments are the configurations (pairs of environments and expressions) that are being related. The last two arguments are two relational invariants: a local invariant and a global invariant. This is the only nonstandard feature of our logical relations. These invariants are binary relations over pairs of a fuel and trace value. The relational invariants allow us to establish the bound that we need to show divergence preservation (lemma 4.2). This bound will typically be different for each &#955; ANF transformation. The local invariant is imposed on the fuel and trace values of the executions of the current configurations while the global invariant is imposed on the future execution of functions in the results through the value relation. Local and global invariants need to be separate because global invariants are used to relate only whole-function executions, while local invariants relate programs at arbitrary points during execution. Hence, they might need to be instantiated with a different relation; we give an example of this later in this section.</p><p>The result relation R relates two results that are either out-of-time exceptions or related values. The value relation V relates &#955; ANF values. Two constructed values are related if they are constructed with the same constructor and their constructor arguments have the same lengths and are pairwise related (written V k (v 1 , v 2 ) {Q }). Two closure values are related if for any two lists of parameters related at some strictly smaller step index i, the two configurations that consist of the body of the function part of the (corresponding) closure and the environment part of the closure (extended with appropriate bindings) are related. The closure environments are extended with bindings that map the formal parameters to the actual parameters and the function name to its closure value, which is needed for recursive functions. The value relation is indexed only with a global invariant that relates the fuel and trace consumption of related function bodies. When the expression relation is invoked from the value relation for closures, the global invariant is used to instantiate both the local and the global invariant of the expression relation.</p><p>The environment relation C lifts the value relation to a subdomain S of two environments and it is defined using an auxiliary variable relation X . In particular, for any variable in the set S, we assert that if it is defined in the domain of the first environment then it is also defined in the domain of the second environment and their values are related with the value relation. The environment relation is used to assert that the environment parts of two configurations are logically related. It is convenient to restrict this relation to a variable subset S that is relevant for a program's execution (which is typically the set of free variables of a term).</p><p>The closure-conversion relation is shown in fig. <ref type="figure">5</ref>. Only the closure case of the value relation V CC is different than the reflexive relation; therefore, we show only show this. The other cases of V CC and the rest of the definitions (E CC , C CC , and R CC ) are defined as before. The V CC relation for closures relates a &#955; ANF closure value with a constructed closure value. The first component for the constructed value is a closure value (i.e., the code component of the closure) and the second component an environment (which is also a constructed value). It might seem odd that the code component of the constructed closure is itself a (primitive, not constructed) closure. However, after closure conversion, functions do not become immediately closed. They might still contain free variables that are references to other (known, not escaping) functions. Functions will become closed when they are all moved to a mutually defined function block after the hoisting transformation. We formally prove that after hoisting all functions are hoisted to the top-level and are closed. The compiler moves to a semantics without closure values but just bare functions during the code-generation phase proof. Technically speaking, we could transition to a representation without closures within this framework by using a third logical relation that moves to a closure-less semantics and function representation.</p><p>Correctness. We show correctness of transformations by showing that under any logically related environments the source and target terms are related for an appropriate choice of local and Value relation</p><p>Result relation</p><p>Otherwise.</p><p>Expression relation V global invariants. In particular, we define a top-level logical relation: </p><p>Notice that we require that the free variables of the source term must be defined in the source environment&#208;this is required by the proofs of certain transformations (lambda lifting and closure conversion). In both transformations, the target term accesses free variables earlier than the source term, and we need this additional assumption so that target terms do not get stuck earlier than source terms.</p><p>Local vs. global invariants. Typically, when we consider top-level programs the local and the global invariants are the same. But during a proof, a local invariant may become (locally) different from the global invariant as we consider intermediate states of the execution of the source and target programs. We illustrate this with an example. Let = k be a relation that relates two pairs (c 1 , t 1 ) and (c 2 , t 2 ) of a fuel and trace value if and only if c 1 + k = c 2 (we ignore the trace value in this example for simplicity). The logical relation (as we will formally state later on) is reflexive. Hence, we can establish that:</p><p>Unsurprisingly, the fuel consumption is the same for the left and right configurations since it is exactly the same program. Now assume that we wish to relate the configuration (&#963; [f &#8594; &#10216;&#963; , fun f x = e &#8242; &#10217;], e) with the configuration (&#963; , fun f x = e &#8242; in e). These configurations are clearly related: they compute exactly the same result. The only difference is that in the first configuration, we have performed execution step. The global invariant should still be = 0 since the results are identical and therefore any two related functions in the results will still have the same fuel consumption. However, the local invariant is not the same anymore. If the first configuration consumes fuel f then the second configuration consumes fuel f + 1 since it has to execute one more step. Therefore we have:</p><p>The above two programs are related with a different local and global invariant. Therefore, keeping the local and the global invariants separate gives us the flexibility to relate programs at points of execution where the global invariant does not hold.</p><p>Compatibility Lemmas. As usual, we formally prove compatibility rules for the logical relation, asserting that the logical relation is closed under language constructions. These lemmas are used to reason compositionally about program relatedness. Because our relation is indexed with local and global invariants, to prove the compatibility lemmas we need to assume that the invariants satisfy some compatibility properties too. These help us establish the invariants compositionally, by asserting that the invariants are preserved when a step of computation happens. As an example, we consider the compatibility rule for constructors. The lemma states that two constructors are related if their arguments are pairwise related in the environments, and their continuations are related in the environments extended with related bindings.</p><p>Lemma 5.1 (Compatibility (constructor)). Assume that Q L (0, 0) (0, 0) and that</p><p>The above lemma requires that the local invariant holds for zero fuel and trace value. This is useful to establish the local invariant when both configurations time out. It also assumes that the local invariant is preserved when adding the fuel needed to evaluate a constructor to two related fuel values. This is needed to establish the local invariant. Assume that c 1 and t 1 (resp. c 2 and t 2 ) is the fuel and trace of expression e 1 (resp. e 2 ). From the hypothesis that e 1 and e 2 are related, we know that Q L (c 1 , t 1 ) (c 2 , t 2 ). Using the compatibility assumption for the local invariant, we derive that Q L (c 1 + 1, t 1 ) (c 2 + 1, t 2 ), which relates the fuel and trace values of let x 1 = C(y 1 ) in e 1 and let x 2 = C(y 2 ) in e 2 .</p><p>Similarly, we need compatibility rules for resource invariants to show the rest of the compatibility lemmas of the logical relation. We define the predicate Compat Q G Q L to mean that the local invariant Q L and global invariant Q G satisfy these compatibility rules. It asserts that:</p><p>(1) Q L (0, 0) (0, 0). This is required when both configurations time out.</p><p>(2) Q L (1, 0) (1, 0). This is required when both configurations return. Recall from the definition of the semantics that return does not incur any trace.</p><p>This is required for the inductive constructors of the language that are not function applications, as illustrated in lemma 5.1.</p><p>This is required for the compatibility lemma of let-bound application. The first premise holds for the fuel and trace values of the function bodies, which are related by the global invariant.</p><p>The second premise holds for the fuel and trace values for the expressions that are evaluated after the function calls (i.e., the expressions that the calls are let-bound in). The result adds the fuel and trace values of the function calls and their continuations and adds a unit to both the fuel and the trace values (since applications require a unit of fuel and a unit of trace). ( <ref type="formula">5</ref></p><p>This is required for the compatibility lemma of tail-call applications.</p><p>Reflexivity. The E relation is reflexive. We take advantage of this property later on, in order to prove a linking theorem for pipelines that use an arbitrary number for same-representation transformations. Reflexivity is proved using the compatibility lemmas for the logical relation; hence, it assumes that the invariants satisfy the compatibility rules of the previous paragraph.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Lemma 5.2 (Reflexivity).</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Assume that Compat</head><p>Transitivity. The E relation can be transitively composed:</p><p>This lemma requires that the global invariant is semi-idempotent:</p><p>. This is necessary in order to transitively compose the value relation when the results are closures (using the induction hypothesis&#208;the induction is on the step-index). We use this property in the correctness proof of the uncurrying transformation, which is expressed as the transitive closure of a rewrite step. For the simple invariant of this transformation the semi-idempotency requirement holds. However, the relation E is not transitive in the general case. We cannot compose the proofs of transformations that have different global invariants or have global invariants that are not idempotent.</p><p>Adequacy and Compatibility with Linking are two important properties of the logical relations. A relation is adequate if it implies the behavioral refinement that we wish to establish. To prove adequacy we need to establish divergence preservation. Therefore, we need to know that the local invariant implies the upper bound of lemma 4.2. This is captured by the following definition.</p><p>Definition 5.4 (Invariant, upper bound). UpperBound Q holds iff there exists a function f such that &#8704; x y, f (x ) &#8804; f (y) &#8658; x &#8804; y and for all fuel values c 1 , c 2 and trace values</p><p>Using the UpperBound predicate we can state and prove adequacy for the logical relations. The logical relation E CC implies the behavioral refinement of the previous section. This is stated below (&#8709; is the empty environment).</p><p>Lemma 5.5 (Adeqacy). Assume UpperBound Q L . Then,</p><p>The logical relation E is also adequate but with respect to a different behavioral refinement that relates closure values to primitive closure values (not constructed closures).</p><p>Using adequacy, we can derive whole-program correctness for our compiler by showing each transformation correct with respect to the logical relation and then transitively composing behavioral refinement (which is transitive). But this approach does not support compositional correctness.</p><p>The logical relations are also compatible with linking. This means that linking related modules with related modules yields related modules and it is captured by the following lemma. It holds for both relations, but we state it only for E.</p><p>Lemma 5.6 (Linking compatibility). Assume that Compat Q G Q L . The following holds.</p><p>In the literature, this property is also referred to as horizontal compositionality <ref type="bibr">[Neis et al. 2015;</ref><ref type="bibr">Song et al. 2019]</ref>. By combining linking compatibility with adequacy, we can derive a corollary: If a transformation inhabits the logical relation then it also satisfies the behavioral refinement for separate compilation we defined in the previous section.</p><p>Unfortunately, logical relations are not (in the general case) transitively composable. Therefore, there is no obvious way to specify the whole pipeline with a relation that is both adequate and compatible with linking.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.1">Compositionality</head><p>To show the desired behavioral refinement for separate compilation we need to prove that the entire pipeline inhabits a relation that is adequate and compatible with linking. Since logical relations are not transitively composable, we cannot prove that the end-to-end pipeline inhabits a logical relation. However, we make a crucial observation: adequacy and compatibility with linking are closed under relation composition. We can devise a new relation that is adequate and compatible with linking by composing individual logical relations. Since each transformation is proved correct with respect to a logical relation, the entire pipeline can be easily shown to inhabit the composition of these logical relations. Therefore, we can show that it satisfies the correctness specifications for both whole-program and separate compilation.</p><p>We begin by defining the relation E + (fig. <ref type="figure">6</ref>), the transitive closure of the reflexive logical relation (strengthened with some additional requirements). At the base case (rule Step), we require that two expressions are related with the top-level logical relation E. The local and global invariants are existentially quantified and do not appear in the top-level definition. We only require that they satisfy the compatibility rules (required for proving linking compatibility) and that the local invariant implies the desired upper bound (required for adequacy). Lastly, we require that the names of the free variables are preserved, which is required to show linking compatibility.</p><p>Lemma 5.7 (Linking compatibility of E + ). Assume that closed(e lib 1 ) and that fv(e 1 ) &#8838; {x }. Then: </p><p>Step</p><p>Trans Fig. <ref type="figure">6</ref>. The E + relation.</p><p>Lemma 5.7 implies that any program that has been transformed with a series of transformations that satisfy the E relation can be linked with a program that has been transformed with a (possibly different) series of transformations that satisfy E. All transformations of the &#955; ANF other than closure conversion satisfy the E relation. Because E is reflexive, the identity transformation also satisfies the E relation. For example, a program that has been inlined and shrink reduced can be safely linked with a program that had its dead parameters remove.</p><p>The proof of this property makes use of reflexivity. In the two premises, the number of transitivity steps of E + need not be the same. If the first premise has m transitivity steps and the second n transitivity steps the result will have m + n transitivity steps. We explain this with an example.</p><p>Assume that we translate the client program e 1 to a program e 2 with two transformations and the library program e lib 3 to e lib 4 with only one transformation. We want to link e 2 with e lib 4 . For the client, we have that </p><p>3 ]e 2 ), ([x &#8594; e lib 4 ]e 2 )) {Q 3 L ; Q 3 G } Each one of these relations is proved by lemma 5.6 where one of the premises is obtained by reflexivity.</p><p>SepCompCert <ref type="bibr">[Kang et al. 2016</ref>] uses the fact that compilation commutes with linking to derive a separate compilation theorem for CompCert. This technique also relies on reflexivity. By modifying syntactic simulations of optional transformations to be reflexive relations (so that they are inhabited by the identity transformation too) it allows separately compiled programs to use different optional optimizations. That is, a program may use an optional optimization or the identity transformation instead. Unlike ours, this technique is not compositional. The linking theorem is not stated with respect to a general relation but it depends on the particular pipeline. Hence, it has to be reproved when new transformations are added or when transformations are reordered.</p><p>Next, we define the E We can easily derive a separate compilation theorem for any pipeline that is in the E + CC relation; pipelines that are in the E + CC relation need not use the same transformations. All transformations that satisfy E are optional; we can safely link modules regardless of whether these transformations are enabled or not during compilation. The only restriction is that pipeline encompass a closure conversion transformation that satisfies E CC &#208;which is quite general. It would permit, for example, sophisticated closure representations such as Shao and Appel's safe-for-space hybrid flat/linked closures <ref type="bibr">[Shao and Appel 1994]</ref>. This is another crucial difference with SepCompCert's theorem: SepCompCert only supports separate compilation with different optional transformations whereas our stronger compositional theorem supports separate compilation of programs with entirely different transformations as long as they satisfy the same logical relations.</p><p>Extension to cross-language setting. Even though we stay in the same &#955; ANF language, our closure-conversion logical relation behaves as a cross-language logical relation. This technique supports separately compiling programs through pipelines that go through n &#8805; 0 representation changes that are surrounded by n + 1 chunks of same-representation passes, with each chunk comprising of an arbitrary number of passes. A top-level compositional relation is obtained by composing the adjacent cross-language relations, adding an intermediate reflexive relation for the IRs where same-language transformations happen. In our application n = 1 (closure conversion), but the principle generalizes to any n.</p><p>The rest of CertiCoq's pipeline is proved correct (or is in the process of being proved correct) with syntactic simulations. We can still derive an end-to-end theorem for separate compilation (with arbitrary &#955; ANF optimizations), but it will not be compositional. The passes that are proved correct with a simulation will have to be the same in separately compiled programs.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="6">COMPOSITIONAL CORRECTNESS</head><p>In this section, we establish the end-to-end correctness properties of our pipeline. First, we discuss the individual correctness proof of each transformation. Then, we show how to compose these correctness proofs to derive an end-to-end compositional compiler theorem for the whole pipeline.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="6.1">Correctness of &#955; ANF Transformations</head><p>We first discuss the resource invariants that we use in the correctness proof of each transformation. For now, we ignore fresh variable generation in order to simplify the type of transformations and the correctness statements. We make this explicit late in this section. Transformations that do not decrease the program steps satisfy a very simple invariant that states that the target steps are always greater or equal to the source steps. We define &#8804; to be the invariant that maps two pairs of fuel and trace (c 1 , t 1 ) and (c 2 , t 2 ) to c 1 &#8804; c 2 .</p><p>Closure conversion. The conclusion of the top-level theorem of the closure conversion transformation (denoted cc) states: E CC (e, cc(e)) {&#8804;; &#8804;}.</p><p>Lambda lifting, uncurrying, hoisting, DPE. These transformations satisfy the invariant &#8804;, therefore they satisfy E (e, trans(e)) {&#8804;; &#8804;}, where trans is any of these transformations.</p><p>Inlining. The specification of inlining is trickier as this transformation can decrease the program steps. This is where we make use of the trace kept by our semantics. Recall that the trace is keeping track of how many function applications are executed by a program. First, consider that for each function that is inlined the target program will perform one or two steps less than the source. One step is removed because the target does not perform then function call. A second step might also be removed when we inline a let-bound call of a function that ends return because the return statement will be removed in the inlined code. For example, consider the program let z = f x y in e where f is defined (earlier in the program) to be fun f x y = let w = Constr(x, y) in ret(w ). After inlining f the program becomes let z = Constr(x, y) in e. The program before inlining consumes 3 + c units of fuel, where c is the fuel consumption of e, whereas the program after in lining consumes 1 + c units of fuel.</p><p>Let c 1 and t 1 be the fuel and trace (i.e., the total number of function applications) of the source and c 2 and t 2 the fuel and trace of the target. Let G be the maximum number of calls that are inlined inside the body of a source function and L the total number of calls that are inlined in the top-level expression of the source program (without considering inlined calls in nested function definitions). The total number of inlined calls when the program runs is therefore upper bounded by G * t 1 + L, where t 1 is the total number of function calls executed by the source program, captured by the trace value. For each one of these inlined calls, the target might perform at most two evaluation steps less than the source program. So we have the bound c 1 &#8804; c 2 + 2 * G * t 1 + 2 * L. Using similar reasoning, we also have that t 1 &#8804; t 2 + G * t 2 + L for the total number of calls performed by the source program. Because the total number of calls that a program performs is less or equal to the total number of steps, we have that t 2 &#8804; c 2 . Using these three inequalities, we can derive a bound c 1 &#8804; A * c 2 + B for some A and B that depend on G and L. This upper bound satisfies the requirements for divergence preservation of lemma lemma 4.2. We define:</p><p>For inlinining (inline) we show: E (e, inline(e)) {Q inline G G; Q inline G G} where G is the maximum number of inlined calls in individual function bodies and the top-level expression (the transformation keeps track of that). Because the logical relation is monotonic in the local (but not the global) invariant we derive E (e, inline(e)) {Q top inline G G; Q inline G G}. Shrinking. The shrinking transformation performs inlining as well as other static reductions and satisfies the same bound as inlining. We express shrinking (shrink) as the transitive closure of a transformation that performs one shrinking step (shrink_one). For the one-step shrinking, we prove E (e, shrink_one(e)) {Q inline 1 1; Q inline 1 1}. By local invariant monotonicity, we derive E (e, shrink_one(e)) {Q top inline 1 1; Q inline 1 1}. Observe that G = L = 1 since only one reduction is performed. Then, for the shrinking transformation we obtain E + (e, shrink(e)).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="6.2">Composition and End-to-End Correctness</head><p>So far, we presented &#955; ANF transformations as total programs, but in reality they are partial programs: they may fail to produce an output for some inputs. Their type is anf_trans def = exp &#8594; comp_data &#8594; error exp * comp_data where exp is the type of &#955; ANF terms and comp_data the type of the compilation state. The sum type error has two variants: programs can successfully return a term (written Ret e), or can return an error message (written Err s). Our transformations receive a compilation state comp_data as an argument threaded through the computation. They use it to generate fresh names for binders.</p><p>We show that whenever some input term e is well-scoped (written well_scoped(e)), meaning that e has unique bindings and its free variables are disjoint from its bound variables, then the transformation successfully produces an output term that is also well-scoped and semantically related to the input term with either E</p><p>+ or E + CC . We assume that the next available free variable is strictly greater than the identifiers (free or bound) that are used in the input term (variables are represented as binary natural numbers). This implies that the freshly generated free variables are disjoint from the identifiers of the program, which is required for correctness and for preserving well-scopedness. The full top-level correctness specification of &#955; ANF transformations is shown below. Similarly, we define correct_cc, a predicate over &#955; ANF transformations that has the same shape as correct, with the difference that it uses the E + CC relation. &#955; ANF transformations can be sequentially composed to derive complex &#955; ANF transformations. Their correctness specifications can be composed as well. We have that:</p><p>We use these facts to derive end-to-end correctness.</p><p>The &#955; ANF pipeline, comp_anf, is a &#955; ANF transformation that takes an additional argument that determines which optional optimizations are enabled. We prove that whatever the choice of optional transformations is, the &#955; ANF pipeline is correct w.r.t. to E + CC . Theorem 6.2 (Correctness of the &#955; ANF pipeline.). &#8704; opt, correct_cc (comp_anf opt ).</p><p>Using this top-level theorem for the &#955; ANF pipeline, we can prove correctness of whole-program compilation and correctness for separate compilation as easy corollaries.  7 EVALUATION Does selective lambda lifting (which pays attention to variable liveness) achieve better performance improvements than traditional non-selective lambda lifting? Does CertiCoq with the optimizing &#955; ANF pipeline succeed in generating native-speed code that is comparable to standard extraction with optimizing OCaml native-code compilation?</p><p>Since there is no standard benchmark suite for Gallina, we devised four benchmarks to estimate answers to these questions: sha: Secure Hash Algorithm 2 (SHA-256) computing the cryptographic hash of a 484-character string, as in <ref type="bibr">Appel [2015]</ref>. color: A formally verified implementation of the Kempe/Chaitin graph-coloring algorithm <ref type="bibr">[Chaitin et al. 1981]</ref> from <ref type="bibr">Appel [2020]</ref>, coloring a graph with 156 nodes and 1168 edges. binom: A verified binomial queue implementation <ref type="bibr">[Vuillemin 1978</ref>] from <ref type="bibr">Appel [2020]</ref>, constructing two priority queues with 1000 elements each, merging them, and finding the max. vs-easy: VeriStar <ref type="bibr">[Stewart et al. 2012</ref>], a formally verified decision procedure for a decidable fragment of separation logic, based on a state-of-the-art paramodulation algorithm [Navarro P&#233;rez and Rybalchenko 2011], on an easy entailment. vs-hard: VeriStar deciding validity of a harder entailment.</p><p>We then compared three versions of lambda-lifting (never, selective, always). For overall performance, we compared CertiCoq+llvm (clang 10.0.1 with omit-frame-pointer), CertiCoq+CompCert (3.8), and Coq-extraction+ocamlopt (4.07.1). The experiments ran on a 2.5 GHz Intel Core i7.</p><p>Figure <ref type="figure">8</ref> demonstrates that (on these benchmarks) selective lambda lifting speeds up some programs and slows down none, which is not true of traditional nonselective lambda lifting.</p><p>CertiCoq's performance is not yet competitive with OCaml native code, as shown in Figure <ref type="figure">9</ref>. We suspect that this is mainly because CertiCoq does not have a native code generator specialized for functional languages&#208;it goes through C's calling convention<ref type="foot">foot_4</ref> for all control flow, including loops (&#955; C ANF would be suitable for direct generation of efficient machine language with the addition of a register allocator and, of course, a native code generator for each target machine.) Adding more optimizations in &#955; ANF would probably further improve performance.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="8">RELATED WORK</head><p>We compare our work with proved-correct compilers and frameworks for compositional compiler correctness.</p><p>CakeML <ref type="bibr">[Kumar et al. 2014;</ref><ref type="bibr">Tan et al. 2016</ref>] is an end-to-end verified compiler for a substantial subset of the ML language. Most closely related to our work is the ClosLang pipeline of CakeML <ref type="bibr">[Owens et al. 2017;</ref><ref type="bibr">Tan et al. 2019]</ref>. Similar to the &#955; ANF pipeline, ClosLang introduces calls to multi-argument functions and optimizes calls to known functions. CakeML uncurries both statically known and escaping functions. This is possible because the ClosLang semantics allows partial application (via mismatch semantics). Like &#955; ANF , CakeML eliminates closures of statically known functions, but unlike &#955; ANF , CakeML does not implement specialized closure allocation strategies for known functions with free variables. The ClosLang pipeline also performs two passes to track the flow of known function calls and annotate statically known functions and function calls. In &#955; ANF , we do not need any additional flow analysis for our closure strategies. The ClosLang intermediate representation needs to distinguishe between ML-style and C-style calls, which is not needed in our setting.</p><p>ClosLang's verification framework uses untyped step-indexed logical relations. But unlike our framework, it does not address compositional compiler correctness. CakeML's theorem only applies to whole-program compilation. Divergence preservation is handled by requiring that two related programs have identical fuel consumption, which requires adding a Tick instruction to all intermediate representations of CakeML. With our work we show how this can be avoided.</p><p>Pilsner <ref type="bibr">[Neis et al. 2015</ref>] is a multi-pass CPS compiler for an ML subset. Its compositional compiler correctness theorem is stronger than &#955; ANF 's in that it applies to modules compiled from the same source, regardless of how are they compiled. Our technique works only on pipelines with the same series of intermediate languages. Pilsner is verified using parametric intra-language simulations (PILS), a novel relational model that is adequate, compatible with linking, and transitive. But the metatheory of PILS, including its transitivity proof, is quite complex and required a lot of proof effort <ref type="bibr">[Neis et al. 2015, Section 4</ref>]. We couldn't have used PILS to verify the &#955; ANF pipeline because it does not support the eta-conversion rule used by our transformations (e.g., uncurrying). A solution has been suggested <ref type="bibr">[Hur et al. 2014</ref>], but has not been incorporated into PILS. Pilsner performs minimal optimizations; it does not uncurry functions or eliminate closures of known functions.</p><p>OEuf <ref type="bibr">[Mullen et al. 2018</ref>] is a prototype compiler from Gallina (Coq's specification language) to Clight. Oeuf supports an impoverished subset of Coq: no user defined inductive types (just predefined bool, list, etc.), no pattern matching, no recursive functions, no dependently typed programs. CertiCoq supports full Gallina (except coinductive types), representable in &#955; ANF . OEuf does not optimize the code; the authors state that it is a non-goal.</p><p>SepCompCert <ref type="bibr">[Kang et al. 2016</ref>] extends CompCert's <ref type="bibr">[Leroy 2009b</ref>] proof to separate compilation. Its theorem supports separate compilation of programs with different optional optimizations. As we explained in section 5, it is verified using a lightweight technique that is not compositional; the theorem about separate compilation is not derived from a general relation. Therefore, it has limitations that are not present in our work. First, SepCompCert requires modification to the statements and proofs of transformations. Second, in SepCompCert the linking theorem dependes on the pipeline and must be reproved for each variation of the pipeline. We prove it as a corollary of our compositional relation. Lastly, the linking theorem of SepCompCert applies only to optional transformations that can be replaced with the identity transformation. Our framework can be used to link programs that are compiled with different nonoptional transformations, such as two different implementations of closure conversion that may use different environment representations.</p><p>Compositional Compiler Correctness is a challenging problem that has been addressed in different contexts. There are many extensions of CompCert to compositional correctness: Comp-CompCert <ref type="bibr">[Stewart et al. 2015]</ref>, CompCertX <ref type="bibr">[Gu et al. 2015;</ref><ref type="bibr">Wang et al. 2019]</ref>, and CompCertM <ref type="bibr">[Song et al. 2019]</ref>. CompCompCert and CompCertM's approaches are based on interaction semantics, which assumes the same memory representation in the source and target languages; this is not true for functional languages as the memory representation of closures changes between the source and target. It is therefore unlikely that these techniques would apply to CertiCoq ( <ref type="bibr">[Song et al. 2019, Section 8.1]</ref>). CompCertM uses closed simulations and assumes that unknown functions in the target are verified against deep specifications; it is not clear if these generalize to higher-order languages. <ref type="bibr">Benton and Hur [Benton and Hur 2009]</ref> and <ref type="bibr">Hur and Dreyer [Hur and Dreyer 2011]</ref> set up step-indexed logical relations between high-and low-level languages to study compositional equivalence of programs in these languages. But it is not clear how these frameworks could be used for multi-pass compilers, because the logical relations are not transitive. <ref type="bibr">Perconti and Ahmed [Perconti and Ahmed 2014</ref>] use multi-language semantics <ref type="bibr">[Matthews and Findler 2007]</ref> to support linking of source programs with arbitrary target code, which is outside the scope of this paper.</p><p>Logical Relations do not typically handle divergence preservation. Paraskevopoulou and Appel <ref type="bibr">[Paraskevopoulou and Appel 2019</ref>] use a logical relation to prove their closure conversion transformation is correct and and safe for space. They parameterize their logical relation with relational pre-and postconditions that are similar our relational invariants and they use them to show both space safety and divergence preservation. However, they do not prove divergence preservation for transformations that reduce the amount of fuel, which is a significantly more challenging problem. Their framework does not address compositional compiler correctness.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="9">CONCLUSION</head><p>We presented a novel verification framework that supports lightweight compositional compiler correctness. In particular, we show how logical relations, which generally do not support vertical compositionality and cannot be used to show compositional compiler correctness for multi-pass pipelines, can be composed to derive a compositional compiler correctness theorem for multi-pass pipelines. The only restriction that we impose is that the pipelines should go through the same changes in representation. Furthermore, we show how this framework can be used to show divergence preservation for transformations, overcoming some complications of previous approaches.</p><p>We applied this framework to prove correct an optimizing multi-pass pipeline on an A-normal form intermediate representation for the CertiCoq compiler. Our optimizing pipeline compiles purely functional programs to a flat-scope programs without closures that can be easily codegenerated. We expect that future compiler verification efforts will benefit from the ideas of our verification framework and the design of our pipeline.</p></div><note xmlns="http://www.tei-c.org/ns/1.0" place="foot" xml:id="foot_0"><p>Proc. ACM Program. Lang., Vol. 5, No. ICFP, Article 86. Publication date: August 2021.</p></note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="2" xml:id="foot_1"><p>Same-language logical relations may or may not be transitive. Typed same-language logical relations can be made transitive by imposing additional requirements about typing<ref type="bibr">[Ahmed 2006</ref>]. Some untyped logical relations are transitive<ref type="bibr">[Owens et al. 2017]</ref>, others, like the ones we will use in these paper, are not.Proc. ACM Program. Lang., Vol. 5, No. ICFP, Article 86. Publication date: August 2021.</p></note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="3" xml:id="foot_2"><p>Typed same-language logical relations are also reflexive but for well-typed programs; this property is usually called the fundamental property of the logical relation<ref type="bibr">[Ahmed 2006</ref>]. In principle, we could apply our technique to typed logical relations but that would require proving that transformations preserve typing.Proc. ACM Program. Lang., Vol. 5, No. ICFP, Article 86. Publication date: August 2021.</p></note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" xml:id="foot_3"><p>Proc. ACM Program. Lang., Vol. 5, No. ICFP, 86. Publication date: August 2021.</p></note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="5" xml:id="foot_4"><p>We use clang with standard C calling conventions. Savary B&#233;langer et al.[2019]  report that performance improves significantly when using clang with llvm's &#322;GHC calling convention&#382; which has no callee-save registers and passes more arguments in registers than the standard. CompCert does not support nonstandard calling conventions.Proc. ACM Program. Lang., Vol. 5, No. ICFP, Article 86. Publication date: August 2021.</p></note>
		</body>
		</text>
</TEI>
