We present an enumerative program synthesis framework calledcomponent-based refactoringthat can refactor “direct” style code that does not use library components into equivalent “combinator” style code that does use library components. This framework introduces a sound but incomplete technique to check the equivalence of direct code and combinator code calledequivalence by canonicalizationthat does not rely on input-output examples or logical specifications. Moreover, our approach can repurpose existing compiler optimizations, leveraging decades of research from the programming languages community. We instantiated our new synthesis framework in two contexts: (i) higher-order functional combinators such asmapandfilterin the staticallytyped functional programming language Elm and (ii) high-performance numerical computing combinators provided by the NumPy library for Python. We implemented both instantiations in a tool calledCobblerand evaluated it on thousands of real programs to test the performance of the component-based refactoring framework in terms of execution time and output quality. Our work offers evidence that synthesis-backed refactoring can apply across a range of domains without specification beyond the input program.
more »
« less
A HAT Trick: Automatically Verifying Representation Invariants using Symbolic Finite Automata
Functional programs typically interact with stateful libraries that hide state behind typed abstractions. One particularly important class of applications are data structure implementations that rely on such libraries to provide a level of efficiency and scalability that may be otherwise difficult to achieve. However, because the specifications of the methods provided by these libraries are necessarily general and rarely specialized to the needs of any specific client, any required application-level invariants must often be expressed in terms of additional constraints on the (often) opaque state maintained by the library. In this paper, we consider the specification and verification of suchrepresentation invariantsusingsymbolic finite automata(SFA). We show that SFAs can be used to succinctly and precisely capture fine-grained temporal and data-dependent histories of interactions between functional clients and stateful libraries. To facilitate modular and compositional reasoning, we integrate SFAs into a refinement type system to qualify stateful computations resulting from such interactions. The particular instantiation we consider,Hoare Automata Types(HATs), allows us to both specify and automatically type-check the representation invariants of a datatype, even when its implementation depends on stateful library methods that operate over hidden state. We also develop a new bidirectional type checking algorithm that implements an efficient subtyping inclusion check over HATs, enabling their translation into a form amenable for SMT-based automated verification. We present extensive experimental results on an implementation of this algorithm that demonstrates the feasibility of type-checking complex and sophisticated HAT-specified OCaml data structure implementations layered on top of stateful library APIs.
more »
« less
- Award ID(s):
- 2321680
- PAR ID:
- 10612782
- Publisher / Repository:
- Association for Computing Machinery (ACM)
- Date Published:
- Journal Name:
- Proceedings of the ACM on Programming Languages
- Volume:
- 8
- Issue:
- PLDI
- ISSN:
- 2475-1421
- Format(s):
- Medium: X Size: p. 1387-1411
- Size(s):
- p. 1387-1411
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Lightweight syntactic analysis tools like Semgrep and Comby leverage the tree structure of code, making them more expressive than string and regex search. Unlike traditional language frameworks (e.g., ESLint) that analyze codebases via explicit syntax tree manipulations, these tools use query languages that closely resemble the source language. However, state-of-the-art matching techniques for these tools require queries to be complete and parsable snippets, which makes in-progress query specifications useless. We propose a new search architecture that relies only on tokenizing (not parsing) a query. We introduce a novel language and matching algorithm to support tree-aware wildcards on this architecture by building on tree automata. We also presentstsearch, a syntactic search tool leveraging our approach. In contrast to past work, our approach supports syntactic searcheven for previously unparsable queries.We show empirically that stsea rch can support all tokenizable queries, while still providing results comparable to Semgrep for existing queries. Our work offers evidence that lightweight syntactic code search can accept in-progress specifications, potentially improving support for interactive settings. CCS Concepts: •Software and its engineering→Formal language definitions;Software maintenance tools;•Information systems→Query representation;•Theory of computation→ Tree languages.more » « less
-
This paper presents a program analysis method that generates program summaries involving polynomial arithmetic. Our approach builds on prior techniques that use solvable polynomial maps for summarizing loops. These techniques are able to generateallpolynomial invariants for a restricted class of programs, but cannot be applied to programs outside of this class—for instance, programs with nested loops, conditional branching, unstructured control flow, etc. There currently lacks approaches to apply these prior methods to the case of general programs. This paper bridges that gap. Instead of restricting the kinds of programs we can handle, our methodabstractsevery loop into a model that can be solved with prior techniques, bringing to bear prior work on solvable polynomial maps to general programs. While no method can generate all polynomial invariants for arbitrary programs, our method establishes its merit through amonotonictyresult. We have implemented our techniques, and tested them on a suite of benchmarks from the literature. Our experiments indicate our techniques show promise on challenging verification tasks requiring non-linear reasoning.more » « less
-
Producing efficient array code is crucial in high-performance domains like image processing and machine learning. It requires the ability to control factors like compute intensity and locality by reordering computations into different stages and granularities with respect to where they are stored. However, traditional pure, functional tensor languages struggle to do so. In a previous publication, we introduced ATL as a pure, functional tensor language capable of systematically decoupling compute and storage order via a set of high-level combinators known as reshape operators. Reshape operators are a unique functional-programming construct since they manipulate storage location in the generated code by modifying the indices that appear on the left-hand sides of storage expressions. We present a formal correctness proof for an implementation of the compilation algorithm, marking the first verification of a lowering algorithm targeting imperative loop nests from a source functional language that enables separate control of compute and storage ordering. One of the core difficulties of this proof required properly formulating the complex invariants to ensure that these storage-index remappings were well-formed. Notably, this exercise revealed asoundness bugin the original published compilation algorithm regarding the truncation reshape operators. Our fix is a new type system that captures safety conditions that were previously implicit and enables us to prove compiler correctness for well-typed source programs. We evaluate this type system and compiler implementation on a range of common programs and optimizations, including but not limited to those previously studied to demonstrate performance comparable to established compilers like Halide.more » « less
-
ABSTRACT ObjectiveNeighborhood perceptions are associated with physical and mental health outcomes; however, the biological associates of this relationship remain to be fully understood. Here, we evaluate the relationship between neighborhood perceptions and amygdala activity and connectivity with salience network (i.e., insula, anterior cingulate, thalamus) nodes. MethodsForty-eight older adults (mean age = 68 [7] years, 52% female, 47% non-Hispanic Black, 2% Hispanic) without dementia or depression completed the Perceptions of Neighborhood Environment Scale. Lower scores indicated less favorable perceptions of aesthetic quality, walking environment, availability of healthy food, safety, violence (i.e., more perceived violence), social cohesion, and participation in activities with neighbors. Participants separately underwent resting-state functional magnetic resonance imaging. ResultsLess favorable perceived safety (β= −0.33,pFDR= .04) and participation in activities with neighbors (β= −0.35,pFDR= .02) were associated with higher left amygdala activity, independent of covariates including psychosocial factors. Less favorable safety perceptions were also associated with enhanced left amygdala functional connectivity with the bilateral insular cortices and the left anterior insula (β= −0.34,pFDR= .04). Less favorable perceived social cohesion was associated with enhanced left amygdala functional connectivity with the right thalamus (β =−0.42,pFDR= .04), and less favorable perceptions about healthy food availability were associated with enhanced left amygdala functional connectivity with the bilateral anterior insula (right:β= −0.39,pFDR= .04; left:β= −0.42,pFDR= .02) and anterior cingulate gyrus (β= −0.37,pFDR= .04). ConclusionsTaken together, our findings document relationships between select neighborhood perceptions and amygdala activity as well as connectivity with salience network nodes; if confirmed, targeted community-level interventions and existing community strengths may promote brain-behavior relationships.more » « less