Functional programs typically interact with stateful libraries that hide state behind typed abstractions. One particularly important class of applications are data structure implementations that rely on such libraries to provide a level of efficiency and scalability that may be otherwise difficult to achieve. However, because the specifications of the methods provided by these libraries are necessarily general and rarely specialized to the needs of any specific client, any required application-level invariants must often be expressed in terms of additional constraints on the (often) opaque state maintained by the library. In this paper, we consider the specification and verification of such representation invariants using symbolic finite automata (SFA). We show that SFAs can be used to succinctly and precisely capture fine-grained temporal and data-dependent histories of interactions between functional clients and stateful libraries. To facilitate modular and compositional reasoning, we integrate SFAs into a refinement type system to qualify stateful computations resulting from such interactions. The particular instantiation we consider, Hoare Automata Types (HATs), allows us to both specify and automatically type-check the representation invariants of a datatype, even when its implementation depends on stateful library methods that operate over hidden state. We also develop a new bidirectional type checking algorithm that implements an efficient subtyping inclusion check over HATs, enabling their translation into a form amenable for SMT-based automated verification. We present extensive experimental results on an implementation of this algorithm that demonstrates the feasibility of type-checking complex and sophisticated HAT-specified OCaml data structure implementations layered on top of stateful library APIs.
more »
« less
Data-driven abductive inference of library specifications
Programmers often leverage data structure libraries that provide useful and reusable abstractions. Modular verification of programs that make use of these libraries naturally rely on specifications that capture important properties about how the library expects these data structures to be accessed and manipulated. However, these specifications are often missing or incomplete, making it hard for clients to be confident they are using the library safely. When library source code is also unavailable, as is often the case, the challenge to infer meaningful specifications is further exacerbated. In this paper, we present a novel data-driven abductive inference mechanism that infers specifications for library methods sufficient to enable verification of the library's clients. Our technique combines a data-driven learning-based framework to postulate candidate specifications, along with SMT-provided counterexamples to refine these candidates, taking special care to prevent generating specifications that overfit to sampled tests. The resulting specifications form a minimal set of requirements on the behavior of library implementations that ensures safety of a particular client program. Our solution thus provides a new multi-abduction procedure for precise specification inference of data structure libraries guided by client-side verification tasks. Experimental results on a wide range of realistic OCaml data structure programs demonstrate the effectiveness of the approach.
more »
« less
- Award ID(s):
- 1755880
- PAR ID:
- 10601887
- Publisher / Repository:
- Association for Computing Machinery (ACM)
- Date Published:
- Journal Name:
- Proceedings of the ACM on Programming Languages
- Volume:
- 5
- Issue:
- OOPSLA
- ISSN:
- 2475-1421
- Format(s):
- Medium: X Size: p. 1-29
- Size(s):
- p. 1-29
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
A key challenge in program synthesis is synthesizing programs that use libraries, which most real-world software does. The current state of the art is to model libraries with mock library implementations that perform the same function in a simpler way. However, mocks may still be large and complex, and must include many implementation details, both of which could limit synthesis performance. To address this problem, we introduce JLibSketch, a Java program synthesis tool that allows library behavior to be described with algebraic specifications, which are rewrite rules for sequences of method calls, e.g., encryption followed by decryption (with the same key) is the identity. JLibSketch implements rewrite rules by compiling JLibSketch problems into problems for the Sketch program synthesis tool. More specifically, after compilation, library calls are represented by abstract data types (ADTs), and rewrite rules manipulate those ADTs. We formalize compilation and prove it sound and complete if the rewrite rules are ordered and non-unifiable. We evaluated JLibSketch by using it to synthesize nine programs that use libraries from three domains: data structures, cryptography, and file systems. We found that algebraic specifications are, on average, about half the size of mocks. We also found that algebraic specifications perform better than mocks on seven of the nine programs, sometimes significantly so, and perform equally well on the last two programs. Thus, we believe that JLibSketch takes an important step toward synthesis of programs that use libraries.more » « less
-
Software developers often struggle to update APIs, leading to manual, time-consuming, and error-prone processes. We introduce Melt, a new approach that generates lightweight API migration rules directly from pull requests in popular library repositories. Our key insight is that pull requests merged into open-source libraries are a rich source of information sufficient to mine API migration rules. By leveraging code examples mined from the library source and automatically generated code examples based on the pull requests, we infer transformation rules in Comby, a language for structural code search and replace. Since inferred rules from single code examples may be too specific, we propose a generalization procedure to make the rules more applicable to client projects. Melt rules are syntax-driven, interpretable, and easily adaptable. Moreover, unlike previous work, our approach enables rule inference to seamlessly integrate into the library workflow, removing the need to wait for client code migrations. We evaluated Melt on pull requests from four popular libraries, successfully mining 461 migration rules from code examples in pull requests and 114 rules from auto-generated code examples. Our generalization procedure increases the number of matches for mined rules by 9×. We applied these rules to client projects and ran their tests, which led to an overall decrease in the number of warnings and fixing some test cases demonstrating MELT's effectiveness in real-world scenarios.more » « less
-
Probabilistic programs often trade accuracy for efficiency, and thus may, with a small probability, return an incorrect result. It is important to obtain precise bounds for the probability of these errors, but existing verification approaches have limitations that lead to error probability bounds that are excessively coarse, or only apply to first-order programs. In this paper we present Eris, a higher-order separation logic for proving error probability bounds for probabilistic programs written in an expressive higher-order language. Our key novelty is the introduction of error credits, a separation logic resource that tracks an upper bound on the probability that a program returns an erroneous result. By representing error bounds as a resource, we recover the benefits of separation logic, including compositionality, modularity, and dependency between errors and program terms, allowing for more precise specifications. Moreover, we enable novel reasoning principles such as expectation-preserving error composition, amortized error reasoning, and error induction. We illustrate the advantages of our approach by proving amortized error bounds on a range of examples, including collision probabilities in hash functions, which allow us to write more modular specifications for data structures that use them as clients. We also use our logic to prove correctness and almost-sure termination of rejection sampling algorithms. All of our results have been mechanized in the Coq proof assistant using the Iris separation logic framework and the Coquelicot real analysis library.more » « less
-
Memcached is a widely used key-value store. It is structured as a multithreaded user-level server, accessed over socket connections by a potentially distributed collection of clients. Because socket communication is so much more expensive than a single operation on a K-V store, much of the client library is devoted to batching of requests. Batching is not always feasible, however, and the cost of communication seems particularly unfortunate when—as is often the case—clients are co-located on a single machine with the server, and have access to the same physical memory. Fortunately, recent work on _protected libraries_ has shown that it is possible, on current Intel processors, to amplify access rights quickly when calling into a specially configured user-level library. Library instances in separate processes can then share data safely, even in the face of independent process failures. We have used protected libraries to implement a new version of memcached in which client threads execute the code of the server themselves, without the need to send messages. Compared to the original, our new version is both significantly simpler, containing 24% less code, and dramatically faster, with a 11–56× reduction in latency and a roughly 2× increase in throughput.more » « less