skip to main content


Title: Automatically Generating Precise Oracles from Structured Natural Language Specifications
Software specifications often use natural language to describe the desired behavior, but such specifications are difficult to verify automatically. We present Swami, an automated technique that extracts test oracles and generates executable tests from structured natural language specifications. Swami focuses on exceptional behavior and boundary conditions that often cause field failures but that developers often fail to manually write tests for. Evaluated on the official JavaScript specification (ECMA-262), 98.4% of the tests Swami generated were precise to the specification. Using Swami to augment developer-written test suites improved coverage and identified 1 previously unknown defect and 15 missing JavaScript features in Rhino, 1 previously unknown defect in Node.js, and 18 semantic ambiguities in the ECMA-262 specification.  more » « less
Award ID(s):
1763423 1453474 1513055
NSF-PAR ID:
10108231
Author(s) / Creator(s):
;
Date Published:
Journal Name:
41st International Conference on Software Engineering (ICSE)
Page Range / eLocation ID:
188-199
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Programmers often leverage data structure libraries that provide useful and reusable abstractions. Modular verification of programs that make use of these libraries naturally rely on specifications that capture important properties about how the library expects these data structures to be accessed and manipulated. However, these specifications are often missing or incomplete, making it hard for clients to be confident they are using the library safely. When library source code is also unavailable, as is often the case, the challenge to infer meaningful specifications is further exacerbated. In this paper, we present a novel data-driven abductive inference mechanism that infers specifications for library methods sufficient to enable verification of the library's clients. Our technique combines a data-driven learning-based framework to postulate candidate specifications, along with SMT-provided counterexamples to refine these candidates, taking special care to prevent generating specifications that overfit to sampled tests. The resulting specifications form a minimal set of requirements on the behavior of library implementations that ensures safety of a particular client program. Our solution thus provides a new multi-abduction procedure for precise specification inference of data structure libraries guided by client-side verification tasks. Experimental results on a wide range of realistic OCaml data structure programs demonstrate the effectiveness of the approach. 
    more » « less
  2. Sridharan, Manu (Ed.)
    JavaScript is a single-threaded programming language, so asynchronous programming is practiced out of necessity to ensure that applications remain responsive in the presence of user input or interactions with file systems and networks. However, many JavaScript applications execute in environments that do exhibit concurrency by, e.g., interacting with multiple or concurrent servers, or by using file systems managed by operating systems that support concurrent I/O. In this paper, we demonstrate that JavaScript programmers often schedule asynchronous I/O operations suboptimally, and that reordering such operations may yield significant performance benefits. Concretely, we define a static side-effect analysis that can be used to determine how asynchronous I/O operations can be refactored so that asynchronous I/O-related requests are made as early as possible, and so that the results of these requests are awaited as late as possible. While our static analysis is potentially unsound, we have not encountered any situations where it suggested reorderings that change program behavior. We evaluate the refactoring on 20 applications that perform file- or network-related I/O. For these applications, we observe average speedups ranging between 0.99% and 53.6% for the tests that execute refactored code (8.1% on average). 
    more » « less
  3. Katsaros, Panagiotis ; Nenzi, Laura (Ed.)
    We present eMOP, a tool for incremental runtime verification (RV) of test executions during software evolution. We previously used RV to find hundreds of bugs in open-source projects by monitoring passing tests against formal specifications of Java APIs. We also proposed evolution-aware techniques to reduce RV’s runtime overhead and human time to inspect specification violations. eMOP brings these benefits to developers in a tool that seamlessly integrates with the Maven build system. We describe eMOP’s design, implementation, and usage. We evaluate eMOP on 676 versions of 21 projects, including those from our earlier prototypes' evaluation. eMOP is up to 8.4x faster and shows up to 31.3x fewer violations, compared to running RV from scratch after each code change. eMOP also does not miss new violations in our evaluation, and it is open-sourced at https://github.com/SoftEngResearch/emop. 
    more » « less
  4. Real-time decision making in emerging IoT applications typically relies on computing quantitative summaries of large data streams in an efficient and incremental manner. To simplify the task of programming the desired logic, we propose StreamQRE, which provides natural and high-level constructs for processing streaming data. Our language has a novel integration of linguistic constructs from two distinct programming paradigms: streaming extensions of relational query languages and quantitative extensions of regular expressions. The former allows the programmer to employ relational constructs to partition the input data by keys and to integrate data streams from different sources, while the latter can be used to exploit the logical hierarchy in the input stream for modular specifications. We first present the core language with a small set of combinators, formal semantics, and a decidable type system. We then show how to express a number of common patterns with illustrative examples. Our compilation algorithm translates the high-level query into a streaming algorithm with precise complexity bounds on per-item processing time and total memory footprint. We also show how to integrate approximation algorithms into our framework. We report on an implementation in Java, and evaluate it with respect to existing high-performance engines for processing streaming data. Our experimental evaluation shows that (1) StreamQRE allows more natural and succinct specification of queries compared to existing frameworks, (2) the throughput of our implementation is higher than comparable systems (for example, two-to-four times greater than RxJava), and (3) the approximation algorithms supported by our implementation can lead to substantial memory savings. 
    more » « less
  5. Highly-configurable software underpins much of our computing infrastructure. It enables extensive reuse, but opens the door to broken configuration specifications. The configuration specification language, Kconfig, is designed to prevent invalid configurations of the Linux kernel from being built. However, the astronomical size of the configuration space for Linux makes finding specification bugs difficult by hand or with random testing. In this paper, we introduce a software model checking framework for building Kconfig static analysis tools. We develop a formal semantics of the Kconfig language and implement the semantics in a symbolic evaluator called kclause that models Kconfig behavior as logical formulas. We then design and implement a bug finder, called kismet, that takes kclause models and leverages automated theorem proving to find unmet dependency bugs. kismet is evaluated for its precision, performance, and impact on kernel development for a recent version of Linux, which has over 140,000 lines of Kconfig across 28 architecture-specific specifications. Our evaluation finds 781 bugs (151 when considering sharing among Kconfig specifications) with 100% precision, spending between 37 and 90 minutes for each Kconfig specification, although it misses some bugs due to underapproximation. Compared to random testing, kismet finds substantially more true positive bugs in a fraction of the time. 
    more » « less