Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher.
Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?
Some links on this page may take you to non-federal websites. Their policies may differ from this site.
-
Model-finders, such as SAT/SMT-solvers and Alloy, are used widely both directly and embedded in domain-specific tools. They support both conventional verification and, unlike other verification tools, property-free exploration. To do this effectively, they must produce output that helps users with these tasks. Unfortunately, the output of model-finders has seen relatively little rigorous human-factors study. Conventionally, these tools tend to show one satisfying instance at a time. Drawing inspiration from the cognitive science literature, we investigate two aspects of model-finder output: how many instances to show at once, and whether all instances must actually satisfy the input constraints. Using both controlled studies and open-ended talk-alouds, we show that there is benefit to showing negative instances in certain settings; the impact of multiple instances is less clear. Our work is a first step in a theoretically grounded approach to understanding how users engage cognitively with model-finder output, and how those tools might better support users in doing so.more » « less
-
Modern-day formal methods tools are more than just a core solver: they also need convenient languages, useful editors, usable visualizations, and often also scriptability. These are required to attract a community of users, to put ideas to work in practice, and to conduct evaluations of the formalisms and core technical ideas. Off-the-shelf solvers address one of these issues but not the others. How can full prototype environments be obtained quickly? We have built Forge, a system for prototyping such environments. In this paper, we present a case-study to assess the utility of Forge. Concretely, we use Forge to build a basic protocol analyzer, inspired by the Cryptographic Protocol Shape Analyzer (CPSA). We show that we can obtain editing, basic visualization, and scriptability at no extra cost beyond embedding in Forge, and a modern, domain-specific visualization for relatively little extra effort.more » « less
-
Gibbons, Jeremy (Ed.)CONTEXT The success of QuickCheck has led to the development of property-based testing (PBT) libraries for many languages and the process is getting increasing attention. However, unlike regular testing, PBT is not widespread in collegiate curricula. Furthermore, the value of PBT is not limited to software testing. The growing use of formal methods in, and the growth of software synthesis, all create demand for techniques to train students and developers in the art of specification writing. We posit that PBT forms a strong bridge between testing and the act of specification: it’s a form of testing where the tester is actually writing abstract specifications. INQUIRY Even well-informed technologists mention the difficulty of finding good motivating examples for its use. We take steps to fill this lacuna. APPROACH & KNOWLEDGE We find that the use of “relational” problems—those for which an input may admit multiple valid outputs—easily motivates the use of PBT. We also notice that such problems are readily available in the computer science pantheon of problems (e.g., many graph and sorting algorithms). We have been using these for some years now to teach PBT in collegiate courses. GROUNDING In this paper, we describe the problems we use and report on students’ completion of them. We believe the problems overcome some of the motivation issues described above. We also show that students can do quite well at PBT for these problems, suggesting that the topic is well within their reach. In the process, we introduce a simple method to evaluate the accuracy of their specifications, and use it to characterize their common mistakes. IMPORTANCE Based on our findings, we believe that relational problems are an underutilized motivating example for PBT. We hope this paper initiates a catalog of such problems for educators (and developers) to use, and also provides a concrete (though by no means exclusive) method to analyze the quality of PBT.more » « less
-
Flawed problem comprehension leads students to produce flawed implementations. However, testing alone is inadequate for checking comprehension: if a student develops both their tests and implementation with the same misunderstanding, running their tests against their implementation will not reveal the issue. As a solution, some pedagogies encourage the creation of input-output examples independent of testing-but seldom provide students with any mechanism to check that their examples are correct and thorough. We propose a mechanism that provides students with instant feedback on their examples, independent of their implementation progress. We assess the impact of such an interface on an introductory programming course and find several positive impacts, some more neutral outcomes, and no identified negative effects.more » « less
-
Numerous devices, from network switches and servers to industrial control systems, can be unreliable if they are not configured properly. Even if a device’s implementation has been proven correct, it must still be configured to meet the specific functional and security requirements of its stakeholders. However, manual configuration remains labor intensive and error-prone even for experts. Automated configuration synthesis presents a promising way forward. Unfortunately, as we show, existing counterexample-guided algorithms can perform poorly if the system model allows configuration changes during execution. Yet disallowing such changes can hide significant problems, such as privilege escalation. We present a new synthesis algorithm that exploits structure inherent in state-machine models where the system configuration changes. We implement it using the Kodkod relational model finder, and show that it favorably solves a number of configuration-synthesis tasks.more » « less
-
Model-finding tools like the Alloy Analyzer produce concrete examples of how a declarative specification can be satisfied. These formal tools are useful in a wide range of domains: software design, security, networking, and more. By producing concrete examples, they assist in exploring system behavior and can help find surprising faults. Specifications usually have many potential candidate solutions, but model- finders tend to leave the choice of which examples to present entirely to the underlying solver. This paper closes that gap by exploring notions of coverage for the model-finding domain, yielding a novel, rigorous metric for output quality. These ideas are realized in the tool CompoSAT, which interposes itself between Alloy’s constraint-solving and presentation stages to produce ensembles of examples that maximize coverage. We show that high-coverage ensembles like CompoSAT produces are useful for, among other things, detecting overconstraint—a particularly insidious form of specification error. We detail the underlying theory and implementation of CompoSAT and evaluate it on numerous specifications.more » « less
-
Instructors routinely use automated assessment methods to evalu- ate the semantic qualities of student implementations and, some- times, test suites. In this work, we distill a variety of automated assessment methods in the literature down to a pair of assessment models. We identify pathological assessment outcomes in each model that point to underlying methodological flaws. These the- oretical flaws broadly threaten the validity of the techniques, and we actually observe them in multiple assignments of an introduc- tory programming course. We propose adjustments that remedy these flaws and then demonstrate, on these same assignments, that our interventions improve the accuracy of assessment. We believe that with these adjustments, instructors can greatly improve the accuracy of automated assessment.more » « less
-
Model-finders such as SAT-solvers are attractive for produc- ing concrete models, either as sample instances or as counterexamples when properties fail. However, the generated model is arbitrary. To ad- dress this, several research efforts have proposed principled forms of output from model-finders. These include minimal and maximal models, unsat cores, and proof-based provenance of facts. While these methods enjoy elegant mathematical foundations, they have not been subjected to rigorous evaluation on users to assess their utility. This paper presents user studies of these three forms of output performed on advanced students. We find that most of the output forms fail to be effective, and in some cases even actively mislead users. To make such studies feasible to run frequently and at scale, we also show how we can pose such studies on the crowdsourcing site Mechanical Turk.more » « less
An official website of the United States government

Full Text Available