skip to main content


Title: Efficient Synthesis with Probabilistic Constraints
We consider the problem of synthesizing a program given a probabilistic specification of its desired behavior. Specifically, we study the recent paradigm of distribution-guided inductive synthesis (digits), which iteratively calls a synthesizer on finite sample sets from a given distribution. We make theoretical and algorithmic contributions: (i) We prove the surprising result that digits only requires a polynomial number of synthesizer calls in the size of the sample set, despite its ostensibly exponential behavior. (ii) We present a property-directed version of digits that further reduces the number of synthesizer calls, drastically improving synthesis performance on a range of benchmarks.  more » « less
Award ID(s):
1704117
NSF-PAR ID:
10158490
Author(s) / Creator(s):
; ;
Date Published:
Journal Name:
International Conference on Computer Aided Verification
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. With the rise of software-as-a-service and microservice architectures, RESTful APIs are now ubiquitous in mobile and web applications. A service can have tens or hundreds of API methods, making it a challenge for programmers to find the right combination of methods to solve their task. We present APIphany, a component-based synthesizer for programs that compose calls to RESTful APIs. The main innovation behind APIphany is the use of precise semantic types, both to specify user intent and to direct the search. APIphany contributes three novel mechanisms to overcome challenges in adapting component-based synthesis to the REST domain: (1) a type inference algorithm for augmenting REST specifications with semantic types; (2) an efficient synthesis technique for “wrangling” semi-structured data, which is commonly required in working with RESTful APIs; and (3) a new form of simulated execution to avoid executing APIs calls during synthesis. We evaluate APIphany on three real-world APIs and 32 tasks extracted from GitHub repositories and StackOverflow. In our experiments, APIphany found correct solutions to 29 tasks, with 23 of them reported among top ten synthesis results. 
    more » « less
  2. Extensive research in well-studied animal models underscores the importance of commensal gastrointestinal (gut) microbes to animal physiology. Gut microbes have been shown to impact dietary digestion, mediate infection, and even modify behavior and cognition. Given the large physiological and pathophysiological contribution microbes provide their host, it is reasonable to assume that the vertebrate gut microbiome may also impact the fitness, health and ecology of wildlife. In accordance with this expectation, an increasing number of investigations have considered the role of the gut microbiome in wildlife ecology, health, and conservation. To help promote the development of this nascent field, we need to dissolve the technical barriers prohibitive to performing wildlife microbiome research. The present review discusses the 16S rRNA gene microbiome research landscape, clarifying best practices in microbiome data generation and analysis, with particular emphasis on unique situations that arise during wildlife investigations. Special consideration is given to topics relevant for microbiome wildlife research from sample collection to molecular techniques for data generation, to data analysis strategies. Our hope is that this article not only calls for greater integration of microbiome analyses into wildlife ecology and health studies but provides researchers with the technical framework needed to successfully conduct such investigations. 
    more » « less
  3. We propose a new conflict-driven program synthesis technique that is capable of learning from past mistakes. Given a spurious program that violates the desired specification, our synthesis algorithm identifies the root cause of the conflict and learns new lemmas that can prevent similar mistakes in the future. Specifically, we introduce the notion of equivalence modulo conflict and show how this idea can be used to learn useful lemmas that allow the synthesizer to prune large parts of the search space. We have implemented a general purpose CDCL-style program synthesizer called Neo and evaluate it in two different application domains, namely data wrangling in R and functional programming over lists. Our experiments demonstrate the substantial benefits of conflict driven learning and show that Neo outperforms two state-of-the-art synthesis tools, Morpheus and DeepCoder, that target these respective domains 
    more » « less
  4. Program sketching is a program synthesis paradigm in which the programmer provides a partial program with holes and assertions. The goal of the synthesizer is to automatically find integer values for the holes so that the resulting program satisfies the assertions. The most popular sketching tool, Sketch , can efficiently solve complex program sketches but uses an integer encoding that often performs poorly if the sketched program manipulates large integer values. In this article, we propose a new solving technique that allows Sketch to handle large integer values while retaining its integer encoding. Our technique uses a result from number theory, the Chinese Remainder Theorem, to rewrite program sketches to only track the remainders of certain variable values with respect to several prime numbers. We prove that our transformation is sound and the encoding of the resulting programs are exponentially more succinct than existing Sketch encodings. We evaluate our technique on a variety of benchmarks manipulating large integer values. Our technique provides speedups against both existing Sketch solvers and can solve benchmarks that existing Sketch solvers cannot handle. 
    more » « less
  5. Translating information between the domains of systematics and conservation requires novel information management designs. Such designs should improve interactions across the trading zone between the domains, herein understood as the model according to which knowledge and uncertainty are productively translated in both directions (cf. Collins et al. 2019). Two commonly held attitudes stand in the way of designing a well-functioning systematics-to-conservation trading zone. On one side, there are calls to unify the knowledge signal produced by systematics, underpinned by the argument that such unification is a necessary precondition for conservation policy to be reliably expressed and enacted (e.g., Garnett et al. 2020). As a matter of legal scholarship, the argument for systematic unity by legislative necessity is principally false (Weiss 2003, MacNeil 2009, Chromá 2011), but perhaps effective enough as a strategy to win over audiences unsure about robust law-making practices in light of variable and uncertain knowledge. On the other side, there is an attitude that conservation cannot ever restrict the academic freedom of systematics as a scientific discipline (e.g., Raposo et al. 2017). This otherwise sound argument misses the mark in the context of designing a productive trading zone with conservation. The central interactional challenge is not whether the systematic knowledge can vary at a given time and/or evolve over time, but whether these signal dynamics are tractable in ways that actors can translate into robust maxims for conservation. Redesigning the trading zone should rest on the (historically validated) projection that systematics will continue to attract generations of inspired, productive researchers and broad-based societal support, frequently leading to protracted conflicts and dramatic shifts in how practioners in the field organize and identify organismal lineages subject to conservation. This confident outlook for systematics' future, in turn, should refocus the challenge of designing the trading zone as one of building better information services to model the concurrent conflicts and longer-term evolution of systematic knowledge. It would seem unreasonable to expect the International Union for Conservation of Nature (IUCN) Red List Index to develop better data science models for the dynamics of systematic knowledge (cf. Hoffmann et al. 2011) than are operational in the most reputable information systems designed and used by domain experts (Burgin et al. 2018). The reasonable challenge from conservation to systematics is not to stop being a science but to be a better data science. In this paper, we will review advances in biodiversity data science in relation to representing and reasoning over changes in systematic knowledge with computational logic, i.e., modeling systematic intelligence (Franz et al. 2016). We stress-test this approach with a use case where rapid systematic signal change and high stakes for conservation action intersect, i.e., the Malagasy mouse lemurs ( Microcebus É. Geoffroy, 1834 sec. Schüßler et al. 2020), where the number of recognized species-level concepts has risen from 2 to 25 in the span of 38 years (1982–2020). As much as scientifically defensible, we extend our modeling approach to the level of individual published occurrence records, where the inability to do so sometimes reflects substandard practice but more importantly reveals systemic inadequacies in biodiversity data science or informational modeling. In the absence of shared, sound theoretical foundations to assess taxonomic congruence or incongruence across treatments, and in the absence of biodiversity data platforms capable of propagating logic-enabled, scalable occurrence-to-concept identification events to produce alternative and succeeding distribution maps, there is no robust way to provide a knowledge signal from systematics to conservation that is both consistent in its syntax and acccurate in its semantics, in the sense of accurately reflecting the variation and uncertainty that exists across multiple systematic perspectives. Translating this diagnosis into new designs for the trading zone is only one "half" of the solution, i.e., a technical advancement that then would need to be socially endorsed and incentivized by systematic and conservation communities motivated to elevate their collaborative interactions and trade robustly in inherently variable and uncertain information. 
    more » « less