Real-time decision making in emerging IoT applications typically relies on computing quantitative summaries of large data streams in an efficient and incremental manner. To simplify the task of programming the desired logic, we propose StreamQRE, which provides natural and high-level constructs for processing streaming data. Our language has a novel integration of linguistic constructs from two distinct programming paradigms: streaming extensions of relational query languages and quantitative extensions of regular expressions. The former allows the programmer to employ relational constructs to partition the input data by keys and to integrate data streams from different sources, while the latter can be used to exploit the logical hierarchy in the input stream for modular specifications. We first present the core language with a small set of combinators, formal semantics, and a decidable type system. We then show how to express a number of common patterns with illustrative examples. Our compilation algorithm translates the high-level query into a streaming algorithm with precise complexity bounds on per-item processing time and total memory footprint. We also show how to integrate approximation algorithms into our framework. We report on an implementation in Java, and evaluate it with respect to existing high-performance engines for processing streaming data. Our experimental evaluation shows that (1) StreamQRE allows more natural and succinct specification of queries compared to existing frameworks, (2) the throughput of our implementation is higher than comparable systems (for example, two-to-four times greater than RxJava), and (3) the approximation algorithms supported by our implementation can lead to substantial memory savings.
more »
« less
Mergeable replicated data types
Programming geo-replicated distributed systems is challenging given the complexity of reasoning about different evolving states on different replicas. Existing approaches to this problem impose significant burden on application developers to consider the effect of how operations performed on one replica are witnessed and applied on others. To alleviate these challenges, we present a fundamentally different approach to programming in the presence of replicated state. Our insight is based on the use of invertible relational specifications of an inductively-defined data type as a mechanism to capture salient aspects of the data type relevant to how its different instances can be safely merged in a replicated environment. Importantly, because these specifications only address a data type's (static) structural properties, their formulation does not require exposing low-level system-level details concerning asynchrony, replication, visibility, etc. As a consequence, our framework enables the correct-by-construction synthesis of rich merge functions over arbitrarily complex (i.e., composable) data types. We show that the use of a rich relational specification language allows us to extract sufficient conditions to automatically derive merge functions that have meaningful non-trivial convergence properties. We incorporate these ideas in a tool called Quark, and demonstrate its utility via a detailed evaluation study on real-world benchmarks.
more »
« less
- Award ID(s):
- 1717741
- PAR ID:
- 10603145
- Publisher / Repository:
- Association for Computing Machinery (ACM)
- Date Published:
- Journal Name:
- Proceedings of the ACM on Programming Languages
- Volume:
- 3
- Issue:
- OOPSLA
- ISSN:
- 2475-1421
- Format(s):
- Medium: X Size: p. 1-29
- Size(s):
- p. 1-29
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
The work of Fuzz has pioneered the use of functional programming languages where types allow reasoning about the sensitivity of programs. Fuzz and subsequent work (e.g., DFuzz and Duet) use advanced technical devices like linear types, modal types, and partial evaluation. These features usually require the design of a new programming language from scratch—a significant task on its own! While these features are part of the classical toolbox of programming languages, they are often unfamiliar to non-experts in this field. Fortunately, recent studies (e.g., Solo) have shown that linear and complex types in general, are not strictly needed for the task of determining programs’ sensitivity since this can be achieved by annotating base types with static sensitivity information. In this work, we take a different approach. We propose to enrich base types with information about the metric relation between values, and we present the novel idea of applyingparametricityto derive direct proofs for the sensitivity of functions. A direct consequence of our result is thatcalculating and provingthe sensitivity of functions is reduced to simply type-checking in a programming language with support for polymorphism and type-level naturals. We formalize our main result in a calculus, prove its soundness, and implement a software library in the programming language Haskell–where we reason about the sensitivity of canonical examples. We show that the simplicity of our approach allows us to exploit the type inference of the host language to support a limited form of sensitivity inference. Furthermore, we extend the language with a privacy monad to showcase how our library can be used in practical scenarios such as the implementation of differentially private programs, where the privacy guarantees depend on the sensitivity of user-defined functions. Our library, called Spar, is implemented in less than 500 lines of code.more » « less
-
Calzavara, Stefano; Naumann, David (Ed.)Availability is crucial to the security of distributed systems, but guaranteeing availability is hard, especially when participants in the system may act maliciously. Quorum replication protocols provide both integrity and availability: data and computation is replicated at multiple independent hosts, and a quorum of these hosts must agree on the output of all operations applied to the data. Unfortunately, these protocols have high overhead and can be difficult to calibrate for a specific application’s needs. Ideally, developers could use high-level abstractions for consensus and replication to write fault-tolerant code that is secure by construction. This paper presents Flow-Limited Authorization for Quorum Replication (FLAQR), a core calculus for building distributed applications with heterogeneous quorum replication protocols while enforcing end-to-end information security. Our type system ensures that well-typed FLAQR programs cannot fail (experience an unrecoverable error) in ways that violate their type-level specifications. We present noninterference theorems that characterize FLAQR’s confidentiality, integrity, and availability in the presence of consensus, replication, and failures, as well as a liveness theorem for the class of majority quorum protocols under a bounded number of faults. Additionally, we present an extension to FLAQR that supports secret sharing as a form of declassification and prove it preserves integrity and availability security properties.more » « less
-
Bringmann, Karl; Grohe, Martin; Puppis, Gabriele; Svensson, Ola (Ed.)TopKAT is the algebraic theory of Kleene algebra with tests (KAT) extended with a top element. Compared to KAT, one pleasant feature of TopKAT is that, in relational models, the top element allows us to express the domain and codomain of a relation. This enables several applications in program logics, such as proving under-approximate specifications or reachability properties of imperative programs. However, while TopKAT inherits many pleasant features of KATs, such as having a decidable equational theory, it is incomplete with respect to relational models. In other words, there are properties that hold true of all relational TopKATs but cannot be proved with the axioms of TopKAT. This issue is potentially worrisome for program-logic applications, in which relational models play a key role. In this paper, we further investigate the completeness properties of TopKAT with respect to relational models. We show that TopKAT is complete with respect to (co)domain comparison of KAT terms, but incomplete when comparing the (co)domain of arbitrary TopKAT terms. Since the encoding of under-approximate specifications in TopKAT hinges on this type of formula, the aforementioned incompleteness results have a limited impact when using TopKAT to reason about such specifications.more » « less
-
We consider the formulation of a symbolic execution (SE) procedure for functional programs that interact with effectful, opaque libraries. Our procedure allows specifications of libraries and abstract data type (ADT) methods that are expressed inLinear Temporal Logic over Finite Traces(LTLf), interpreting them assymbolic finite automata(SFAs) to enable intelligent specification-guided path exploration in this setting. We apply our technique to facilitate the falsification of complex data structure safety properties in terms of effectful operations made by ADT methods on underlying opaque representation type(s). Specifications naturally characterize admissible traces of temporally-ordered events that ADT methods (and the library methods they depend upon) are allowed to perform. We show how to use these specifications to construct feasible symbolic input states for the corresponding methods, as well as how to encode safety properties in terms of this formalism. More importantly, we incorporate the notion ofsymbolic derivatives, a mechanism that allows the SE procedure to intelligently underapproximate the set of precondition states it needs to explore, based on the automata structures latent in the provided specifications and the safety property that is to be falsified. Intuitively, derivatives enable symbolic execution to exploit temporal constraints defined by trace-based specifications to quickly prune unproductive paths and discover feasible error states. Experimental results on a wide-range of challenging ADT implementations demonstrate the effectiveness of our approach.more » « less
An official website of the United States government
