NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Programming Frameworks for Differential Privacy

Gaboardi, Marco; Hay, Michael; Vadhan, Salil (July 2025, Now Publisher inc)
Fioretto, Ferdinando; Van_Hentenryck, Pascal (Ed.)
Free, publicly-accessible full text available July 23, 2026
Probabilistic Refinement Session Types

https://doi.org/10.1145/3729317

Fu, Qiancheng; Das, Ankush; Gaboardi, Marco (June 2025, Proceedings of the ACM on Programming Languages)

Session types provide a formal type system to define and verify communication protocols between message-passing processes. In order to analyze randomized systems, recent works have extended session types with probabilistic type constructors. Unfortunately, all the proposed extensions only support constant probabilities which limits their applicability to real-world systems. Our work addresses this limitation by introducing probabilistic refinement session types which enable symbolic reasoning for concurrent probabilistic systems in a core calculus we call PReST. The type system is carefully designed to be a conservative extension of refinement session types and supports both probabilistic and regular choice type operators. We also implement PReST in a prototype which we use for validating probabilistic concurrent programs. The added expressive power leads to significant challenges, both in the meta theory and implementation of PReST, particularly with type checking: it requires reconstructing intermediate types for channels when type checking probabilistic branching expressions. The theory handles this by semantically quantifying refinement variables in probabilistic typing rules, a deviation from standard refinement type systems. The implementation relies on a bi-directional type checker that uses an SMT solver to reconstruct the intermediate types minimizing annotation overhead and increasing usability. To guarantee that probabilistic processes are almost-surely terminating, we integrate cost analysis into our type system to obtain expected upper bounds on recursion depth. We evaluate PReST on a wide variety of benchmarks from 4 categories: (i) randomized distributed protocols such as Itai and Rodeh's leader election, bounded retransmission, etc., (ii) parametric Markov chains such as random walks, (iii) probabilistic analysis of concurrent data structures such as queues, and (iv) distributions obtained by composing uniform distributions using operators like max, and sum. Our experiments show that the PReST type checker scales to large programs with sophisticated probabilistic distributions.
more » « less
Free, publicly-accessible full text available June 10, 2026
BiGKAT: An Algebraic Framework for Relational Verification of Probabilistic Programs

https://doi.org/10.1007/978-3-031-90897-2_12

Gomes, Leandro; Baillot, Patrick; Gaboardi, Marco (January 2025, Springer Nature Switzerland)

Abstract This work is devoted to formal reasoning on relational properties of probabilistic imperative programs. Relational properties are properties which relate the execution of two programs (possibly the same one) on two initial memories. We aim at extending the algebraic approach of Kleene Algebras with Tests (KAT) to relational properties of probabilistic programs. For that we consider the approach of Guarded Kleene Algebras with Tests (GKAT), which can be used for representing probabilistic programs, and define a relational version of it, called Bi-guarded Kleene Algebras with Tests (BiGKAT) together with a semantics. We show that the setting of BiGKAT is expressive enough to encode a finitary version of probabilistic Relational Hoare Logic (pRHL) (without the While rule), a program logic that has been introduced in the literature for the verification of relational properties of probabilistic programs. We also discuss the additional expressivity brought by BiGKAT.
more » « less
Full Text Available
A Kleene Algebra with Tests for Union Bound Reasoning About Probabilistic Programs

https://doi.org/10.4230/LIPIcs.CSL.2025.35

Gomes, Leandro; Baillot, Patrick; Gaboardi, Marco (January 2025, Schloss Dagstuhl – Leibniz-Zentrum für Informatik)
Endrullis, Jörg; Schmitz, Sylvain (Ed.)
Kleene Algebra with Tests (KAT) provides a framework for algebraic equational reasoning about imperative programs. The recent variant Guarded KAT (GKAT) allows to reason on non-probabilistic properties of probabilistic programs. Here we introduce an extension of this framework called approximate GKAT (aGKAT), which equips GKAT with a partially ordered monoid (real numbers) enabling to express satisfaction of (deterministic) properties except with a probability up to a certain bound. This allows to represent in equational reasoning "à la KAT" proofs of probabilistic programs based on the union bound, a technique from basic probability theory. We show how a propositional variant of approximate Hoare Logic (aHL), a program logic for union bound, can be soundly encoded in our system aGKAT. We then illustrate the use of aGKAT with an example of accuracy analysis from the field of differential privacy.
more » « less
Full Text Available
Kleene Algebra with Commutativity Conditions Is Undecidable

https://doi.org/10.4230/LIPICS.CSL.2025.36

Azevedo_de_Amorim, Arthur; Zhang, Cheng; Gaboardi, Marco (January 2025, Schloss Dagstuhl – Leibniz-Zentrum für Informatik)
Endrullis, Jörg; Schmitz, Sylvain (Ed.)
We prove that the equational theory of Kleene algebra with commutativity conditions on primitives (or atomic terms) is undecidable, thereby settling a longstanding open question in the theory of Kleene algebra. While this question has also been recently solved independently by Kuznetsov, our results hold even for weaker theories that do not support the induction axioms of Kleene algebra.
more » « less
Full Text Available
Sensitivity by Parametricity

https://doi.org/10.1145/3689726

Lobo-Vesga, Elisabet; Russo, Alejandro; Gaboardi, Marco; Cortiñas, Carlos Tomé (October 2024, Proceedings of the ACM on Programming Languages)

The work of Fuzz has pioneered the use of functional programming languages where types allow reasoning about the sensitivity of programs. Fuzz and subsequent work (e.g., DFuzz and Duet) use advanced technical devices like linear types, modal types, and partial evaluation. These features usually require the design of a new programming language from scratch—a significant task on its own! While these features are part of the classical toolbox of programming languages, they are often unfamiliar to non-experts in this field. Fortunately, recent studies (e.g., Solo) have shown that linear and complex types in general, are not strictly needed for the task of determining programs’ sensitivity since this can be achieved by annotating base types with static sensitivity information. In this work, we take a different approach. We propose to enrich base types with information about the metric relation between values, and we present the novel idea of applyingparametricityto derive direct proofs for the sensitivity of functions. A direct consequence of our result is thatcalculating and provingthe sensitivity of functions is reduced to simply type-checking in a programming language with support for polymorphism and type-level naturals. We formalize our main result in a calculus, prove its soundness, and implement a software library in the programming language Haskell–where we reason about the sensitivity of canonical examples. We show that the simplicity of our approach allows us to exploit the type inference of the host language to support a limited form of sensitivity inference. Furthermore, we extend the language with a privacy monad to showcase how our library can be used in practical scenarios such as the implementation of differentially private programs, where the privacy guarantees depend on the sensitivity of user-defined functions. Our library, called Spar, is implemented in less than 500 lines of code.
more » « less
Full Text Available
Domain Reasoning in TopKAT

https://doi.org/10.4230/LIPICS.ICALP.2024.157

Zhang, Cheng; Azevedo_de_Amorim, Arthur; Gaboardi, Marco (July 2024, Schloss Dagstuhl – Leibniz-Zentrum für Informatik)
Bringmann, Karl; Grohe, Martin; Puppis, Gabriele; Svensson, Ola (Ed.)
TopKAT is the algebraic theory of Kleene algebra with tests (KAT) extended with a top element. Compared to KAT, one pleasant feature of TopKAT is that, in relational models, the top element allows us to express the domain and codomain of a relation. This enables several applications in program logics, such as proving under-approximate specifications or reachability properties of imperative programs. However, while TopKAT inherits many pleasant features of KATs, such as having a decidable equational theory, it is incomplete with respect to relational models. In other words, there are properties that hold true of all relational TopKATs but cannot be proved with the axioms of TopKAT. This issue is potentially worrisome for program-logic applications, in which relational models play a key role. In this paper, we further investigate the completeness properties of TopKAT with respect to relational models. We show that TopKAT is complete with respect to (co)domain comparison of KAT terms, but incomplete when comparing the (co)domain of arbitrary TopKAT terms. Since the encoding of under-approximate specifications in TopKAT hinges on this type of formula, the aforementioned incompleteness results have a limited impact when using TopKAT to reason about such specifications.
more » « less
Full Text Available
Program Analysis for Adaptive Data Analysis

https://doi.org/10.1145/3656414

Liu, Jiawen; Qu, Weihao; Gaboardi, Marco; Garg, Deepak; Ullman, Jonathan (June 2024, Proceedings of the ACM on Programming Languages)

Data analyses are usually designed to identify some property of the population from which the data are drawn, generalizing beyond the specific data sample. For this reason, data analyses are often designed in a way that guarantees that they produce a low generalization error. That is, they are designed so that the result of a data analysis run on a sample data does not differ too much from the result one would achieve by running the analysis over the entire population. An adaptive data analysis can be seen as a process composed by multiple queries interrogating some data, where the choice of which query to run next may rely on the results of previous queries. The generalization error of each individual query/analysis can be controlled by using an array of well-established statistical techniques. However, when queries are arbitrarily composed, the different errors can propagate through the chain of different queries and bring to a high generalization error. To address this issue, data analysts are designing several techniques that not only guarantee bounds on the generalization errors of single queries, but that also guarantee bounds on the generalization error of the composed analyses. The choice of which of these techniques to use, often depends on the chain of queries that an adaptive data analysis can generate. In this work, we consider adaptive data analyses implemented as while-like programs and we design a program analysis which can help with identifying which technique to use to control their generalization errors. More specifically, we formalize the intuitive notion ofadaptivityas a quantitative property of programs. We do this because the adaptivity level of a data analysis is a key measure to choose the right technique. Based on this definition, we design a program analysis for soundly approximating this quantity. The program analysis generates a representation of the data analysis as a weighted dependency graph, where the weight is an upper bound on the number of times each variable can be reached, and uses a path search strategy to guarantee an upper bound on the adaptivity. We implement our program analysis and show that it can help to analyze the adaptivity of several concrete data analyses with different adaptivity structures.
more » « less
Full Text Available
Continual Release of Differentially Private Synthetic Data from Longitudinal Data Collections

https://doi.org/10.1145/3651595

Bun, Mark; Gaboardi, Marco; Neunhoeffer, Marcel; Zhang, Wanrong (May 2024, Proceedings of the ACM on Management of Data)

Motivated by privacy concerns in long-term longitudinal studies in medical and social science research, we study the problem of continually releasing differentially private synthetic data from longitudinal data collections. We introduce a model where, in every time step, each individual reports a new data element, and the goal of the synthesizer is to incrementally update a synthetic dataset in a consistent way to capture a rich class of statistical properties. We give continual synthetic data generation algorithms that preserve two basic types of queries: fixed time window queries and cumulative time queries. We show nearly tight upper bounds on the error rates of these algorithms and demonstrate their empirical performance on realistically sized datasets from the U.S. Census Bureau's Survey of Income and Program Participation.
more » « less
Full Text Available
Differentially private confidence intervals for proportions under stratified random sampling

https://doi.org/10.1214/24-EJS2234

Lin, Shurong; Bun, Mark; Gaboardi, Marco; Kolaczyk, Eric D; Smith, Adam (January 2024, Electronic Journal of Statistics)

Full Text Available

« Prev Next »

Search for: All records