NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Program Analysis for Adaptive Data Analysis

https://doi.org/10.1145/3656414

Liu, Jiawen; Qu, Weihao; Gaboardi, Marco; Garg, Deepak; Ullman, Jonathan (June 2024, Proceedings of the ACM on Programming Languages)

Data analyses are usually designed to identify some property of the population from which the data are drawn, generalizing beyond the specific data sample. For this reason, data analyses are often designed in a way that guarantees that they produce a low generalization error. That is, they are designed so that the result of a data analysis run on a sample data does not differ too much from the result one would achieve by running the analysis over the entire population. An adaptive data analysis can be seen as a process composed by multiple queries interrogating some data, where the choice of which query to run next may rely on the results of previous queries. The generalization error of each individual query/analysis can be controlled by using an array of well-established statistical techniques. However, when queries are arbitrarily composed, the different errors can propagate through the chain of different queries and bring to a high generalization error. To address this issue, data analysts are designing several techniques that not only guarantee bounds on the generalization errors of single queries, but that also guarantee bounds on the generalization error of the composed analyses. The choice of which of these techniques to use, often depends on the chain of queries that an adaptive data analysis can generate. In this work, we consider adaptive data analyses implemented as while-like programs and we design a program analysis which can help with identifying which technique to use to control their generalization errors. More specifically, we formalize the intuitive notion ofadaptivityas a quantitative property of programs. We do this because the adaptivity level of a data analysis is a key measure to choose the right technique. Based on this definition, we design a program analysis for soundly approximating this quantity. The program analysis generates a representation of the data analysis as a weighted dependency graph, where the weight is an upper bound on the number of times each variable can be reached, and uses a path search strategy to guarantee an upper bound on the adaptivity. We implement our program analysis and show that it can help to analyze the adaptivity of several concrete data analyses with different adaptivity structures.
more » « less
Full Text Available
Understanding the drivers of volcano deformation through geodetic model verification and validation

https://doi.org/10.1007/s00445-023-01687-4

Crozier, Josh; Karlstrom, Leif; Montgomery-Brown, Emily; Angarita, Mario; Cayol, Valérie; Bato, Mary Grace; Wang, Taiyi A.; Grapenthin, Ronni; Shreve, Tara; Anderson, Kyle; et al (December 2023, Bulletin of Volcanology)

Full Text Available
Isolation without taxation: near-zero-cost transitions for WebAssembly and SFI

https://doi.org/10.1145/3498688

Kolosick, Matthew; Narayan, Shravan; Johnson, Evan; Watt, Conrad; LeMay, Michael; Garg, Deepak; Jhala, Ranjit; Stefan, Deian (January 2022, Proceedings of the ACM on Programming Languages)

Software sandboxing or software-based fault isolation (SFI) is a lightweight approach to building secure systems out of untrusted components. Mozilla, for example, uses SFI to harden the Firefox browser by sandboxing third-party libraries, and companies like Fastly and Cloudflare use SFI to safely co-locate untrusted tenants on their edge clouds. While there have been significant efforts to optimize and verify SFI enforcement, context switching in SFI systems remains largely unexplored: almost all SFI systems use heavyweight transitions that are not only error-prone but incur significant performance overhead from saving, clearing, and restoring registers when context switching. We identify a set of zero-cost conditions that characterize when sandboxed code has sufficient structured to guarantee security via lightweight zero-cost transitions (simple function calls). We modify the Lucet Wasm compiler and its runtime to use zero-cost transitions, eliminating the undue performance tax on systems that rely on Lucet for sandboxing (e.g., we speed up image and font rendering in Firefox by up to 29.7% and 10% respectively). To remove the Lucet compiler and its correct implementation of the Wasm specification from the trusted computing base, we (1) develop a static binary verifier , VeriZero, which (in seconds) checks that binaries produced by Lucet satisfy our zero-cost conditions, and (2) prove the soundness of VeriZero by developing a logical relation that captures when a compiled Wasm function is semantically well-behaved with respect to our zero-cost conditions. Finally, we show that our model is useful beyond Wasm by describing a new, purpose-built SFI system, SegmentZero32, that uses x86 segmentation and LLVM with mostly off-the-shelf passes to enforce our zero-cost conditions; our prototype performs on-par with the state-of-the-art Native Client SFI system.
more » « less
Full Text Available
A unifying type-theory for higher-order (amortized) cost analysis

https://doi.org/10.1145/3434308

Rajani, Vineet; Gaboardi, Marco; Garg, Deepak; Hoffmann, Jan (January 2021, Proceedings of the ACM on Programming Languages)

This paper presents λ-amor, a new type-theoretic framework for amortized cost analysis of higher-order functional programs and shows that existing type systems for cost analysis can be embedded in it. λ-amor introduces a new modal type for representing potentials – costs that have been accounted for, but not yet incurred, which are central to amortized analysis. Additionally, λ-amor relies on standard type-theoretic concepts like affineness, refinement types and an indexed cost monad. λ-amor is proved sound using a rather simple logical relation. We embed two existing type systems for cost analysis in λ-amor showing that, despite its simplicity, λ-amor can simulate cost analysis for different evaluation strategies (call-by-name and call-by-value), in different styles (effect-based and coeffect-based), and with or without amortization. One of the embeddings also implies that λ-amor is relatively complete for all terminating PCF programs.
more » « less
Bidirectional type checking for relational properties

https://doi.org/10.1145/3314221.3314603

Çiçek, Ezgi; Qu, Weihao; Barthe, Gilles; Gaboardi, Marco; Garg, Deepak (June 2019, Programming Language Design and Implementation (PLDI))

Full Text Available
Relational cost analysis for functional-imperative programs

https://doi.org/10.1145/3341696

Qu, Weihao; Gaboardi, Marco; Garg, Deepak (July 2019, Proceedings of the ACM on Programming Languages)

Relational cost analysis aims at formally establishing bounds on the difference in the evaluation costs of two programs. As a particular case, one can also use relational cost analysis to establish bounds on the difference in the evaluation cost of the same program on two different inputs. One way to perform relational cost analysis is to use a relational type-and-effect system that supports reasoning about relations between two executions of two programs. Building on this basic idea, we present a type-and-effect system, called ARel, for reasoning about the relative cost of array-manipulating, higher-order functional-imperative programs. The key ingredient of our approach is a new lightweight type refinement discipline that we use to track relations (differences) between two mutable arrays. This discipline combined with Hoare-style triples built into the types allows us to express and establish precise relative costs of several interesting programs which imperatively update their data. We have implemented ARel using ideas from bidirectional type checking.
more » « less
Higher-order probabilistic adversarial computations: categorical semantics and program logics

https://doi.org/10.1145/3473598

Aguirre, Alejandro; Barthe, Gilles; Gaboardi, Marco; Garg, Deepak; Katsumata, Shin-ya; Sato, Tetsuya (August 2021, Proceedings of the ACM on Programming Languages)

Adversarial computations are a widely studied class of computations where resource-bounded probabilistic adversaries have access to oracles, i.e., probabilistic procedures with private state. These computations arise routinely in several domains, including security, privacy and machine learning. In this paper, we develop program logics for reasoning about adversarial computations in a higher-order setting. Our logics are built on top of a simply typed λ-calculus extended with a graded monad for probabilities and state. The grading is used to model and restrict the memory footprint and the cost (in terms of oracle calls) of computations. Under this view, an adversary is a higher-order expression that expects as arguments the code of its oracles. We develop unary program logics for reasoning about error probabilities and expected values, and a relational logic for reasoning about coupling-based properties. All logics feature rules for adversarial computations, and yield guarantees that are valid for all adversaries that satisfy a fixed resource policy. We prove the soundness of the logics in the category of quasi-Borel spaces, using a general notion of graded predicate liftings, and we use logical relations over graded predicate liftings to establish the soundness of proof rules for adversaries. We illustrate the working of our logics with simple but illustrative examples.
more » « less
Relational Reasoning for Markov Chains in a Probabilistic Guarded Lambda Calculus

https://doi.org/10.1007/978-3-319-89884-1

Aguirre, Alejandro; Barthe, Gilles; Birkedal, Lars; Bizjak, Ales; Gaboardi, Marco; Garg, Deepak (April 2018, Lecture notes in computer science)

Full Text Available

Search for: All records