skip to main content

Title: Duet: an expressive higher-order language and linear type system for statically enforcing differential privacy
During the past decade, differential privacy has become the gold standard for protecting the privacy of individuals. However, verifying that a particular program provides differential privacy often remains a manual task to be completed by an expert in the field. Language-based techniques have been proposed for fully automating proofs of differential privacy via type system design, however these results have lagged behind advances in differentially-private algorithms, leaving a noticeable gap in programs which can be automatically verified while also providing state-of-the-art bounds on privacy. We propose Duet, an expressive higher-order language, linear type system and tool for automatically verifying differential privacy of general-purpose higher-order programs. In addition to general purpose programming, Duet supports encoding machine learning algorithms such as stochastic gradient descent, as well as common auxiliary data analysis tasks such as clipping, normalization and hyperparameter tuning - each of which are particularly challenging to encode in a statically verified differential privacy framework. We present a core design of the Duet language and linear type system, and complete key proofs about privacy for well-typed programs. We then show how to extend Duet to support realistic machine learning applications and recent variants of differential privacy which result in improved accuracy for more » many practical differentially private algorithms. Finally, we implement several differentially private machine learning algorithms in Duet which have never before been automatically verified by a language-based tool, and we present experimental results which demonstrate the benefits of Duet's language design in terms of accuracy of trained machine learning models. « less
Authors:
; ; ; ; ; ; ; ; ; ;
Award ID(s):
1901278
Publication Date:
NSF-PAR ID:
10219925
Journal Name:
Proceedings of the ACM on Programming Languages
Volume:
3
Issue:
OOPSLA
Page Range or eLocation-ID:
1 to 30
ISSN:
2475-1421
Sponsoring Org:
National Science Foundation
More Like this
  1. Differential privacy offers a formal privacy guarantee for individuals, but many deployments of differentially private systems require a trusted third party (the data curator). We propose DuetSGX, a system that uses secure hardware (Intel’s SGX) to eliminate the need for a trusted data curator. Data owners submit encrypted data that can be decrypted only within a secure enclave running the DuetSGX system, ensuring that sensitive data is never available to the data curator. Analysts submit queries written in the Duet language, which is specifically designed for verifying that programs satisfy differential privacy; DuetSGX uses the Duet typechecker to verify thatmore »each query satisfies differential privacy before running it. DuetSGX therefore provides the benefits of local differential privacy and central differential privacy simultaneously: noise is only added to final results, and there is no trusted third party. We have implemented a proof-of-concept implementation of DuetSGX and we release it as open-source.« less
  2. Recent work on formal verification of differential privacy shows a trend toward usability and expressiveness – generating a correctness proof of sophisticated algorithm while minimizing the annotation burden on programmers. Sometimes, combining those two requires substantial changes to program logics: one recent paper is able to verify Report Noisy Max automatically, but it involves a complex verification system using customized program logics and verifiers. In this paper, we propose a new proof technique, called shadow execution, and embed it into a language called ShadowDP. ShadowDP uses shadow execution to generate proofs of differential privacy with very few programmer annotations andmore »without relying on customized logics and verifiers. In addition to verifying Report Noisy Max, we show that it can verify a new variant of Sparse Vector that reports the gap between some noisy query answers and the noisy threshold. Moreover, ShadowDP reduces the complexity of verification: for all of the algorithms we have evaluated, type checking and verification in total takes at most 3 seconds, while prior work takes minutes on the same algorithms.« less
  3. Personal information and other types of private data are valuable for both data owners and institutions interested in providing targeted and customized services that require analyzing such data. In this context, privacy is sometimes seen as a commodity: institutions (data buyers) pay individuals (or data sellers) in exchange for private data. In this study, we examine the problem of designing such data contracts, through which a buyer aims to minimize his payment to the sellers for a desired level of data quality, while the latter aim to obtain adequate compensation for giving up a certain amount of privacy. Specifically, wemore »use the concept of differential privacy and examine a model of linear and nonlinear queries on private data. We show that conventional algorithms that introduce differential privacy via zero-mean noise fall short for the purpose of such transactions as they do not provide sufficient degree of freedom for the contract designer to negotiate between the competing interests of the buyer and the sellers. Instead, we propose a biased differentially private algorithm which allows us to customize the privacy-accuracy tradeoff for each individual. We use a contract design approach to find the optimal contracts when using this biased algorithm to provide privacy, and show that under this combination the buyer can achieve the same level of accuracy with a lower payment as compared to using the unbiased algorithms, while incurring lower privacy loss for the sellers.« less
  4. Abstract Economics and social science research often require analyzing datasets of sensitive personal information at fine granularity, with models fit to small subsets of the data. Unfortunately, such fine-grained analysis can easily reveal sensitive individual information. We study regression algorithms that satisfy differential privacy , a constraint which guarantees that an algorithm’s output reveals little about any individual input data record, even to an attacker with side information about the dataset. Motivated by the Opportunity Atlas , a high-profile, small-area analysis tool in economics research, we perform a thorough experimental evaluation of differentially private algorithms for simple linear regression onmore »small datasets with tens to hundreds of records—a particularly challenging regime for differential privacy. In contrast, prior work on differentially private linear regression focused on multivariate linear regression on large datasets or asymptotic analysis. Through a range of experiments, we identify key factors that affect the relative performance of the algorithms. We find that algorithms based on robust estimators—in particular, the median-based estimator of Theil and Sen—perform best on small datasets (e.g., hundreds of datapoints), while algorithms based on Ordinary Least Squares or Gradient Descent perform better for large datasets. However, we also discuss regimes in which this general finding does not hold. Notably, the differentially private analogues of Theil–Sen (one of which was suggested in a theoretical work of Dwork and Lei) have not been studied in any prior experimental work on differentially private linear regression.« less
  5. Developers of machine learning applications often apply post-training neural network optimizations, such as quantization and pruning, that approximate a neural network to speed up inference and reduce energy consumption, while maintaining high accuracy and robustness. Despite a recent surge in techniques for the robustness verification of neural networks, a major limitation of almost all state-of-the-art approaches is that the verification needs to be run from scratch every time the network is even slightly modified. Running precise end-to-end verification from scratch for every new network is expensive and impractical in many scenarios that use or compare multiple approximate network versions, andmore »the robustness of all the networks needs to be verified efficiently. We present FANC, the first general technique for transferring proofs between a given network and its multiple approximate versions without compromising verifier precision. To reuse the proofs obtained when verifying the original network, FANC generates a set of templates – connected symbolic shapes at intermediate layers of the original network – that capture the proof of the property to be verified. We present novel algorithms for generating and transforming templates that generalize to a broad range of approximate networks and reduce the verification cost. We present a comprehensive evaluation demonstrating the effectiveness of our approach. We consider a diverse set of networks obtained by applying popular approximation techniques such as quantization and pruning on fully-connected and convolutional architectures and verify their robustness against different adversarial attacks such as adversarial patches, L 0 , rotation and brightening. Our results indicate that FANC can significantly speed up verification with state-of-the-art verifier, DeepZ by up to 4.1x.« less