NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Statistical Data Privacy: A Song of Privacy and Utility

https://doi.org/10.1146/annurev-statistics-033121-112921

Slavković, Aleksandra; Seeman, Jeremy (March 2023, Annual Review of Statistics and Its Application)

To quantify trade-offs between increasing demand for open data sharing and concerns about sensitive information disclosure, statistical data privacy (SDP) methodology analyzes data release mechanisms that sanitize outputs based on confidential data. Two dominant frameworks exist: statistical disclosure control (SDC) and the more recent differential privacy (DP). Despite framing differences, both SDC and DP share the same statistical problems at their core. For inference problems, either we may design optimal release mechanisms and associated estimators that satisfy bounds on disclosure risk measures, or we may adjust existing sanitized output to create new statistically valid and optimal estimators. Regardless of design or adjustment, in evaluating risk and utility, valid statistical inferences from mechanism outputs require uncertainty quantification that accounts for the effect of the sanitization mechanism that introduces bias and/or variance. In this review, we discuss the statistical foundations common to both SDC and DP, highlight major developments in SDP, and present exciting open research problems in private inference.
more » « less
Full Text Available
Shape and structure preserving differential privacy.

C. Soto, K. Bharath (October 2022, Advances in Neural Information Processing Systems.)

It is common for data structures such as images and shapes of 2D objects to be represented as points on a manifold. The utility of a mechanism to produce sanitized differentially private estimates from such data is intimately linked to how compatible it is with the underlying structure and geometry of the space. In particular, as recently shown, utility of the Laplace mechanism on a positively curved manifold, such as Kendall’s 2D shape space, is significantly influenced by the curvature. Focusing on the problem of sanitizing the Fr\'echet mean of a sample of points on a manifold, we exploit the characterization of the mean as the minimizer of an objective function comprised of the sum of squared distances and develop a K-norm gradient mechanism on Riemannian manifolds that favors values that produce gradients close to the the zero of the objective function. For the case of positively curved manifolds, we describe how using the gradient of the squared distance function offers better control over sensitivity than the Laplace mechanism, and demonstrate this numerically on a dataset of shapes of corpus callosa. Further illustrations of the mechanism’s utility on a sphere and the manifold of symmetric positive definite matrices are also presented.
more » « less
Full Text Available
A Latent Class Modeling Approach for Differentially Private Synthetic Data for Contingency Tables

https://doi.org/10.29012/jpc.768

Nixon, Michelle; Barrientos, Andres; Reiter, Jerome; Slavkovic, Aleksandra (July 2022, Journal of Privacy and Confidentiality)

We present an approach to construct differentially private synthetic data for contingency tables. The algorithm achieves privacy by adding noise to selected summary counts, e.g., two-way margins of the contingency table, via the Geometric mechanism. We posit an underlying latent class model for the counts, estimate the parameters of the model based on the noisy counts, and generate synthetic data using the estimated model. This approach allows the agency to create multiple imputations of synthetic data with no additional privacy loss, thereby facilitating estimation of uncertainty in downstream analyses. We illustrate the approach using a subset of the 2016 American Community Survey Public Use Microdata Sets.
more » « less
Full Text Available
Representation of Chromosome Conformations Using a Shape Alphabet Across Modeling Methods

https://doi.org/10.1109/BIBM52615.2021.9669716

Soto, Carlos; Dalgarno, Audrey; Bryner, Darshan; McLaughlin, Benjamin; Neretti, Nicola; Srivastava, Anuj (December 2021, 2021 IEEE International Conference on Bioinformatics and Biomedicine (BIBM))

Despite enormous structural variability exhibited in 3D chromosomal conformations at a global scale, there is a significant commonality of structures visible at smaller, local levels. We hypothesize that chromosomal conformations are representable as concatenations of a handful of prototypical shapelets, termed shape letters. This is akin to expressing complicated sentences in a language using only a small set of letters. Our goal is to organize the vast variability of 3D chromosomal conformation by constructing a set of predominant shape letters, termed a shape alphabet, using statistical shape analysis of curvelets taken from training conformations. This paper utilizes conformations generated from Integrative Genome Modeling to develop a shape alphabet as follows: it first segments 3D conformations into curvelets according to their Topologically Associated Domains. It then clusters these segments, estimates mean shapes, and refines and reorders these shapes into a Chromosome Shape Alphabet. The paper demonstrates effectiveness of this construction by successfully representing independent test conformations taken from IGM and other methods such as SIMBA3D, both symbolically and structurally, using the constructed alphabet.
more » « less
Full Text Available
Automatable Distributed Regression Analysis of Vertically Partitioned Data Facilitated by PopMedNet: Feasibility and Enhancement Study

https://doi.org/10.2196/21459

Her, Qoua; Kent, Thomas; Samizo, Yuji; Slavkovic, Aleksandra; Vilk, Yury; Toh, Sengwee (January 2021, JMIR Medical Informatics)
null (Ed.)
Background In clinical research, important variables may be collected from multiple data sources. Physical pooling of patient-level data from multiple sources often raises several challenges, including proper protection of patient privacy and proprietary interests. We previously developed an SAS-based package to perform distributed regression—a suite of privacy-protecting methods that perform multivariable-adjusted regression analysis using only summary-level information—with horizontally partitioned data, a setting where distinct cohorts of patients are available from different data sources. We integrated the package with PopMedNet, an open-source file transfer software, to facilitate secure file transfer between the analysis center and the data-contributing sites. The feasibility of using PopMedNet to facilitate distributed regression analysis (DRA) with vertically partitioned data, a setting where the data attributes from a cohort of patients are available from different data sources, was unknown. Objective The objective of the study was to describe the feasibility of using PopMedNet and enhancements to PopMedNet to facilitate automatable vertical DRA (vDRA) in real-world settings. Methods We gathered the statistical and informatic requirements of using PopMedNet to facilitate automatable vDRA. We enhanced PopMedNet based on these requirements to improve its technical capability to support vDRA. Results PopMedNet can enable automatable vDRA. We identified and implemented two enhancements to PopMedNet that improved its technical capability to perform automatable vDRA in real-world settings. The first was the ability to simultaneously upload and download multiple files, and the second was the ability to directly transfer summary-level information between the data-contributing sites without a third-party analysis center. Conclusions PopMedNet can be used to facilitate automatable vDRA to protect patient privacy and support clinical research in real-world settings.
more » « less
Full Text Available
Exact Privacy Guarantees for Markov Chain Implementations of the Exponential Mechanism with Artificial Atoms

J. Seeman, M. Reimherr (January 2021, Advances in Neural Information Processing Systems 34 - 35th Conference on Neural Information Processing Systems, NeurIPS 2021)
Marc'Aurelio Ranzato, Alina Beygelzimer (Ed.)
Implementations of the exponential mechanism in differential privacy often require sampling from intractable distributions. When approximate procedures like Markov chain Monte Carlo (MCMC) are used, the end result incurs costs to both privacy and accuracy. Existing work has examined these effects asymptotically, but implementable finite sample results are needed in practice so that users can specify privacy budgets in advance and implement samplers with exact privacy guarantees. In this paper, we use tools from ergodic theory and perfect simulation to design exact finite runtime sampling algorithms for the exponential mechanism by introducing an intermediate modified target distribution using artificial atoms. We propose an additional modification of this sampling algorithm that maintains its ǫ-DP guarantee and has improved runtime at the cost of some utility. We then compare these methods in scenarios where we can explicitly calculate a δ cost (as in (ǫ, δ)-DP) incurred when using standard MCMC techniques. Much as there is a well known trade-off between privacy and utility, we demonstrate that there is also a trade-off between privacy guarantees and runtime.
more » « less
Full Text Available
Formal Privacy for Modern Nonparametric Statistics

https://doi.org/10.1080/09332480.2020.1847959

Awan, Jordan; Reimherr, Matthew; Slavković, Aleksandra (October 2020, CHANCE)
null (Ed.)
Full Text Available
Structure and Sensitivity in Differential Privacy: Comparing K-Norm Mechanisms

https://doi.org/10.1080/01621459.2020.1773831

Awan, Jordan and (July 2020, Journal of the American Statistical Association)

Full Text Available
KNG: The K-Norm Gradient Mechanism

Reimherr, Matthew and (January 2019, Advances in neural information processing systems)

Full Text Available
Elliptical Perturbations for Differential Privacy

Reimherr, Matthew and (January 2019, Advances in neural information processing systems)

Full Text Available

Search for: All records