skip to main content

Attention:

The NSF Public Access Repository (PAR) system and access will be unavailable from 11:00 PM ET on Thursday, January 16 until 2:00 AM ET on Friday, January 17 due to maintenance. We apologize for the inconvenience.


Search for: All records

Creators/Authors contains: "Gong, Ruobin"

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. We propose new differential privacy solutions for when external invariants and integer constraints are simultaneously enforced on the data product. These requirements arise in real world applications of private data curation, including the public release of the 2020 U.S. Decennial Census. They pose a great challenge to the production of provably private data products with adequate statistical usability. We propose integer subspace differential privacy to rigorously articulate the privacy guarantee when data products maintain both the invariants and integer characteristics, and demonstrate the composition and post-processing properties of our proposal. To address the challenge of sampling from a potentially highly restricted discrete space, we devise a pair of unbiased additive mechanisms, the generalized Laplace and the generalized Gaussian mechanisms, by solving the Diophantine equations as defined by the constraints. The proposed mechanisms have good accuracy, with errors exhibiting sub-exponential and sub-Gaussian tail probabilities respectively. To implement our proposal, we design an MCMC algorithm and supply empirical convergence assessment using estimated upper bounds on the total variation distance via L-lag coupling. We demonstrate the efficacy of our proposal with applications to a synthetic problem with intersecting invariants, a sensitive contingency table with known margins, and the 2010 Census county-level demonstration data with mandated fixed state population totals. 
    more » « less
  2. Steed et al . ( 1 ) illustrates the crucial impact that the quality of official statistical data products may exert on the accuracy, stability, and equity of policy decisions on which they are based. The authors remind us that data, however responsibly curated, can be fallible. With this comment, we underscore the importance of conducting principled quality assessment of official statistical data products. We observe that the quality assessment procedure employed by Steed et al . needs improvement, due to (i) the inadmissibility of the estimator used, and (ii) the inconsistent probability model it induces on the joint space of the estimator and the observed data. We discuss the design of alternative statistical methods to conduct principled quality assessments for official statistical data products, showcasing two simulation-based methods for admissible minimax shrinkage estimation via multilevel empirical Bayesian modeling. For policymakers and stakeholders to accurately gauge the context-specific usability of data, the assessment should take into account both uncertainty sources inherent to the data and the downstream use cases, such as policy decisions based on those data products. 
    more » « less
  3. This article proposes a set of categories, each one representing a particular distillation of important statistical ideas. Each category is labeled a “sense” because we think of these as essential in helping every statistical mind connect in constructive and insightful ways with statistical theory, methodologies, and computation, toward the ultimate goal of building statistical phronesis. The illustration of each sense with statistical principles and methods provides a sensical tour of the conceptual landscape of statistics, as a leading discipline in the data science ecosystem. Expected final online publication date for the Annual Review of Statistics and Its Application, Volume 10 is March 2023. Please see http://www.annualreviews.org/page/journal/pubdates for revised estimates. 
    more » « less
  4. This paper discusses how two classes of approximate computation algorithms can be adapted, in a modular fashion, to achieve exact statistical inference from differentially private data products. Considered are approximate Bayesian computation for Bayesian inference, and Monte Carlo Expectation-Maximization for likelihood inference. Up to Monte Carlo error, inference from these algorithms is exact with respect to the joint specification of both the analyst's original data model, and the curator's differential privacy mechanism. Highlighted is a duality between approximate computation on exact data, and exact computation on approximate data, which can be leveraged by a well-designed computational procedure for statistical inference. 
    more » « less
  5. Koyejo, Sanmi ; Mohamed, Shakir (Ed.)
    Differentially private mechanisms protect privacy by introducing additional randomness into the data. Restricting access to only the privatized data makes it challenging to perform valid statistical inference on parameters underlying the confidential data. Specifically, the likelihood function of the privatized data requires integrating over the large space of confidential databases and is typically intractable. For Bayesian analysis, this results in a posterior distribution that is doubly intractable, rendering traditional MCMC techniques inapplicable. We propose an MCMC framework to perform Bayesian inference from the privatized data, which is applicable to a wide range of statistical models and privacy mechanisms. Our MCMC algorithm augments the model parameters with the unobserved confidential data, and alternately updates each one conditional on the other. For the potentially challenging step of updating the confidential data, we propose a generic approach that exploits the privacy guarantee of the mechanism to ensure efficiency. In particular, we give results on the computational complexity, acceptance rate, and mixing properties of our MCMC. We illustrate the efficacy and applicability of our methods on a na\"ive-Bayes log-linear model as well as on a linear regression model. 
    more » « less
  6. Many data applications have certain invariant constraints due to practical needs. Data curators who employ differential privacy need to respect such constraints on the sanitized data product as a primary utility requirement. Invariants challenge the formulation, implementation, and interpretation of privacy guarantees. We propose subspace differential privacy, to honestly characterize the dependence of the sanitized output on confidential aspects of the data. We discuss two design frameworks that convert well-known differentially private mechanisms, such as the Gaussian and the Laplace mechanisms, to subspace differentially private ones that respect the invariants specified by the curator. For linear queries, we discuss the design of near-optimal mechanisms that minimize the mean squared error. Subspace differentially private mechanisms rid the need for post-processing due to invariants, preserve transparency and statistical intelligibility of the output, and can be suitable for distributed implementation. We showcase the proposed mechanisms on the 2020 Census Disclosure Avoidance demonstration data, and a spatio-temporal dataset of mobile access point connections on a large university campus. 
    more » « less
  7. Koyejo, S. ; Mohamed, S. ; Agarwal, A. ; Belgrave, D. ; Cho, K. ; Oh, A. (Ed.)
    Differentially private mechanisms protect privacy by introducing additional randomness into the data. Restricting access to only the privatized data makes it challenging to perform valid statistical inference on parameters underlying the confidential data. Specifically, the likelihood function of the privatized data requires integrating over the large space of confidential databases and is typically intractable. For Bayesian analysis, this results in a posterior distribution that is doubly intractable, rendering traditional MCMC techniques inapplicable. We propose an MCMC framework to perform Bayesian inference from the privatized data, which is applicable to a wide range of statistical models and privacy mechanisms. Our MCMC algorithm augments the model parameters with the unobserved confidential data, and alternately updates each one conditional on the other. For the potentially challenging step of updating the confidential data, we propose a generic approach that exploits the privacy guarantee of the mechanism to ensure efficiency. We give results on the computational complexity, acceptance rate, and mixing properties of our MCMC. We illustrate the efficacy and applicability of our methods on a naive-Bayes log-linear model and on a linear regression model. 
    more » « less