skip to main content


Title: Some examples of privacy-preserving sharing of COVID-19 pandemic data with statistical utility evaluation
Abstract Background

A considerable amount of various types of data have been collected during the COVID-19 pandemic, the analysis and understanding of which have been indispensable for curbing the spread of the disease. As the pandemic moves to an endemic state, the data collected during the pandemic will continue to be rich sources for further studying and understanding the impacts of the pandemic on various aspects of our society. On the other hand, naïve release and sharing of the information can be associated with serious privacy concerns.

Methods

We use three common but distinct data types collected during the pandemic (case surveillance tabular data, case location data, and contact tracing networks) to illustrate the publication and sharing of granular information and individual-level pandemic data in a privacy-preserving manner. We leverage and build upon the concept of differential privacy to generate and release privacy-preserving data for each data type. We investigate the inferential utility of privacy-preserving information through simulation studies at different levels of privacy guarantees and demonstrate the approaches in real-life data. All the approaches employed in the study are straightforward to apply.

Results

The empirical studies in all three data cases suggest that privacy-preserving results based on the differentially privately sanitized data can be similar to the original results at a reasonably small privacy loss ($$\epsilon \approx 1$$ϵ1). Statistical inferences based on sanitized data using the multiple synthesis technique also appear valid, with nominal coverage of 95% confidence intervals when there is no noticeable bias in point estimation. When$$\epsilon <1$$ϵ<1 and the sample size is not large enough, some privacy-preserving results are subject to bias, partially due to the bounding applied to sanitized data as a post-processing step to satisfy practical data constraints.

Conclusions

Our study generates statistical evidence on the practical feasibility of sharing pandemic data with privacy guarantees and on how to balance the statistical utility of released information during this process.

 
more » « less
NSF-PAR ID:
10414561
Author(s) / Creator(s):
; ;
Publisher / Repository:
Springer Science + Business Media
Date Published:
Journal Name:
BMC Medical Research Methodology
Volume:
23
Issue:
1
ISSN:
1471-2288
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract

    We study the distribution over measurement outcomes of noisy random quantum circuits in the regime of low fidelity, which corresponds to the setting where the computation experiences at least one gate-level error with probability close to one. We model noise by adding a pair of weak, unital, single-qubit noise channels after each two-qubit gate, and we show that for typical random circuit instances, correlations between the noisy output distribution$$p_{\text {noisy}}$$pnoisyand the corresponding noiseless output distribution$$p_{\text {ideal}}$$pidealshrink exponentially with the expected number of gate-level errors. Specifically, the linear cross-entropy benchmarkFthat measures this correlation behaves as$$F=\text {exp}(-2s\epsilon \pm O(s\epsilon ^2))$$F=exp(-2sϵ±O(sϵ2)), where$$\epsilon $$ϵis the probability of error per circuit location andsis the number of two-qubit gates. Furthermore, if the noise is incoherent—for example, depolarizing or dephasing noise—the total variation distance between the noisy output distribution$$p_{\text {noisy}}$$pnoisyand the uniform distribution$$p_{\text {unif}}$$punifdecays at precisely the same rate. Consequently, the noisy output distribution can be approximated as$$p_{\text {noisy}}\approx Fp_{\text {ideal}}+ (1-F)p_{\text {unif}}$$pnoisyFpideal+(1-F)punif. In other words, although at least one local error occurs with probability$$1-F$$1-F, the errors are scrambled by the random quantum circuit and can be treated as global white noise, contributing completely uniform output. Importantly, we upper bound the average total variation error in this approximation by$$O(F\epsilon \sqrt{s})$$O(Fϵs). Thus, the “white-noise approximation” is meaningful when$$\epsilon \sqrt{s} \ll 1$$ϵs1, a quadratically weaker condition than the$$\epsilon s\ll 1$$ϵs1requirement to maintain high fidelity. The bound applies if the circuit size satisfies$$s \ge \Omega (n\log (n))$$sΩ(nlog(n)), which corresponds to onlylogarithmic depthcircuits, and if, additionally, the inverse error rate satisfies$$\epsilon ^{-1} \ge {\tilde{\Omega }}(n)$$ϵ-1Ω~(n), which is needed to ensure errors are scrambled faster thanFdecays. The white-noise approximation is useful for salvaging the signal from a noisy quantum computation; for example, it was an underlying assumption in complexity-theoretic arguments that noisy random quantum circuits cannot be efficiently sampled classically, even when the fidelity is low. Our method is based on a map from second-moment quantities in random quantum circuits to expectation values of certain stochastic processes for which we compute upper and lower bounds.

     
    more » « less
  2. Abstract

    We address a system of equations modeling a compressible fluid interacting with an elastic body in dimension three. We prove the local existence and uniqueness of a strong solution when the initial velocity belongs to the space$$H^{2+\epsilon }$$H2+ϵand the initial structure velocity is in$$H^{1.5+\epsilon }$$H1.5+ϵ, where$$\epsilon \in (0,1/2)$$ϵ(0,1/2).

     
    more » « less
  3. Abstract

    We consider the problem of covering multiple submodular constraints. Given a finite ground setN, a weight function$$w: N \rightarrow \mathbb {R}_+$$w:NR+,rmonotone submodular functions$$f_1,f_2,\ldots ,f_r$$f1,f2,,froverNand requirements$$k_1,k_2,\ldots ,k_r$$k1,k2,,krthe goal is to find a minimum weight subset$$S \subseteq N$$SNsuch that$$f_i(S) \ge k_i$$fi(S)kifor$$1 \le i \le r$$1ir. We refer to this problem asMulti-Submod-Coverand it was recently considered by Har-Peled and Jones (Few cuts meet many point sets. CoRR.arxiv:abs1808.03260Har-Peled and Jones 2018) who were motivated by an application in geometry. Even with$$r=1$$r=1Multi-Submod-Covergeneralizes the well-known Submodular Set Cover problem (Submod-SC), and it can also be easily reduced toSubmod-SC. A simple greedy algorithm gives an$$O(\log (kr))$$O(log(kr))approximation where$$k = \sum _i k_i$$k=ikiand this ratio cannot be improved in the general case. In this paper, motivated by several concrete applications, we consider two ways to improve upon the approximation given by the greedy algorithm. First, we give a bicriteria approximation algorithm forMulti-Submod-Coverthat covers each constraint to within a factor of$$(1-1/e-\varepsilon )$$(1-1/e-ε)while incurring an approximation of$$O(\frac{1}{\epsilon }\log r)$$O(1ϵlogr)in the cost. Second, we consider the special case when each$$f_i$$fiis a obtained from a truncated coverage function and obtain an algorithm that generalizes previous work on partial set cover (Partial-SC), covering integer programs (CIPs) and multiple vertex cover constraints Bera et al. (Theoret Comput Sci 555:2–8 Bera et al. 2014). Both these algorithms are based on mathematical programming relaxations that avoid the limitations of the greedy algorithm. We demonstrate the implications of our algorithms and related ideas to several applications ranging from geometric covering problems to clustering with outliers. Our work highlights the utility of the high-level model and the lens of submodularity in addressing this class of covering problems.

     
    more » « less
  4. Abstract

    We study two-qubit circuits over the Clifford+CS gate set, which consists of the Clifford gates together with the controlled-phase gate CS = diag(1, 1, 1, i). The Clifford+CS gate set is universal for quantum computation and its elements can be implemented fault-tolerantly in most error-correcting schemes through magic state distillation. Since non-Clifford gates are typically more expensive to perform in a fault-tolerant manner, it is often desirable to construct circuits that use few CS gates. In the present paper, we introduce an efficient and optimal synthesis algorithm for two-qubit Clifford+CS operators. Our algorithm inputs a Clifford+CS operatorUand outputs a Clifford+CS circuit forU, which uses the least possible number of CS gates. Because the algorithm is deterministic, the circuit it associates to a Clifford+CS operator can be viewed as a normal form for that operator. We give an explicit description of these normal forms and use this description to derive a worst-case lower bound of$$5{{\rm{log}}}_{2}(\frac{1}{\epsilon })+O(1)$$5log2(1ϵ)+O(1)on the number of CS gates required toϵ-approximate elements of SU(4). Our work leverages a wide variety of mathematical tools that may find further applications in the study of fault-tolerant quantum circuits.

     
    more » « less
  5. Letffbe analytic on[0,1][0,1]with|f(k)(1/2)|⩽<#comment/>Aα<#comment/>kk!|f^{(k)}(1/2)|\leqslant A\alpha ^kk!for some constantsAAandα<#comment/>>2\alpha >2and allk⩾<#comment/>1k\geqslant 1. We show that the median estimate ofμ<#comment/>=∫<#comment/>01f(x)dx\mu =\int _0^1f(x)\,\mathrm {d} xunder random linear scrambling withn=2mn=2^mpoints converges at the rateO(n−<#comment/>clog⁡<#comment/>(n))O(n^{-c\log (n)})for anyc>3log⁡<#comment/>(2)/π<#comment/>2≈<#comment/>0.21c> 3\log (2)/\pi ^2\approx 0.21. We also get a super-polynomial convergence rate for the sample median of2k−<#comment/>12k-1random linearly scrambled estimates, whenk/mk/mis bounded away from zero. Whenffhas app’th derivative that satisfies aλ<#comment/>\lambda-Hölder condition then the median of means has errorO(n−<#comment/>(p+λ<#comment/>)+ϵ<#comment/>)O( n^{-(p+\lambda )+\epsilon })for anyϵ<#comment/>>0\epsilon >0, ifk→<#comment/>∞<#comment/>k\to \inftyasm→<#comment/>∞<#comment/>m\to \infty. The proof techniques use methods from analytic combinatorics that have not previously been applied to quasi-Monte Carlo methods, most notably an asymptotic expression from Hardy and Ramanujan on the number of partitions of a natural number.

     
    more » « less