skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.
Attention:The NSF Public Access Repository (PAR) system and access will be unavailable from 11:00 PM ET on Thursday, June 11 until 2:00 AM ET on Friday, June 12 due to maintenance. We apologize for the inconvenience.


Search for: All records

Creators/Authors contains: "Vadhan, Salil"

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Srinivasan, Srikanth (Ed.)
    Given a sequence of samples x_1, … , x_k promised to be drawn from one of two distributions X₀, X₁, a well-studied problem in statistics is to decide which distribution the samples are from. Information theoretically, the maximum advantage in distinguishing the two distributions given k samples is captured by the total variation distance between X₀^{⊗k} and X₁^{⊗k}. However, when we restrict our attention to efficient distinguishers (i.e., small circuits) of these two distributions, exactly characterizing the ability to distinguish X₀^{⊗k} and X₁^{⊗k} is more involved and less understood. In this work, we give a general way to reduce bounds on the computational indistinguishability of X₀ and X₁ to bounds on the information-theoretic indistinguishability of some specific, related variables X̃₀ and X̃₁. As a consequence, we prove a new, tight characterization of the number of samples k needed to efficiently distinguish X₀^{⊗k} and X₁^{⊗k} with constant advantage as k = Θ(d_H^{-2}(X̃₀, X̃₁)), which is the inverse of the squared Hellinger distance d_H between two distributions X̃₀ and X̃₁ that are computationally indistinguishable from X₀ and X₁. Likewise, our framework can be used to re-derive a result of Halevi and Rabin (TCC 2008) and Geier (TCC 2022), proving nearly-tight bounds on how computational indistinguishability scales with the number of samples for arbitrary product distributions. At the heart of our work is the use of the Multicalibration Theorem (Hébert-Johnson, Kim, Reingold, Rothblum 2018) in a way inspired by recent work of Casacuberta, Dwork, and Vadhan (STOC 2024). Multicalibration allows us to relate the computational indistinguishability of X₀, X₁ to the statistical indistinguishability of X̃₀, X̃₁ (for lower bounds on k) and construct explicit circuits to distinguish between X̃₀, X̃₁ and consequently X₀, X₁ (for upper bounds on k). 
    more » « less
  2. Fioretto, Ferdinando; Van_Hentenryck, Pascal (Ed.)
  3. In this paper, we introduce a new, spectral notion of approximation between directed graphs, which we call singular value (SV) approximation. SV-approximation is stronger than previous notions of spectral approximation considered in the literature, including spectral approximation of Laplacians for undirected graphs [ST04], standard approximation for directed graphs [CKP + 17], and unit-circle (UC) approximation for directed graphs [AKM + 20]. Further, SV approximation enjoys several useful properties not possessed by previous notions of approximation, e.g., it is preserved under products of randomwalk matrices and bounded matrices. We provide a nearly linear-time algorithm for SV-sparsifying (and hence UC-sparsifying) Eulerian directed graphs, as well as ℓ-step random walks on such graphs, for any ℓ≤poly(n). Combined with the Eulerian scaling algorithms of [CKK + 18], given an arbitrary (not necessarily Eulerian) directed graph and a set S of vertices, we can approximate the stationary probability mass of the (S,Sc) cut in an ℓ-step random walk to within a multiplicative error of 1/polylog(n) and an additive error of 1/poly(n) in nearly linear time. As a starting point for these results, we provide a simple black-box reduction from SV-sparsifying Eulerian directed graphs to SV-sparsifying undirected graphs; such a directed-to-undirected reduction was not known for previous notions of spectral approximation. 
    more » « less
  4. We give new upper and lower bounds on the power of several restricted classes of arbitrary-order read-once branching programs (ROBPs) and standard-order ROBPs (SOBPs) that have received significant attention in the literature on pseudorandomness for space-bounded computation. - Regular SOBPs of length n and width ⌊w(n+1)/2⌋ can exactly simulate general SOBPs of length n and width w, and moreover an n/2-o(n) blow-up in width is necessary for such a simulation. Our result extends and simplifies prior average-case simulations (Reingold, Trevisan, and Vadhan (STOC 2006), Bogdanov, Hoza, Prakriya, and Pyne (CCC 2022)), in particular implying that weighted pseudorandom generators (Braverman, Cohen, and Garg (SICOMP 2020)) for regular SOBPs of width poly(n) or larger automatically extend to general SOBPs. Furthermore, our simulation also extends to general (even read-many) oblivious branching programs. - There exist natural functions computable by regular SOBPs of constant width that are average-case hard for permutation SOBPs of exponential width. Indeed, we show that Inner-Product mod 2 is average-case hard for arbitrary-order permutation ROBPs of exponential width. - There exist functions computable by constant-width arbitrary-order permutation ROBPs that are worst-case hard for exponential-width SOBPs. - Read-twice permutation branching programs of subexponential width can simulate polynomial-width arbitrary-order ROBPs. 
    more » « less
  5. f-DP has recently been proposed as a generalization of differential privacy allowing a lossless analysis of composition, post-processing, and privacy amplification via subsampling. In the setting of f-DP, we propose the concept of a canonical noise distribution (CND), the first mechanism designed for an arbitrary f-DP guarantee. The notion of CND captures whether an additive privacy mechanism perfectly matches the privacy guarantee of a given f. We prove that a CND always exists, and give a construction that produces a CND for any f. We show that private hypothesis tests are intimately related to CNDs, allowing for the release of private p-values at no additional privacy cost as well as the construction of uniformly most powerful (UMP) tests for binary data, within the general f-DP framework. We apply our techniques to the problem of difference of proportions testing, and construct a UMP unbiased (UMPU) "semi-private" test which upper bounds the performance of any f-DP test. Using this as a benchmark we propose a private test, based on the inversion of characteristic functions, which allows for optimal inference for the two population parameters and is nearly as powerful as the semi-private UMPU. When specialized to the case of (ϵ,0)-DP, we show empirically that our proposed test is more powerful than any (ϵ/sqrt(2))-DP test and has more accurate type I errors than the classic normal approximation test. 
    more » « less
  6. Abstract Economics and social science research often require analyzing datasets of sensitive personal information at fine granularity, with models fit to small subsets of the data. Unfortunately, such fine-grained analysis can easily reveal sensitive individual information. We study regression algorithms that satisfy differential privacy , a constraint which guarantees that an algorithm’s output reveals little about any individual input data record, even to an attacker with side information about the dataset. Motivated by the Opportunity Atlas , a high-profile, small-area analysis tool in economics research, we perform a thorough experimental evaluation of differentially private algorithms for simple linear regression on small datasets with tens to hundreds of records—a particularly challenging regime for differential privacy. In contrast, prior work on differentially private linear regression focused on multivariate linear regression on large datasets or asymptotic analysis. Through a range of experiments, we identify key factors that affect the relative performance of the algorithms. We find that algorithms based on robust estimators—in particular, the median-based estimator of Theil and Sen—perform best on small datasets (e.g., hundreds of datapoints), while algorithms based on Ordinary Least Squares or Gradient Descent perform better for large datasets. However, we also discuss regimes in which this general finding does not hold. Notably, the differentially private analogues of Theil–Sen (one of which was suggested in a theoretical work of Dwork and Lei) have not been studied in any prior experimental work on differentially private linear regression. 
    more » « less