Given data drawn from an unknown distribution, D, to what extent is it possible to amplify'' this dataset and faithfully output an even larger set of samples that appear to have been drawn from D? We formalize this question as follows: an (n,m) amplification procedure takes as input n independent draws from an unknown distribution D, and outputs a set of m > n samples'' which must be indistinguishable from m samples drawn iid from D. We consider this sample amplification problem in two fundamental settings: the case where D is an arbitrary discrete distribution supported on k elements, and the case where D is a d-dimensional Gaussian with unknown mean, and fixed covariance matrix. Perhaps surprisingly, we show a valid amplification procedure exists for both of these settings, even in the regime where the size of the input dataset, n, is significantly less than what would be necessary to learn distribution D to non-trivial accuracy. We also show that our procedures are optimal up to constant factors. Beyond these results, we describe potential applications of such data amplification, and formalize a number of curious directions for future research along this vein.
more »
« less
Minimal Non-Uniform Sampling For Multi-Dimensional Period Identification
This paper addresses a fundamental question in the context of multi-dimensional periodicity. Namely, to distinguish between two N-dimensional periodic patterns, what is the least number of (possibly non-contiguous) samples that need to be observed? This question was only recently addressed for onedimensional signals. This paper generalizes those results to Ndimensional signals. It will be shown that the optimal sampling pattern takes the form of sparse and uniformly separated bunches. Apart from new theoretical insights, this paper’s results may provide the foundation for fast N-dimensional period recognition algorithms that use minimal number of samples.
more »
« less
- Award ID(s):
- 1712633
- PAR ID:
- 10094555
- Date Published:
- Journal Name:
- Asilomar Conf. Sig., Sys., and Computers
- Page Range / eLocation ID:
- 2104 to 2108
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Given data drawn from an unknown distribution, D, to what extent is it possible to ``amplify'' this dataset and faithfully output an even larger set of samples that appear to have been drawn from D? We formalize this question as follows: an (n,m) amplification procedure takes as input n independent draws from an unknown distribution D, and outputs a set of m > n ``samples'' which must be indistinguishable from m samples drawn iid from D. We consider this sample amplification problem in two fundamental settings: the case where D is an arbitrary discrete distribution supported on k elements, and the case where D is a d-dimensional Gaussian with unknown mean, and fixed covariance matrix. Perhaps surprisingly, we show a valid amplification procedure exists for both of these settings, even in the regime where the size of the input dataset, n, is significantly less than what would be necessary to learn distribution D to non-trivial accuracy. We also show that our procedures are optimal up to constant factors. Beyond these results, we describe potential applications of sample amplification, and formalize a number of curious directions for future research.more » « less
-
In this paper, we consider an infinite-dimensional phase retrieval problem to reconstruct real-valued signals living in a shift-invariant space from their phaseless samples taken either on the whole line or on a discrete set with finite sampling density. We characterize all phase retrievable signals in a real-valued shift-invariant space using their nonseparability. For nonseparable signals generated by some function with support length L, we show that they can be well approximated, up to a sign, from their noisy phaseless samples taken on a discrete set with sampling density 2L-1 . In this paper, we also propose an algorithm with linear computational complexity to reconstruct nonseparable signals in a shift-invariant space from their phaseless samples corrupted by bounded noises.more » « less
-
Sampling in shift-invariant spaces is a realistic model for signals with smooth spectrum. In this paper, we consider phaseless sampling and reconstruction of real-valued signals in a high-dimensional shift-invariant space from their magnitude measurements on the whole Euclidean space and from their phaseless samples taken on a discrete set with finite sampling density. The determination of a signal in a shift-invariant space, up to a sign, by its magnitude measurements on the whole Euclidean space has been shown in the literature to be equivalent to its nonseparability. In this paper, we introduce an undirected graph associated with the signal in a shift-invariant space and use connectivity of the graph to characterize nonseparability of the signal. Under the local complement property assumption on a shift-invariant space, we find a discrete set with finite sampling density such that nonseparable signals in the shift-invariant space can be reconstructed in a stable way from their phaseless samples taken on that set. In this paper, we also propose a reconstruction algorithm which provides an approximation to the original signal when its noisy phaseless samples are available only. Finally, numerical simulations are performed to demonstrate the robustness of the proposed algorithm to reconstruct box spline signals from their noisy phaseless samples.more » « less
-
We present a fast, differentially private algorithm for high-dimensional covariance-aware mean estimation with nearly optimal sample complexity. Only exponential-time estimators were previously known to achieve this guarantee. Given n samples from a (sub-)Gaussian distribution with unknown mean μ and covariance Σ, our (ε,δ)-differentially private estimator produces μ~ such that ∥μ−μ~∥Σ≤α as long as n≳dα2+dlog1/δ√αε+dlog1/δε. The Mahalanobis error metric ∥μ−μ^∥Σ measures the distance between μ^ and μ relative to Σ; it characterizes the error of the sample mean. Our algorithm runs in time O~(ndω−1+nd/ε), where ω<2.38 is the matrix multiplication exponent. We adapt an exponential-time approach of Brown, Gaboardi, Smith, Ullman, and Zakynthinou (2021), giving efficient variants of stable mean and covariance estimation subroutines that also improve the sample complexity to the nearly optimal bound above. Our stable covariance estimator can be turned to private covariance estimation for unrestricted subgaussian distributions. With n≳d3/2 samples, our estimate is accurate in spectral norm. This is the first such algorithm using n=o(d2) samples, answering an open question posed by Alabi et al. (2022). With n≳d2 samples, our estimate is accurate in Frobenius norm. This leads to a fast, nearly optimal algorithm for private learning of unrestricted Gaussian distributions in TV distance. Duchi, Haque, and Kuditipudi (2023) obtained similar results independently and concurrently.more » « less
An official website of the United States government

