Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher.
Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?
Some links on this page may take you to non-federal websites. Their policies may differ from this site.
-
Abstract The protection of private information is of vital importance in data-driven research, business and government. The conflict between privacy and utility has triggered intensive research in the computer science and statistics communities, who have developed a variety of methods for privacy-preserving data release. Among the main concepts that have emerged are anonymity and differential privacy. Today, another solution is gaining traction, synthetic data. However, the road to privacy is paved with NP-hard problems. In this paper, we focus on the NP-hard challenge to develop a synthetic data generation method that is computationally efficient, comes with provable privacy guarantees and rigorously quantifies data utility. We solve a relaxed version of this problem by studying a fundamental, but a first glance completely unrelated, problem in probability concerning the concept of covariance loss. Namely, we find a nearly optimal and constructive answer to the question how much information is lost when we take conditional expectation. Surprisingly, this excursion into theoretical probability produces mathematical techniques that allow us to derive constructive, approximately optimal solutions to difficult applied problems concerning microaggregation, privacy and synthetic data.more » « less
-
null (Ed.)Motivated by biological considerations, we study sparse neural maps from an input layer to a target layer with sparse activity, and specifically the problem of storing K input-target associations (x, y), or memories, when the target vectors y are sparse. We mathematically prove that K undergoes a phase transition and that in general, and some-what paradoxically, sparsity in the target layers increases the storage capacity of the map.The target vectors can be chosen arbitrarily, including in random fashion, and the memories can be both encoded and decoded by networks trained using local learning rules, including the simple Hebb rule. These results are robust under a variety of statistical assumptions on the data. The proofs rely on elegant properties of random polytopes and sub-gaussian random vector variables. Open problems and connections to capacity theories and polynomial threshold maps are discussedmore » « less
-
null (Ed.)We prove the Marchenko–Pastur law for the eigenvalues of [Formula: see text] sample covariance matrices in two new situations where the data does not have independent coordinates. In the first scenario — the block-independent model — the [Formula: see text] coordinates of the data are partitioned into blocks in such a way that the entries in different blocks are independent, but the entries from the same block may be dependent. In the second scenario — the random tensor model — the data is the homogeneous random tensor of order [Formula: see text], i.e. the coordinates of the data are all [Formula: see text] different products of [Formula: see text] variables chosen from a set of [Formula: see text] independent random variables. We show that Marchenko–Pastur law holds for the block-independent model as long as the size of the largest block is [Formula: see text], and for the random tensor model as long as [Formula: see text]. Our main technical tools are new concentration inequalities for quadratic forms in random variables with block-independent coordinates, and for random tensors.more » « less