Source Identification for Mixtures of Product Distributions
We give an algorithm for source identification of a mixture of k product distributions on n bits. This is a fundamental problem in machine learning with many applications. Our algorithm identifies the source parameters of an identifiable mixture, given, as input, approximate values of multilinear moments (derived, for instance, from a sufficiently large sample), using 2^O(k^2) n^O(k) arithmetic operations. Our result is the first explicit bound on the computational complexity of source identification of such mixtures. The running time improves previous results by Feldman, O’Donnell, and Servedio (FOCS 2005) and Chen and Moitra (STOC 2019) that guaranteed only learning the mixture (without parametric identification of the source). Our analysis gives a quantitative version of a qualitative characterization of identifiable sources that is due to Tahmasebi, Motahari, and Maddah-Ali (ISIT 2018).
Authors:
; ; ;
Editors:
;
Award ID(s):
Publication Date:
NSF-PAR ID:
10295567
Journal Name:
Proceedings of Thirty Fourth Conference on Learning Theory
Volume:
134
Page Range or eLocation-ID:
2193
National Science Foundation
##### More Like this
1. Stochastic approximation (SA) algorithms are recursive techniques used to obtain the roots of functions that can be expressed as expectations of a noisy parameterized family of functions. In this paper two new SA algorithms are introduced: 1) PolSA, an extension of Polyak's momentum technique with a specially designed matrix momentum, and 2) NeSA, which can either be regarded as a variant of Nesterov's acceleration method, or a simplification of PolSA. The rates of convergence of SA algorithms is well understood. Under special conditions, the mean square error of the parameter estimates is bounded by σ 2 /n+o(1/n), where σ 2 ≥ 0 is an identifiable constant. If these conditions fail, the rate is typically sub-linear. There are two well known SA algorithms that ensure a linear rate, with minimal value of variance, σ 2 : the Ruppert-Polyak averaging technique, and the stochastic Newton-Raphson (SNR) algorithm. It is demonstrated here that under mild technical assumptions, the PolSA algorithm also achieves this optimality criteria. This result is established via novel coupling arguments: It is shown that the parameter estimates obtained from the PolSA algorithm couple with those of the optimal variance (but computationally more expensive) SNR algorithm, at a rate O(1/n 2more »
2. (Ed.)
The Neural Engineering Data Consortium (NEDC) is developing the Temple University Digital Pathology Corpus (TUDP), an open source database of high-resolution images from scanned pathology samples [1], as part of its National Science Foundation-funded Major Research Instrumentation grant titled “MRI: High Performance Digital Pathology Using Big Data and Machine Learning” [2]. The long-term goal of this project is to release one million images. We have currently scanned over 100,000 images and are in the process of annotating breast tissue data for our first official corpus release, v1.0.0. This release contains 3,505 annotated images of breast tissue including 74 patients with cancerous diagnoses (out of a total of 296 patients). In this poster, we will present an analysis of this corpus and discuss the challenges we have faced in efficiently producing high quality annotations of breast tissue. It is well known that state of the art algorithms in machine learning require vast amounts of data. Fields such as speech recognition [3], image recognition [4] and text processing [5] are able to deliver impressive performance with complex deep learning models because they have developed large corpora to support training of extremely high-dimensional models (e.g., billions of parameters). Other fields that do notmore »
3. Hyperspectral unmixing is an important remote sensing task with applications including material identification and analysis. Characteristic spectral features make many pure materials identifiable from their visible-to-infrared spectra, but quantifying their presence within a mixture is a challenging task due to nonlinearities and factors of variation. In this paper, spectral variation is considered from a physics-based approach and incorporated into an end-to-end spectral unmixing algorithm via differentiable programming. The dispersion model is introduced to simulate realistic spectral variation, and an efficient method to fit the parameters is presented. Then, this dispersion model is utilized as a generative model within an analysis-by-synthesis spectral unmixing algorithm. Further, a technique for inverse rendering using a convolutional neural network to predict parameters of the generative model is introduced to enhance performance and speed when training data is available. Results achieve state-of-the-art on both infrared and visible-to-near-infrared (VNIR) datasets, and show promise for the synergy between physicsbased models and deep learning in hyperspectral unmixing in the future.
4. Abstract

In a Merlin–Arthur proof system, the proof verifier (Arthur) accepts valid proofs (from Merlin) with probability 1, and rejects invalid proofs with probability arbitrarily close to 1. The running time of such a system is defined to be the length of Merlin’s proof plus the running time of Arthur. We provide new Merlin–Arthur proof systems for some key problems in fine-grained complexity. In several cases our proof systems have optimal running time. Our main results include:

Certifying that a list ofnintegers has no 3-SUM solution can be done in Merlin–Arthur time$$\tilde{O}(n)$$$\stackrel{~}{O}\left(n\right)$. Previously, Carmosino et al. [ITCS 2016] showed that the problem has a nondeterministic algorithm running in$$\tilde{O}(n^{1.5})$$$\stackrel{~}{O}\left({n}^{1.5}\right)$time (that is, there is a proof system with proofs of length$$\tilde{O}(n^{1.5})$$$\stackrel{~}{O}\left({n}^{1.5}\right)$and a deterministic verifier running in$$\tilde{O}(n^{1.5})$$$\stackrel{~}{O}\left({n}^{1.5}\right)$time).

Counting the number ofk-cliques with total edge weight equal to zero in ann-node graph can be done in Merlin–Arthur time$${\tilde{O}}(n^{\lceil k/2\rceil })$$$\stackrel{~}{O}\left({n}^{⌈k/2⌉}\right)$(where$$k\ge 3$$$k\ge 3$). For oddk, this bound can be further improved for sparse graphs: for example, counting the number of zero-weight triangles in anm-edge graph can be done in Merlin–Arthur time$${\tilde{O}}(m)$$$\stackrel{~}{O}\left(m\right)$. Previous Merlin–Arthur protocols by Williams [CCC’16] and Björklund and Kaski [PODC’16] could only countk-cliques in unweighted graphs, and had worse running times for smallk.

Computing the All-Pairsmore »

Certifying that ann-variablek-CNF is unsatisfiable can be done in Merlin–Arthur time$$2^{n/2 - n/O(k)}$$${2}^{n/2-n/O\left(k\right)}$. We also observe an algebrization barrier for the previous$$2^{n/2}\cdot \textrm{poly}(n)$$${2}^{n/2}·\text{poly}\left(n\right)$-time Merlin–Arthur protocol of R. Williams [CCC’16] for$$\#$$$#$SAT: in particular, his protocol algebrizes, and we observe there is no algebrizing protocol fork-UNSAT running in$$2^{n/2}/n^{\omega (1)}$$${2}^{n/2}/{n}^{\omega \left(1\right)}$time. Therefore we have to exploit non-algebrizing properties to obtain our new protocol.

Certifying a Quantified Boolean Formula is true can be done in Merlin–Arthur time$$2^{4n/5}\cdot \textrm{poly}(n)$$${2}^{4n/5}·\text{poly}\left(n\right)$. Previously, the only nontrivial result known along these lines was an Arthur Merlin–Arthur protocol (where Merlin’s proof depends on some of Arthur’s coins) running in$$2^{2n/3}\cdot \textrm{poly}(n)$$${2}^{2n/3}·\text{poly}\left(n\right)$time.

Due to the centrality of these problems in fine-grained complexity, our results have consequences for many other problems of interest. For example, our work implies that certifying there is no Subset Sum solution tonintegers can be done in Merlin–Arthur time$$2^{n/3}\cdot \textrm{poly}(n)$$${2}^{n/3}·\text{poly}\left(n\right)$, improving on the previous best protocol by Nederlof [IPL 2017] which took$$2^{0.49991n}\cdot \textrm{poly}(n)$$${2}^{0.49991n}·\text{poly}\left(n\right)$time.

5. Abstract We show that a very simple pseudorandom generator fools intersections of k linear threshold functions (LTFs) and arbitrary functions of k LTFs over n-dimensional Gaussian space. The two analyses of our PRG (for intersections versus arbitrary functions of LTFs) are quite different from each other and from previous analyses of PRGs for functions of halfspaces. Our analysis for arbitrary functions of LTFs establishes bounds on the Wasserstein distance between Gaussian random vectors with similar covariance matrices, and combines these bounds with a conversion from Wasserstein distance to "union-of-orthants" distance from [Xi Chen et al., 2014]. Our analysis for intersections of LTFs uses extensions of the classical Sudakov-Fernique type inequalities, which give bounds on the difference between the expectations of the maxima of two Gaussian random vectors with similar covariance matrices. For all values of k, our generator has seed length O(log n) + poly(k) for arbitrary functions of k LTFs and O(log n) + poly(log k) for intersections of k LTFs. The best previous result, due to [Gopalan et al., 2010], only gave such PRGs for arbitrary functions of k LTFs when k=O(log log n) and for intersections of k LTFs when k=O((log n)/(log log n)). Thus our PRGmore »