 Editors:
 Belkin, M; Kpotufe, S
 Award ID(s):
 1909972
 Publication Date:
 NSFPAR ID:
 10295567
 Journal Name:
 Proceedings of Thirty Fourth Conference on Learning Theory
 Volume:
 134
 Page Range or eLocationID:
 2193
 Sponsoring Org:
 National Science Foundation
More Like this

Stochastic approximation (SA) algorithms are recursive techniques used to obtain the roots of functions that can be expressed as expectations of a noisy parameterized family of functions. In this paper two new SA algorithms are introduced: 1) PolSA, an extension of Polyak's momentum technique with a specially designed matrix momentum, and 2) NeSA, which can either be regarded as a variant of Nesterov's acceleration method, or a simplification of PolSA. The rates of convergence of SA algorithms is well understood. Under special conditions, the mean square error of the parameter estimates is bounded by σ 2 /n+o(1/n), where σ 2 ≥ 0 is an identifiable constant. If these conditions fail, the rate is typically sublinear. There are two well known SA algorithms that ensure a linear rate, with minimal value of variance, σ 2 : the RuppertPolyak averaging technique, and the stochastic NewtonRaphson (SNR) algorithm. It is demonstrated here that under mild technical assumptions, the PolSA algorithm also achieves this optimality criteria. This result is established via novel coupling arguments: It is shown that the parameter estimates obtained from the PolSA algorithm couple with those of the optimal variance (but computationally more expensive) SNR algorithm, at a rate O(1/n 2more »

Obeid, I. (Ed.)The Neural Engineering Data Consortium (NEDC) is developing the Temple University Digital Pathology Corpus (TUDP), an open source database of highresolution images from scanned pathology samples [1], as part of its National Science Foundationfunded Major Research Instrumentation grant titled “MRI: High Performance Digital Pathology Using Big Data and Machine Learning” [2]. The longterm goal of this project is to release one million images. We have currently scanned over 100,000 images and are in the process of annotating breast tissue data for our first official corpus release, v1.0.0. This release contains 3,505 annotated images of breast tissue including 74 patients with cancerous diagnoses (out of a total of 296 patients). In this poster, we will present an analysis of this corpus and discuss the challenges we have faced in efficiently producing high quality annotations of breast tissue. It is well known that state of the art algorithms in machine learning require vast amounts of data. Fields such as speech recognition [3], image recognition [4] and text processing [5] are able to deliver impressive performance with complex deep learning models because they have developed large corpora to support training of extremely highdimensional models (e.g., billions of parameters). Other fields that do notmore »

Hyperspectral unmixing is an important remote sensing task with applications including material identification and analysis. Characteristic spectral features make many pure materials identifiable from their visibletoinfrared spectra, but quantifying their presence within a mixture is a challenging task due to nonlinearities and factors of variation. In this paper, spectral variation is considered from a physicsbased approach and incorporated into an endtoend spectral unmixing algorithm via differentiable programming. The dispersion model is introduced to simulate realistic spectral variation, and an efficient method to fit the parameters is presented. Then, this dispersion model is utilized as a generative model within an analysisbysynthesis spectral unmixing algorithm. Further, a technique for inverse rendering using a convolutional neural network to predict parameters of the generative model is introduced to enhance performance and speed when training data is available. Results achieve stateoftheart on both infrared and visibletonearinfrared (VNIR) datasets, and show promise for the synergy between physicsbased models and deep learning in hyperspectral unmixing in the future.

Abstract In a Merlin–Arthur proof system, the proof verifier (Arthur) accepts valid proofs (from Merlin) with probability 1, and rejects invalid proofs with probability arbitrarily close to 1. The running time of such a system is defined to be the length of Merlin’s proof plus the running time of Arthur. We provide new Merlin–Arthur proof systems for some key problems in finegrained complexity. In several cases our proof systems have optimal running time. Our main results include:
Certifying that a list of
n integers has no 3SUM solution can be done in Merlin–Arthur time . Previously, Carmosino et al. [ITCS 2016] showed that the problem has a nondeterministic algorithm running in$$\tilde{O}(n)$$ $\stackrel{~}{O}\left(n\right)$ time (that is, there is a proof system with proofs of length$$\tilde{O}(n^{1.5})$$ $\stackrel{~}{O}\left({n}^{1.5}\right)$ and a deterministic verifier running in$$\tilde{O}(n^{1.5})$$ $\stackrel{~}{O}\left({n}^{1.5}\right)$ time).$$\tilde{O}(n^{1.5})$$ $\stackrel{~}{O}\left({n}^{1.5}\right)$Counting the number of
k cliques with total edge weight equal to zero in ann node graph can be done in Merlin–Arthur time (where$${\tilde{O}}(n^{\lceil k/2\rceil })$$ $\stackrel{~}{O}\left({n}^{\lceil k/2\rceil}\right)$ ). For odd$$k\ge 3$$ $k\ge 3$k , this bound can be further improved for sparse graphs: for example, counting the number of zeroweight triangles in anm edge graph can be done in Merlin–Arthur time . Previous Merlin–Arthur protocols by Williams [CCC’16] and Björklund and Kaski [PODC’16] could only count$${\tilde{O}}(m)$$ $\stackrel{~}{O}\left(m\right)$k cliques in unweighted graphs, and had worse running times for smallk .Computing the AllPairsmore »
Certifying that an
n variablek CNF is unsatisfiable can be done in Merlin–Arthur time . We also observe an algebrization barrier for the previous$$2^{n/2  n/O(k)}$$ ${2}^{n/2n/O\left(k\right)}$ time Merlin–Arthur protocol of R. Williams [CCC’16] for$$2^{n/2}\cdot \textrm{poly}(n)$$ ${2}^{n/2}\xb7\text{poly}\left(n\right)$ SAT: in particular, his protocol algebrizes, and we observe there is no algebrizing protocol for$$\#$$ $\#$k UNSAT running in time. Therefore we have to exploit nonalgebrizing properties to obtain our new protocol.$$2^{n/2}/n^{\omega (1)}$$ ${2}^{n/2}/{n}^{\omega \left(1\right)}$ Due to the centrality of these problems in finegrained complexity, our results have consequences for many other problems of interest. For example, our work implies that certifying there is no Subset Sum solution toCertifying a Quantified Boolean Formula is true can be done in Merlin–Arthur time
. Previously, the only nontrivial result known along these lines was an Arthur Merlin–Arthur protocol (where Merlin’s proof depends on some of Arthur’s coins) running in$$2^{4n/5}\cdot \textrm{poly}(n)$$ ${2}^{4n/5}\xb7\text{poly}\left(n\right)$ time.$$2^{2n/3}\cdot \textrm{poly}(n)$$ ${2}^{2n/3}\xb7\text{poly}\left(n\right)$n integers can be done in Merlin–Arthur time , improving on the previous best protocol by Nederlof [IPL 2017] which took$$2^{n/3}\cdot \textrm{poly}(n)$$ ${2}^{n/3}\xb7\text{poly}\left(n\right)$ time.$$2^{0.49991n}\cdot \textrm{poly}(n)$$ ${2}^{0.49991n}\xb7\text{poly}\left(n\right)$ 
Abstract We show that a very simple pseudorandom generator fools intersections of k linear threshold functions (LTFs) and arbitrary functions of k LTFs over ndimensional Gaussian space. The two analyses of our PRG (for intersections versus arbitrary functions of LTFs) are quite different from each other and from previous analyses of PRGs for functions of halfspaces. Our analysis for arbitrary functions of LTFs establishes bounds on the Wasserstein distance between Gaussian random vectors with similar covariance matrices, and combines these bounds with a conversion from Wasserstein distance to "unionoforthants" distance from [Xi Chen et al., 2014]. Our analysis for intersections of LTFs uses extensions of the classical SudakovFernique type inequalities, which give bounds on the difference between the expectations of the maxima of two Gaussian random vectors with similar covariance matrices. For all values of k, our generator has seed length O(log n) + poly(k) for arbitrary functions of k LTFs and O(log n) + poly(log k) for intersections of k LTFs. The best previous result, due to [Gopalan et al., 2010], only gave such PRGs for arbitrary functions of k LTFs when k=O(log log n) and for intersections of k LTFs when k=O((log n)/(log log n)). Thus our PRGmore »