We consider the problem of accurately recovering a matrix B of size M by M, which represents a probability distribution over M^2 outcomes, given access to an observed matrix of "counts" generated by taking independent samples from the distribution B. How can structural properties of the underlying matrix B be leveraged to yield computationally efficient and information theoretically optimal reconstruction algorithms? When can accurate reconstruction be accomplished in the sparse data regime? This basic problem lies at the core of a number of questions that are currently being considered by different communities, including building recommendation systems and collaborative filtering in the sparse data regime, community detection in sparse random graphs, learning structured models such as topic models or hidden Markov models, and the efforts from the natural language processing community to compute "word embeddings". Many aspects of this problemboth in terms of learning and property testing/estimation and on both the algorithmic and information theoretic sidesremain open. Our results apply to the setting where B has a low rank structure. For this setting, we propose an efficient (and practically viable) algorithm that accurately recovers the underlying M by M matrix using O(M) samples} (where we assume the rank is a constant).more »
GLAD: Learning sparse graph recovery
Recovering sparse conditional independence graphs from data is a fundamental
problem in machine learning with wide applications. A popular formulation of
the problem is an L1 regularized maximum likelihood estimation. Many convex
optimization algorithms have been designed to solve this formulation to recover the
graph structure. Recently, there is a surge of interest to learn algorithms directly
based on data, and in this case, learn to map empirical covariance to the sparse
precision matrix. However, it is a challenging task in this case, since the symmetric
positive definiteness (SPD) and sparsity of the matrix are not easy to enforce in
learned algorithms, and a direct mapping from data to precision matrix may contain
many parameters. We propose a deep learning architecture, GLAD, which uses an
Alternating Minimization (AM) algorithm as our model inductive bias, and learns
the model parameters via supervised learning. We show that GLAD learns a very
compact and effective model for recovering sparse graphs from data.
 Award ID(s):
 1841351
 Publication Date:
 NSFPAR ID:
 10334056
 Journal Name:
 International Conference on Learning Representations (ICLR)
 Sponsoring Org:
 National Science Foundation
More Like this


Fast linear transforms are ubiquitous in machine learning, including the discrete Fourier transform, discrete cosine transform, and other structured transformations such as convolutions. All of these transforms can be represented by dense matrixvector multiplication, yet each has a specialized and highly efficient (subquadratic) algorithm. We ask to what extent handcrafting these algorithms and implementations is necessary, what structural prior they encode, and how much knowledge is required to automatically learn a fast algorithm for a provided structured transform. Motivated by a characterization of fast matrixvector multiplication as products of sparse matrices, we introduce a parameterization of divideandconquer methods that is capable of representing a large class of transforms. This generic formulation can automatically learn an efficient algorithm for many important transforms; for example, it recovers the O(N logN) CooleyTukey FFT algorithm to machine precision, for dimensions N up to 1024. Furthermore, our method can be incorporated as a lightweight replacement of generic matrices in machine learning pipelines to learn efficient and compressible transformations. On a standard task of compressing a single hiddenlayer network, our method exceeds the classification accuracy of unconstrained matrices on CIFAR10 by 3.9 points—the first time a structured approach has done so—with 4X faster inference speed andmore »

Many programanalysis problems can be formulated as graphreachability problems. Interleaved Dyck language reachability ( InterDyck reachability) is a fundamental framework to express a wide variety of programanalysis problems over edgelabeled graphs. The InterDyck language represents an intersection of multiple matchedparenthesis languages (i.e., Dyck languages). In practice, program analyses typically leverage one Dyck language to achieve contextsensitivity, and other Dyck languages to model data dependencies, such as fieldsensitivity and pointer references/dereferences. In the ideal case, an InterDyck reachability framework should model multiple Dyck languages simultaneously . Unfortunately, precise InterDyck reachability is undecidable. Any practical solution must overapproximate the exact answer. In the literature, a lot of work has been proposed to overapproximate the InterDyck reachability formulation. This article offers a new perspective on improving both the precision and the scalability of InterDyck reachability: we aim at simplifying the underlying input graph G . Our key insight is based on the observation that if an edge is not contributing to any InterDyck paths, we can safely eliminate it from G . Our technique is orthogonal to the InterDyck reachability formulation and can serve as a preprocessing step with any overapproximating approach for InterDyck reachability. We have applied our graph simplification algorithm tomore »

Many realworld applications involve longitudinal data, consisting of observations of several variables, where different subsets of variables are sampled at irregularly spaced time points. We introduce the Longitudinal Gaussian Process Latent Variable Model (LGPLVM), a variant of the Gaussian Process Latent Variable Model, for learning compact representations of such data. LGPLVM overcomes a key limitation of the Dynamic Gaussian Process Latent Variable Model and its variants, which rely on the assumption that the data are fully observed over all of the sampled time points. We describe an effective approach to learning the parameters of LGPLVM from sparse observations, by coupling the dynamical model with a Multitask Gaussian Process model for sampling of the missing observations at each step of the gradientbased optimization of the variational lower bound. We further show the advantage of the Sparse Process Convolution framework to learn the latent representation of sparsely and irregularly sampled longitudinal data with minimal computational overhead relative to a standard Latent Variable Model. We demonstrated experiments with synthetic data as well as variants of MOCAP data with varying degrees of sparsity of observations that show that LGPLVM substantially and consistently outperforms the stateoftheart alternatives in recovering the missing observations even when themore »

MixtureofExperts (MoE) is a widely popular model for ensemble learning and is a basic building block of highly successful modern neural networks as well as a component in Gated Recurrent Units (GRU) and Attention networks. However, present algorithms for learning MoE, including the EM algorithm and gradient descent, are known to get stuck in local optima. From a theoretical viewpoint, finding an efficient and provably consistent algorithm to learn the parameters remains a long standing open problem for more than two decades. In this paper, we introduce the first algorithm that learns the true parameters of a MoE model for a wide class of nonlinearities with global consistency guarantees. While existing algorithms jointly or iteratively estimate the expert parameters and the gating parameters in the MoE, we propose a novel algorithm that breaks the deadlock and can directly estimate the expert parameters by sensing its echo in a carefully designed crossmoment tensor between the inputs and the output. Once the experts are known, the recovery of gating parameters still requires an EM algorithm; however, we show that the EM algorithm for this simplified problem, unlike the joint EM algorithm, converges to the true parameters. We empirically validate our algorithm onmore »