skip to main content

Title: Iterative Collaborative Filtering for Sparse Matrix Estimation
We consider sparse matrix estimation where the goal is to estimate an n-by-n matrix from noisy observations of a small subset of its entries. We analyze the estimation error of the popularly used collaborative filtering algorithm for the sparse regime. Specifically, we propose a novel iterative variant of the algorithm, adapted to handle the setting of sparse observations. We establish that as long as the number of entries observed at random scale logarithmically larger than linear in n, the estimation error with respect to the entry-wise max norm decays to zero as n goes to infinity, assuming the underlying matrix of interest has constant rank r. Our result is robust to model misspecification in that if the underlying matrix is approximately rank r, then the estimation error decays to the approximation error with respect to the [Formula: see text]-norm. In the process, we establish the algorithm’s ability to handle arbitrary bounded noise in the observations.
Authors:
; ; ;
Award ID(s):
1948256 1955997
Publication Date:
NSF-PAR ID:
10332704
Journal Name:
Operations Research
ISSN:
0030-364X
Sponsoring Org:
National Science Foundation
More Like this
  1. We propose an algorithm to impute and forecast a time series by transforming the observed time series into a matrix, utilizing matrix estimation to recover missing values and de-noise observed entries, and performing linear regression to make predictions. At the core of our analysis is a representation result, which states that for a large class of models, the transformed time series matrix is (approximately) low-rank. In effect, this generalizes the widely used Singular Spectrum Analysis (SSA) in the time series literature, and allows us to establish a rigorous link between time series analysis and matrix estimation. The key to establishing this link is constructing a Page matrix with non-overlapping entries rather than a Hankel matrix as is commonly done in the literature (e.g., SSA). This particular matrix structure allows us to provide finite sample analysis for imputation and prediction, and prove the asymptotic consistency of our method. Another salient feature of our algorithm is that it is model agnostic with respect to both the underlying time dynamics and the noise distribution in the observations. The noise agnostic property of our approach allows us to recover the latent states when only given access to noisy and partial observations a la amore »Hidden Markov Model; e.g., recovering the time-varying parameter of a Poisson process without knowing that the underlying process is Poisson. Furthermore, since our forecasting algorithm requires regression with noisy features, our approach suggests a matrix estimation based method-coupled with a novel, non-standard matrix estimation error metric-to solve the error-in-variable regression problem, which could be of interest in its own right. Through synthetic and real-world datasets, we demonstrate that our algorithm outperforms standard software packages (including R libraries) in the presence of missing data as well as high levels of noise.« less
  2. We propose an algorithm to impute and forecast a time series by transforming the observed time series into a matrix, utilizing matrix estimation to recover missing values and de-noise observed entries, and performing linear regression to make predictions. At the core of our analysis is a representation result, which states that for a large class of models, the transformed time series matrix is (approximately) low-rank. In effect, this generalizes the widely used Singular Spectrum Analysis (SSA) in the time series literature, and allows us to establish a rigorous link between time series analysis and matrix estimation. The key to establishing this link is constructing a Page matrix with non-overlapping entries rather than a Hankel matrix as is commonly done in the literature (e.g., SSA). This particular matrix structure allows us to provide finite sample analysis for imputation and prediction, and prove the asymptotic consistency of our method. Another salient feature of our algorithm is that it is model agnostic with respect to both the underlying time dynamics and the noise distribution in the observations. The noise agnostic property of our approach allows us to recover the latent states when only given access to noisy and partial observations a la amore »Hidden Markov Model; e.g., recovering the time-varying parameter of a Poisson process without knowing that the underlying process is Poisson. Furthermore, since our forecasting algorithm requires regression with noisy features, our approach suggests a matrix estimation based method—coupled with a novel, non-standard matrix estimation error metric—to solve the error-in-variable regression problem, which could be of interest in its own right. Through synthetic and real-world datasets, we demonstrate that our algorithm outperforms standard software packages (including R libraries) in the presence of missing data as well as high levels of noise.« less
  3. We give two new quantum algorithms for solving semidefinite programs (SDPs) providing quantum speed-ups. We consider SDP instances with m constraint matrices, each of dimension n, rank at most r, and sparsity s. The first algorithm assumes an input model where one is given access to an oracle to the entries of the matrices at unit cost. We show that it has run time O~(s^2 (sqrt{m} epsilon^{-10} + sqrt{n} epsilon^{-12})), with epsilon the error of the solution. This gives an optimal dependence in terms of m, n and quadratic improvement over previous quantum algorithms (when m ~~ n). The second algorithm assumes a fully quantum input model in which the input matrices are given as quantum states. We show that its run time is O~(sqrt{m}+poly(r))*poly(log m,log n,B,epsilon^{-1}), with B an upper bound on the trace-norm of all input matrices. In particular the complexity depends only polylogarithmically in n and polynomially in r. We apply the second SDP solver to learn a good description of a quantum state with respect to a set of measurements: Given m measurements and a supply of copies of an unknown state rho with rank at most r, we show we can find in time sqrt{m}*poly(logmore »m,log n,r,epsilon^{-1}) a description of the state as a quantum circuit preparing a density matrix which has the same expectation values as rho on the m measurements, up to error epsilon. The density matrix obtained is an approximation to the maximum entropy state consistent with the measurement data considered in Jaynes' principle from statistical mechanics. As in previous work, we obtain our algorithm by "quantizing" classical SDP solvers based on the matrix multiplicative weight update method. One of our main technical contributions is a quantum Gibbs state sampler for low-rank Hamiltonians, given quantum states encoding these Hamiltonians, with a poly-logarithmic dependence on its dimension, which is based on ideas developed in quantum principal component analysis. We also develop a "fast" quantum OR lemma with a quadratic improvement in gate complexity over the construction of Harrow et al. [Harrow et al., 2017]. We believe both techniques might be of independent interest.« less
  4. null (Ed.)
    Abstract One of the classical approaches for estimating the frequencies and damping factors in a spectrally sparse signal is the MUltiple SIgnal Classification (MUSIC) algorithm, which exploits the low-rank structure of an autocorrelation matrix. Low-rank matrices have also received considerable attention recently in the context of optimization algorithms with partial observations, and nuclear norm minimization (NNM) has been widely used as a popular heuristic of rank minimization for low-rank matrix recovery problems. On the other hand, it has been shown that NNM can be viewed as a special case of atomic norm minimization (ANM), which has achieved great success in solving line spectrum estimation problems. However, as far as we know, the general ANM (not NNM) considered in many existing works can only handle frequency estimation in undamped sinusoids. In this work, we aim to fill this gap and deal with damped spectrally sparse signal recovery problems. In particular, inspired by the dual analysis used in ANM, we offer a novel optimization-based perspective on the classical MUSIC algorithm and propose an algorithm for spectral estimation that involves searching for the peaks of the dual polynomial corresponding to a certain NNM problem, and we show that this algorithm is in factmore »equivalent to MUSIC itself. Building on this connection, we also extend the classical MUSIC algorithm to the missing data case. We provide exact recovery guarantees for our proposed algorithms and quantify how the sample complexity depends on the true spectral parameters. In particular, we provide a parameter-specific recovery bound for low-rank matrix recovery of jointly sparse signals rather than use certain incoherence properties as in existing literature. Simulation results also indicate that the proposed algorithms significantly outperform some relevant existing methods (e.g., ANM) in frequency estimation of damped exponentials.« less
  5. We consider the problem of accurately recovering a matrix B of size M by M, which represents a probability distribution over M^2 outcomes, given access to an observed matrix of "counts" generated by taking independent samples from the distribution B. How can structural properties of the underlying matrix B be leveraged to yield computationally efficient and information theoretically optimal reconstruction algorithms? When can accurate reconstruction be accomplished in the sparse data regime? This basic problem lies at the core of a number of questions that are currently being considered by different communities, including building recommendation systems and collaborative filtering in the sparse data regime, community detection in sparse random graphs, learning structured models such as topic models or hidden Markov models, and the efforts from the natural language processing community to compute "word embeddings". Many aspects of this problem---both in terms of learning and property testing/estimation and on both the algorithmic and information theoretic sides---remain open. Our results apply to the setting where B has a low rank structure. For this setting, we propose an efficient (and practically viable) algorithm that accurately recovers the underlying M by M matrix using O(M) samples} (where we assume the rank is a constant).more »This linear sample complexity is optimal, up to constant factors, in an extremely strong sense: even testing basic properties of the underlying matrix (such as whether it has rank 1 or 2) requires Omega(M) samples. Additionally, we provide an even stronger lower bound showing that distinguishing whether a sequence of observations were drawn from the uniform distribution over M observations versus being generated by a well-conditioned Hidden Markov Model with two hidden states requires Omega(M) observations, while our positive results for recovering B immediately imply that Omega(M) observations suffice to learn such an HMM. This lower bound precludes sublinear-sample hypothesis tests for basic properties, such as identity or uniformity, as well as sublinear sample estimators for quantities such as the entropy rate of HMMs.« less