Lowrank matrix recovery is a fundamental problem in machine learning with numerous applications. In practice, the problem can be solved by convex optimization namely nuclear norm minimization, or by nonconvex optimization as it is wellknown that for lowrank matrix problems like matrix sensing and matrix completion, all local optima of the natural nonconvex objectives are also globally optimal under certain ideal assumptions.
In this paper, we study new approaches for matrix sensing in a semirandom model where an adversary can add any number of arbitrary sensing matrices. More precisely, the problem is to recover a lowrank matrix $X^\star$ from linear measurements $b_i = \langle A_i, X^\star \rangle$, where an unknown subset of the sensing matrices satisfies the Restricted Isometry Property (RIP) and the rest of the $A_i$'s are chosen adversarially.
It is known that in the semirandom model, existing nonconvex objectives can have bad local optima. To fix this, we present a descentstyle algorithm that provably recovers the groundtruth matrix $X^\star$. For the closelyrelated problem of semirandom matrix completion, prior work [CG18] showed that all bad local optima can be eliminated by reweighting the input data. However, the analogous approach for matrix sensing requires reweighting a set of matrices to satisfy RIP, which is a condition that is NPhard to check. Instead, we build on the framework proposed in [KLL$^+$23] for semirandom sparse linear regression, where the algorithm in each iteration reweights the input based on the current solution, and then takes a weighted gradient step that is guaranteed to work well locally. Our analysis crucially exploits the connection between sparsity in vector problems and lowrankness in matrix problems, which may have other applications in obtaining robust algorithms for sparse and lowrank problems.
more »
« less
Fundamental limits for rankone matrix estimation with groupwise heteroskedasticity
Lowrank matrix recovery problems involving highdimensional and heterogeneous data appear in applications throughout statistics and machine learning. The contribution of this paper is to establish the fundamental limits of recovery for a broad class of these problems. In particular, we study the problem of estimating a rankone matrix from Gaussian observations where different blocks of the matrix are observed under different noise levels. In the setting where the number of blocks is fixed while the number of variables tends to infinity, we prove asymptotically exact formulas for the minimum meansquared error in estimating both the matrix and underlying factors. These results are based on a novel reduction from the lowrank matrix tensor product model (with homogeneous noise) to a rankone model with heteroskedastic noise. As an application of our main result, we show that show recently proposed methods based on applying principal component analysis (PCA) to weighted combinations of the data are optimal in some settings but suboptimal in others. We also provide numerical results comparing our asymptotic formulas with the performance of methods based weighted PCA, gradient descent, and approximate message passing.
more »
« less
 Award ID(s):
 1750362
 NSFPAR ID:
 10413790
 Date Published:
 Journal Name:
 International Conference on Artificial Intelligence and Statistics
 Format(s):
 Medium: X
 Sponsoring Org:
 National Science Foundation
More Like this


null (Ed.)Abstract One of the classical approaches for estimating the frequencies and damping factors in a spectrally sparse signal is the MUltiple SIgnal Classification (MUSIC) algorithm, which exploits the lowrank structure of an autocorrelation matrix. Lowrank matrices have also received considerable attention recently in the context of optimization algorithms with partial observations, and nuclear norm minimization (NNM) has been widely used as a popular heuristic of rank minimization for lowrank matrix recovery problems. On the other hand, it has been shown that NNM can be viewed as a special case of atomic norm minimization (ANM), which has achieved great success in solving line spectrum estimation problems. However, as far as we know, the general ANM (not NNM) considered in many existing works can only handle frequency estimation in undamped sinusoids. In this work, we aim to fill this gap and deal with damped spectrally sparse signal recovery problems. In particular, inspired by the dual analysis used in ANM, we offer a novel optimizationbased perspective on the classical MUSIC algorithm and propose an algorithm for spectral estimation that involves searching for the peaks of the dual polynomial corresponding to a certain NNM problem, and we show that this algorithm is in fact equivalent to MUSIC itself. Building on this connection, we also extend the classical MUSIC algorithm to the missing data case. We provide exact recovery guarantees for our proposed algorithms and quantify how the sample complexity depends on the true spectral parameters. In particular, we provide a parameterspecific recovery bound for lowrank matrix recovery of jointly sparse signals rather than use certain incoherence properties as in existing literature. Simulation results also indicate that the proposed algorithms significantly outperform some relevant existing methods (e.g., ANM) in frequency estimation of damped exponentials.more » « less

Principal component analysis (PCA) is a key tool in the field of data dimensionality reduction that is useful for various data science problems. However, many applications involve heterogeneous data that varies in quality due to noise characteristics associated with different sources of the data. Methods that deal with this mixed dataset are known as heteroscedastic methods. Current methods like HePPCAT make Gaussian assumptions of the basis coefficients that may not hold in practice. Other methods such as Weighted PCA (WPCA) assume the noise variances are known, which may be difficult to know in practice. This paper develops a PCA method that can estimate the samplewise noise variances and use this information in the model to improve the estimate of the subspace basis associated with the lowrank structure of the data. This is done without distributional assumptions of the lowrank component and without assuming the noise variances are known. Simulations show the effectiveness of accounting for such heteroscedasticity in the data, the benefits of using such a method with all of the data versus retaining only good data, and comparisons are made against other PCA methods established in the literature like PCA, Robust PCA (RPCA), and HePPCAT. Code available at https://github.com/javiersc1/ALPCAH.more » « less

Summary Motivated by the problem of estimating bacterial growth rates for genome assemblies from shotgun metagenomic data, we consider the permuted monotone matrix model $Y=\Theta\Pi+Z$ where $Y\in \mathbb{R}^{n\times p}$ is observed, $\Theta\in \mathbb{R}^{n\times p}$ is an unknown approximately rankone signal matrix with monotone rows, $\Pi \in \mathbb{R}^{p\times p}$ is an unknown permutation matrix, and $Z\in \mathbb{R}^{n\times p}$ is the noise matrix. In this article we study estimation of the extreme values associated with the signal matrix $\Theta$, including its first and last columns and their difference. Treating these estimation problems as compound decision problems, minimax rateoptimal estimators are constructed using the spectral columnsorting method. Numerical experiments on simulated and synthetic microbiome metagenomic data are conducted, demonstrating the superiority of the proposed methods over existing alternatives. The methods are illustrated by comparing the growth rates of gut bacteria in inflammatory bowel disease patients and control subjects.more » « less

There has been a flurry of recent literature studying streaming algorithms for which the input stream is chosen adaptively by a blackbox adversary who observes the output of the streaming algorithm at each time step. However, these algorithms fail when the adversary has access to the internal state of the algorithm, rather than just the output of the algorithm. We study streaming algorithms in the whitebox adversarial model, where the stream is chosen adaptively by an adversary who observes the entire internal state of the algorithm at each time step. We show that nontrivial algorithms are still possible. We first give a randomized algorithm for the L1heavy hitters problem that outperforms the optimal deterministic MisraGries algorithm on long streams. If the whitebox adversary is computationally bounded, we use cryptographic techniques to reduce the memory of our L1heavy hitters algorithm even further and to design a number of additional algorithms for graph, string, and linear algebra problems. The existence of such algorithms is surprising, as the streaming algorithm does not even have a secret key in this model, i.e., its state is entirely known to the adversary. One algorithm we design is for estimating the number of distinct elements in a stream with insertions and deletions achieving a multiplicative approximation and sublinear space; such an algorithm is impossible for deterministic algorithms. We also give a general technique that translates any twoplayer deterministic communication lower bound to a lower bound for randomized algorithms robust to a whitebox adversary. In particular, our results show that for all p ≥ 0, there exists a constant Cp > 1 such that any Cpapproximation algorithm for Fp moment estimation in insertiononly streams with a whitebox adversary requires Ω(n) space for a universe of size n. Similarly, there is a constant C > 1 such that any Capproximation algorithm in an insertiononly stream for matrix rank requires Ω(n) space with a whitebox adversary. These results do not contradict our upper bounds since they assume the adversary has unbounded computational power. Our algorithmic results based on cryptography thus show a separation between computationally bounded and unbounded adversaries. Finally, we prove a lower bound of Ω(log n) bits for the fundamental problem of deterministic approximate counting in a stream of 0’s and 1’s, which holds even if we know how many total stream updates we have seen so far at each point in the stream. Such a lower bound for approximate counting with additional information was previously unknown, and in our context, it shows a separation between multiplayer deterministic maximum communication and the whitebox space complexity of a streaming algorithmmore » « less