Principal component analysis (PCA) is a key tool in the field of data dimensionality reduction that is useful
for various data science problems. However, many applications involve heterogeneous data that varies in quality due to noise characteristics associated with different sources of the data. Methods that deal with this mixed dataset are known as heteroscedastic methods. Current methods like HePPCAT make Gaussian assumptions of the basis coefficients that may not hold in practice. Other methods such as Weighted PCA (WPCA) assume the noise variances are known, which may be difficult to know in practice. This paper develops a PCA method that can estimate the samplewise noise variances and use this information in the model to improve the estimate of the subspace basis associated with the lowrank structure of the data. This is done without distributional assumptions of the lowrank component and without assuming the noise variances are known. Simulations show the effectiveness of accounting for such heteroscedasticity in the data, the benefits of using such a method with all of the data versus retaining only good data, and comparisons are made against other PCA methods established in the literature like PCA, Robust PCA (RPCA), and HePPCAT. Code available at https://github.com/javiersc1/ALPCAH.
more »
« less
Fundamental limits for rankone matrix estimation with groupwise heteroskedasticity
Lowrank matrix recovery problems involving highdimensional and heterogeneous data appear in applications throughout statistics and machine learning. The contribution of this paper is to establish the fundamental limits of recovery for a broad class of these problems. In particular, we study the problem of estimating a rankone matrix from Gaussian observations where different blocks of the matrix are observed under different noise levels. In the setting where the number of blocks is fixed while the number of variables tends to infinity, we prove asymptotically exact formulas for the minimum meansquared error in estimating both the matrix and underlying factors. These results are based on a novel reduction from the lowrank matrix tensor product model (with homogeneous noise) to a rankone model with heteroskedastic noise. As an application of our main result, we show that show recently proposed methods based on applying principal component analysis (PCA) to weighted combinations of the data are optimal in some settings but suboptimal in others. We also provide numerical results comparing our asymptotic formulas with the performance of methods based weighted PCA, gradient descent, and approximate message passing.
more »
« less
 Award ID(s):
 1750362
 NSFPAR ID:
 10413790
 Date Published:
 Journal Name:
 International Conference on Artificial Intelligence and Statistics
 Format(s):
 Medium: X
 Sponsoring Org:
 National Science Foundation
More Like this


Lowrank matrix recovery is a fundamental problem in machine learning with numerous applications. In practice, the problem can be solved by convex optimization namely nuclear norm minimization, or by nonconvex optimization as it is wellknown that for lowrank matrix problems like matrix sensing and matrix completion, all local optima of the natural nonconvex objectives are also globally optimal under certain ideal assumptions. In this paper, we study new approaches for matrix sensing in a semirandom model where an adversary can add any number of arbitrary sensing matrices. More precisely, the problem is to recover a lowrank matrix $X^\star$ from linear measurements $b_i = \langle A_i, X^\star \rangle$, where an unknown subset of the sensing matrices satisfies the Restricted Isometry Property (RIP) and the rest of the $A_i$'s are chosen adversarially. It is known that in the semirandom model, existing nonconvex objectives can have bad local optima. To fix this, we present a descentstyle algorithm that provably recovers the groundtruth matrix $X^\star$. For the closelyrelated problem of semirandom matrix completion, prior work [CG18] showed that all bad local optima can be eliminated by reweighting the input data. However, the analogous approach for matrix sensing requires reweighting a set of matrices to satisfy RIP, which is a condition that is NPhard to check. Instead, we build on the framework proposed in [KLL$^+$23] for semirandom sparse linear regression, where the algorithm in each iteration reweights the input based on the current solution, and then takes a weighted gradient step that is guaranteed to work well locally. Our analysis crucially exploits the connection between sparsity in vector problems and lowrankness in matrix problems, which may have other applications in obtaining robust algorithms for sparse and lowrank problems.more » « less

Recently, there has been significant progress in understanding the convergence and generalization properties of gradientbased methods for training overparameterized learning models. However, many aspects including the role of small random initialization and how the various parameters of the model are coupled during gradientbased updates to facilitate good generalization, remain largely mysterious. A series of recent papers have begun to study this role for nonconvex formulations of symmetric Positive SemiDefinite (PSD) matrix sensing problems which involve reconstructing a lowrank PSD matrix from a few linear measurements. The underlying symmetry/PSDness is crucial to existing convergence and generalization guarantees for this problem. In this paper, we study a general overparameterized lowrank matrix sensing problem where one wishes to reconstruct an asymmetric rectangular lowrank matrix from a few linear measurements. We prove that an overparameterized model trained via factorized gradient descent converges to the lowrank matrix generating the measurements. We show that in this setting, factorized gradient descent enjoys two implicit properties: (1) coupling of the trajectory of gradient descent where the factors are coupled in various ways throughout the gradient update trajectory and (2) an algorithmic regularization property where the iterates show a propensity towards lowrank models despite the overparameterized nature of the factorized model. These two implicit properties in turn allow us to show that the gradient descent trajectory from small random initialization moves towards solutions that are both globally optimal and generalize well.more » « less

We study the problem of estimating a large, lowrank matrix corrupted by additive noise of unknown covariance, assuming one has access to additional side information in the form of noiseonly measurements. We study the WhitenShrinkreColour (WSC) workflow, where a ‘noise covariance whitening’ transformation is applied to the observations, followed by appropriate singular value shrinkage and a ‘noise covariance recolouring’ transformation. We show that under the mean square error loss, a unique, asymptotically optimal shrinkage nonlinearity exists for the WSC denoising workflow, and calculate it in closed form. To this end, we calculate the asymptotic eigenvector rotation of the random spiked Fmatrix ensemble, a result which may be of independent interest. With sufficiently many purenoise measurements, our optimally tuned WSC denoising workflow outperforms, in mean square error, matrix denoising algorithms based on optimal singular value shrinkage that do not make similar use of noiseonly side information; numerical experiments show that our procedure’s relative performance is particularly strong in challenging statistical settings with high dimensionality and large degree of heteroscedasticity.more » « less

Summary Motivated by the problem of estimating bacterial growth rates for genome assemblies from shotgun metagenomic data, we consider the permuted monotone matrix model $Y=\Theta\Pi+Z$ where $Y\in \mathbb{R}^{n\times p}$ is observed, $\Theta\in \mathbb{R}^{n\times p}$ is an unknown approximately rankone signal matrix with monotone rows, $\Pi \in \mathbb{R}^{p\times p}$ is an unknown permutation matrix, and $Z\in \mathbb{R}^{n\times p}$ is the noise matrix. In this article we study estimation of the extreme values associated with the signal matrix $\Theta$, including its first and last columns and their difference. Treating these estimation problems as compound decision problems, minimax rateoptimal estimators are constructed using the spectral columnsorting method. Numerical experiments on simulated and synthetic microbiome metagenomic data are conducted, demonstrating the superiority of the proposed methods over existing alternatives. The methods are illustrated by comparing the growth rates of gut bacteria in inflammatory bowel disease patients and control subjects.more » « less