skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Approximate message passing for orthogonally invariant ensembles: multivariate non-linearities and spectral initialization
Abstract We study a class of Approximate Message Passing (AMP) algorithms for symmetric and rectangular spiked random matrix models with orthogonally invariant noise. The AMP iterates have fixed dimension $$K \geq 1$$, a multivariate non-linearity is applied in each AMP iteration, and the algorithm is spectrally initialized with $$K$$ super-critical sample eigenvectors. We derive the forms of the Onsager debiasing coefficients and corresponding AMP state evolution, which depend on the free cumulants of the noise spectral distribution. This extends previous results for such models with $K=1$ and an independent initialization. Applying this approach to Bayesian principal components analysis, we introduce a Bayes-OAMP algorithm that uses as its non-linearity the posterior mean conditional on all preceding AMP iterates. We describe a practical implementation of this algorithm, where all debiasing and state evolution parameters are estimated from the observed data, and we illustrate the accuracy and stability of this approach in simulations.  more » « less
Award ID(s):
2142476
PAR ID:
10541832
Author(s) / Creator(s):
; ;
Publisher / Repository:
Oxford University Press
Date Published:
Journal Name:
Information and Inference: A Journal of the IMA
Volume:
13
Issue:
3
ISSN:
2049-8772
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract When the dimension of data is comparable to or larger than the number of data samples, principal components analysis (PCA) may exhibit problematic high-dimensional noise. In this work, we propose an empirical Bayes PCA method that reduces this noise by estimating a joint prior distribution for the principal components. EB-PCA is based on the classical Kiefer–Wolfowitz non-parametric maximum likelihood estimator for empirical Bayes estimation, distributional results derived from random matrix theory for the sample PCs and iterative refinement using an approximate message passing (AMP) algorithm. In theoretical ‘spiked’ models, EB-PCA achieves Bayes-optimal estimation accuracy in the same settings as an oracle Bayes AMP procedure that knows the true priors. Empirically, EB-PCA significantly improves over PCA when there is strong prior structure, both in simulation and on quantitative benchmarks constructed from the 1000 Genomes Project and the International HapMap Project. An illustration is presented for analysis of gene expression data obtained by single-cell RNA-seq. 
    more » « less
  2. Abstract This paper studies a statistical learning model where the model coefficients have a pre-determined non-overlapping group sparsity structure. We consider a combination of a loss function and a regularizer to recover the desired group sparsity patterns, which can embrace many existing works. We analyze directional stationary solutions of the proposed formulation, obtaining a sufficient condition for a directional stationary solution to achieve optimality and establishing a bound of the distance from the solution to a reference point. We develop an efficient algorithm that adopts an alternating direction method of multiplier (ADMM), showing that the iterates converge to a directional stationary solution under certain conditions. In the numerical experiment, we implement the algorithm for generalized linear models with convex and nonconvex group regularizers to evaluate the model performance on various data types, noise levels, and sparsity settings. 
    more » « less
  3. In machine learning, stochastic gradient descent (SGD) is widely deployed to train models using highly non-convex objectives with equally complex noise models. Unfortunately, SGD theory often makes restrictive assumptions that fail to capture the non-convexity of real problems, and almost entirely ignore the complex noise models that exist in practice. In this work, we demonstrate the restrictiveness of these assumptions using three canonical models in machine learning. Then, we develop novel theory to address this shortcoming in two ways. First, we establish that SGD’s iterates will either globally converge to a stationary point or diverge under nearly arbitrary nonconvexity and noise models. Under a slightly more restrictive assumption on the joint behavior of the non-convexity and noise model that generalizes current assumptions in the literature, we show that the objective function cannot diverge, even if the iterates diverge. As a consequence of our results, SGD can be applied to a greater range of stochastic optimization problems with confidence about its global convergence behavior and stability. 
    more » « less
  4. ABSTRACT We evaluate the performance of the Lyman α forest weak gravitational lensing estimator of Metcalf et al. on forest data from hydrodynamic simulations and ray-trace simulated lensing potentials. We compare the results to those obtained from the Gaussian random field simulated Lyα forest data and lensing potentials used in previous work. We find that the estimator is able to reconstruct the lensing potentials from the more realistic data and investigate dependence on spectrum signal to noise. The non-linearity and non-Gaussianity in this forest data arising from gravitational instability and hydrodynamics causes a reduction in signal to noise by a factor of ∼2.7 for noise free data and a factor of ∼1.5 for spectra with signal to noise of order unity (comparable to current observational data). Compared to Gaussian field lensing potentials, using ray-traced potentials from N-body simulations incurs a further signal-to-noise reduction of a factor of ∼1.3 at all noise levels. The non-linearity in the forest data is also observed to increase bias in the reconstructed potentials by $$5-25{{\ \rm per\ cent}}$$, and the ray-traced lensing potential further increases the bias by $$20-30{{\ \rm per\ cent}}$$. We demonstrate methods for mitigating these issues including Gaussianization and bias correction which could be used in real observations. 
    more » « less
  5. We study stochastic approximation procedures for approximately solving a $$d$$-dimensional linear fixed point equation based on observing a trajectory of length $$n$$ from an ergodic Markov chain. We first exhibit a non-asymptotic bound of the order $$t_{\mathrm{mix}} \tfrac{d}{n}$$ on the squared error of the last iterate of a standard scheme, where $$t_{\mathrm{mix}}$$ is a mixing time. We then prove a non-asymptotic instance-dependent bound on a suitably averaged sequence of iterates, with a leading term that matches the local asymptotic minimax limit, including sharp dependence on the parameters $$(d, t_{\mathrm{mix}})$$ in the higher order terms. We complement these upper bounds with a non-asymptotic minimax lower bound that establishes the instance-optimality of the averaged SA estimator. We derive corollaries of these results for policy evaluation with Markov noise—covering the TD($$\lambda$$) family of algorithms for all $$\lambda \in [0, 1)$$—and linear autoregressive models. Our instance-dependent characterizations open the door to the design of fine-grained model selection procedures for hyperparameter tuning (e.g., choosing the value of $$\lambda$$ when running the TD($$\lambda$$) algorithm). 
    more » « less