Abstract When the dimension of data is comparable to or larger than the number of data samples, principal components analysis (PCA) may exhibit problematic high-dimensional noise. In this work, we propose an empirical Bayes PCA method that reduces this noise by estimating a joint prior distribution for the principal components. EB-PCA is based on the classical Kiefer–Wolfowitz non-parametric maximum likelihood estimator for empirical Bayes estimation, distributional results derived from random matrix theory for the sample PCs and iterative refinement using an approximate message passing (AMP) algorithm. In theoretical ‘spiked’ models, EB-PCA achieves Bayes-optimal estimation accuracy in the same settings as an oracle Bayes AMP procedure that knows the true priors. Empirically, EB-PCA significantly improves over PCA when there is strong prior structure, both in simulation and on quantitative benchmarks constructed from the 1000 Genomes Project and the International HapMap Project. An illustration is presented for analysis of gene expression data obtained by single-cell RNA-seq.
more »
« less
Approximate message passing from random initialization with applications to Z 2 synchronization
This paper is concerned with the problem of reconstructing an unknown rank-one matrix with prior structural information from noisy observations. While computing the Bayes optimal estimator is intractable in general due to the requirement of computing high-dimensional integrations/summations, Approximate Message Passing (AMP) emerges as an efficient first-order method to approximate the Bayes optimal estimator. However, the theoretical underpinnings of AMP remain largely unavailable when it starts from random initialization, a scheme of critical practical utility. Focusing on a prototypical model called Z 2 synchronization, we characterize the finite-sample dynamics of AMP from random initialization, uncovering its rapid global convergence. Our theory—which is nonasymptotic in nature—in this model unveils the non-necessity of a careful initialization for the success of AMP.
more »
« less
- PAR ID:
- 10434808
- Date Published:
- Journal Name:
- Proceedings of the National Academy of Sciences
- Volume:
- 120
- Issue:
- 31
- ISSN:
- 0027-8424
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Abstract We study a class of Approximate Message Passing (AMP) algorithms for symmetric and rectangular spiked random matrix models with orthogonally invariant noise. The AMP iterates have fixed dimension $$K \geq 1$$, a multivariate non-linearity is applied in each AMP iteration, and the algorithm is spectrally initialized with $$K$$ super-critical sample eigenvectors. We derive the forms of the Onsager debiasing coefficients and corresponding AMP state evolution, which depend on the free cumulants of the noise spectral distribution. This extends previous results for such models with $K=1$ and an independent initialization. Applying this approach to Bayesian principal components analysis, we introduce a Bayes-OAMP algorithm that uses as its non-linearity the posterior mean conditional on all preceding AMP iterates. We describe a practical implementation of this algorithm, where all debiasing and state evolution parameters are estimated from the observed data, and we illustrate the accuracy and stability of this approach in simulations.more » « less
-
Abstract We propose a new procedure for inference on optimal treatment regimes in the model‐free setting, which does not require to specify an outcome regression model. Existing model‐free estimators for optimal treatment regimes are usually not suitable for the purpose of inference, because they either have nonstandard asymptotic distributions or do not necessarily guarantee consistent estimation of the parameter indexing the Bayes rule due to the use of surrogate loss. We first study a smoothed robust estimator that directly targets the parameter corresponding to the Bayes decision rule for optimal treatment regimes estimation. This estimator is shown to have an asymptotic normal distribution. Furthermore, we verify that a resampling procedure provides asymptotically accurate inference for both the parameter indexing the optimal treatment regime and the optimal value function. A new algorithm is developed to calculate the proposed estimator with substantially improved speed and stability. Numerical results demonstrate the satisfactory performance of the new methods.more » « less
-
We study the convergence properties of the Expectation-Maximization algorithm in the Naive Bayes model. We show that EM can get stuck in regions of slow convergence, even when the features are binary and i.i.d. conditioning on the class label, and even under random (i.e. non worst-case) initialization. In turn, we show that EM can be bootstrapped in a pre-training step that computes a good initialization. From this initialization we show theoretically and experimentally that EM converges exponentially fast to the true model parameters. Our bootstrapping method amounts to running the EM algorithm on appropriately centered iterates of small magnitude, which as we show corresponds to effectively performing power iteration on the covariance matrix of the mixture model, although power iteration is performed under the hood by EM itself. As such, we call our bootstrapping approach “power EM.” Specifically for the case of two binary features, we show global exponentially fast convergence of EM, even without bootstrapping. Finally, as the Naive Bayes model is quite expressive, we show as corollaries of our convergence results that the EM algorithm globally converges to the true model parameters for mixtures of two Gaussians, recovering recent results of Xu et al.’2016 and Daskalakis et al. 2017.more » « less
-
In this paper, we study the role of initialization in Low Rank Adaptation (LoRA) as originally introduced in Hu et al. [19]. Essentially, to start from the pretrained model as initialization for finetuning, one can either initialize B to zero and A to random (default initialization in PEFT package), or vice-versa. In both cases, the product BA is equal to zero at initialization, which makes finetuning starts from the pretrained model. These two initialization schemes are seemingly sim- ilar. They should in-principle yield the same performance and share the same optimal learning rate. We demonstrate that this is an incorrect intuition and that the first scheme (initializing B to zero and A to random) on average yields better performance compared to the other scheme. Our theoretical analysis shows that the reason behind this might be that the first initialization allows the use of larger learning rates (without causing output instability) compared to the second initial- ization, resulting in more efficient learning of the first scheme. We validate our results with extensive experiments on LLMs.more » « less
An official website of the United States government

