Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher.
Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?
Some links on this page may take you to non-federal websites. Their policies may differ from this site.
-
We develop a unified approach to bounding the largest and smallest singular values of an inhomogeneous random rectangular matrix, based on the non-backtracking operator and the Ihara-Bass formula for general random Hermitian matrices with a bipartite block structure. We obtain probabilistic upper (respectively, lower) bounds for the largest (respectively, smallest) singular values of a large rectangular random matrix X. These bounds are given in terms of the maximal and minimal 2-norms of the rows and columns of the variance profile of X. The proofs involve finding probabilistic upper bounds on the spectral radius of an associated non-backtracking matrix B. The two-sided bounds can be applied to the centered adjacency matrix of sparse inhomogeneous Erd˝os-Rényi bipartite graphs for a wide range of sparsity, down to criticality. In particular, for Erd˝os-Rényi bipartite graphs G(n,m, p) with p = ω(log n)/n, and m/n→ y ∈ (0,1), our sharp bounds imply that there are no outliers outside the support of the Marˇcenko-Pastur law almost surely. This result extends the Bai-Yin theorem to sparse rectangular random matrices.more » « lessFree, publicly-accessible full text available November 1, 2025
-
Oh, A; Naumann, T; Globerson, A; Saenko, K; Hardt, M; Levine, S (Ed.)We consider the problem of learning a single-index target function f∗ : Rd → R under the spiked covariance data: f∗(x) = σ∗ √ 1 1+θ ⟨x,μ⟩ , x ∼ N(0, Id + θμμ⊤), θ ≍ dβ for β ∈ [0, 1), where the link function σ∗ : R → R is a degree-p polynomial with information exponent k (defined as the lowest degree in the Hermite expansion of σ∗), and it depends on the projection of input x onto the spike (signal) direction μ ∈ Rd. In the proportional asymptotic limit where the number of training examples n and the dimensionality d jointly diverge: n, d → ∞, n/d → ψ ∈ (0,∞), we ask the following question: how large should the spike magnitude θ be, in order for (i) kernel methods, (ii) neural networks optimized by gradient descent, to learn f∗? We show that for kernel ridge regression, β ≥ 1 − 1 p is both sufficient and necessary. Whereas for two-layer neural networks trained with gradient descent, β > 1 − 1 k suffices. Our results demonstrate that both kernel methods and neural networks benefit from low-dimensional structures in the data. Further, since k ≤ p by definition, neural networks can adapt to such structures more effectively.more » « less
-
Ruiz, Francisco; Dy, Jennifer; van_de_Meent, Jan-Willem (Ed.)We investigate the properties of random feature ridge regression (RFRR) given by a two-layer neural network with random Gaussian initialization. We study the non-asymptotic behaviors of the RFRR with nearly orthogonal deterministic unit-length input data vectors in the overparameterized regime, where the width of the first layer is much larger than the sample size. Our analysis shows high-probability non-asymptotic concentration results for the training errors, cross-validations, and generalization errors of RFRR centered around their respective values for a kernel ridge regression (KRR). This KRR is derived from an expected kernel generated by a nonlinear random feature map. We then approximate the performance of the KRR by a polynomial kernel matrix obtained from the Hermite polynomial expansion of the activation function, whose degree only depends on the orthogonality among different data points. This polynomial kernel determines the asymptotic behavior of the RFRR and the KRR. Our results hold for a wide variety of activation functions and input data sets that exhibit nearly orthogonal properties. Based on these approximations, we obtain a lower bound for the generalization error of the RFRR for a nonlinear student-teacher model.more » « less
An official website of the United States government

Full Text Available