It is currently known how to characterize functions that neural networks can learn with SGD for two extremal parametrizations: neural networks in the linear regime, and neural networks with no structural constraints. However, for the main parametrization of interest —non-linear but regular networks— no tight characterization has yet been achieved, despite significant developments. We take a step in this direction by considering depth-2 neural networks trained by SGD in the mean-field regime. We consider functions on binary inputs that depend on a latent low-dimensional subspace (i.e., small number of coordinates). This regime is of interest since it is poorly under- stood how neural networks routinely tackle high-dimensional datasets and adapt to latent low- dimensional structure without suffering from the curse of dimensionality. Accordingly, we study SGD-learnability with O(d) sample complexity in a large ambient dimension d. Our main results characterize a hierarchical property —the merged-staircase property— that is both necessary and nearly sufficient for learning in this setting. We further show that non-linear training is necessary: for this class of functions, linear methods on any feature map (e.g., the NTK) are not capable of learning efficiently. The key tools are a new “dimension-free” dynamics approximation result that applies to functions defined on a latent space of low-dimension, a proof of global convergence based on polynomial identity testing, and an improvement of lower bounds against linear methods for non-almost orthogonal functions. 
                        more » 
                        « less   
                    
                            
                            Smoothing the Landscape Boosts the Signal for SGD: Optimal Sample Complexity for Learning Single Index Models
                        
                    
    
            We focus on the task of learning a single index model with respect to the isotropic Gaussian distribution in d dimensions. Prior work has shown that the sample complexity of learning the hidden direction is governed by the information exponent k of the link function. Ben Arous et al. showed that d^k samples suffice for learning and that this is tight for online SGD. However, the CSQ lowerbound for gradient based methods only shows that d^{k/2} samples are necessary. In this work, we close the gap between the upper and lower bounds by showing that online SGD on a smoothed loss learns the hidden direction with the correct number of samples. We also draw connections to statistical analyses of tensor PCA and to the implicit regularization effects of minibatch SGD on empirical losses. 
        more » 
        « less   
        
    
                            - Award ID(s):
- 1845171
- PAR ID:
- 10530894
- Publisher / Repository:
- NeurIPS 2023
- Date Published:
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
- 
            
- 
            Few neural architectures lend themselves to provable learning with gradient based methods. One popular model is the single-index model, in which labels are produced by composing an unknown linear projection with a possibly unknown scalar link function. Learning this model with SGD is relatively well-understood, whereby the so-called information exponent of the link function governs a polynomial sample complexity rate. However, extending this analysis to deeper or more complicated architectures remains challenging. In this work, we consider single index learning in the setting of symmetric neural net- works. Under analytic assumptions on the activation and maximum degree assumptions on the link function, we prove that gradient flow recovers the hidden planted direction, represented as a finitely supported vector in the feature space of power sum polynomials. We characterize a notion of information exponent adapted to our setting that controls the efficiency of learning.more » « less
- 
            null (Ed.)Stochastic gradient descent (SGD) and its variants have established themselves as the go-to algorithms for large-scale machine learning problems with independent samples due to their generalization performance and intrinsic computational advantage. However, the fact that the stochastic gradient is a biased estimator of the full gradient with correlated samples has led to the lack of theoretical understanding of how SGD behaves under correlated settings and hindered its use in such cases. In this paper, we focus on the Gaussian process (GP) and take a step forward towards breaking the barrier by proving minibatch SGD converges to a critical point of the full loss function, and recovers model hyperparameters with rate O(1/K) up to a statistical error term depending on the minibatch size. Numerical studies on both simulated and real datasets demonstrate that minibatch SGD has better generalization over state-of-the-art GP methods while reducing the computational burden and opening a new, previously unexplored, data size regime for GPs.more » « less
- 
            The matrix completion problem seeks to recover a $$d\times d$$ ground truth matrix of low rank $$r\ll d$$ from observations of its individual elements. Real-world matrix completion is often a huge-scale optimization problem, with $$d$$ so large that even the simplest full-dimension vector operations with $O(d)$ time complexity become prohibitively expensive. Stochastic gradient descent (SGD) is one of the few algorithms capable of solving matrix completion on a huge scale, and can also naturally handle streaming data over an evolving ground truth. Unfortunately, SGD experiences a dramatic slow-down when the underlying ground truth is ill-conditioned; it requires at least $$O(\kappa\log(1/\epsilon))$$ iterations to get $$\epsilon$$-close to ground truth matrix with condition number $$\kappa$$. In this paper, we propose a preconditioned version of SGD that preserves all the favorable practical qualities of SGD for huge-scale online optimization while also making it agnostic to $$\kappa$$. For a symmetric ground truth and the Root Mean Square Error (RMSE) loss, we prove that the preconditioned SGD converges to $$\epsilon$$-accuracy in $$O(\log(1/\epsilon))$$ iterations, with a rapid linear convergence rate as if the ground truth were perfectly conditioned with $$\kappa=1$$. In our numerical experiments, we observe a similar acceleration for ill-conditioned matrix completion under the 1-bit cross-entropy loss, as well as pairwise losses such as the Bayesian Personalized Ranking (BPR) loss.more » « less
- 
            Abstract Supra‐permafrost submarine groundwater discharge (SGD) in the Arctic is potentially important for coastal biogeochemistry and will likely increase over the coming decades owing to climate change. Despite this, land‐to‐ocean material fluxes via SGD in Arctic environments have seldom been quantified. This study used radium (Ra) isotopes to quantify SGD fluxes to an Arctic coastal lagoon (Simpson Lagoon, Alaska) during five sampling periods between 2021 and 2023. Using a Ra mass balance model, we found that the SGD water flux was substantial and dependent on environmental conditions. No measurable SGD was detected during the spring sampling period (June 2022), when the lagoon was partially ice‐covered. During ice‐free periods, the main driver of SGD in this location is wind‐driven lagoon water level changes, not tides, which control surface water recirculation through sediments along the lagoon boundary. A combination of wind strength and direction led to low SGD fluxes in July 2022, with an SGD flux of (6 ± 3) × 106 m3 d−1, moderate fluxes in August 2021 and July 2023, which had an average flux of (17 ± 9) × 106 m3 d−1, and high fluxes in October 2022, at (79 ± 16) × 106 m3 d−1. This work demonstrates how soil and environmental conditions in the Arctic impact Ra mobilization, laying a foundation for future SGD studies in the Arctic and shedding light on the major processes driving Ra fluxes in this important environment.more » « less
 An official website of the United States government
An official website of the United States government 
				
			 
					 
					
 
                                    