Search for: All records

Creators/Authors contains: "Gunasekar, Suriya"

« Prev Next »

Total Resources

9

Resource Type
Conference Paper

7

Conference Proceeding

0

Dataset

0

Journal Article

2

Workshop Report

0

Availability
Full Text / Resource Available

9

Citation Only

0

Save Results
Excel (limit 2000)
CSV (limit 5000)
XML (limit 5000)

Have feedback or suggestions for a way to improve these results?
!

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Mirrorless Mirror Descent: A Natural Derivation of Mirror Descent

Gunasekar, Suriya ; Woodworth, Blake ; Srebro, Nathan ( January 2021 , Proceedings of Machine Learning Research)
null (Ed.)
We present a direct (primal only) derivation of Mirror Descent as a “partial” discretization of gradient flow on a Riemannian manifold where the metric tensor is the Hessian of the Mirror Descent potential function. We contrast this discretization to Natural Gradient Descent, which is obtained by a “full” forward Euler discretization. This view helps shed light on the relationship between the methods and allows generalizing Mirror Descent to any Riemannian geometry in Rd, even when the metric tensor is not a Hessian, and thus there is no “dual.”
more » « less
Full Text Available
Implicit Bias in Deep Linear Classification: Initialization Scale vs Training Accuracy

Moroshko, Edward ; Gunasekar, Suriya ; Woodworth, Blake ; Lee, Jason D ; Srebro, Nathan ; Soudry, Daniel ( July 2020 , Advances in neural information processing systems)
null (Ed.)
We provide a detailed asymptotic study of gradient flow trajectories and their implicit optimization bias when minimizing the exponential loss over "diagonal linear networks". This is the simplest model displaying a transition between "kernel" and non-kernel ("rich" or "active") regimes. We show how the transition is controlled by the relationship between the initialization scale and how accurately we minimize the training loss. Our results indicate that some limit behaviors of gradient descent only kick in at ridiculous training accuracies (well beyond 10−100). Moreover, the implicit bias at reasonable initialization scales and training accuracies is more complex and not captured by these limits.
more » « less
Full Text Available
Kernel and Rich Regimes in Overparametrized Models

Woodworth, Blake ; Gunasekar, Suriya ; Lee, Jason D ; Moroshko, Edward ; Savarese, Pedro ; Golan, Itay ; Soudry, Daniel ; Srebro, Nathan ( February 2020 , Conference on Learning Theory (COLT))

A recent line of work studies overparametrized neural networks in the “kernel regime,” i.e. when during training the network behaves as a kernelized linear predictor, and thus, training with gradient descent has the effect of finding the corresponding minimum RKHS norm solution. This stands in contrast to other studies which demonstrate how gradient descent on overparametrized networks can induce rich implicit biases that are not RKHS norms. Building on an observation by \citet{chizat2018note}, we show how the \textbf{\textit{scale of the initialization}} controls the transition between the “kernel” (aka lazy) and “rich” (aka active) regimes and affects generalization properties in multilayer homogeneous models. We provide a complete and detailed analysis for a family of simple depth-D linear networks that exhibit an interesting and meaningful transition between the kernel and rich regimes, and highlight an interesting role for the \emph{width} of the models. We further demonstrate this transition empirically for matrix factorization and multilayer non-linear networks.
more » « less
Full Text Available
Kernel and Rich Regimes in Overparametrized Models

Woodworth, Blake ; Gunasekar, Suriya ; Lee, Jason D. ; Moroshko, Edward ; Savarese, Pedro ; Golan, Itay ; Soudry, Daniel ; Srebro, Nati ( January 2020 , COLT 2020)

Full Text Available
Lexicographic and Depth-Sensitive Margins in Homogeneous and Non-Homogeneous Deep Models

Nacson, MS ; Gunasekar, Suriya ; Lee, Jason ; Srebro, Nathan ; Soudry, Daniel ( January 2019 , In International Conference on Machine Learning)

Full Text Available
Convergence of Gradient Descent on Separable Data

Nacson, MS ; Lee, Jason ; Gunasekar, Suriya ; Savarese, Pedro ; Srebro, Nathan ; Soudry, Daniel ( January 2019 , The 22nd International Conference on Artificial Intelligence and Statistics)

Full Text Available
Characterizing Implicit Bias in Terms of Optimization Geometry

Gunasekar, Suriya ; Lee, Jason ; Soudry, Daniel ; Srebro, Nathan ( January 2018 , In International Conference on Machine Learning)

Full Text Available
Implicit bias of gradient descent on linear convolutional networks

Gunasekar, Suriya ; Lee, Jason ; Soudry, Daniel ; Srebro, Nathan ( January 2018 , In Advances in Neural Information Processing Systems)

Full Text Available
Implicit Regularization in Matrix Factorization

Gunasekar, Suriya ; Woodworth, Blake ; Bhojanapalli, Srinadh ; Neyshabur, Behnam ; Srebro, Nathan ( May 2017 , arXiv.org)

We study implicit regularization when optimizing an underdetermined quadratic objective over a matrix X with gradient descent on a factorization of X. We conjecture and provide empirical and theoretical evidence that with small enough step sizes and initialization close enough to the origin, gradient descent on a full dimensional factorization converges to the minimum nuclear norm solution.
more » « less
Full Text Available