Recent research shows that the dynamics of an infinitely wide neural network (NN) trained by gradient descent can be characterized by Neural Tangent Kernel (NTK) [27]. Under the squared loss, the infinite-width NN trained by gradient descent with an infinitely small learning rate is equivalent to kernel regression with NTK [4]. However, the equivalence is only known for ridge regression currently [6], while the equivalence between NN and other kernel machines (KMs), e.g. support vector machine (SVM), remains unknown. Therefore, in this work, we propose to establish the equivalence between NN and SVM, and specifically, the infinitely wide NN trained by soft margin loss and the standard soft margin SVM with NTK trained by subgradient descent. Our main theoretical results include establishing the equivalence between NN and a broad family of L2 regularized KMs with finite width bounds, which cannot be handled by prior work, and showing that every finite-width NN trained by such regularized loss functions is approximately a KM. Furthermore, we demonstrate our theory can enable three practical applications, including (i) non-vacuous generalization bound of NN via the corresponding KM; (ii) nontrivial robustness certificate for the infinite-width NN (while existing robustness verification methods would provide vacuous bounds); (iii) intrinsically more robust infinite-width NNs than those from previous kernel regression.
more »
« less
Metric flows with neural networks
Abstract We develop a general theory of flows in the space of Riemannian metrics induced by neural network (NN) gradient descent. This is motivated in part by recent advances in approximating Calabi–Yau metrics with NNs and is enabled by recent advances in understanding flows in the space of NNs. We derive the corresponding metric flow equations, which are governed by a metric neural tangent kernel (NTK), a complicated, non-local object that evolves in time. However, many architectures admit an infinite-width limit in which the kernel becomes fixed and the dynamics simplify. Additional assumptions can induce locality in the flow, which allows for the realization of Perelman’s formulation of Ricci flow that was used to resolve the 3d Poincaré conjecture. We demonstrate that such fixed kernel regimes lead to poor learning of numerical Calabi–Yau metrics, as is expected since the associated NNs do not learn features. Conversely, we demonstrate that well-learned numerical metrics at finite-width exhibit an evolving metric-NTK, associated with feature learning. Our theory of NN metric flows therefore explains why NNs are better at learning Calabi–Yau metrics than fixed kernel methods, such as the Ricci flow.
more »
« less
- PAR ID:
- 10555692
- Publisher / Repository:
- IOP Publishing
- Date Published:
- Journal Name:
- Machine Learning: Science and Technology
- Volume:
- 5
- Issue:
- 4
- ISSN:
- 2632-2153
- Page Range / eLocation ID:
- 045020
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
We study the collapsing of Calabi–Yau metrics and of Kähler–Ricci flows on fiber spaces where the base is smooth.We identify the collapsed Gromov–Hausdorff limit of the Kähler–Ricci flow when the divisorial part of the discriminant locus has simple normal crossings.In either setting, we also obtain an explicit bound for the real codimension-2 Hausdorff measure of the Cheeger–Colding singular set and identify a sufficient condition from birational geometry to understand the metric behavior of the limiting metric on the base.more » « less
-
null (Ed.)We study the degenerations of asymptotically conical Ricci-flat Kahler metrics as the Kahler class degenerates to a semi-positive class. We show that under appropriate assumptions, the Ricci-flat Kahler metrics converge to a incomplete smooth Ricci-flat Kahler metric away from a compact subvariety. As a consequence, we construct singular Calabi–Yau metrics with asymptotically conical behaviour at infinity on certain quasi-projective varieties and we show that the metric geometry of these singular metrics are homeomorphic to the topology of the singular variety. Finally, we will apply our results to study several classes of examples of geometric transitions between Calabi–Yau manifolds.more » « less
-
Abstract We analyze feature learning in infinite-width neural networks trained with gradient flow through a self-consistent dynamical field theory. We construct a collection of deterministic dynamical order parameters which are inner-product kernels for hidden unit activations and gradients in each layer at pairs of time points, providing a reduced description of network activity through training. These kernel order parameters collectively define the hidden layer activation distribution, the evolution of the neural tangent kernel (NTK), and consequently, output predictions. We show that the field theory derivation recovers the recursive stochastic process of infinite-width feature learning networks obtained by Yang and Hu with tensor programs. For deep linear networks, these kernels satisfy a set of algebraic matrix equations. For nonlinear networks, we provide an alternating sampling procedure to self-consistently solve for the kernel order parameters. We provide comparisons of the self-consistent solution to various approximation schemes including the static NTK approximation, gradient independence assumption, and leading order perturbation theory, showing that each of these approximations can break down in regimes where general self-consistent solutions still provide an accurate description. Lastly, we provide experiments in more realistic settings which demonstrate that the loss and kernel dynamics of convolutional neural networks at fixed feature learning strength are preserved across different widths on a image classification task.more » « less
-
A Riemannian cone (C,gC) is by definition a warped product C=R+×L with metric gC=dr2⊕r2gL, where (L,gL) is a compact Riemannian manifold without boundary. We say that C is a Calabi-Yau cone if gC is a Ricci-flat Kähler metric and if C admits a gC-parallel holomorphic volume form; this is equivalent to the cross-section (L,gL) being a Sasaki-Einstein manifold. In this paper, we give a complete classification of all smooth complete Calabi-Yau manifolds asymptotic to some given Calabi-Yau cone at a polynomial rate at infinity. As a special case, this includes a proof of Kronheimer's classification of ALE hyper-Kähler 4-manifolds without twistor theory.more » « less
An official website of the United States government

