Bounds on the smallest eigenvalue of the neural tangent kernel (NTK) are a key ingredient in the analysis of neural network optimization and memorization. How- ever, existing results require distributional assumptions on the data and are limited to a high-dimensional setting, where the input dimension d0 scales at least log- arithmically in the number of samples n. In this work we remove both of these requirements and instead provide bounds in terms of a measure of distance between data points: notably these bounds hold with high probability even when d0 is held constant versus n. We prove our results through a novel application of the hemisphere transform.
more »
« less
Bounds for the smallest eigenvalue of the NTK for arbitrary spherical data of arbitrary dimension
Bounds on the smallest eigenvalue of the neural tangent kernel (NTK) are a key ingredient in the analysis of neural network optimization and memorization. How- ever, existing results require distributional assumptions on the data and are limited to a high-dimensional setting, where the input dimension d0 scales at least log- arithmically in the number of samples n. In this work we remove both of these requirements and instead provide bounds in terms of a measure of distance between data points: notably these bounds hold with high probability even when d0 is held constant versus n. We prove our results through a novel application of the hemisphere transform.
more »
« less
- PAR ID:
- 10591835
- Publisher / Repository:
- 38th Conference on Neural Information Processing Systems (NeurIPS 2024)
- Date Published:
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
A<sc>bstract</sc> The D0-brane/Banks-Fischler-Shenker-Susskind matrix theory is a strongly coupled quantum system with an interesting gravity dual. We develop a scheme to derive bootstrap bounds on simple correlators in the matrix theory at infiniteNat zero energy by imposing the supercharge equations of motion. By exploiting SO(9) symmetry, we are able to consider single-trace operators involving words of length up to 9 using very modest computational resources. We interpret our initial results as strong evidence that the bootstrap method can efficiently access physics in the strongly coupled, infiniteNregime.more » « less
-
Resonant two-photon ionization (R2PI) spectroscopy has been used to measure the bond dissociation energies (BDEs) of the diatomic transition metal nitrides ScN, TiN, YN, MoN, RuN, RhN, HfN, OsN, and IrN. Of these, the BDEs of only TiN and HfN had been previously measured. Due to the many ways electrons can be distributed among the d orbitals, these molecules possess an extremely high density of electronic states near the ground separated atom limit. Spin–orbit and nonadiabatic interactions couple these states quite effectively, so that the molecules readily find a path to dissociation when excited above the ground separated atom limit. The result is a sharp drop in ion signal in the R2PI spectrum when the molecule is excited above this limit, allowing the BDE to be readily measured. Using this method, the values D0(ScN) = 3.905(29) eV, D0(TiN) = 5.000(19) eV, D0(YN) = 4.125(24) eV, D0(MoN) = 5.220(4) eV, D0(RuN) = 4.905(3) eV, D0(RhN) = 3.659(32) eV, D0(HfN) = 5.374(4) eV, D0(OsN) = 5.732(3) eV, and D0(IrN) = 5.115(4) eV are obtained. To support the experimental findings, ab initio coupled-cluster calculations extrapolated to the complete basis set limit (CBS) were performed. With a semiempirical correction for spin–orbit effects, these coupled-cluster single double triple-CBS calculations give a mean absolute deviation from the experimental BDE values of 0.20 eV. A discussion of the periodic trends, summaries of previous work, and comparisons to isoelectronic species is also provided.more » « less
-
Megow, Nicole; Smith, Adam (Ed.)We study the mixing time of the single-site update Markov chain, known as the Glauber dynamics, for generating a random independent set of a tree. Our focus is obtaining optimal convergence results for arbitrary trees. We consider the more general problem of sampling from the Gibbs distribution in the hard-core model where independent sets are weighted by a parameter λ > 0; the special case λ = 1 corresponds to the uniform distribution over all independent sets. Previous work of Martinelli, Sinclair and Weitz (2004) obtained optimal mixing time bounds for the complete Δ-regular tree for all λ. However, Restrepo et al. (2014) showed that for sufficiently large λ there are bounded-degree trees where optimal mixing does not hold. Recent work of Eppstein and Frishberg (2022) proved a polynomial mixing time bound for the Glauber dynamics for arbitrary trees, and more generally for graphs of bounded tree-width. We establish an optimal bound on the relaxation time (i.e., inverse spectral gap) of O(n) for the Glauber dynamics for unweighted independent sets on arbitrary trees. Moreover, for λ ≤ .44 we prove an optimal mixing time bound of O(n log n). We stress that our results hold for arbitrary trees and there is no dependence on the maximum degree Δ. Interestingly, our results extend (far) beyond the uniqueness threshold which is on the order λ = O(1/Δ). Our proof approach is inspired by recent work on spectral independence. In fact, we prove that spectral independence holds with a constant independent of the maximum degree for any tree, but this does not imply mixing for general trees as the optimal mixing results of Chen, Liu, and Vigoda (2021) only apply for bounded degree graphs. We instead utilize the combinatorial nature of independent sets to directly prove approximate tensorization of variance/entropy via a non-trivial inductive proof.more » « less
-
We consider the problem of sampling and approximately counting an arbitrary given motif H in a graph G, where access to G is given via queries: degree, neighbor, and pair, as well as uniform edge sample queries. Previous algorithms for these tasks were based on a decomposition of H into a collection of odd cycles and stars, denoted D^*(H) = {O_{k₁},...,O_{k_q}, S_{p₁},...,S_{p_𝓁}}. These algorithms were shown to be optimal for the case where H is a clique or an odd-length cycle, but no other lower bounds were known. We present a new algorithm for sampling arbitrary motifs which, up to poly(log n) factors, is always at least as good, and for most graphs G is strictly better. The main ingredient leading to this improvement is an improved uniform algorithm for sampling stars, which might be of independent interest, as it allows to sample vertices according to the p-th moment of the degree distribution. Finally, we prove that this algorithm is decomposition-optimal for decompositions that contain at least one odd cycle. These are the first lower bounds for motifs H with a nontrivial decomposition, i.e., motifs that have more than a single component in their decomposition.more » « less
An official website of the United States government

