We study the problem of learning hierarchical polynomials over the standard Gaussian distribution with three-layer neural networks. We specifically consider target functions of the form where is a degree polynomial and is a degree polynomial. This function class generalizes the single-index model, which corresponds to , and is a natural class of functions possessing an underlying hierarchical structure. Our main result shows that for a large subclass of degree polynomials , a three-layer neural network trained via layerwise gradient descent on the square loss learns the target up to vanishing test error in samples and polynomial time. This is a strict improvement over kernel methods, which require samples, as well as existing guarantees for two-layer networks, which require the target function to be low-rank. Our result also generalizes prior works on three-layer neural networks, which were restricted to the case of being a quadratic. When is indeed a quadratic, we achieve the information-theoretically optimal sample complexity , which is an improvement over prior work (Nichani et al., 2023) requiring a sample size of . Our proof proceeds by showing that during the initial stage of training the network performs feature learning to recover the feature with samples. This work demonstrates the ability of three-layer neural networks to learn complex features and as a result, learn a broad class of hierarchical functions.
more »
« less
This content will become publicly available on July 30, 2026
A solution to the degree-d twisted rabbit problem
We solve generalizations of Hubbard's twisted rabbit problem for analogues of the rabbit polynomial of degree $$d\geq 2$$. The twisted rabbit problem asks: when a certain quadratic polynomial, called the Douady Rabbit polynomial, is twisted by a cyclic subgroup of a mapping class group, to which polynomial is the resulting map equivalent (as a function of the power of the generator)? The solution to the original quadratic twisted rabbit problem, given by Bartholdi--Nekrashevych, depended on the 4-adic expansion of the power of the mapping class by which we twist. In this paper, we provide a solution to a degree-$$d$$ generalization that depends on the $d^2$-adic expansion of the power of the mapping class element by which we twist.
more »
« less
- Award ID(s):
- 2108572
- PAR ID:
- 10621442
- Publisher / Repository:
- Cambridge
- Date Published:
- Journal Name:
- Ergodic theory dynamical systems
- ISSN:
- 0143-3857
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Consider a system of m polynomial equations {pi(x)=bi}i≤m of degree D≥2 in n-dimensional variable x∈ℝn such that each coefficient of every pi and bis are chosen at random and independently from some continuous distribution. We study the basic question of determining the smallest m -- the algorithmic threshold -- for which efficient algorithms can find refutations (i.e. certificates of unsatisfiability) for such systems. This setting generalizes problems such as refuting random SAT instances, low-rank matrix sensing and certifying pseudo-randomness of Goldreich's candidate generators and generalizations. We show that for every d∈ℕ, the (n+m)O(d)-time canonical sum-of-squares (SoS) relaxation refutes such a system with high probability whenever m≥O(n)⋅(nd)D−1. We prove a lower bound in the restricted low-degree polynomial model of computation which suggests that this trade-off between SoS degree and the number of equations is nearly tight for all d. We also confirm the predictions of this lower bound in a limited setting by showing a lower bound on the canonical degree-4 sum-of-squares relaxation for refuting random quadratic polynomials. Together, our results provide evidence for an algorithmic threshold for the problem at m≳O˜(n)⋅n(1−δ)(D−1) for 2nδ-time algorithms for all δ.more » « less
-
We examine the Kauffman bracket expansion of the generalized crossing $$\Delta_n$$, a half-twist on $$n$$ parallel strands, as an element of the Temperley-Lieb algebra with coefficients in $$\mathbb{Z}[A,A^{-1}]$$. In particular, we determine the minimum and maximum degrees of all possible coefficients appearing in this expansion. Our main theorem shows that the maximum such degree is quadratic in $$n$$, while the minimum such degree is linear. We also include an appendix with explicit expansions for $$n$$ at most six.more » « less
-
Statistical and Computational Complexities of BFGS Quasi-Newton Method for Generalized Linear ModelsThe gradient descent (GD) method has been used widely to solve parameter estimation in generalized linear models (GLMs), a generalization of linear models when the link function can be non-linear. In GLMs with a polynomial link function, it has been shown that in the high signal-to-noise ratio (SNR) regime, due to the problem's strong convexity and smoothness, GD converges linearly and reaches the final desired accuracy in a logarithmic number of iterations. In contrast, in the low SNR setting, where the problem becomes locally convex, GD converges at a slower rate and requires a polynomial number of iterations to reach the desired accuracy. Even though Newton's method can be used to resolve the flat curvature of the loss functions in the low SNR case, its computational cost is prohibitive in high-dimensional settings as it is $$\mathcal{O}(d^3)$$, where $$d$$ the is the problem dimension. To address the shortcomings of GD and Newton's method, we propose the use of the BFGS quasi-Newton method to solve parameter estimation of the GLMs, which has a per iteration cost of $$\mathcal{O}(d^2)$$. When the SNR is low, for GLMs with a polynomial link function of degree $$p$$, we demonstrate that the iterates of BFGS converge linearly to the optimal solution of the population least-square loss function, and the contraction coefficient of the BFGS algorithm is comparable to that of Newton's method. Moreover, the contraction factor of the linear rate is independent of problem parameters and only depends on the degree of the link function $$p$$. Also, for the empirical loss with $$n$$ samples, we prove that in the low SNR setting of GLMs with a polynomial link function of degree $$p$$, the iterates of BFGS reach a final statistical radius of $$\mathcal{O}((d/n)^{\frac{1}{2p+2}})$$ after at most $$\log(n/d)$$ iterations. This complexity is significantly less than the number required for GD, which scales polynomially with $(n/d)$.more » « less
-
Chan, Timothy; Fischer, Johannes; Iacono, John; Herman, Grzegorz (Ed.)We consider hypergraph network design problems where the goal is to construct a hypergraph that satisfies certain connectivity requirements. For graph network design problems where the goal is to construct a graph that satisfies certain connectivity requirements, the number of edges in every feasible solution is at most quadratic in the number of vertices. In contrast, for hypergraph network design problems, we might have feasible solutions in which the number of hyperedges is exponential in the number of vertices. This presents an additional technical challenge in hypergraph network design problems compared to graph network design problems: in order to solve the problem in polynomial time, we first need to show that there exists a feasible solution in which the number of hyperedges is polynomial in the input size. The central theme of this work is to overcome this additional technical challenge for certain hypergraph network design problems. We show that these hypergraph network design problems admit solutions in which the number of hyperedges is polynomial in the number of vertices and moreover, can be solved in strongly polynomial time. Our work improves on the previous fastest pseudo-polynomial run-time for these problems. As applications of our results, we derive the first strongly polynomial time algorithms for (i) degree-specified hypergraph connectivity augmentation using hyperedges and (ii) degree-specified hypergraph node-to-area connectivity augmentation using hyperedges.more » « less
An official website of the United States government
