skip to main content

This content will become publicly available on January 1, 2023

Title: Gardner formula for Ising perceptron models at small densities
We consider the Ising perceptron model with N spins and M = N*alpha patterns, with a general activation function U that is bounded above. For U bounded away from zero, or U a one-sided threshold function, it was shown by Talagrand (2000, 2011) that for small densities alpha, the free energy of the model converges in the large-N limit to the replica symmetric formula conjectured in the physics literature (Krauth–Mezard 1989, see also Gardner–Derrida 1988). We give a new proof of this result, which covers the more general class of all functions U that are bounded above and satisfy a certain variance bound. The proof uses the (first and second) moment method conditional on the approximate message passing iterates of the model. In order to deduce our main theorem, we also prove a new concentration result for the perceptron model in the case where U is not bounded away from zero.
Authors:
; ; ;
Award ID(s):
2031883
Publication Date:
NSF-PAR ID:
10344272
Journal Name:
Conference on Learning Theory
Sponsoring Org:
National Science Foundation
More Like this
  1. Extremal combinatorics often deals with problems of maximizing a specific quantity related to substructures in large discrete structures. The first question of this kind that comes to one's mind is perhaps determining the maximum possible number of induced subgraphs isomorphic to a fixed graph $H$ in an $n$-vertex graph. The asymptotic behavior of this number is captured by the limit of the ratio of the maximum number of induced subgraphs isomorphic to $H$ and the number of all subgraphs with the same number vertices as $H$; this quantity is known as the _inducibility_ of $H$. More generally, one can define the inducibility of a family of graphs in the analogous way.Among all graphs with $k$ vertices, the only two graphs with inducibility equal to one are the empty graph and the complete graph. However, how large can the inducibility of other graphs with $k$ vertices be? Fix $k$, consider a graph with $n$ vertices join each pair of vertices independently by an edge with probability $\binom{k}{2}^{-1}$. The expected number of $k$-vertex induced subgraphs with exactly one edge is $e^{-1}+o(1)$. So, the inducibility of large graphs with a single edge is at least $e^{-1}+o(1)$. This article establishes that this bound ismore »the best possible in the following stronger form, which proves a conjecture of Alon, Hefetz, Krivelevich and Tyomkyn: the inducibility of the family of $k$-vertex graphs with exactly $l$ edges where $0« less
  2. Real-time decision making in IoT applications relies upon space-efficient evaluation of queries over streaming data. To model the uncertainty in the classification of data being processed, we consider the model of probabilistic strings --- sequences of discrete probability distributions over a finite set of events, and initiate the study of space complexity of streaming computation for different classes of queries over such probabilistic strings. We first consider the problem of computing the probability that a word, sampled from the distribution defined by the probabilistic string read so far, is accepted by a given deterministic finite automaton. We show that this regular pattern matching problem can be solved using space that is only poly-logarithmic in the string length (and polynomial in the size of the DFA) if we are allowed a multiplicative approximation error. Then we show how to generalize this result to quantitative queries specified by additive cost register automata --- these are automata that map strings to numerical values using finite control and registers that get updated using linear transformations. Finally, we consider the case when updates in such an automaton involve tests, and in particular, when there is a counter variable that can be either incremented or decrementedmore »but decrements only apply when the counter value is non-zero. In this case, the desired answer depends on the probability distribution over the set of possible counter values that can range from 0 to n for a string of length n. Under a mild assumption, namely probabilities of the individual events are bounded away from 0 and 1, we show that there is an algorithm that can compute all n entries of this probability distribution vector to within additive 1/poly(n) error using space that is only Õ(n). In establishing these results, we introduce several new technical ideas that may prove useful for designing space-efficient algorithms for other query models over probabilistic strings.« less
  3. We establish a relation between the "large r" asymptotics of the Turaev-Viro invariants $TV_r $and the Gromov norm of 3-manifolds. We show that for any orientable, compact 3-manifold $M$, with (possibly empty) toroidal boundary, $log|TVr(M)|$ is bounded above by a function linear in $r$ and whose slope is a positive universal constant times the Gromov norm of $M$. The proof combines TQFT techniques, geometric decomposition theory of 3-manifolds and analytical estimates of $6j$-symbols. We obtain topological criteria that can be used to check whether the growth is actually exponential; that is one has $log|TVr(M)|\geq B r$, for some $B>0$. We use these criteria to construct infinite families of hyperbolic 3-manifolds whose $SO(3)$- Turaev-Viro invariants grow exponentially. These constructions are essential for the results of article [3] where we make progress on a conjecture of Andersen, Masbaum and Ueno about the geometric properties of surface mapping class groups detected by the quantum representations. We also study the behavior of the Turaev-Viro invariants under cutting and gluing of 3-manifolds along tori. In particular, we show that, like the Gromov norm, the values of the invariants do not increase under Dehn filling and we give applications of this result on the question ofmore »the extent to which relations between the invariants TVr and hyperbolic volume are preserved under Dehn filling. Finally we give constructions of 3-manifolds, both with zero and non-zero Gromov norm, for which the Turaev-Viro invariants determine the Gromov norm.« less
  4. We investigate the approximability of the following optimization problem. The input is an n× n matrix A=(Aij) with real entries and an origin-symmetric convex body K⊂ ℝn that is given by a membership oracle. The task is to compute (or approximate) the maximum of the quadratic form ∑i=1n∑j=1n Aij xixj=⟨ x,Ax⟩ as x ranges over K. This is a rich and expressive family of optimization problems; for different choices of matrices A and convex bodies K it includes a diverse range of optimization problems like max-cut, Grothendieck/non-commutative Grothendieck inequalities, small set expansion and more. While the literature studied these special cases using case-specific reasoning, here we develop a general methodology for treatment of the approximability and inapproximability aspects of these questions. The underlying geometry of K plays a critical role; we show under commonly used complexity assumptions that polytime constant-approximability necessitates that K has type-2 constant that grows slowly with n. However, we show that even when the type-2 constant is bounded, this problem sometimes exhibits strong hardness of approximation. Thus, even within the realm of type-2 bodies, the approximability landscape is nuanced and subtle. However, the link that we establish between optimization and geometry of Banach spaces allows usmore »to devise a generic algorithmic approach to the above problem. We associate to each convex body a new (higher dimensional) auxiliary set that is not convex, but is approximately convex when K has a bounded type-2 constant. If our auxiliary set has an approximate separation oracle, then we design an approximation algorithm for the original quadratic optimization problem, using an approximate version of the ellipsoid method. Even though our hardness result implies that such an oracle does not exist in general, this new question can be solved in specific cases of interest by implementing a range of classical tools from functional analysis, most notably the deep factorization theory of linear operators. Beyond encompassing the scenarios in the literature for which constant-factor approximation algorithms were found, our generic framework implies that that for convex sets with bounded type-2 constant, constant factor approximability is preserved under the following basic operations: (a) Subspaces, (b) Quotients, (c) Minkowski Sums, (d) Complex Interpolation. This yields a rich family of new examples where constant factor approximations are possible, which were beyond the reach of previous methods. We also show (under commonly used complexity assumptions) that for symmetric norms and unitarily invariant matrix norms the type-2 constant nearly characterizes the approximability of quadratic maximization.« less
  5. Arikan's exciting discovery of polar codes has provided an altogether new way to efficiently achieve Shannon capacity. Given a (constant-sized) invertible matrix M, a family of polar codes can be associated with this matrix and its ability to approach capacity follows from the polarization of an associated [0,1]-bounded martingale, namely its convergence in the limit to either 0 or 1 with probability 1. Arikan showed appropriate polarization of the martingale associated with the matrix G2=(1011) to get capacity achieving codes. His analysis was later extended to all matrices M which satisfy an obvious necessary condition for polarization. While Arikan's theorem does not guarantee that the codes achieve capacity at small blocklengths, it turns out that a "strong" analysis of the polarization of the underlying martingale would lead to such constructions. Indeed for the martingale associated with G2 such a strong polarization was shown in two independent works ([Guruswami and Xia, IEEE IT '15] and [Hassani et al., IEEE IT '14]), thereby resolving a major theoretical challenge associated with the efficient attainment of Shannon capacity. In this work we extend the result above to cover martingales associated with all matrices that satisfy the necessary condition for (weak) polarization. In addition tomore »being vastly more general, our proofs of strong polarization are (in our view) also much simpler and modular. Key to our proof is a notion of local polarization that only depends on the evolution of the martingale in a single time step. Our result shows strong polarization over all prime fields and leads to efficient capacity-achieving source codes for compressing arbitrary i.i.d. sources, and capacity-achieving channel codes for arbitrary symmetric memoryless channels.« less