skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: For interpolating kernel machines, minimizing the norm of the ERM solution maximizes stability
In this paper, we study kernel ridge-less regression, including the case of interpolating solutions. We prove that maximizing the leave-one-out ([Formula: see text]) stability minimizes the expected error. Further, we also prove that the minimum norm solution — to which gradient algorithms are known to converge — is the most stable solution. More precisely, we show that the minimum norm interpolating solution minimizes a bound on [Formula: see text] stability, which in turn is controlled by the smallest singular value, hence the condition number, of the empirical kernel matrix. These quantities can be characterized in the asymptotic regime where both the dimension ([Formula: see text]) and cardinality ([Formula: see text]) of the data go to infinity (with [Formula: see text] as [Formula: see text]). Our results suggest that the property of [Formula: see text] stability of the learning algorithm with respect to perturbations of the training set may provide a more general framework than the classical theory of Empirical Risk Minimization (ERM). While ERM was developed to deal with the classical regime in which the architecture of the learning network is fixed and [Formula: see text], the modern regime focuses on interpolating regressors and overparameterized models, when both [Formula: see text] and [Formula: see text] go to infinity. Since the stability framework is known to be equivalent to the classical theory in the classical regime, our results here suggest that it may be interesting to extend it beyond kernel regression to other overparameterized algorithms such as deep networks.  more » « less
Award ID(s):
2134108
PAR ID:
10565457
Author(s) / Creator(s):
; ;
Publisher / Repository:
World Scientific Publishing Company
Date Published:
Journal Name:
Analysis and Applications
Volume:
21
Issue:
01
ISSN:
0219-5305
Page Range / eLocation ID:
193 to 215
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. null (Ed.)
    We consider the minimum norm interpolation problem in the [Formula: see text] space, aiming at constructing a sparse interpolation solution. The original problem is reformulated in the pre-dual space, thereby inducing a norm in a related finite-dimensional Euclidean space. The dual problem is then transformed into a linear programming problem, which can be solved by existing methods. With that done, the original interpolation problem is reduced by solving an elementary finite-dimensional linear algebra equation. A specific example is presented to illustrate the proposed method, in which a sparse solution in the [Formula: see text] space is compared to the dense solution in the [Formula: see text] space. This example shows that a solution of the minimum norm interpolation problem in the [Formula: see text] space is indeed sparse, while that of the minimum norm interpolation problem in the [Formula: see text] space is not. 
    more » « less
  2. This work aims to prove a Hardy-type inequality and a trace theorem for a class of function spaces on smooth domains with a nonlocal character. Functions in these spaces are allowed to be as rough as an [Formula: see text]-function inside the domain of definition but as smooth as a [Formula: see text]-function near the boundary. This feature is captured by a norm that is characterized by a nonlocal interaction kernel defined heterogeneously with a special localization feature on the boundary. Thus, the trace theorem we obtain here can be viewed as an improvement and refinement of the classical trace theorem for fractional Sobolev spaces [Formula: see text]. Similarly, the Hardy-type inequalities we establish for functions that vanish on the boundary show that functions in this generalized space have the same decay rate to the boundary as functions in the smaller space [Formula: see text]. The results we prove extend existing results shown in the Hilbert space setting with p = 2. A Poincaré-type inequality we establish for the function space under consideration together with the new trace theorem allows formulating and proving well-posedness of a nonlinear nonlocal variational problem with conventional local boundary condition. 
    more » « less
  3. null (Ed.)
    Let [Formula: see text] be a group acting properly and by isometries on a metric space [Formula: see text]; it follows that the quotient or orbit space [Formula: see text] is also a metric space. We study the Vietoris–Rips and Čech complexes of [Formula: see text]. Whereas (co)homology theories for metric spaces let the scale parameter of a Vietoris–Rips or Čech complex go to zero, and whereas geometric group theory requires the scale parameter to be sufficiently large, we instead consider intermediate scale parameters (neither tending to zero nor to infinity). As a particular case, we study the Vietoris–Rips and Čech thickenings of projective spaces at the first scale parameter where the homotopy type changes. 
    more » « less
  4. A homology class [Formula: see text] of a complex flag variety [Formula: see text] is called a line degree if the moduli space [Formula: see text] of 0-pointed stable maps to X of degree d is also a flag variety [Formula: see text]. We prove a quantum equals classical formula stating that any n-pointed (equivariant, [Formula: see text]-theoretic, genus zero) Gromov–Witten invariant of line degree on X is equal to a classical intersection number computed on the flag variety [Formula: see text]. We also prove an n-pointed analogue of the Peterson comparison formula stating that these invariants coincide with Gromov–Witten invariants of the variety of complete flags [Formula: see text]. Our formulas make it straightforward to compute the big quantum [Formula: see text]-theory ring [Formula: see text] modulo the ideal [Formula: see text] generated by degrees d larger than line degrees. 
    more » « less
  5. For generalized Korteweg–De Vries (KdV) models with polynomial nonlinearity, we establish a local smoothing property in [Formula: see text] for [Formula: see text]. Such smoothing effect persists globally, provided that the [Formula: see text] norm does not blow up in finite time. More specifically, we show that a translate of the nonlinear part of the solution gains [Formula: see text] derivatives for [Formula: see text]. Following a new simple method, which is of independent interest, we establish that, for [Formula: see text], [Formula: see text] norm of a solution grows at most by [Formula: see text] if [Formula: see text] norm is a priori controlled. 
    more » « less