skip to main content


The NSF Public Access Repository (NSF-PAR) system and access will be unavailable from 11:00 PM ET on Friday, September 29 until 11:59 PM ET on Saturday, September 30 due to maintenance. We apologize for the inconvenience.

Search for: All records

Creators/Authors contains: "Kamath, Pritish"

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. null (Ed.)
    We study the relative power of learning with gradient descent on differentiable models, such as neural networks, versus using the corresponding tangent kernels. We show that under certain conditions, gradient descent achieves small error only if a related tangent kernel method achieves a non-trivial advantage over random guessing (a.k.a. weak learning), though this advantage might be very small even when gradient descent can achieve arbitrarily high accuracy. Complementing this, we show that without these conditions, gradient descent can in fact learn with small error even when no kernel method, in particular using the tangent kernel, can achieve a non-trivial advantage over random guessing. 
    more » « less
  2. null (Ed.)
    We show that the Invariant Risk Minimization (IRM) formulation of Arjovsky et al. (2019) can fail to capture “natural” invariances, at least when used in its practical “linear” form, and even on very simple problems which directly follow the motivating examples for IRM. This can lead to worse generalization on new environments, even when compared to unconstrained ERM. The issue stems from a significant gap between the linear variant (as in their concrete method IRMv1) and the full non-linear IRM formulation. Additionally, even when capturing the “right” invariances, we show that it is possible for IRM to learn a sub-optimal predictor, due to the loss function not being invariant across environments. The issues arise even when measuring invariance on the population distributions, but are exacerbated by the fact that IRM is extremely fragile to sampling. 
    more » « less
  3. We present and study approximate notions of dimensional and margin complexity, which correspond to the minimal dimension or norm of an embedding required to {\em approximate}, rather then exactly represent, a given hypothesis class. We show that such notions are not only sufficient for learning using linear predictors or a kernel, but unlike the exact variants, are also necessary. Thus they are better suited for discussing limitations of linear or kernel methods. 
    more » « less
  4. null (Ed.)
    We study the search problem class PPA_q defined as a modulo-q analog of the well-known polynomial parity argument class PPA introduced by Papadimitriou (JCSS 1994). Our first result shows that this class can be characterized in terms of PPA_p for prime p. Our main result is to establish that an explicit version of a search problem associated to the Chevalley - Warning theorem is complete for PPA_p for prime p. This problem is natural in that it does not explicitly involve circuits as part of the input. It is the first such complete problem for PPA_p when p ≥ 3. Finally we discuss connections between Chevalley-Warning theorem and the well-studied short integer solution problem and survey the structural properties of PPA_q. 
    more » « less
  5. null (Ed.)