skip to main content


Title: Tighter PAC-Bayes Bounds Through Coin-Betting
We consider the problem of estimating the mean of a sequence of random elements f (θ, X_1) , . . . , f (θ, X_n) where f is a fixed scalar function, S = (X_1, . . . , X_n) are independent random variables, and θ is a possibly S-dependent parameter. An example of such a problem would be to estimate the generalization error of a neural network trained on n examples where f is a loss function. Classically, this problem is approached through concentration inequalities holding uniformly over compact parameter sets of functions f , for example as in Rademacher or VC type analysis. However, in many problems, such inequalities often yield numerically vacuous estimates. Recently, the PAC-Bayes framework has been proposed as a better alternative for this class of problems for its ability to often give numerically non-vacuous bounds. In this paper, we show that we can do even better: we show how to refine the proof strategy of the PAC-Bayes bounds and achieve even tighter guarantees. Our approach is based on the coin-betting framework that derives the numerically tightest known time-uniform concentration inequalities from the regret guarantees of online gambling algorithms. In particular, we derive the first PAC-Bayes concentration inequality based on the coin-betting approach that holds simultaneously for all sample sizes. We demonstrate its tightness showing that by relaxing it we obtain a number of previous results in a closed form including Bernoulli-KL and empirical Bernstein inequalities. Finally, we propose an efficient algorithm to numerically calculate confidence sequences from our bound, which often generates nonvacuous confidence bounds even with one sample, unlike the state-of-the-art PAC-Bayes bounds.  more » « less
Award ID(s):
2046096 2022446
PAR ID:
10489296
Author(s) / Creator(s):
; ; ;
Publisher / Repository:
Proceedings of Machine Learning Research
Date Published:
Journal Name:
Conference on Learning Theory
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. We study verifiable outsourcing of computation in a model where the verifier has black-box access to the function being computed. We introduce the problem of oracle-aided batch verification of computation (OBVC) for a function class $\mathcal{F}$. This allows a verifier to efficiently verify the correctness of any $f \in \mathcal{F}$ evaluated on a batch of $n$ instances $x_1, \ldots, x_n$, while only making $\lambda$ calls to an oracle for $f$ (along with $O(n \lambda)$ calls to low-complexity helper oracles), for security parameter $\lambda$. We obtain the following positive and negative results: - We build OBVC protocols for the class of all functions that admit {\em random-self-reductions}. Some of our protocols rely on homomorphic encryption schemes. - We show that there cannot exist OBVC schemes for the class of all functions mapping $\lambda$-bit inputs to $\lambda$-bit outputs, for any $n = \mathsf{poly}(\lambda)$. 
    more » « less
  2. Abstract

    Isotonic regression is a standard problem in shape-constrained estimation where the goal is to estimate an unknown non-decreasing regression function $f$ from independent pairs $(x_i, y_i)$ where ${\mathbb{E}}[y_i]=f(x_i), i=1, \ldots n$. While this problem is well understood both statistically and computationally, much less is known about its uncoupled counterpart, where one is given only the unordered sets $\{x_1, \ldots , x_n\}$ and $\{y_1, \ldots , y_n\}$. In this work, we leverage tools from optimal transport theory to derive minimax rates under weak moments conditions on $y_i$ and to give an efficient algorithm achieving optimal rates. Both upper and lower bounds employ moment-matching arguments that are also pertinent to learning mixtures of distributions and deconvolution.

     
    more » « less
  3. We develop new techniques for proving lower bounds on the least singular value of random matrices with limited randomness. The matrices we consider have entries that are given by polynomials of a few underlying base random variables. This setting captures a core technical challenge for obtaining smoothed analysis guarantees in many algorithmic settings. Least singular value bounds often involve showing strong anti-concentration inequalities that are intricate and much less understood compared to concentration (or large deviation) bounds. First, we introduce a general technique for proving anti-concentration that uses well-conditionedness properties of the Jacobian of a polynomial map, and show how to combine this with a hierarchical net argument to prove least singular value bounds. Our second tool is a new statement about least singular values to reason about higher-order lifts of smoothed matrices and the action of linear operators on them. Apart from getting simpler proofs of existing smoothed analysis results, we use these tools to now handle more general families of random matrices. This allows us to produce smoothed analysis guarantees in several previously open settings. These new settings include smoothed analysis guarantees for power sum decompositions and certifying robust entanglement of subspaces, where prior work could only establish least singular value bounds for fully random instances or only show non-robust genericity guarantees. 
    more » « less
  4. Abstract

    This paper investigates robust versions of the general empirical risk minimization algorithm, one of the core techniques underlying modern statistical methods. Success of the empirical risk minimization is based on the fact that for a ‘well-behaved’ stochastic process $\left \{ f(X), \ f\in \mathscr F\right \}$ indexed by a class of functions $f\in \mathscr F$, averages $\frac{1}{N}\sum _{j=1}^N f(X_j)$ evaluated over a sample $X_1,\ldots ,X_N$ of i.i.d. copies of $X$ provide good approximation to the expectations $\mathbb E f(X)$, uniformly over large classes $f\in \mathscr F$. However, this might no longer be true if the marginal distributions of the process are heavy tailed or if the sample contains outliers. We propose a version of empirical risk minimization based on the idea of replacing sample averages by robust proxies of the expectations and obtain high-confidence bounds for the excess risk of resulting estimators. In particular, we show that the excess risk of robust estimators can converge to $0$ at fast rates with respect to the sample size $N$, referring to the rates faster than $N^{-1/2}$. We discuss implications of the main results to the linear and logistic regression problems and evaluate the numerical performance of proposed methods on simulated and real data.

     
    more » « less
  5. The noise sensitivity of a Boolean function f: {0,1}^n - > {0,1} is one of its fundamental properties. For noise parameter delta, the noise sensitivity is denoted as NS_{delta}[f]. This quantity is defined as follows: First, pick x = (x_1,...,x_n) uniformly at random from {0,1}^n, then pick z by flipping each x_i independently with probability delta. NS_{delta}[f] is defined to equal Pr [f(x) != f(z)]. Much of the existing literature on noise sensitivity explores the following two directions: (1) Showing that functions with low noise-sensitivity are structured in certain ways. (2) Mathematically showing that certain classes of functions have low noise sensitivity. Combined, these two research directions show that certain classes of functions have low noise sensitivity and therefore have useful structure. The fundamental importance of noise sensitivity, together with this wealth of structural results, motivates the algorithmic question of approximating NS_{delta}[f] given an oracle access to the function f. We show that the standard sampling approach is essentially optimal for general Boolean functions. Therefore, we focus on estimating the noise sensitivity of monotone functions, which form an important subclass of Boolean functions, since many functions of interest are either monotone or can be simply transformed into a monotone function (for example the class of unate functions consists of all the functions that can be made monotone by reorienting some of their coordinates [O'Donnell, 2014]). Specifically, we study the algorithmic problem of approximating NS_{delta}[f] for monotone f, given the promise that NS_{delta}[f] >= 1/n^{C} for constant C, and for delta in the range 1/n <= delta <= 1/2. For such f and delta, we give a randomized algorithm performing O((min(1,sqrt{n} delta log^{1.5} n))/(NS_{delta}[f]) poly (1/epsilon)) queries and approximating NS_{delta}[f] to within a multiplicative factor of (1 +/- epsilon). Given the same constraints on f and delta, we also prove a lower bound of Omega((min(1,sqrt{n} delta))/(NS_{delta}[f] * n^{xi})) on the query complexity of any algorithm that approximates NS_{delta}[f] to within any constant factor, where xi can be any positive constant. Thus, our algorithm's query complexity is close to optimal in terms of its dependence on n. We introduce a novel descending-ascending view of noise sensitivity, and use it as a central tool for the analysis of our algorithm. To prove lower bounds on query complexity, we develop a technique that reduces computational questions about query complexity to combinatorial questions about the existence of "thin" functions with certain properties. The existence of such "thin" functions is proved using the probabilistic method. These techniques also yield new lower bounds on the query complexity of approximating other fundamental properties of Boolean functions: the total influence and the bias. 
    more » « less