skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Faster margin maximization rates for generic and adversarially robust optimization methods
Abstract First-order optimization methods tend to inherently favor certain solutions over others when minimizing an underdetermined training objective that has multiple global optima. This phenomenon, known asimplicit bias, plays a critical role in understanding the generalization capabilities of optimization algorithms. Recent research has revealed that in separable binary classification tasks gradient-descent-based methods exhibit an implicit bias for the$$\ell _2$$ 2 -maximal margin classifier. Similarly, generic optimization methods, such as mirror descent and steepest descent, have been shown to converge to maximal margin classifiers defined by alternative geometries. While gradient-descent-based algorithms provably achievefastimplicit bias rates, corresponding rates in the literature for generic optimization methods are relatively slow. To address this limitation, we present a series of state-of-the-art implicit bias rates for mirror descent and steepest descent algorithms. Our primary technique involves transforming a generic optimization algorithm into an online optimization dynamic that solves a regularized bilinear game, providing a unified framework for analyzing the implicit bias of various optimization methods. Our accelerated rates are derived by leveraging the regret bounds of online learning algorithms within this game framework. We then show the flexibility of this framework by analyzing the implicit bias inadversarial training, and again obtain significantly improved convergence rates.  more » « less
Award ID(s):
2239151 2212182
PAR ID:
10641843
Author(s) / Creator(s):
; ; ; ;
Publisher / Repository:
Springer Science + Business Media
Date Published:
Journal Name:
Mathematical Programming
ISSN:
0025-5610
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract We propose a new observable for the measurement of the forward–backward asymmetry$$(A_{FB})$$ ( A FB ) in Drell–Yan lepton production. At hadron colliders, the$$A_{FB}$$ A FB distribution is sensitive to both the electroweak (EW) fundamental parameter$$\sin ^{2} \theta _{W}$$ sin 2 θ W , the weak mixing angle, and the parton distribution functions (PDFs). Hence, the determination of$$\sin ^{2} \theta _{W}$$ sin 2 θ W and the updating of PDFs by directly using the same$$A_{FB}$$ A FB spectrum are strongly correlated. This correlation would introduce large bias or uncertainty into both precise measurements of EW and PDF sectors. In this article, we show that the sensitivity of$$A_{FB}$$ A FB on$$\sin ^{2} \theta _{W}$$ sin 2 θ W is dominated by its average value around theZpole region, while the shape (or gradient) of the$$A_{FB}$$ A FB spectrum is insensitive to$$\sin ^{2} \theta _{W}$$ sin 2 θ W and contains important information on the PDF modeling. Accordingly, a new observable related to the gradient of the spectrum is introduced, and demonstrated to be able to significantly reduce the potential bias on the determination of$$\sin ^{2} \theta _{W}$$ sin 2 θ W when updating the PDFs using the same$$A_{FB}$$ A FB data. 
    more » « less
  2. Abstract We studyinexactfixed-point proximity algorithms for solving a class of sparse regularization problems involving the$$\ell _0$$ 0 norm. Specifically, the$$\ell _0$$ 0 model has an objective function that is the sum of a convex fidelity term and a Moreau envelope of the$$\ell _0$$ 0 norm regularization term. Such an$$\ell _0$$ 0 model is non-convex. Existing exact algorithms for solving the problems require the availability of closed-form formulas for the proximity operator of convex functions involved in the objective function. When such formulas are not available, numerical computation of the proximity operator becomes inevitable. This leads to inexact iteration algorithms. We investigate in this paper how the numerical error for every step of the iteration should be controlled to ensure global convergence of the inexact algorithms. We establish a theoretical result that guarantees the sequence generated by the proposed inexact algorithm converges to a local minimizer of the optimization problem. We implement the proposed algorithms for three applications of practical importance in machine learning and image science, which include regression, classification, and image deblurring. The numerical results demonstrate the convergence of the proposed algorithm and confirm that local minimizers of the$$\ell _0$$ 0 models found by the proposed inexact algorithm outperform global minimizers of the corresponding$$\ell _1$$ 1 models, in terms of approximation accuracy and sparsity of the solutions. 
    more » « less
  3. Abstract We prove that the Hilbert scheme ofkpoints on$${\mathbb {C}}^2$$ C 2 ($$\hbox {Hilb}^k[{\mathbb {C}}^2]$$ Hilb k [ C 2 ] ) is self-dual under three-dimensional mirror symmetry using methods of geometry and integrability. Namely, we demonstrate that the corresponding quantum equivariant K-theory is invariant upon interchanging its Kähler and equivariant parameters as well as inverting the weight of the$${\mathbb {C}}^\times _\hbar $$ C ħ × -action. First, we find a two-parameter family$$X_{k,l}$$ X k , l of self-mirror quiver varieties of type A and study their quantum K-theory algebras. The desired quantum K-theory of$$\hbox {Hilb}^k[{\mathbb {C}}^2]$$ Hilb k [ C 2 ] is obtained via direct limit$$l\longrightarrow \infty $$ l and by imposing certain periodic boundary conditions on the quiver data. Throughout the proof, we employ the quantum/classical (q-Langlands) correspondence between XXZ Bethe Ansatz equations and spaces of twisted$$\hbar $$ ħ -opers. In the end, we propose the 3d mirror dual for the moduli spaces of torsion-free rank-Nsheaves on$${\mathbb {P}}^2$$ P 2 with the help of a different (three-parametric) family of type A quiver varieties with known mirror dual. 
    more » « less
  4. Abstract We introduce partitioned matching games as a suitable model for international kidney exchange programmes, where in each round the total number of available kidney transplants needs to be distributed amongst the participating countries in a “fair” way. A partitioned matching game (N, v) is defined on a graph$$G=(V,E)$$ G = ( V , E ) with an edge weightingwand a partition$$V=V_1 \cup \dots \cup V_n$$ V = V 1 V n . The player set is$$N = \{ 1, \dots , n\}$$ N = { 1 , , n } , and player$$p \in N$$ p N owns the vertices in$$V_p$$ V p . The valuev(S) of a coalition $$S \subseteq N$$ S N is the maximum weight of a matching in the subgraph ofGinduced by the vertices owned by the players in S. If$$|V_p|=1$$ | V p | = 1 for all$$p\in N$$ p N , then we obtain the classical matching game. Let$$c=\max \{|V_p| \; |\; 1\le p\le n\}$$ c = max { | V p | | 1 p n } be the width of (N, v). We prove that checking core non-emptiness is polynomial-time solvable if$$c\le 2$$ c 2 but co--hard if$$c\le 3$$ c 3 . We do this via pinpointing a relationship with the known class ofb-matching games and completing the complexity classification on testing core non-emptiness forb-matching games. With respect to our application, we prove a number of complexity results on choosing, out of possibly many optimal solutions, one that leads to a kidney transplant distribution that is as close as possible to some prescribed fair distribution. 
    more » « less
  5. Abstract LetXbe ann-element point set in thek-dimensional unit cube$$[0,1]^k$$ [ 0 , 1 ] k where$$k \ge 2$$ k 2 . According to an old result of Bollobás and Meir (Oper Res Lett 11:19–21, 1992) , there exists a cycle (tour)$$x_1, x_2, \ldots , x_n$$ x 1 , x 2 , , x n through thenpoints, such that$$\left( \sum _{i=1}^n |x_i - x_{i+1}|^k \right) ^{1/k} \le c_k$$ i = 1 n | x i - x i + 1 | k 1 / k c k , where$$|x-y|$$ | x - y | is the Euclidean distance betweenxandy, and$$c_k$$ c k is an absolute constant that depends only onk, where$$x_{n+1} \equiv x_1$$ x n + 1 x 1 . From the other direction, for every$$k \ge 2$$ k 2 and$$n \ge 2$$ n 2 , there existnpoints in$$[0,1]^k$$ [ 0 , 1 ] k , such that their shortest tour satisfies$$\left( \sum _{i=1}^n |x_i - x_{i+1}|^k \right) ^{1/k} = 2^{1/k} \cdot \sqrt{k}$$ i = 1 n | x i - x i + 1 | k 1 / k = 2 1 / k · k . For the plane, the best constant is$$c_2=2$$ c 2 = 2 and this is the only exact value known. Bollobás and Meir showed that one can take$$c_k = 9 \left( \frac{2}{3} \right) ^{1/k} \cdot \sqrt{k}$$ c k = 9 2 3 1 / k · k for every$$k \ge 3$$ k 3 and conjectured that the best constant is$$c_k = 2^{1/k} \cdot \sqrt{k}$$ c k = 2 1 / k · k , for every$$k \ge 2$$ k 2 . Here we significantly improve the upper bound and show that one can take$$c_k = 3 \sqrt{5} \left( \frac{2}{3} \right) ^{1/k} \cdot \sqrt{k}$$ c k = 3 5 2 3 1 / k · k or$$c_k = 2.91 \sqrt{k} \ (1+o_k(1))$$ c k = 2.91 k ( 1 + o k ( 1 ) ) . Our bounds are constructive. We also show that$$c_3 \ge 2^{7/6}$$ c 3 2 7 / 6 , which disproves the conjecture for$$k=3$$ k = 3 . Connections to matching problems, power assignment problems, related problems, including algorithms, are discussed in this context. A slightly revised version of the Bollobás–Meir conjecture is proposed. 
    more » « less