Abstract The stein variational gradient descent (SVGD) algorithm is a deterministic particle method for sampling. However, a mean-field analysis reveals that the gradient flow corresponding to the SVGD algorithm (i.e., the Stein Variational Gradient Flow) only provides a constant-order approximation to the Wasserstein gradient flow corresponding to the KL-divergence minimization. In this work, we propose the Regularized Stein Variational Gradient Flow, which interpolates between the Stein Variational Gradient Flow and the Wasserstein gradient flow. We establish various theoretical properties of the Regularized Stein Variational Gradient Flow (and its time-discretization) including convergence to equilibrium, existence and uniqueness of weak solutions, and stability of the solutions. We provide preliminary numerical evidence of the improved performance offered by the regularization.
more »
« less
The geometry of adversarial training in binary classification
Abstract We establish an equivalence between a family of adversarial training problems for non-parametric binary classification and a family of regularized risk minimization problems where the regularizer is a nonlocal perimeter functional. The resulting regularized risk minimization problems admit exact convex relaxations of the type $$L^1+\text{(nonlocal)}\operatorname{TV}$$, a form frequently studied in image analysis and graph-based learning. A rich geometric structure is revealed by this reformulation which in turn allows us to establish a series of properties of optimal solutions of the original problem, including the existence of minimal and maximal solutions (interpreted in a suitable sense) and the existence of regular solutions (also interpreted in a suitable sense). In addition, we highlight how the connection between adversarial training and perimeter minimization problems provides a novel, directly interpretable, statistical motivation for a family of regularized risk minimization problems involving perimeter/total variation. The majority of our theoretical results are independent of the distance used to define adversarial attacks.
more »
« less
- Award ID(s):
- 2005797
- PAR ID:
- 10482562
- Publisher / Repository:
- Oxford University Press
- Date Published:
- Journal Name:
- Information and Inference: A Journal of the IMA
- Volume:
- 12
- Issue:
- 2
- ISSN:
- 2049-8772
- Page Range / eLocation ID:
- 921 to 968
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Abstract We study the existence and uniqueness of solutions to the vector field Peierls–Nabarro (PN) model for curved dislocations in a transversely isotropic medium. Under suitable assumptions for the misfit potential on the slip plane, we reduce the 3D PN model to a nonlocal scalar Ginzburg–Landau equation. For a particular range of elastic coefficients, the nonlocal scalar equation with explicit nonlocal positive kernel is derived. We prove that any stable steady solution has a one-dimensional profile. As a result, we obtain that solutions to the scalar equation, as well as the original 3D system, are characterized as a one-parameter family of straight dislocations. This paper generalizes results found previously for the full isotropic case to an anisotropic setting.more » « less
-
Abstract We propose a sparse deep ReLU network (SDRN) estimator of the regression function obtained from regularized empirical risk minimization with a Lipschitz loss function. Our framework can be applied to a variety of regression and classification problems. We establish novel nonasymptotic excess risk bounds for our SDRN estimator when the regression function belongs to a Sobolev space with mixed derivatives. We obtain a new, nearly optimal, risk rate in the sense that the SDRN estimator can achieve nearly the same optimal minimax convergence rate as one-dimensional nonparametric regression with the dimension involved in a logarithm term only when the feature dimension is fixed. The estimator has a slightly slower rate when the dimension grows with the sample size. We show that the depth of the SDRN estimator grows with the sample size in logarithmic order, and the total number of nodes and weights grows in polynomial order of the sample size to have the nearly optimal risk rate. The proposed SDRN can go deeper with fewer parameters to well estimate the regression and overcome the overfitting problem encountered by conventional feedforward neural networks.more » « less
-
Abstract Adversarial training is a min-max optimization problem that is designed to construct robust classifiers against adversarial perturbations of data. We study three models of adversarial training in the multiclass agnostic-classifier setting. We prove the existence of Borel measurable robust classifiers in each model and provide a unified perspective of the adversarial training problem, expanding the connections with optimal transport initiated by the authors in their previous work [21]. In addition, we develop new connections between adversarial training in the multiclass setting and total variation regularization. As a corollary of our results, we provide an alternative proof of the existence of Borel measurable solutions to the agnostic adversarial training problem in the binary classification setting.more » « less
-
Abstract We study a model for adversarial classification based on distributionally robust chance constraints. We show that under Wasserstein ambiguity, the model aims to minimize the conditional value-at-risk of the distance to misclassification, and we explore links to adversarial classification models proposed earlier and to maximum-margin classifiers. We also provide a reformulation of the distributionally robust model for linear classification, and show it is equivalent to minimizing a regularized ramp loss objective. Numerical experiments show that, despite the nonconvexity of this formulation, standard descent methods appear to converge to the global minimizer for this problem. Inspired by this observation, we show that, for a certain class of distributions, the only stationary point of the regularized ramp loss minimization problem is the global minimizer.more » « less
An official website of the United States government

