skip to main content

Title: Sharpness-Aware Minimization with Dynamic Reweighting
Deep neural networks are often overparameterized and may not easily achieve model generalization. Adversarial training has shown effectiveness in improving generalization by regularizing the change of loss on top of adversarially chosen perturbations. The recently proposed sharpness-aware minimization (SAM) algorithm conducts adversarial weight perturbation, encouraging the model to converge to a flat minima. SAM finds a common adversarial weight perturbation per-batch. Although per-instance adversarial weight perturbations are stronger adversaries and can potentially lead to better generalization performance, their computational cost is very high and thus it is impossible to use per-instance perturbations efficiently in SAM. In this paper, we tackle this efficiency bottleneck and propose sharpness-aware minimization with dynamic reweighting (delta-SAM). Our theoretical analysis motivates that it is possible to approach the stronger, per-instance adversarial weight perturbations using reweighted per-batch weight perturbations. delta-SAM dynamically reweights perturbation within each batch according to the theoretically principled weighting factors, serving as a good approximation to per-instance perturbation. Experiments on various natural language understanding tasks demonstrate the effectiveness of delta-SAM.  more » « less
Award ID(s):
Author(s) / Creator(s):
; ; ;
Date Published:
Journal Name:
Findings of the Association for Computational Linguistics: EMNLP 2022
Page Range / eLocation ID:
5686 to 5699
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. This work studies the sensitivity of neural networks to weight perturbations, firstly corresponding to a newly developed threat model that perturbs the neural network parameters. We propose an efficient approach to compute a certified robustness bound of weight perturbations, within which neural networks will not make erroneous outputs as desired by the adversary. In addition, we identify a useful connection between our developed certification method and the problem of weight quantization, a popular model compression technique in deep neural networks (DNNs) and a ‘must-try’ step in the design of DNN inference engines on resource constrained computing platforms, such as mobiles, FPGA, and ASIC. Specifically, we study the problem of weight quantization – weight perturbations in the non-adversarial setting – through the lens of certificated robustness, and we demonstrate significant improvements on the generalization ability of quantized networks through our robustness-aware quantization scheme. 
    more » « less
  2. We present sharpness-aware minimization (SAM) for fluid dynamics which can efficiently learn the plausible dynamics of liquid splashes. Due to its ability to achieve robust and generalizing solutions, SAM efficiently converges to a parameter set that predicts plausible dynamics of elusive liquid splashes. Our training scheme requires 6 times smaller number of epochs to converge and, 4 times shorter wall-clock time. Our result shows that sharpness of loss function has a close connection to the plausibility of fluid dynamics and suggests further applicability of SAM to machine learning based fluid simulation. 
    more » « less
  3. Neural-network quantum molecular dynamics (NNQMD) simulations based on machine learning are revolutionizing atomistic simulations of materials by providing quantum-mechanical accuracy but orders-of-magnitude faster, illustrated by ACM Gordon Bell prize (2020) and finalist (2021). State-of-the-art (SOTA) NNQMD model founded on group theory featuring rotational equivari- ance and local descriptors has provided much higher accuracy and speed than those models, thus named Allegro (meaning fast). On massively parallel super- computers, however, it suffers a fidelity-scaling problem, where growing number of unphysical predictions of interatomic forces prohibits simulations involving larger numbers of atoms for longer times. Here, we solve this problem by com- bining the Allegro model with sharpness aware minimization (SAM) for enhanc- ing the robustness of model through improved smoothness of the loss landscape. The resulting Allegro-Legato (meaning fast and “smooth”) model was shown to elongate the time-to-failure tfailure, without sacrificing computational speed or accuracy. Specifically, Allegro-Legato exhibits much weaker dependence of time- to-failure on the problem size, t_failure = N^−0.14 (N is the number of atoms) compared to the SOTA Allegro model (t_failure ∝ N^−0.29), i.e., systematically delayed time-to-failure, thus allowing much larger and longer NNQMD simulations without failure. The model also exhibits excellent computational scalabil- ity and GPU acceleration on the Polaris supercomputer at Argonne Leadership Computing Facility. Such scalable, accurate, fast and robust NNQMD models will likely find broad applications in NNQMD simulations on emerging exaflop/s computers, with a specific example of accounting for nuclear quantum effects in the dynamics of ammonia to lay a foundation of the green ammonia technology for sustainability. 
    more » « less
  4. null (Ed.)
    Adversarially robust classification seeks a classifier that is insensitive to adversarial perturbations of test patterns. This problem is often formulated via a minimax objective, where the target loss is the worst-case value of the 0-1 loss subject to a bound on the size of perturbation. Recent work has proposed convex surrogates for the adversarial 0-1 loss, in an effort to make optimization more tractable. In this work, we consider the question of which surrogate losses are calibrated with respect to the adversarial 0-1 loss, meaning that minimization of the former implies minimization of the latter. We show that no convex surrogate loss is calibrated with respect to the adversarial 0-1 loss when restricted to the class of linear models. We further introduce a class of nonconvex losses and offer necessary and sufficient conditions for losses in this class to be calibrated. Keywords: surrogate loss, classification calibration, adversarial robustness 
    more » « less

    NK22 proposed a new method to reconstruct the temperature perturbation map (as functions of time and disc radius) of active galactic nuclei (AGN) accretion discs using multiwavelength photometric light curves. We apply their technique to 100 quasars at z = 0.5–2 from the Sloan Digital Sky Survey Reverberation Mapping project, using multi-epoch spectroscopy that covers rest-frame UV-optical continuum emission from the quasar and probes days to months time-scales. Consistent with NK22 for low-redshift AGNs, we find that the dominant pattern of disc temperature perturbations is either slow inward/outward moving waves with typical amplitudes $\delta T/T_0\sim 10~{{\ \rm per \, cent}}$ traveling at ∼0.01–0.1c, with a typical radial frequency of ∼ 0.5 dex in log R, or incoherent perturbations. In nearly none of the cases do we find clear evidence for coherent, fast outgoing temperature perturbations at the speed of light, reminiscent of the lamppost model; but such lamppost signals may be present in some quasars for limited periods of the monitoring data. Using simulated data, we demonstrate that high-fidelity temperature perturbation maps can be recovered with high-quality monitoring spectroscopy, with limited impact from seasonal gaps in the data. On the other hand, reasonable temperature perturbation maps can be reconstructed with high-cadence photometric light curves from the Vera C Rubin Observatory Legacy Survey of Space and Time. Our findings, together with NK22, suggest that internal disc processes are the main driver for temperature fluctuations in AGN accretion discs over days to months time-scales.

    more » « less