We present a generalization of Nesterov's accelerated gradient descent algorithm. Our algorithm (AGNES) provably achieves acceleration for smooth convex and strongly convex minimization tasks with noisy gradient estimates if the noise intensity is proportional to the magnitude of the gradient at every point. Nesterov's method converges at an accelerated rate if the constant of proportionality is below 1, while AGNES accommodates any signal-to-noise ratio. The noise model is motivated by applications in overparametrized machine learning. AGNES requires only two parameters in convex and three in strongly convex minimization tasks, improving on existing methods. We further provide clear geometric interpretations and heuristics for the choice of parameters.
more »
« less
A Robust Accelerated Optimization Algorithm for Strongly Convex Functions
This work proposes an accelerated first-order algorithm we call the Robust Momentum Method for optimizing smooth strongly convex functions. The algorithm has a single scalar parameter that can be tuned to trade off robustness to gradient noise versus worst-case convergence rate. At one extreme, the algorithm is faster than Nesterov's Fast Gradient Method by a constant factor but more fragile to noise. At the other extreme, the algorithm reduces to the Gradient Method and is very robust to noise. The algorithm design technique is inspired by methods from classical control theory and the resulting algorithm has a simple analytical form. Algorithm performance is verified on a series of numerical simulations in both noise-free and relative gradient noise cases.
more »
« less
- PAR ID:
- 10080865
- Date Published:
- Journal Name:
- American Control Conference
- Page Range / eLocation ID:
- 1376 - 1381
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Adversarial training (AT) is a widely recognized defense mechanism to gain the robustness of deep neural networks against adversarial attacks. It is built on min-max optimization (MMO), where the minimizer (i.e., defender) seeks a robust model to minimize the worst-case training loss in the presence of adversarial examples crafted by the maximizer (i.e., attacker). However, the conventional MMO method makes AT hard to scale. Thus, FAST-AT (Wong et al., 2020) and other recent algorithms attempt to simplify MMO by replacing its maximization step with the single gradient sign-based attack generation step. Although easy to implement, FAST-AT lacks theoretical guarantees, and its empirical performance is unsatisfactory due to the issue of robust catastrophic overfitting when training with strong adversaries. In this paper, we advance FAST-AT from the fresh perspective of bi-level optimization (BLO). We first show that the commonly used FAST-AT is equivalent to using a stochastic gradient algorithm to solve a linearized BLO problem involving a sign operation. However, the discrete nature of the sign operation makes it difficult to understand the algorithm performance. Inspired by BLO, we design and analyze a new set of robust training algorithms termed Fast Bilevel AT (FAST-BAT), which effectively defends sign-based projected gradient descent (PGD) attacks without using any gradient sign method or explicit robust regularization. In practice, we show our method yields substantial robustness improvements over baselines across multiple models and datasetsmore » « less
-
In this paper, we study the robustness property of policy optimization (particularly Gauss–Newton gradient descent algorithm which is equivalent to the policy iteration in reinforcement learning) subject to noise at each iteration. By invoking the concept of input-to-state stability and utilizing Lyapunov’s direct method, it is shown that, if the noise is sufficiently small, the policy iteration algorithm converges to a small neighborhood of the optimal solution even in the presence of noise at each iteration. Explicit expressions of the upperbound on the noise and the size of the neighborhood to which the policies ultimately converge are provided. Based on Willems’ fundamental lemma, a learning-based policy iteration algorithm is proposed. The persistent excitation condition can be readily guaranteed by checking the rank of the Hankel matrix related to an exploration signal. The robustness of the learning-based policy iteration to measurement noise and unknown system disturbances is theoretically demonstrated by the input-to-state stability of the policy iteration. Several numerical simulations are conducted to demonstrate the efficacy of the proposed method.more » « less
-
Robust Markov decision processes (RMDPs) provide a promising framework for computing reliable policies in the face of model errors. Many successful reinforcement learning algorithms build on variations of policy-gradient methods, but adapting these methods to RMDPs has been challenging. As a result, the applicability of RMDPs to large, practical domains remains limited. This paper proposes a new Double-Loop Robust Policy Gradient (DRPG), the first generic policy gradient method for RMDPs. In contrast with prior robust policy gradient algorithms, DRPG monotonically reduces approximation errors to guarantee convergence to a globally optimal policy in tabular RMDPs. We introduce a novel parametric transition kernel and solve the inner loop robust policy via a gradient-based method. Finally, our numerical results demonstrate the utility of our new algorithm and confirm its global convergence properties.more » « less
-
Abstract The purpose of the current study was to introduce a Deep learning‐based Accelerated and Noise‐Suppressed Estimation (DANSE) method for reconstructing quantitative maps of biological tissue cellular‐specific,R2t*, and hemodynamic‐specific,R2’, metrics of quantitative gradient‐recalled echo (qGRE) MRI. The DANSE method adapts a supervised learning paradigm to train a convolutional neural network for robust estimation ofR2t*andR2’maps with significantly reduced sensitivity to noise and the adverse effects of macroscopic (B0) magnetic field inhomogeneities directly from the gradient‐recalled echo (GRE) magnitude images. TheR2t*andR2’maps for training were generated by means of a voxel‐by‐voxel fitting of a previously developed biophysical quantitative qGRE model accounting for tissue, hemodynamic, and B0‐inhomogeneities contributions to multigradient‐echo GRE signal using a nonlinear least squares (NLLS) algorithm. We show that the DANSE model efficiently estimates the aforementioned qGRE maps and preserves all the features of the NLLS approach with significant improvements including noise suppression and computation speed (from many hours to seconds). The noise‐suppression feature of DANSE is especially prominent for data with low signal‐to‐noise ratio (SNR ~ 50–100), where DANSE‐generatedR2t*andR2’maps had up to three times smaller errors than that of the NLLS method. The DANSE method enables fast reconstruction of qGRE maps with significantly reduced sensitivity to noise and magnetic field inhomogeneities. The DANSE method does not require any information about field inhomogeneities during application. It exploits spatial and gradient echo time‐dependent patterns in the GRE data and previously gained knowledge from the biophysical model, thus producing high quality qGRE maps, even in environments with high noise levels. These features along with fast computational speed can lead to broad qGRE clinical and research applications.more » « less
An official website of the United States government

