skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


This content will become publicly available on December 1, 2026

Title: A dynamic fractional generalized deterministic annealing for rapid convergence in deep learning optimization
Optimization is central to classical and modern machine learning. This paper introduces Dynamic Fractional Generalized Deterministic Annealing (DF-GDA), a physics-inspired algorithm that boosts stability and speeds convergence across a wide range of models, especially deep networks. Unlike traditional methods such as Stochastic Gradient Descent, which may converge slowly or become trapped in local minima, DF-GDA employs an adaptive, temperature-controlled schedule that balances global exploration with precise refinement. Its dynamic fractional-parameter update selectively optimizes model components, improving computational efficiency. The method excels on high-dimensional tasks, including image classification, and also strengthens simpler classical models by reducing local-minimum risk and increasing robustness to noisy data. Extensive experiments on sixteen large, interdisciplinary datasets, including image classification, natural language processing, healthcare, and biology, show that DF-GDA consistently outperforms both state-of-the-art and traditional optimizers in convergence speed and accuracy, offering a powerful alternative for critical large-scale, complex problems across diverse scientific and industrial settings today. Check for updates  more » « less
Award ID(s):
2000487
PAR ID:
10653839
Author(s) / Creator(s):
; ;
Corporate Creator(s):
Editor(s):
Guo, Ronghua
Publisher / Repository:
Springer_Nature
Date Published:
Journal Name:
npj Artificial Intelligence
Edition / Version:
1
Volume:
1
Issue:
1
ISSN:
3005-1460
Page Range / eLocation ID:
1-19
Subject(s) / Keyword(s):
Machine learning, image classification
Format(s):
Medium: X Size: 2.5MB Other: pdfa
Size(s):
2.5MB
Sponsoring Org:
National Science Foundation
More Like this
  1. null (Ed.)
    A broad class of unsupervised deep learning methods such as Generative Adversarial Networks (GANs) involve training of overparameterized models where the number of parameters of the model exceeds a certain threshold. Indeed, most successful GANs used in practice are trained using overparameterized generator and discriminator networks, both in terms of depth and width. A large body of work in supervised learning have shown the importance of model overparameterization in the convergence of the gradient descent (GD) to globally optimal solutions. In contrast, the unsupervised setting and GANs in particular involve non-convex concave mini-max optimization problems that are often trained using Gradient Descent/Ascent (GDA). The role and benefits of model overparameterization in the convergence of GDA to a global saddle point in non-convex concave problems is far less understood. In this work, we present a comprehensive analysis of the importance of model overparameterization in GANs both theoretically and empirically. We theoretically show that in an overparameterized GAN model with a 1-layer neural network generator and a linear discriminator, GDA converges to a global saddle point of the underlying non-convex concave min-max problem. To the best of our knowledge, this is the first result for global convergence of GDA in such settings. Our theory is based on a more general result that holds for a broader class of nonlinear generators and discriminators that obey certain assumptions (including deeper generators and random feature discriminators). Our theory utilizes and builds upon a novel connection with the convergence analysis of linear timevarying dynamical systems which may have broader implications for understanding the convergence behavior of GDA for non-convex concave problems involving overparameterized models. We also empirically study the role of model overparameterization in GANs using several large-scale experiments on CIFAR-10 and Celeb-A datasets. Our experiments show that overparameterization improves the quality of generated samples across various model architectures and datasets. Remarkably, we observe that overparameterization leads to faster and more stable convergence behavior of GDA across the board. 
    more » « less
  2. Smooth minimax games often proceed by simultaneous or alternating gradient updates. Although algorithms with alternating updates are commonly used in practice, the majority of existing theoretical analyses focus on simultaneous algorithms for convenience of analysis. In this paper, we study alternating gradient descent-ascent (Alt-GDA) in minimax games and show that Alt-GDA is superior to its simultaneous counterpart (Sim-GDA) in many settings. We prove that Alt-GDA achieves a near-optimal local convergence rate for strongly convex-strongly concave (SCSC) problems while Sim-GDA converges at a much slower rate. To our knowledge, this is the first result of any setting showing that Alt-GDA converges faster than Sim-GDA by more than a constant. We further adapt the theory of integral quadratic constraints (IQC) and show that Alt-GDA attains the same rate globally for a subclass of SCSC minimax problems. Empirically, we demonstrate that alternating updates speed up GAN training significantly and the use of optimism only helps for simultaneous algorithms. 
    more » « less
  3. In recent years, federated minimax optimization has attracted growing interest due to its extensive applications in various machine learning tasks. While Smoothed Alternative Gradient Descent Ascent (Smoothed-AGDA) has proved successful in centralized nonconvex minimax optimization, how and whether smoothing techniques could be helpful in a federated setting remains unexplored. In this paper, we propose a new algorithm termed Federated Stochastic Smoothed Gradient Descent Ascent (FESS-GDA), which utilizes the smoothing technique for federated minimax optimization. We prove that FESS-GDA can be uniformly applied to solve several classes of federated minimax problems and prove new or better analytical convergence results for these settings. We showcase the practical efficiency of FESS-GDA in practical federated learning tasks of training generative adversarial networks (GANs) and fair classification. 
    more » « less
  4. In [Antil et al. Inverse Probl. 35 (2019) 084003.] we introduced a new notion of optimal control and source identification (inverse) problems where we allow the control/source to be outside the domain where the fractional elliptic PDE is fulfilled. The current work extends this previous work to the parabolic case. Several new mathematical tools have been developed to handle the parabolic problem. We tackle the Dirichlet, Neumann and Robin cases. The need for these novel optimal control concepts stems from the fact that the classical PDE models only allow placing the control/source either on the boundary or in the interior where the PDE is satisfied. However, the nonlocal behavior of the fractional operator now allows placing the control/source in the exterior. We introduce the notions of weak and very-weak solutions to the fractional parabolic Dirichlet problem. We present an approach on how to approximate the fractional parabolic Dirichlet solutions by the fractional parabolic Robin solutions (with convergence rates). A complete analysis for the Dirichlet and Robin optimal control problems has been discussed. The numerical examples confirm our theoretical findings and further illustrate the potential benefits of nonlocal models over the local ones. 
    more » « less
  5. Abstract Quantum federated learning (QFL) can facilitate collaborative learning across multiple clients using quantum machine learning (QML) models, while preserving data privacy. Although recent advances in QFL span different tasks like classification while leveraging several data types, no prior work has focused on developing a QFL framework that utilizes temporal data to approximate functions useful to analyze the performance of distributed quantum sensing networks. In this paper, a novel QFL framework that is the first to integrate quantum long short-term memory (QLSTM) models with temporal data is proposed. The proposedfederated QLSTM (FedQLSTM)framework is exploited for performing the task of function approximation. In this regard, three key use cases are presented: Bessel function approximation, sinusoidal delayed quantum feedback control function approximation, and Struve function approximation. Simulation results confirm that, for all considered use cases, the proposed FedQLSTM framework achieves a faster convergence rate under one local training epoch, minimizing the overall computations, and saving 25–33% of the number of communication rounds needed until convergence compared to an FL framework with classical LSTM models. 
    more » « less