Neural networks (NNs) have been extremely successful across many tasks in machine learning. Quantization of NN weights has become an important topic due to its impact on their energy efficiency, inference time and deployment on hardware. Although post-training quantization is well-studied, training optimal quantized NNs involves combinatorial non-convex optimization problems which appear intractable. In this work, we introduce a convex optimization strategy to train quantized NNs with polynomial activations. Our method leverages hidden convexity in twolayer neural networks from the recent literature, semidefinite lifting, and Grothendieck’s identity. Surprisingly, we show that certain quantized NN problems can be solved to global optimality provably in polynomial time in all relevant parameters via tight semidefinite relaxations. We present numerical examples to illustrate the effectiveness of our method.
more »
« less
This content will become publicly available on December 1, 2026
Quantum sequel of neural network training
Abstract Training of neural networks (NNs) has emerged as a major consumer of both computational and energy resources. Quantum computers were coined as a root to facilitate training, but no experimental evidence has been presented so far. Here we demonstrate that quantum annealing platforms, such as D-Wave, can enable fast and efficient training of classical NNs, which are then deployable on conventional hardware. From a physics perspective, NN training can be viewed as a dynamical phase transition: the system evolves from an initial spin glass state to a highly ordered, trained state. This process involves eliminating numerous undesired minima in its energy landscape. The advantage of annealing devices is their ability to rapidly find multiple deep states. We found that this quantum training achieves superior performance scaling compared to classical backpropagation methods, with a clearly higher scaling exponent (1.01 vs. 0.78). It may be further increased up to a factor of 2 with a fully coherent quantum platform using a variant of the Grover algorithm. Furthermore, we argue that even a modestly sized annealer can be beneficial to train a deep NN by being applied sequentially to a few layers at a time.
more »
« less
- Award ID(s):
- 2338819
- PAR ID:
- 10651320
- Publisher / Repository:
- Nature
- Date Published:
- Journal Name:
- Communications Physics
- Volume:
- 8
- Issue:
- 1
- ISSN:
- 2399-3650
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Neural networks (NNs) have been extremely successful across many tasks in machine learning. Quantization of NN weights has become an important topic due to its impact on their energy efficiency, inference time and deployment on hardware. Although post-training quantization is well-studied, training optimal quantized NNs involves combinatorial non-convex optimization problems which appear intractable. In this work, we introduce a convex optimization strategy to train quantized NNs with polynomial activations. Our method leverages hidden convexity in two-layer neural networks from the recent literature, semidefinite lifting, and Grothendieck’s identity. Surprisingly, we show that certain quantized NN problems can be solved to global optimality provably in polynomial time in all relevant parameters via tight semidefinite relaxations. We present numerical examples to illustrate the effectiveness of our method.more » « less
-
Estimating Full Longwave and Shortwave Radiative Transfer with Neural Networks of Varying ComplexityAbstract Radiative transfer (RT) is a crucial but computationally expensive process in numerical weather/climate prediction. We develop neural networks (NN) to emulate a common RT parameterization called the Rapid Radiative Transfer Model (RRTM), with the goal of creating a faster parameterization for the Global Forecast System (GFS) v16. In previous work we emulated a highly simplified version of the shortwave RRTM only—excluding many predictor variables, driven by Rapid Refresh forecasts interpolated to a consistent height grid, using only 30 sites in the Northern Hemisphere. In this work we emulate the full shortwave and longwave RRTM—with all predictor variables, driven by GFSv16 forecasts on the native pressure–sigma grid, using data from around the globe. We experiment with NNs of widely varying complexity, including the U-net++ and U-net3+ architectures and deeply supervised training, designed to ensure realistic and accurate structure in gridded predictions. We evaluate the optimal shortwave NN and optimal longwave NN in great detail—as a function of geographic location, cloud regime, and other weather types. Both NNs produce extremely reliable heating rates and fluxes. The shortwave NN has an overall RMSE/MAE/bias of 0.14/0.08/−0.002 K day−1for heating rate and 6.3/4.3/−0.1 W m−2for net flux. Analogous numbers for the longwave NN are 0.22/0.12/−0.0006 K day−1and 1.07/0.76/+0.01 W m−2. Both NNs perform well in nearly all situations, and the shortwave (longwave) NN is 7510 (90) times faster than the RRTM. Both will soon be tested online in the GFSv16. Significance StatementRadiative transfer is an important process for weather and climate. Accurate radiative transfer models exist, such as the RRTM, but these models are computationally slow. We develop neural networks (NNs), a type of machine learning model that is often computationally fast after training, to mimic the RRTM. We wish to accelerate the RRTM by orders of magnitude without sacrificing much accuracy. We drive both the NNs and RRTM with data from the GFSv16, an operational weather model, using locations around the globe during all seasons. We show that the NNs are highly accurate and much faster than the RRTM, which suggests that the NNs could be used to solve radiative transfer inside the GFSv16.more » « less
-
Abstract Quantum annealing is a powerful alternative model of quantum computing, which can succeed in the presence of environmental noise even without error correction. However, despite great effort, no conclusive demonstration of a quantum speedup (relative to state of the art classical algorithms) has been shown for these systems, and rigorous theoretical proofs of a quantum advantage (such as the adiabatic formulation of Grover’s search problem) generally rely on exponential precision in at least some aspects of the system, an unphysical resource guaranteed to be scrambled by experimental uncertainties and random noise. In this work, we propose a new variant of quantum annealing, called RFQA, which can maintain a scalable quantum speedup in the face of noise and modest control precision. Specifically, we consider a modification of flux qubit-based quantum annealing which includes low-frequency oscillations in the directions of the transverse field terms as the system evolves. We show that this method produces a quantum speedup for finding ground states in the Grover problem and quantum random energy model, and thus should be widely applicable to other hard optimization problems which can be formulated as quantum spin glasses. Further, we explore three realistic noise channels and show that the speedup from RFQA is resilient to 1/f-like local potential fluctuations and local heating from interaction with a sufficiently low temperature bath. Another noise channel, bath-assisted quantum cooling transitions, actually accelerates the algorithm and may outweigh the negative effects of the others. We also detail how RFQA may be implemented experimentally with current technology.more » « less
-
Abstract There are different strategies for training neural networks (NNs) as subgrid‐scale parameterizations. Here, we use a 1D model of the quasi‐biennial oscillation (QBO) and gravity wave (GW) parameterizations as testbeds. A 12‐layer convolutional NN that predicts GW forcings for given wind profiles, when trained offline in abig‐dataregime (100‐year), produces realistic QBOs once coupled to the 1D model. In contrast, offline training of this NN in asmall‐dataregime (18‐month) yields unrealistic QBOs. However, online re‐training of just two layers of this NN using ensemble Kalman inversion and only time‐averaged QBO statistics leads to parameterizations that yield realistic QBOs. Fourier analysis of these three NNs' kernels suggests why/how re‐training works and reveals that these NNs primarily learn low‐pass, high‐pass, and a combination of band‐pass filters, potentially related to the local and non‐local dynamics in GW propagation and dissipation. These findings/strategies generally apply to data‐driven parameterizations of other climate processes.more » « less
An official website of the United States government
