skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


This content will become publicly available on December 1, 2025

Title: Towards provably efficient quantum algorithms for large-scale machine-learning models
Abstract Large machine learning models are revolutionary technologies of artificial intelligence whose bottlenecks include huge computational expenses, power, and time used both in the pre-training and fine-tuning process. In this work, we show that fault-tolerant quantum computing could possibly provide provably efficient resolutions for generic (stochastic) gradient descent algorithms, scaling as$${{{{{{{\mathcal{O}}}}}}}}({T}^{2}\times {{{{{{{\rm{polylog}}}}}}}}(n))$$ O ( T 2 × polylog ( n ) ) , wherenis the size of the models andTis the number of iterations in the training, as long as the models are both sufficiently dissipative and sparse, with small learning rates. Based on earlier efficient quantum algorithms for dissipative differential equations, we find and prove that similar algorithms work for (stochastic) gradient descent, the primary algorithm for machine learning. In practice, we benchmark instances of large machine learning models from 7 million to 103 million parameters. We find that, in the context of sparse training, a quantum enhancement is possible at the early stage of learning after model pruning, motivating a sparse parameter download and re-upload scheme. Our work shows solidly that fault-tolerant quantum algorithms could potentially contribute to most state-of-the-art, large-scale machine-learning problems.  more » « less
Award ID(s):
2016245 1936118 2137642 1818914
PAR ID:
10506504
Author(s) / Creator(s):
; ; ; ; ; ; ;
Publisher / Repository:
Nature Portfolio
Date Published:
Journal Name:
Nature Communications
Volume:
15
Issue:
1
ISSN:
2041-1723
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract In this paper, we study multistage stochastic mixed-integer nonlinear programs (MS-MINLP). This general class of problems encompasses, as important special cases, multistage stochastic convex optimization withnon-Lipschitzianvalue functions and multistage stochastic mixed-integer linear optimization. We develop stochastic dual dynamic programming (SDDP) type algorithms with nested decomposition, deterministic sampling, and stochastic sampling. The key ingredient is a new type of cuts based on generalized conjugacy. Several interesting classes of MS-MINLP are identified, where the new algorithms are guaranteed to obtain the global optimum without the assumption of complete recourse. This significantly generalizes the classic SDDP algorithms. We also characterize the iteration complexity of the proposed algorithms. In particular, for a$$(T+1)$$ ( T + 1 ) -stage stochastic MINLP satisfyingL-exact Lipschitz regularization withd-dimensional state spaces, to obtain an$$\varepsilon $$ ε -optimal root node solution, we prove that the number of iterations of the proposed deterministic sampling algorithm is upper bounded by$${\mathcal {O}}((\frac{2LT}{\varepsilon })^d)$$ O ( ( 2 L T ε ) d ) , and is lower bounded by$${\mathcal {O}}((\frac{LT}{4\varepsilon })^d)$$ O ( ( LT 4 ε ) d ) for the general case or by$${\mathcal {O}}((\frac{LT}{8\varepsilon })^{d/2-1})$$ O ( ( LT 8 ε ) d / 2 - 1 ) for the convex case. This shows that the obtained complexity bounds are rather sharp. It also reveals that the iteration complexity dependspolynomiallyon the number of stages. We further show that the iteration complexity dependslinearlyonT, if all the state spaces are finite sets, or if we seek a$$(T\varepsilon )$$ ( T ε ) -optimal solution when the state spaces are infinite sets, i.e. allowing the optimality gap to scale withT. To the best of our knowledge, this is the first work that reports global optimization algorithms as well as iteration complexity results for solving such a large class of multistage stochastic programs. The iteration complexity study resolves a conjecture by the late Prof. Shabbir Ahmed in the general setting of multistage stochastic mixed-integer optimization. 
    more » « less
  2. Abstract We continue the program of proving circuit lower bounds via circuit satisfiability algorithms. So far, this program has yielded several concrete results, proving that functions in$$\mathsf {Quasi}\text {-}\mathsf {NP} = \mathsf {NTIME}[n^{(\log n)^{O(1)}}]$$ Quasi - NP = NTIME [ n ( log n ) O ( 1 ) ] and other complexity classes do not have small circuits (in the worst case and/or on average) from various circuit classes$$\mathcal { C}$$ C , by showing that$$\mathcal { C}$$ C admits non-trivial satisfiability and/or#SAT algorithms which beat exhaustive search by a minor amount. In this paper, we present a new strong lower bound consequence of having a non-trivial#SAT algorithm for a circuit class$${\mathcal C}$$ C . Say that a symmetric Boolean functionf(x1,…,xn) issparseif it outputs 1 onO(1) values of$${\sum }_{i} x_{i}$$ i x i . We show that for every sparsef, and for all “typical”$$\mathcal { C}$$ C , faster#SAT algorithms for$$\mathcal { C}$$ C circuits imply lower bounds against the circuit class$$f \circ \mathcal { C}$$ f C , which may bestrongerthan$$\mathcal { C}$$ C itself. In particular:#SAT algorithms fornk-size$$\mathcal { C}$$ C -circuits running in 2n/nktime (for allk) implyNEXPdoes not have$$(f \circ \mathcal { C})$$ ( f C ) -circuits of polynomial size.#SAT algorithms for$$2^{n^{{\varepsilon }}}$$ 2 n ε -size$$\mathcal { C}$$ C -circuits running in$$2^{n-n^{{\varepsilon }}}$$ 2 n n ε time (for someε> 0) implyQuasi-NPdoes not have$$(f \circ \mathcal { C})$$ ( f C ) -circuits of polynomial size. Applying#SAT algorithms from the literature, one immediate corollary of our results is thatQuasi-NPdoes not haveEMAJ∘ACC0∘THRcircuits of polynomial size, whereEMAJis the “exact majority” function, improving previous lower bounds againstACC0[Williams JACM’14] andACC0∘THR[Williams STOC’14], [Murray-Williams STOC’18]. This is the first nontrivial lower bound against such a circuit class. 
    more » « less
  3. Abstract We prove that$${{\,\textrm{poly}\,}}(t) \cdot n^{1/D}$$ poly ( t ) · n 1 / D -depth local random quantum circuits with two qudit nearest-neighbor gates on aD-dimensional lattice withnqudits are approximatet-designs in various measures. These include the “monomial” measure, meaning that the monomials of a random circuit from this family have expectation close to the value that would result from the Haar measure. Previously, the best bound was$${{\,\textrm{poly}\,}}(t)\cdot n$$ poly ( t ) · n due to Brandão–Harrow–Horodecki (Commun Math Phys 346(2):397–434, 2016) for$$D=1$$ D = 1 . We also improve the “scrambling” and “decoupling” bounds for spatially local random circuits due to Brown and Fawzi (Scrambling speed of random quantum circuits, 2012). One consequence of our result is that assuming the polynomial hierarchy ($${{\,\mathrm{\textsf{PH}}\,}}$$ PH ) is infinite and that certain counting problems are$$\#{\textsf{P}}$$ # P -hard “on average”, sampling within total variation distance from these circuits is hard for classical computers. Previously, exact sampling from the outputs of even constant-depth quantum circuits was known to be hard for classical computers under these assumptions. However the standard strategy for extending this hardness result to approximate sampling requires the quantum circuits to have a property called “anti-concentration”, meaning roughly that the output has near-maximal entropy. Unitary 2-designs have the desired anti-concentration property. Our result improves the required depth for this level of anti-concentration from linear depth to a sub-linear value, depending on the geometry of the interactions. This is relevant to a recent experiment by the Google Quantum AI group to perform such a sampling task with 53 qubits on a two-dimensional lattice (Arute in Nature 574(7779):505–510, 2019; Boixo et al. in Nate Phys 14(6):595–600, 2018) (and related experiments by USTC), and confirms their conjecture that$$O(\sqrt{n})$$ O ( n ) depth suffices for anti-concentration. The proof is based on a previous construction oft-designs by Brandão et al. (2016), an analysis of how approximate designs behave under composition, and an extension of the quasi-orthogonality of permutation operators developed by Brandão et al. (2016). Different versions of the approximate design condition correspond to different norms, and part of our contribution is to introduce the norm corresponding to anti-concentration and to establish equivalence between these various norms for low-depth circuits. For random circuits with long-range gates, we use different methods to show that anti-concentration happens at circuit size$$O(n\ln ^2 n)$$ O ( n ln 2 n ) corresponding to depth$$O(\ln ^3 n)$$ O ( ln 3 n ) . We also show a lower bound of$$\Omega (n \ln n)$$ Ω ( n ln n ) for the size of such circuit in this case. We also prove that anti-concentration is possible in depth$$O(\ln n \ln \ln n)$$ O ( ln n ln ln n ) (size$$O(n \ln n \ln \ln n)$$ O ( n ln n ln ln n ) ) using a different model. 
    more » « less
  4. Abstract Stochastic networks for the clock were identified by ensemble methods using genetic algorithms that captured the amplitude and period variation in single cell oscillators ofNeurosporacrassa. The genetic algorithms were at least an order of magnitude faster than ensemble methods using parallel tempering and appeared to provide a globally optimum solution from a random start in the initial guess of model parameters (i.e., rate constants and initial counts of molecules in a cell). The resulting goodness of fit$${x}^{2}$$ x 2 was roughly halved versus solutions produced by ensemble methods using parallel tempering, and the resulting$${x}^{2}$$ x 2 per data point was only$${\chi }^{2}/n$$ χ 2 / n = 2,708.05/953 = 2.84. The fitted model ensemble was robust to variation in proxies for “cell size”. The fitted neutral models without cellular communication between single cells isolated by microfluidics provided evidence for onlyoneStochastic Resonance at one common level of stochastic intracellular noise across days from 6 to 36 h of light/dark (L/D) or in a D/D experiment. When the light-driven phase synchronization was strong as measured by the Kuramoto (K), there was degradation in the single cell oscillations away from the stochastic resonance. The rate constants for the stochastic clock network are consistent with those determined on a macroscopic scale of 107cells. 
    more » « less
  5. Abstract The free multiplicative Brownian motion$$b_{t}$$ b t is the large-Nlimit of the Brownian motion on$$\mathsf {GL}(N;\mathbb {C}),$$ GL ( N ; C ) , in the sense of$$*$$ -distributions. The natural candidate for the large-Nlimit of the empirical distribution of eigenvalues is thus the Brown measure of$$b_{t}$$ b t . In previous work, the second and third authors showed that this Brown measure is supported in the closure of a region$$\Sigma _{t}$$ Σ t that appeared in the work of Biane. In the present paper, we compute the Brown measure completely. It has a continuous density$$W_{t}$$ W t on$$\overline{\Sigma }_{t},$$ Σ ¯ t , which is strictly positive and real analytic on$$\Sigma _{t}$$ Σ t . This density has a simple form in polar coordinates:$$\begin{aligned} W_{t}(r,\theta )=\frac{1}{r^{2}}w_{t}(\theta ), \end{aligned}$$ W t ( r , θ ) = 1 r 2 w t ( θ ) , where$$w_{t}$$ w t is an analytic function determined by the geometry of the region$$\Sigma _{t}$$ Σ t . We show also that the spectral measure of free unitary Brownian motion$$u_{t}$$ u t is a “shadow” of the Brown measure of$$b_{t}$$ b t , precisely mirroring the relationship between the circular and semicircular laws. We develop several new methods, based on stochastic differential equations and PDE, to prove these results. 
    more » « less