Abstract We definelazinessto describe a large suppression of variational parameter updates for neural networks, classical or quantum. In the quantum case, the suppression is exponential in the number of qubits for randomized variational quantum circuits. We discuss the difference between laziness andbarren plateauin quantum machine learning created by quantum physicists in McCleanet al(2018Nat. Commun.91–6) for the flatness of the loss function landscape during gradient descent. We address a novel theoretical understanding of those two phenomena in light of the theory of neural tangent kernels. For noiseless quantum circuits, without the measurement noise, the loss function landscape is complicated in the overparametrized regime with a large number of trainable variational angles. Instead, around a random starting point in optimization, there are large numbers of local minima that are good enough and could minimize the mean square loss function, where we still have quantum laziness, but we do not have barren plateaus. However, the complicated landscape is not visible within a limited number of iterations, and low precision in quantum control and quantum sensing. Moreover, we look at the effect of noises during optimization by assuming intuitive noise models, and show that variational quantum algorithms are noise-resilient in the overparametrization regime. Our work precisely reformulates the quantum barren plateau statement towards a precision statement and justifies the statement in certain noise models, injects new hope toward near-term variational quantum algorithms, and provides theoretical connections toward classical machine learning. Our paper provides conceptual perspectives about quantum barren plateaus, together with discussions about the gradient descent dynamics in Liuet al(2023Phys. Rev. Lett.130150601).
more »
« less
Equivalence of cost concentration and gradient vanishing for quantum circuits: an elementary proof in the Riemannian formulation
Abstract The optimization of quantum circuits can be hampered by a decay of average gradient amplitudes with increasing system size. When the decay is exponential, this is called the barren plateau problem. Considering explicit circuit parametrizations (in terms of rotation angles), it has been shown in Arrasmithet al(2022Quantum Sci. Technol.7045015) that barren plateaus are equivalent to an exponential decay of the variance of cost-function differences. We show that the issue is particularly simple in the (parametrization-free) Riemannian formulation of such optimization problems and obtain a tighter bound for the cost-function variance. An elementary derivation shows that the single-gate variance of the cost function isstrictly equalto half the variance of the Riemannian single-gate gradient, where we sample variable gates according to the uniform Haar measure. The total variances of the cost function and its gradient are then both bounded from above by the sum of single-gate variances and, conversely, bound single-gate variances from above. So, decays of gradients and cost-function variations go hand in hand, and barren plateau problems cannot be resolved by avoiding gradient-based in favor of gradient-free optimization methods.
more »
« less
- Award ID(s):
- 2120757
- PAR ID:
- 10592835
- Publisher / Repository:
- Quantum Science and Technology
- Date Published:
- Journal Name:
- Quantum Science and Technology
- Volume:
- 9
- Issue:
- 4
- ISSN:
- 2058-9565
- Page Range / eLocation ID:
- 045039
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Abstract Bosonic variational quantum circuits (VQCs) are crucial for information processing in microwave cavities, trapped ions, and optical systems, widely applicable in quantum communication, sensing and error correction. The trainability of such VQCs is less understood, hindered by the lack of theoretical tools such ast-design due to the infinite dimension of the continuous-variable systems involved. We overcome this difficulty to reveal an energy-dependent barren plateau in such VQCs. The variance of the gradient decays as , exponential in the number of modesMbut polynomial in the (per-mode) circuit energyE. The exponentν = 1 for shallow circuits andν = 2 for deep circuits. We prove these results for state preparation of general Gaussian states and number states. We also provide numerical evidence demonstrating that the results extend to general state preparation tasks. As circuit energy is a controllable parameter, we provide a strategy to mitigate the barren plateau in bosonic continuous-variable VQCs.more » « less
-
Abstract Convergence analysis of accelerated first-order methods for convex optimization problems are developed from the point of view of ordinary differential equation solvers. A new dynamical system, called Nesterov accelerated gradient (NAG) flow, is derived from the connection between acceleration mechanism andA-stability of ODE solvers, and the exponential decay of a tailored Lyapunov function along with the solution trajectory is proved. Numerical discretizations of NAG flow are then considered and convergence rates are established via a discrete Lyapunov function. The proposed differential equation solver approach can not only cover existing accelerated methods, such as FISTA, Güler’s proximal algorithm and Nesterov’s accelerated gradient method, but also produce new algorithms for composite convex optimization that possess accelerated convergence rates. Both the convex and the strongly convex cases are handled in a unified way in our approach.more » « less
-
We consider stochastic zeroth-order optimization over Riemannian submanifolds embedded in Euclidean space, where the task is to solve Riemannian optimization problems with only noisy objective function evaluations. Toward this, our main contribution is to propose estimators of the Riemannian gradient and Hessian from noisy objective function evaluations, based on a Riemannian version of the Gaussian smoothing technique. The proposed estimators overcome the difficulty of nonlinearity of the manifold constraint and issues that arise in using Euclidean Gaussian smoothing techniques when the function is defined only over the manifold. We use the proposed estimators to solve Riemannian optimization problems in the following settings for the objective function: (i) stochastic and gradient-Lipschitz (in both nonconvex and geodesic convex settings), (ii) sum of gradient-Lipschitz and nonsmooth functions, and (iii) Hessian-Lipschitz. For these settings, we analyze the oracle complexity of our algorithms to obtain appropriately defined notions of ϵ-stationary point or ϵ-approximate local minimizer. Notably, our complexities are independent of the dimension of the ambient Euclidean space and depend only on the intrinsic dimension of the manifold under consideration. We demonstrate the applicability of our algorithms by simulation results and real-world applications on black-box stiffness control for robotics and black-box attacks to neural networks.more » « less
-
Understanding the computational power of noisy intermediate-scale quantum (NISQ) devices is of both fundamental and practical importance to quantum information science. Here, we address the question of whether error-uncorrected noisy quantum computers can provide computational advantage over classical computers. Specifically, we study noisy random circuit sampling in one dimension (or 1D noisy RCS) as a simple model for exploring the effects of noise on the computational power of a noisy quantum device. In particular, we simulate the real-time dynamics of 1D noisy random quantum circuits via matrix product operators (MPOs) and characterize the computational power of the 1D noisy quantum system by using a metric we call MPO entanglement entropy. The latter metric is chosen because it determines the cost of classical MPO simulation. We numerically demonstrate that for the two-qubit gate error rates we considered, there exists a characteristic system size above which adding more qubits does not bring about an exponential growth of the cost of classical MPO simulation of 1D noisy systems. Specifically, we show that above the characteristic system size, there is an optimal circuit depth, independent of the system size, where the MPO entanglement entropy is maximized. Most importantly, the maximum achievable MPO entanglement entropy is bounded by a constant that depends only on the gate error rate, not on the system size. We also provide a heuristic analysis to get the scaling of the maximum achievable MPO entanglement entropy as a function of the gate error rate. The obtained scaling suggests that although the cost of MPO simulation does not increase exponentially in the system size above a certain characteristic system size, it does increase exponentially as the gate error rate decreases, possibly making classical simulation practically not feasible even with state-of-the-art supercomputers.more » « less
An official website of the United States government

