Variational quantum algorithms rely on the optimization of parameterized quantum circuits in noisy settings. The commonly used back-propagation procedure in classical machine learning is not directly applicable in this setting due to the collapse of quantum states after measurements. Thus, gradient estimations constitute a significant overhead in a gradient-based optimization of such quantum circuits. This paper introduces a random coordinate descent algorithm as a practical and easy-to-implement alternative to the full gradient descent algorithm. This algorithm only requires one partial derivative at each iteration. Motivated by the behavior of measurement noise in the practical optimization of parameterized quantum circuits, this paper presents an optimization problem setting that is amenable to analysis. Under this setting, the random coordinate descent algorithm exhibits the same level of stochastic stability as the full gradient approach, making it as resilient to noise. The complexity of the random coordinate descent method is generally no worse than that of the gradient descent and can be much better for various quantum optimization problems with anisotropic Lipschitz constants. Theoretical analysis and extensive numerical experiments validate our findings.
- Award ID(s):
- 1723048
- PAR ID:
- 10293520
- Date Published:
- Journal Name:
- Computational optimization and applications
- Volume:
- 79
- Issue:
- 2
- ISSN:
- 1573-2894
- Page Range / eLocation ID:
- 339-368
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
Published by the American Physical Society 2024 -
null (Ed.)The successes of deep learning, variational inference, and many other fields have been aided by specialized implementations of reverse-mode automatic differentiation (AD) to compute gradients of mega-dimensional objectives. The AD techniques underlying these tools were designed to compute exact gradients to numerical precision, but modern machine learning models are almost always trained with stochastic gradient descent. Why spend computation and memory on exact (minibatch) gradients only to use them for stochastic optimization? We develop a general framework and approach for randomized automatic differentiation (RAD), which can allow unbiased gradient estimates to be computed with reduced memory in return for variance. We examine limitations of the general approach, and argue that we must leverage problem specific structure to realize benefits. We develop RAD techniques for a variety of simple neural network architectures, and show that for a fixed memory budget, RAD converges in fewer iterations than using a small batch size for feedforward networks, and in a similar number for recurrent networks. We also show that RAD can be applied to scientific computing, and use it to develop a low-memory stochastic gradient method for optimizing the control parameters of a linear reaction-diffusion PDE representing a fission reactor.more » « less
-
null (Ed.)In this paper, we analyze several methods for approximating gradients of noisy functions using only function values. These methods include finite differences, linear interpolation, Gaussian smoothing, and smoothing on a sphere. The methods differ in the number of functions sampled, the choice of the sample points, and the way in which the gradient approximations are derived. For each method, we derive bounds on the number of samples and the sampling radius which guarantee favorable convergence properties for a line search or fixed step size descent method. To this end, we use the results in Berahas et al. (Global convergence rate analysis of a generic line search algorithm with noise, arXiv:1910.04055, 2019) and show how each method can satisfy the sufficient conditions, possibly only with some sufficiently large probability at each iteration, as happens to be the case with Gaussian smoothing and smoothing on a sphere. Finally, we present numerical results evaluating the quality of the gradient approximations as well as their performance in conjunction with a line search derivative-free optimization algorithm.more » « less
-
Abstract We have developed a differentiable programming framework for truncated hierarchical B-splines (THB-splines), which can be used for several applications in geometry modeling, such as surface fitting and deformable image registration, and can be easily integrated with geometric deep learning frameworks. Differentiable programming is a novel paradigm that enables an algorithm to be differentiated via automatic differentiation, i.e., using automatic differentiation to compute the derivatives of its outputs with respect to its inputs or parameters. Differentiable programming has been used extensively in machine learning for obtaining gradients required in optimization algorithms such as stochastic gradient descent (SGD). While incorporating differentiable programming with traditional functions is straightforward, it is challenging when the functions are complex, such as splines. In this work, we extend the differentiable programming paradigm to THB-splines. THB-splines offer an efficient approach for complex surface fitting by utilizing a hierarchical tensor structure of B-splines, enabling local adaptive refinement. However, this approach brings challenges, such as a larger computational overhead and the non-trivial implementation of automatic differentiation and parallel evaluation algorithms. We use custom kernel functions for GPU acceleration in forward and backward evaluation that are necessary for differentiable programming of THB-splines. Our approach not only improves computational efficiency but also significantly enhances the speed of surface evaluation compared to previous methods. Our differentiable THB-splines framework facilitates faster and more accurate surface modeling with local refinement, with several applications in CAD and isogeometric analysis.
-
Supervised matrix factorization (SMF) is a classical machine learning method that seeks low-dimensional feature extraction and classification tasks at the same time. Training an SMF model involves solving a non-convex and factor-wise constrained optimization problem with at least three blocks of parameters. Due to the high non-convexity and constraints, theoretical understanding of the optimization landscape of SMF has been limited. In this paper, we provide an extensive local landscape analysis for SMF and derive several theoretical and practical applications. Analyzing diagonal blocks of the Hessian naturally leads to a block coordinate descent (BCD) algorithm with adaptive step sizes. We provide global convergence and iteration complexity guarantees for this algorithm. Full Hessian analysis gives minimum L2-regularization to guarantee local strong convexity and robustness of parameters. We establish a local estimation guarantee under a statistical SMF model. We also propose a novel GPU-friendly neural implementation of the BCD algorithm and validate our theoretical findings through numerical experiments. Our work contributes to a deeper understanding of SMF optimization, offering insights into the optimization landscape and providing practical solutions to enhance its performance.more » « less