A quantum neural network (QNN) is a parameterized mapping efficiently implementable on nearterm Noisy IntermediateScale Quantum (NISQ) computers. It can be used for supervised learning when combined with classical gradientbased optimizers. Despite the existing empirical and theoretical investigations, the convergence of QNN training is not fully understood. Inspired by the success of the neural tangent kernels (NTKs) in probing into the dynamics of classical neural networks, a recent line of works proposes to study overparameterized QNNs by examining a quantum version of tangent kernels. In this work, we study the dynamics of QNNs and show that contrary to popular belief it is qualitatively different from that of any kernel regression: due to the unitarity of quantum operations, there is a nonnegligible deviation from the tangent kernel regression derived at the random initialization. As a result of the deviation, we prove the atmost sublinear convergence for QNNs with Pauli measurements, which is beyond the explanatory power of any kernel regression dynamics. We then present the actual dynamics of QNNs in the limit of overparameterization. The new dynamics capture the change of convergence rate during training, and implies that the range of measurements is crucial to the fast QNN convergence.
more »
« less
Analyzing Convergence in Quantum Neural Networks: Deviations from Neural Tangent Kernels
A quantum neural network (QNN) is a parameterized mapping efficiently implementable on near term Noisy IntermediateScale Quantum (NISQ) computers. It can be used for supervised learn ing when combined with classical gradientbased optimizers. Despite the existing empirical and theoretical investigations, the convergence of QNN training is not fully understood. Inspired by the success of the neural tangent kernels (NTKs) in probing into the dynamics of classical neural net works, a recent line of works proposes to study overparameterized QNNs by examining a quantum version of tangent kernels. In this work, we study the dynamics of QNNs and show that contrary to popular belief it is qualitatively different from that of any kernel regression: due to the unitarity of quantum operations, there is a non negligible deviation from the tangent kernel regression derived at the random initialization. As a result of the deviation, we prove the atmost sub linear convergence for QNNs with Pauli measurements, which is beyond the explanatory power of any kernel regression dynamics. We then present the actual dynamics of QNNs in the limit of over parameterization. The new dynamics capture the change of convergence rate during training, and implies that the range of measurements is crucial to the fast QNN convergence.
more »
« less
 Award ID(s):
 1942837
 NSFPAR ID:
 10425244
 Date Published:
 Journal Name:
 Proceedings of the 40th International Conference on Machine Learning
 Format(s):
 Medium: X
 Sponsoring Org:
 National Science Foundation
More Like this


null (Ed.)Quantum Neural Networks (QNNs), or the socalled variational quantum circuits, are important quantum applications both because of their similar promises as classical neural networks and because of the feasibility of their implementation on nearterm intermediatesize noisy quantum machines (NISQ). However, the training task of QNNs is challenging and much less understood. We conduct a quantitative investigation on the landscape of loss functions of QNNs and identify a class of simple yet extremely hard QNN instances for training. Specifically, we show for typical underparameterized QNNs, there exists a dataset that induces a loss function with the number of spurious local minima depending exponentially on the number of parameters. Moreover, we show the optimality of our construction by providing an almost matching upper bound on such dependence. While local minima in classical neural networks are due to nonlinear activations, in quantum neural networks local minima appear as a result of the quantum interference phenomenon. Finally, we empirically confirm that our constructions can indeed be hard instances in practice with typical gradientbased optimizers, which demonstrates the practical value of our findings.more » « less

We propose a circuitlevel backdoor attack, QTrojan, against Quantum Neural Networks (QNNs) in this paper. QTrojan is implemented by a few quantum gates inserted into the variational quantum circuit of the victim QNN. QTrojan is much stealthier than a prior DataPoisoningbased Backdoor Attack (DPBA) since it does not embed any trigger in the inputs of the victim QNN or require access to original training datasets. Compared to a DPBA, QTrojan improves the clean data accuracy by 21% and the attack success rate by 19.9%.more » « less

While overparameterization is widely believed to be crucial for the success of optimization for the neural networks, most existing theories on overparameterization do not fully explain the reason  they either work in the Neural Tangent Kernel regime where neurons don't move much, or require an enormous number of neurons. In practice, when the data is generated using a teacher neural network, even mildly overparameterized neural networks can achieve 0 loss and recover the directions of teacher neurons. In this paper we develop a local convergence theory for mildly overparameterized twolayer neural net. We show that as long as the loss is already lower than a threshold (polynomial in relevant parameters), all student neurons in an overparameterized twolayer neural network will converge to one of teacher neurons, and the loss will go to 0. Our result holds for any number of student neurons as long as it is at least as large as the number of teacher neurons, and our convergence rate is independent of the number of student neurons. A key component of our analysis is the new characterization of local optimization landscape  we show the gradient satisfies a special case of Lojasiewicz property which is different from local strong convexity or PL conditions used in previous work.more » « less

We study how neural networks trained by gradient descent extrapolate, i.e., what they learn outside the support of the training distribution. Previous works report mixed empirical results when extrapolating with neural networks: while feedforward neural networks, a.k.a. multilayer perceptrons (MLPs), do not extrapolate well in certain simple tasks, Graph Neural Networks (GNNs) – structured networks with MLP modules – have shown some success in more complex tasks. Working towards a theoretical explanation, we identify conditions under which MLPs and GNNs extrapolate well. First, we quantify the observation that ReLU MLPs quickly converge to linear functions along any direction from the origin, which implies that ReLU MLPs do not extrapolate most nonlinear functions. But, they can provably learn a linear target function when the training distribution is sufficiently “diverse”. Second, in connection to analyzing the successes and limitations of GNNs, these results suggest a hypothesis for which we provide theoretical and empirical evidence: the success of GNNs in extrapolating algorithmic tasks to new data (e.g., larger graphs or edge weights) relies on encoding taskspecific nonlinearities in the architecture or features. Our theoretical analysis builds on a connection of overparameterized networks to the neural tangent kernel. Empirically, our theory holds across different training settings.more » « less