skip to main content


Title: Efficient 2D tensor network simulation of quantum systems
Simulation of quantum systems is challenging due to the exponential size of the state space. Tensor networks provide a systematically improvable approximation for quantum states. 2D tensor networks such as Projected Entangled Pair States (PEPS) are well-suited for key classes of physical systems and quantum circuits. However, direct contraction of PEPS networks has exponential cost, while approximate algorithms require computations with large tensors. We propose new scalable algorithms and software abstractions for PEPS-based methods, accelerating the bottleneck operation of contraction and refactorization of a tensor subnetwork. We employ randomized SVD with an implicit matrix to reduce cost and memory footprint asymptotically. Further, we develop a distributed-memory PEPS library and study accuracy and efficiency of alternative algorithms for PEPS contraction and evolution on the Stampede2 supercomputer. We also simulate a popular near-term quantum algorithm, the Variational Quantum Eigensolver (VQE), and benchmark Imaginary Time Evolution (ITE), which compute ground states of Hamiltonians.  more » « less
Award ID(s):
1839204
NSF-PAR ID:
10298175
Author(s) / Creator(s):
; ; ; ;
Date Published:
Journal Name:
SC '20: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract

    Contracting tensor networks is often computationally demanding. Well-designed contraction sequences can dramatically reduce the contraction cost. We explore the performance of simulated annealing and genetic algorithms, two common discrete optimization techniques, to this ordering problem. We benchmark their performance as well as that of the commonly-used greedy search on physically relevant tensor networks. Where computationally feasible, we also compare them with the optimal contraction sequence obtained by an exhaustive search. Furthermore, we present a systematic comparison with state-of-the-art tree decomposition and graph partitioning algorithms in the context of random regular graph tensor networks. We find that the algorithms we consider consistently outperform a greedy search given equal computational resources, with an advantage that scales with tensor network size. We compare the obtained contraction sequences and identify signs of highly non-local optimization, with the more sophisticated algorithms sacrificing run-time early in the contraction for better overall performance.

     
    more » « less
  2. Tensor contractions are ubiquitous in computational chemistry andphysics, where tensors generally represent states or operators andcontractions express the algebra of these quantities. In this context,the states and operators often preserve physical conservation laws,which are manifested as group symmetries in the tensors. These groupsymmetries imply that each tensor has block sparsity and can be storedin a reduced form. For nontrivial contractions, the memory footprint andcost are lowered, respectively, by a linear and a quadratic factor inthe number of symmetry sectors. State-of-the-art tensor contractionsoftware libraries exploit this opportunity by iterating over blocks orusing general block-sparse tensor representations. Both approachesentail overhead in performance and code complexity. With intuition aidedby tensor diagrams, we present a technique, irreducible representationalignment, which enables efficient handling of Abelian group symmetriesvia only dense tensors, by using contraction-specific reduced forms.This technique yields a general algorithm for arbitrary group symmetriccontractions, which we implement in Python and apply to a variety ofrepresentative contractions from quantum chemistry and tensor networkmethods. As a consequence of relying on only dense tensor contractions,we can easily make use of efficient batched matrix multiplication viaIntel’s MKL and distributed tensor contraction via the Cyclops library,achieving good efficiency and parallel scalability on up to 4096 KnightsLanding cores of a supercomputer.

     
    more » « less
  3. null (Ed.)
    Due to the unreliability and limited capacity of existing quantum computer prototypes, quantum circuit simulation continues to be a vital tool for validating next generation quantum computers and for studying variational quantum algorithms, which are among the leading candidates for useful quantum computation. Existing quantum circuit simulators do not address the common traits of variational algorithms, namely: 1) their ability to work with noisy qubits and operations, 2) their repeated execution of the same circuits but with different parameters, and 3) the fact that they sample from circuit final wavefunctions to drive a classical optimization routine. We present a quantum circuit simulation toolchain based on logical abstractions targeted for simulating variational algorithms. Our proposed toolchain encodes quantum amplitudes and noise probabilities in a probabilistic graphical model, and it compiles the circuits to logical formulas that support efficient repeated simulation of and sampling from quantum circuits for different parameters. Compared to state-of-the-art state vector and density matrix quantum circuit simulators, our simulation approach offers greater performance when sampling from noisy circuits with at least eight to 20 qubits and with around 12 operations on each qubit, making the approach ideal for simulating near-term variational quantum algorithms. And for simulating noise-free shallow quantum circuits with 32 qubits, our simulation approach offers a 66X reduction in sampling cost versus quantum circuit simulation techniques based on tensor network contraction. 
    more » « less
  4. Quantum many-body systems involving bosonic modes or gauge fields have infinite-dimensional local Hilbert spaces which must be truncated to perform simulations of real-time dynamics on classical or quantum computers. To analyze the truncation error, we develop methods for bounding the rate of growth of local quantum numbers such as the occupation number of a mode at a lattice site, or the electric field at a lattice link. Our approach applies to various models of bosons interacting with spins or fermions, and also to both abelian and non-abelian gauge theories. We show that if states in these models are truncated by imposing an upper limit Λ on each local quantum number, and if the initial state has low local quantum numbers, then an error at most ϵ can be achieved by choosing Λ to scale polylogarithmically with ϵ − 1 , an exponential improvement over previous bounds based on energy conservation. For the Hubbard-Holstein model, we numerically compute a bound on Λ that achieves accuracy ϵ , obtaining significantly improved estimates in various parameter regimes. We also establish a criterion for truncating the Hamiltonian with a provable guarantee on the accuracy of time evolution. Building on that result, we formulate quantum algorithms for dynamical simulation of lattice gauge theories and of models with bosonic modes; the gate complexity depends almost linearly on spacetime volume in the former case, and almost quadratically on time in the latter case. We establish a lower bound showing that there are systems involving bosons for which this quadratic scaling with time cannot be improved. By applying our result on the truncation error in time evolution, we also prove that spectrally isolated energy eigenstates can be approximated with accuracy ϵ by truncating local quantum numbers at Λ = polylog ( ϵ − 1 ) . 
    more » « less
  5. Calculation of many-body correlation functions is one of the critical kernels utilized in many scientific computing areas, especially in Lattice Quantum Chromodynamics (Lattice QCD). It is formalized as a sum of a large number of contraction terms each of which can be represented by a graph consisting of vertices describing quarks inside a hadron node and edges designating quark propagations at specific time intervals. Due to its computation- and memory-intensive nature, real-world physics systems (e.g., multi-meson or multi-baryon systems) explored by Lattice QCD prefer to leverage multi-GPUs. Different from general graph processing, many-body correlation function calculations show two specific features: a large number of computation-/data-intensive kernels and frequently repeated appearances of original and intermediate data. The former results in expensive memory operations such as tensor movements and evictions. The latter offers data reuse opportunities to mitigate the data-intensive nature of many-body correlation function calculations. However, existing graph-based multi-GPU schedulers cannot capture these data-centric features, thus resulting in a sub-optimal performance for many-body correlation function calculations. To address this issue, this paper presents a multi-GPU scheduling framework, MICCO, to accelerate contractions for correlation functions particularly by taking the data dimension (e.g., data reuse and data eviction) into account. This work first performs a comprehensive study on the interplay of data reuse and load balance, and designs two new concepts: local reuse pattern and reuse bound to study the opportunity of achieving the optimal trade-off between them. Based on this study, MICCO proposes a heuristic scheduling algorithm and a machine-learning-based regression model to generate the optimal setting of reuse bounds. Specifically, MICCO is integrated into a real-world Lattice QCD system, Redstar, for the first time running on multiple GPUs. The evaluation demonstrates MICCO outperforms other state-of-art works, achieving up to 2.25× speedup in synthesized datasets, and 1.49× speedup in real-world correlation functions. 
    more » « less