<?xml-model href='http://www.tei-c.org/release/xml/tei/custom/schema/relaxng/tei_all.rng' schematypens='http://relaxng.org/ns/structure/1.0'?><TEI xmlns="http://www.tei-c.org/ns/1.0">
	<teiHeader>
		<fileDesc>
			<titleStmt><title level='a'>Enhancing variational Monte Carlo simulations using a programmable quantum simulator</title></titleStmt>
			<publicationStmt>
				<publisher>Physical Review A</publisher>
				<date>03/01/2024</date>
			</publicationStmt>
			<sourceDesc>
				<bibl> 
					<idno type="par_id">10512797</idno>
					<idno type="doi">10.1103/PhysRevA.109.032410</idno>
					<title level='j'>Physical Review A</title>
<idno>2469-9926</idno>
<biblScope unit="volume">109</biblScope>
<biblScope unit="issue">3</biblScope>					

					<author>M Schuyler Moss</author><author>Sepehr Ebadi</author><author>Tout T Wang</author><author>Giulia Semeghini</author><author>Annabelle Bohrdt</author><author>Mikhail D Lukin</author><author>Roger G Melko</author>
				</bibl>
			</sourceDesc>
		</fileDesc>
		<profileDesc>
			<abstract><ab><![CDATA[]]></ab></abstract>
		</profileDesc>
	</teiHeader>
	<text><body xmlns="http://www.tei-c.org/ns/1.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xlink="http://www.w3.org/1999/xlink">
<div xmlns="http://www.tei-c.org/ns/1.0"><p>Programmable quantum simulators based on Rydberg atom arrays are a fast-emerging quantum platform, bringing together long coherence times, high-fidelity operations, and large numbers of interacting qubits deterministically arranged in flexible geometries. Today's Rydberg array devices are demonstrating their utility as quantum simulators for studying phases and phase transitions in quantum matter. In this paper, we show that unprocessed and imperfect experimental projective measurement data can be used to enhance in silico simulations of quantum matter, by improving the performance of variational Monte Carlo simulations. As an example, we focus on data spanning the disordered-to-checkerboard transition in a 16 &#215; 16 square-lattice array [S. <ref type="bibr">Ebadi et al.</ref>, Nature (London) 595, 227 (2021)] and employ the data-enhanced variational Monte Carlo algorithm to train powerful autoregressive wave-function ans&#228;tze based on recurrent neural networks (RNNs). We observe universal improvements in the convergence times of our simulations with this hybrid training scheme. Notably, we also find that pretraining with experimental data enables relatively simple RNN ans&#228;tze to accurately capture phases of matter that are not learned with a purely variational training approach. Our work highlights the promise of hybrid quantum-classical approaches for large-scale simulation of quantum many-body systems, combining autoregressive language models with experimental data from existing quantum devices. DOI: 10.1103/PhysRevA.109.032410</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>I. INTRODUCTION</head><p>A central challenge for the current generation of quantum computers and simulators is determining whether their outputs can provide value for problems of practical interest. This is particularly urgent for devices that have sufficiently large numbers of qubits, long coherence times, and high connectivity, such that their outputs cannot be simulated exactly using classical methods <ref type="bibr">[1]</ref><ref type="bibr">[2]</ref><ref type="bibr">[3]</ref><ref type="bibr">[4]</ref>. Programmable quantum devices made of arrays of Rydberg atoms are no exception, with rapid scaling progress being achieved by several groups in academia and industry <ref type="bibr">[5]</ref><ref type="bibr">[6]</ref><ref type="bibr">[7]</ref><ref type="bibr">[8]</ref><ref type="bibr">[9]</ref><ref type="bibr">[10]</ref><ref type="bibr">[11]</ref><ref type="bibr">[12]</ref><ref type="bibr">[13]</ref>. The large number of qubits in the latest experimental Rydberg arrays have allowed these systems to flourish as quantum simulators of quantum matter, with numerous impressive demonstrations preparing exotic phases and studying their phase transitions <ref type="bibr">[9,</ref><ref type="bibr">[14]</ref><ref type="bibr">[15]</ref><ref type="bibr">[16]</ref> and dynamics <ref type="bibr">[6,</ref><ref type="bibr">17]</ref>. It is intriguing to inquire if such devices can be used to accelerate solutions to practical simulation problems despite existing limitations to system size, simulation time, decoherence, and data acquisition rate.</p><p>Even for today's best quantum computing platforms, determining properties of the quantum state is fundamentally hampered by the intractability of traditional tomography. Namely, since the state space grows exponentially, a full * msmoss@uwaterloo.ca tomographic reconstruction will always be data limited, making it impossible for any platform with a large number of qubits. One useful strategy that has recently emerged for extracting value from a limited amount of quantum measurement data is generative modeling <ref type="bibr">[18]</ref>. Generative neural networks have many desirable properties that make them especially suitable as ansatz quantum states for learning tasks in quantum many-body physics. For example, they are flexible architectures that can be readily extended to represent higher-dimensional systems, they can be trained with efficient heuristics, and improvements can be systematically explored through hyperparameter tuning. Models such as restricted Boltzmann machines (RBMs) <ref type="bibr">[19,</ref><ref type="bibr">20]</ref> and recurrent neural networks (RNNs) <ref type="bibr">[21]</ref><ref type="bibr">[22]</ref><ref type="bibr">[23]</ref> have demonstrated successful wave-function reconstruction, although the quality of the reconstructed state is still highly dependent on the number of measurement samples available and the quality of the data. One recent RBM reconstruction of the ground state of a chain of only N = 8 Rydberg atoms used O( <ref type="formula">10</ref>3 ) samples, requiring some degree of error mitigation applied to the experimental data <ref type="bibr">[24]</ref>. As Rydberg array sizes continue to grow, the amount of data needed for an accurate reconstruction is also expected to grow, although a study of the sample complexity required for accurate wave-function reconstruction has not been performed for Rydberg array data to date. As an example, information from O(10 3 ) measurements would not be sufficient to reconstruct the full quantum state for system sizes of N 8 at least in the described manner; however, these data can still provide value in the training of generative models.</p><p>In this work, we demonstrate how limited and imperfect experimental data from a Rydberg atom array can be used without preprocessing or error mitigation to help in the variational training of generative models, specifically recurrent neural networks (RNNs), to accurately capture ground-state wave functions at many points across a phase transition. We use a hybrid training scheme introduced as data-enhanced variational Monte Carlo simulations <ref type="bibr">[22]</ref> (or, neural error mitigation <ref type="bibr">[25]</ref>), that is both data driven and Hamiltonian driven (i.e., variational). Figure <ref type="figure">1</ref> shows a schematic outlining this training procedure. We find pretraining our RNN ans&#228;tze with the experimental data results in a robust improvement in the time to solution for the subsequent variational optimization. In addition to these speedups in convergence time, in some cases, data enhancement improves the accuracy of the quantum state learned by the model. Once trained, an RNN wave function acts as a generative model, generating new wave-function configurations with tractable autoregressive sampling, allowing for the calculation of many observables of interest, including those that cannot be calculated directly from projective measurement data. We provide results for estimators of the energy, the order parameter characterizing the phase transition, and an off-diagonal operator, confirming that our trained models capture characteristic features of the target ground states. Our results indicate that generative autoregressive models designed for use as parametrized wave functions, such as RNNs, are well suited to take advantage of raw unprocessed measurement data from today's quantum devices, enabling a potentially powerful hybrid approach for quantum simulators.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>II. PROGRAMMABLE RYDBERG ATOM ARRAYS</head><p>As a concrete example, we focus on experimental work by Ebadi et al. <ref type="bibr">[15]</ref>, where two-dimensional (2D) defect-free arrays containing hundreds of 87 Rb atoms were studied. These arrays are assembled into an L &#215; L square lattice using a combination of static and movable optical tweezers. The qubits in the array are encoded in the ground |g and the highly excited Rydberg state |r , and the coherent evolution of the array is governed by the Rydberg Hamiltonian which is parametrized by the Rabi frequency (t ), detuning &#948;(t ), and long-range interactions V i j :</p><p>where N = L 2 is the number of qubits, &#963; x i = |g i r| i + |r i g| i , ni = |r i r| i , and V i j = R 6  b /|r ir j | 6 . The Rydberg blockade mechanism <ref type="bibr">[26]</ref>, which penalizes the simultaneous excitation of atoms within a blockade radius R b &#8801; (V 0 / ) 1/6 , can effectively be controlled by adjusting the lattice spacing a of the atom array. Thus, by tuning the various parameters and programming the geometry of the array, a rich phase diagram of quantum states can be accessed using these devices.</p><p>After evolution under this Hamiltonian, the state of each atomic qubit is read out with a projective measurement where atoms in |g are imaged, while atoms in |r are ejected from the system. This measurement procedure yields a set of binary data &#963; = {n i } N i=1 , where n i = 0, 1, that represents one projective measurement of the many-body quantum state, which we refer to as a "shot." Because the Hamiltonian describing this system can be made stoquastic in the Rydberg occupation basis, projective measurements taken in this basis provide complete information about the positive, real-valued amplitudes of the ground-state wave function.</p><p>Making use of this protocol, Ebadi et al. realized and explored a number of phases and phase transitions detected on a square 16 &#215; 16 Rydberg atom array <ref type="bibr">[15]</ref>. Here, we study the most well understood of these, the disordered and Z 2 checkerboard phases, including the quantum phase transition between them that lies in the (2 + 1)-dimensional [(2 + 1)D] Wilson-Fisher (WF) universality class. For a given R b /a, a state of interest is prepared by linearly increasing &#948; at a given sweep rate s and at fixed , from an initial large negative value with all atoms in |g , and stopping at the desired value of &#948;. We consider the slowest sweep rate, s = 15 MHz/&#956;s, for which we have the largest number of measurements (1000 per &#948;) and the state preparation is most adiabatic. Each time a state is prepared, a single projective measurement is captured, requiring repeated state preparations and measurements for each final detuning value. This raw experimental measurement data obtained for a range of detuning values that spans the disordered-to-checkerboard transition is used in all numerical experiments herein. For a more detailed list of the experimental parameters, see Appendix E.</p><p>For these experiments, measurements are of high quality, with readout fidelities of 99% for both |g and |r . However, it is worth emphasizing that the states prepared on these devices are low-energy states and not necessarily the many-body ground states. As a result, measurement shots in the data set, especially those in the ordered phase, show energy fluctuations such as domain walls separating regions of opposite ordering, which are artifacts of nonadiabatic state preparation, and single-particle errors due to readout and quantum fluctuations (see the shot in Fig. <ref type="figure">1</ref>). Nevertheless, the measurement data are capable of providing estimators of correlation functions and susceptibilities accurate enough to extract values for the critical exponents at the WF critical point that are in close agreement with known universal results <ref type="bibr">[15,</ref><ref type="bibr">27]</ref>. A detailed investigation into how the imperfections in the data manifest in the training of RNN wave functions is discussed in Appendix A.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>III. RECURRENT NEURAL NETWORK WAVE FUNCTIONS</head><p>In this work, we employ RNNs to encode ground-state wave functions <ref type="bibr">[28]</ref>. RNNs are universal function approximators, meaning they can be made expressive enough to encode any arbitrarily complex function by increasing the number of tunable parameters (or weights W) in the network <ref type="bibr">[29]</ref>. Additionally, RNNs belong to a class of generative models known as autoregressive neural networks. Instead of directly encoding a target joint distribution p(x), these models learn conditional probability distributions over the N individual variables p(x i |x i-1 , x i-2 . . . x 1 ). The chain rule of probabilities allows the overall joint distribution to be decomposed into a sequence of such conditionals,</p><p>This autoregressive property gives rise to a number of desirable features. For instance, the overall normalization of the joint probability distribution is guaranteed by the decomposition into normalized conditionals. This normalization means the joint distribution can directly produce independent samples, bypassing the need for Markov chain Monte Carlo sampling. The ability to efficiently obtain independent samples is particularly crucial during variational optimization. The elementary building block of an RNN is a recurrent cell, denoted by the green boxes in Fig. <ref type="figure">2</ref>. These cells represent a nonlinear transformation which involve the tunable parameters of the model W and act on the vectors of hidden units h, which are the main conduit through which FIG. <ref type="figure">2</ref>. (a) A 1D RNN wave function, where each RNN cell (green boxes) takes the hidden vector h i-1 and vector containing the state of the previous qubit &#963; i-1 and performs a nonlinear transformation involving the tunable parameters W, producing a new hidden vector h i and the probability over the current qubit conditioned on all previously sampled qubits p W (&#963; i |&#963; &lt;i ), from which the state of the current qubit &#963; i can be sampled. The sampling path (red arrows) exactly follows the path in which hidden vectors and state vectors are passed (black arrows). (b) A 2D RNN wave function, where each RNN cell (green boxes) performs a nonlinear transformation on the hidden vectors and state vectors from nearest-neighbor RNN cells, which correspond to neighboring qubits. The transformation produces the new hidden vector and conditional probability. Note that while the information is passed in two directions (black arrows), the sampling path is one dimensional (red arrows) as required by the autoregressive property.</p><p>information is passed in the model. In the one-dimensional case, this transformation maps an incoming vector of hidden units h i-1 , with dimension N h , and an incoming vector &#963; i-1 , which encodes the state of the previous variable, to an output vector of hidden units h i , also with dimension N h . From the new hidden vector, which has been updated to include information about the newly sampled variable via the nonlinear transformation, one can obtain a probability distribution over possible states for the present variable, conditioned on all previously sampled variables p(&#963; i |&#963; i&lt; ). This distribution can be sampled for the state of the present variable &#963; i , which is then passed on to the next recurrent cell. This process is naturally represented as a sequence, where &#963; i-1 is sampled first and used to calculate p(&#963; i |&#963; i&lt; ), from which &#963; i can be sampled.</p><p>The sequence ordering, which is indicated by the red arrows in Fig. <ref type="figure">2</ref>, ensures that the autoregressive property is obeyed. Figure <ref type="figure">2</ref> also demonstrates how RNNs can be made multidimensional by allowing the recurrent cell to take multiple inputs, which facilitates information passing in more than one direction. It is important, however, that the sampling path (red arrows) remains strictly one dimensional, so that the autoregressive property is not violated. This extension of the RNN is especially beneficial for data or systems with multidimensional structure, as demonstrated by our results. We choose a gated recurrent unit (GRU) as our recurrent cell <ref type="bibr">[30]</ref> for the one-dimensional (1D) RNN wave function or the higher-dimension tensorized-GRU <ref type="bibr">[31]</ref> for the 2D RNN wave function.</p><p>In the context of using RNNs as wave functions, the target distribution is the probability density function given by the squared amplitudes of the ground-state wave function, which we refer to as the Born distribution. Applying the chain rule of probabilities, the Born distribution can be exactly decomposed into conditional probability distributions over individual qubits,</p><p>Each recurrent cell outputs a a hidden vector which is used to compute a conditional probability over a single qubit conditioned on the previously sampled qubits. Therefore, in the case of positive real-valued wave functions, such as the ground state of Eq. ( <ref type="formula">1</ref>), the distribution encoded in the RNN represents the full quantum state (&#963;) = &#963;| = &#8730; p RNN (&#963;; W ). Further modifications can be used to reconstruct wave functions with complex amplitudes <ref type="bibr">[28]</ref>. Additionally, modifications can be made such that each recurrent cell corresponds to a "unit cell" of qubits, rather than a single qubit, enabling the study of non-Bravais lattices <ref type="bibr">[32]</ref>.</p><p>These models are trained by optimizing the parameters in this network W such that a specified loss function is minimized. More discussion on the loss function(s) and optimization procedures follow.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>IV. THE DATA-ENHANCED VARIATIONAL MONTE CARLO ALGORITHM</head><p>We follow the recently introduced training procedure <ref type="bibr">[25]</ref> which we refer to as the data-enhanced variational Monte Carlo algorithm <ref type="bibr">[22]</ref>. This hybrid approach consists of two distinct phases of the training (see Fig. <ref type="figure">1</ref>). First, our RNN wave function is trained in a data-driven setting, where we make use of projective measurements of the experimental Rydberg atom array. The remainder of the training is carried out in a Hamiltonian-driven setting, where we employ variational optimization common to machine-learning-based variational Monte Carlo simulations. The only difference between these two settings is the loss function governing the optimization.</p><p>During data-driven training, we train our RNN wave function in an unsupervised manner to learn the probability distribution over spin configurations that is contained in the data set D. We use the Kullback-Leibler divergence as our loss function,</p><p>where we have introduced the entropy of the data set S D = -{&#963;} p D (&#963;)ln p D (&#963;) and approximated p D with the empirical distribution, which reduces the sum over all configurations {&#963;} to the sum over the data in D. It is clear from Eq. ( <ref type="formula">4</ref>) that the minimum of L DKL occurs when the two probability distributions, p D (&#963;) and p RNN (&#963;; W ), are equal. This is achieved in practice by determining the parameters W of our RNN wave function that minimize this loss, which we approximate with Eq. ( <ref type="formula">5</ref>). With the availability of large (informationally complete) amounts of data, the empirical distribution is a good approximation to the Born distribution and the KL loss could be employed on its own to reliably reconstruct a quantum state of interest <ref type="bibr">[18]</ref>. We focus on the scenario where one has access to only a limited amount of data and full tomography or even data-driven quantum state reconstruction is out of reach. Thus, we treat the data-driven training as a type of pretraining which improves the performance of the subsequent variational optimization.</p><p>In the Hamiltonian-driven setting, the same RNN wave function can be trained to represent the ground state of the quantum system without making use of data, and instead using the Hamiltonian that is presumed to describe the system. Here, we use standard variational optimization in which the expectation value of the energy is minimized <ref type="bibr">[33]</ref>. This optimization technique can be used to target ground states because of the variational principle, which guarantees that the energy will not converge to a value below the true ground state of the Hamiltonian. We define the new loss function for variational optimization as</p><p>where we introduce the local energy</p><p>Here, we approximate the ground-state energy of our RNN wave function H RNN by averaging the local energy H loc evaluated on N s samples drawn from p RNN (&#963;; W ). This quantity is treated as our loss function and is minimized during Hamiltonian-driven training. A more detailed discussion of this approximation of H RNN and the local energy can be found in Appendix B. A recent work <ref type="bibr">[34]</ref> also explored incorporating independent samples, obtained using an independent quantum simulation of the target system, to enhance variational Monte Carlo. We emphasize that we employ two distinct loss functions to carry out the two phases of training. Alternatively, the method proposed by Montanaro et al. <ref type="bibr">[34]</ref> uses the Hamiltonian-driven loss function throughout the training, and uses the independent quantum samples to estimate the local energies in the initial stage of training. Interestingly, the results of the two methods are in strong agreement.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>V. RESULTS</head><p>In this section, we demonstrate the success of the dataenhanced variational Monte Carlo algorithm as a method for training both one-dimensional (1D) and two-dimensional (2D) RNN wave functions. In what follows, we benchmark our results against data provided by quantum Monte Carlo (QMC) simulations <ref type="bibr">[35]</ref>. For Rydberg atom arrays described by the Hamiltonian (1), QMC is free of the sign problem, and can therefore provide ground-state observables that are exact to within statistical errors. Additionally, QMC can provide unbiased (but autocorrelated) samples of ground-state configurations in the Rydberg occupation basis, which we use to train RNN generative models in Appendix A.</p><p>In our main results, we consider the experimental data for the disordered-to-checkerboard phase transition from Ref. <ref type="bibr">[15]</ref> for a 16 &#215; 16 array of 87 Rb atoms. The experimental parameter governing this phase transition is the detuning &#948;; we report our results in terms of the dimensionless ratio &#948;/ . We consider 31 values of &#948;/ ranging from roughly -0.4 to 3.2 with the phase transition occurring at &#948; c / = 1.12(4) <ref type="bibr">[15]</ref>. In all of the following, we fix N h = 2L for the 1D RNN, which was previously shown to equip these models with sufficient expressive capabilities <ref type="bibr">[22]</ref>. For the 2D RNN, we set N h = L, keeping the total number of parameters in these two types of ans&#228;tze roughly equal. We initialize our RNN wave function using the methods introduced by Glorot et al. <ref type="bibr">[36]</ref>. Then, using the Adam optimizer <ref type="bibr">[37]</ref>, which is an extension to stochastic gradient descent with an adaptive learning rate, we optimize the tunable parameters W according to the specified loss function. Additional information about the experimental parameters and hyperparameters of our RNN wave functions can be found in Appendix E.</p><p>As discussed above and illustrated in Fig. <ref type="figure">1</ref>, training proceeds through two different phases: data driven then Hamiltonian driven, the difference being the loss function which is minimized. This hybrid training scheme involves simply switching between loss functions after a specified number of data-driven training steps given by t trans . Here, we fixed t trans = 1000 for our hybrid trained 1D RNN wave functions and t trans = 100 for our hybrid trained 2D RNN wave functions based on observations that after this many data-driven training steps, the respective models had learned from the measurement data without overfitting to it. Note, in this work we only explore one phase of data-driven training, followed by one phase of Hamiltonian-driven training. Other transition schedules are possible; however, we observe this order of training to give the best results, consistent with past studies using perfect synthetic data <ref type="bibr">[22]</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>A. Improving variational Monte Carlo simulations with data enhancement</head><p>To illustrate the behavior of our RNN wave functions during the training process, Fig. <ref type="figure">3</ref> shows the training curves for 1D and 2D RNN wave functions optimized to capture the ground state of our Rydberg system deep in the checkerboard phase (&#948;/ = 3.173). We show the results of data-enhanced variational Monte Carlo (VMC) simulations with a transition time t trans , and compare it to a purely Hamiltonian-driven approach (t trans = 0), as well as a purely data-driven approach (t trans = &#8734;). These training curves epitomize the success of data-enhanced VMC simulations for both the 1D and 2D RNN wave functions. For the 1D RNN, variational optimization struggles and the model is unable to capture the properties of the target ground state. The use of the experimental data to pretrain the model significantly improves the overall performance, as seen through the converging ground-state energy estimate H RNN . For the 2D RNN, the models trained purely variationally and with data-enhanced VMC simulations reach comparable and accurate ground-state energies, but the data enhancement provides a speedup in the overall time to convergence, as seen in the inset of Fig. <ref type="figure">3</ref>. We also point out the significant gap between the ground-state energy obtained from our QMC simulations and the energies achieved by both the 1D and 2D RNN wave functions trained with data alone. In Appendix A we show that this gap is at least partly due to both the limited and the imperfect nature of the experimental data. Figure <ref type="figure">3</ref> illustrates these training dynamics for the case where we are learning a ground state in the ordered checkerboard phase, but we observe that these trends also hold more broadly across the phase transition. Figures <ref type="figure">4(a</ref> Figure <ref type="figure">4</ref>(a) shows that, in some cases, the energies achieved by the 1D RNN are improved by over an order of magnitude when data-enhanced VMC simulations are used as the training procedure, particularly in the ordered checkerboard regime. A similar effect was not observed in Ref. <ref type="bibr">[22]</ref> because the method was tested at only one point in the phase diagram, close to a phase transition. The results here are consistent, in that the performance of variational optimization and data-enhanced VMC simulations are comparable in the disordered phase and around the phase transition, with data enhancement still providing a speedup in convergence time in these regimes. In addition to the accuracy improvements seen in the ground-state energy estimates in the ordered regime [Fig. <ref type="figure">4(a)</ref>], we find that the use of data allows the 1D RNN wave function to capture the 2D ordering of the checkerboard phase, which it is unable to do when trained using the Hamiltonian alone. This might be understood since the data set, which is a set of projective measurement outcomes that are exemplary of the ground state or states close to it, contains a more explicit representation of the correlations between atoms in the array. Using these data in the earliest epochs of training optimizes the model to directly reflect this information. This early-phase training seems to be crucial in moving the gradient descent algorithm into a smaller subspace that can more easily be optimized by later-phase variational training. More details and discussion can be found in Appendix D. Importantly, the variance of the variational energy, a measure that indicates how close a variational state is to an eigenvalue of the Hamiltonian, follows the same trend seen in Fig. <ref type="figure">4(a)</ref>. In the ordered phase, variances from simulations where our hybrid training method was employed are an order of magnitude smaller than when a traditional VMC approach is used. This indicates that the improved variational energies correspond to the model reaching a better variational state closer to the ground state of the Hamiltonian.</p><p>Figure <ref type="figure">4</ref>(b) shows that for the 2D RNN wave functions, the data enhancement consistently provides a significant speedup in time to convergence for varying points across the phase transition. For any t trans &gt; 0 the time to convergence is reduced, with the most significant speedup occurring when t trans is equal to the point at which the minimum energy is achieved during data-driven training. While these results show that t trans could be optimized for each trained model, they also motivate our choice for a uniform transition time since appreciable speedups are observed for all transition times.</p><p>It should be noted that t conv is rescaled by t VMC , which masks the absolute speedup in convergence time. As an example, at t trans = 80 the values of t conv /t VMC = 87/541, 186/1498, 267/1667, 179/883 for the four values of &#948;/ in increasing order. Both t conv and t VMC vary for the different points across the phase transition, with the longest convergence times occurring for the values of &#948;/ close to the phase transition. While the relative speedups are comparable, the absolute speedups show that the data offer the largest gain in the vicinity of the phase transition, where it is expected in the thermodynamic limit that the correlation length will diverge and the energy gap will vanish.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>B. Calculating observables from trained RNN wave functions</head><p>In the previous section, we focused on demonstrating how the data-enhanced VMC algorithm is a successful training protocol that enables both the 1D and 2D RNN wave functions to more efficiently learn quantum ground states. Here, we aim to further demonstrate that trained RNN wave functions are accurate representations of ground states of the Rydberg array across the quantum phase transition of interest. We do so by using our trained RNN wave functions to calculate physically relevant ground-state observables across the transition.</p><p>We define first a simple order parameter for the checkerboard phase,</p><p>where A and B refer to the two sublattices of the bipartite square lattice, and we define the Pauli operator &#963; z i = 2n i -1. This is equivalent to the staggered magnetization in a spin- 1   2   system, and we use that nomenclature henceforth. In the phase described by this order parameter, the Fourier transform of experimental measurement outcomes of the Rydberg occupation would result in a sharp peak at wave vector k = (&#960;, &#960; ) <ref type="bibr">[15]</ref>.</p><p>Because Eq. ( <ref type="formula">10</ref>) is diagonal in the occupation basis, this quantity can be easily calculated directly from samples or projective measurements. For example, the staggered magnetization can be calculated from samples drawn from a trained RNN wave function. The ensemble average then defines the estimator</p><p>which can provide an important quantity to verify the RNN wave-function accuracy past the standard loss function or the energy.</p><p>While the estimation of M s RNN does allow us to verify how well our models have captured the phase across the phase transition, the staggered magnetization can also be calculated directly from experimental measurement data or easily tracked in most world-line QMC simulations. On the contrary, estimators for off-diagonal operators are harder to obtain experimentally. Because our trained RNN wave functions provide explicit access to the learned approximation of the ground-state wave function, we have a representation of the associated Born distribution that can then be probed easily. This allows for an efficient estimation of off-diagonal observables.</p><p>In order to demonstrate how the trained RNN wave functions can be used to estimate off-diagonal observables, we consider the spatially averaged expectation value of &#963; x , which can be defined as</p><p>where (&#963; i ) j = &#963; j -&#948; i, j , meaning &#963; i is equivalent to &#963; except that the ith spin is flipped. The inner sum is the spatial average and the outer sum is the ensemble average used to estimate the expectation value. Having an explicit representation of our approximation of the Born distribution is essential for calculating the ground-state wave-function amplitude for specific configurations (in this case, e.g., &#963; i ). For more details on estimating off-diagonal operators from trained RNN wave functions, see Appendix C. Figure <ref type="figure">5</ref> displays these two observables, estimated for all values of &#948;/ for which we have experimental data. The upper panel shows the staggered magnetization as a function of &#948;/ . As expected, the 2D RNN wave functions, trained purely variationally and with data-enhanced VMC simulations, are in close agreement with the staggered magnetization estimators obtained through QMC simulations, which we take to be the ground truth. On the other hand, the hybrid-trained 1D RNN wave functions show much better performance than those trained with standard VMC simulations. This performance is consistent with the behavior seen in Fig. <ref type="figure">4</ref>(a) in that there is more error in the vicinity of the phase transition, but better agreement with QMC simulations in the highly ordered phase. Estimates of M s obtained from the raw experimental data set are also shown as a means of demonstrating that while neither the experimental data nor the 1D RNN trained with standard VMC simulations are able to accurately capture the staggered magnetization across the phase transition, combining the two leads to significantly higher accuracy.</p><p>The lower panel shows the spatially averaged expectation value of &#963;x . These results again demonstrate how the 1D RNN wave functions trained with only VMC simulations are unable to capture the transition to the ordered phase. Interestingly, the hybrid trained 1D RNN is in close agreement with the 2D RNN, even in the region around the phase transition. These results show how even suboptimal models (in this case the 1D RNN attempting to capture 2D ordering patterns) can be drastically improved by pretraining with raw experimental data.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>VI. CONCLUSIONS AND OUTLOOK</head><p>In this paper we use raw experimental data from a 256-qubit Rydberg atom array to pretrain state-of-the-art autoregressive recurrent neural network (RNN) wave-function ans&#228;tze. The data contain shots of projective measurements taken in the Rydberg occupation basis across the disorderedto-checkerboard phase transition, obtained in Ref. <ref type="bibr">[15]</ref>. We demonstrate that subsequent variational training of the wave function, guided by Eq. ( <ref type="formula">1</ref>), evolves it towards the ground state of the presumptive Rydberg Hamiltonian despite the limited size and imperfections of the data set used for pretraining.</p><p>RNNs, first developed for applications in natural language processing, are traditionally one dimensional (1D), and have been demonstrated to successfully support dataenhanced variational Monte Carlo (VMC) simulations in the past <ref type="bibr">[22,</ref><ref type="bibr">28]</ref>. For our two-dimensional (2D) 16 &#215; 16 Rydberg array, we find that 1D RNNs make poor variational wave functions when trained with a Hamiltonian-driven strategy alone, with deteriorating performance as the 2D checkerboard order of the system becomes more pronounced. However, pretraining with a data-driven strategy, using raw experimental shots, results in a significantly improved variational ground state. We quantify this by estimating the energy and other relevant observables across the disordered-to-checkerboard transition. In particular, we calculate both the checkerboard order parameter (the qubit staggered &#963; z magnetization) and the spatially averaged expectation value of &#963; x , which is inaccessible from the raw experimental data in the occupation basis. These quantities all reflect vast qualitative improvement, particularly in the checkerboard phase, when obtained with our data-driven, then Hamiltonian-driven, hybrid optimization.</p><p>We further find that the use of the experimental data quite generally provides a speedup in the convergence time of subsequent variational optimization in a variety of RNN wave functions. For the 1D RNN, this includes the aforementioned qualitative improvement in observables, suggesting the experimental data help drive the RNN parameters towards a fundamentally different optimum during training. In this context, the pretraining of our ans&#228;tze with experimental data can be viewed as a means for better initializing the parameters of the wave function prior to the subsequent variational optimization. From this perspective, our results are consistent with the previous body of work in the machine-learning literature that shows that the initialization of neural networks has tremendous effects on convergence <ref type="bibr">[36,</ref><ref type="bibr">38]</ref>, which presently motivates significant research efforts in determining optimal methods for initialization <ref type="bibr">[39]</ref>. With this intuition, it is natural to question whether there are other successful "physics-informed" initialization schemes, for example, initializing parameters to a nearby noninteracting ground state, or an area-law entangled state. However, mapping such states to the RNN ansatz can itself be a nontrivial task due to the need for a one-to-one correspondence between the variational parameters of the two ans&#228;tze. Indeed, recent work has explored such a mapping between matrix product states and RNNs <ref type="bibr">[40]</ref>. In many cases, however, it may prove to be most efficient to use a priori physics information to produce artificial data sets that capture important features of the desired state (e.g., by obtaining samples from simple, computationally inexpensive simulations), which can then be used for very early-phase data-driven training before experimental data are subsequently used.</p><p>We also explored state-of-the-art 2D RNN wave functions which are more suited to the 2D Rydberg array geometry, and found that their Hamiltonian-driven optimization was highly effective, especially when compared to the variationally trained 1D RNN wave functions. Despite this, we are able to demonstrate that pretraining with experimental data also provides clear value, in that the time to convergence of the variational optimization is significantly improved. Specifically, pretraining on just 1000 experimental shots reduced the time to convergence by up to O( <ref type="formula">10</ref>3 ) training iterations, an improvement that can translate into a significant reduction of computer time for an RNN model of this size. While 2D RNN wave functions converge in fewer training steps than their 1D counterparts, Rydberg atom arrays continue to scale towards thousands of atoms and the size of the models used to characterize them must be scaled in response. In this limit, even 2D RNNs may require long convergence times using Hamiltonian-driven training alone. Here, pretraining with experimental data (even limited, imperfect data) could appreciably improve the variational ground state, and lessen the task of designing clever ans&#228;tze for the large Rydberg arrays of the near future.</p><p>Furthermore, since these models can be trained efficiently to represent quantum ground states in various phases with excellent accuracy, we suggest that they could be useful tools for experimentalists in need of quick characterization as systems are scaled in size or used to explore new regimes of quantum matter. While data-enhanced VMC simulations cannot replace the tomographic reconstruction of quantum states, this method provides an alternative that is rooted in the experiment without requiring prohibitive amounts of data that will grow with increasing system size.</p><p>Most importantly, our results pave the way for hybrid quantum-classical simulation of more complex systems, where variational optimization becomes challenging and quantum data might be essential for capturing the target quantum state. Indeed, RNNs have been demonstrated as powerful wave-function ans&#228;tze for learning ground states with more exotic structures such as nontrivial signs <ref type="bibr">[31]</ref> and topological order <ref type="bibr">[32]</ref>. It is conceivable that there are many instances where our hybrid strategy will be essential for RNNs and other autoregressive models to accurately capture these and other complex correlations in quantum matter. Particularly promising examples include systems with sign or phase structures, such as frustrated Hamiltonians as could be realized with XXZ spin interactions <ref type="bibr">[41]</ref>, and systems undergoing dynamical evolution. In these cases, measurements in multiple bases used for early-epoch training could significantly help subsequent variational optimization learn the wave function's complex phase, a task that is believed to be more difficult than learning amplitudes alone <ref type="bibr">[42,</ref><ref type="bibr">43]</ref>. In addition, in experiments where the state preparation protocol might lead to metastable states, a subsequent variational optimization could be used to give information about the distance to the target ground state. Ultimately, this hybrid procedure for error mitigation could be developed into a more formal error-correcting protocol <ref type="bibr">[25,</ref><ref type="bibr">44,</ref><ref type="bibr">45]</ref>. These results hint at the promise of data-enhanced variational Monte Carlo simulations for leveraging data from today's quantum devices and contributing to their path towards quantum advantage in the future.</p><p>Our code is available at <ref type="bibr">[46]</ref>. for his invaluable support with the quantum Monte Carlo simulations. We would also like to thank H. Levine, A. Keesling, G. <ref type="bibr">Semeghini</ref> In this Appendix we aim to answer three main questions concerning the experimental data set we consider and its use in training our RNN wave functions. First, how do the imperfections in the experimental data set manifest in the training of our RNN wave functions? Second, how does having a limited amount of measurement data hinder the training of our RNN wave function? And finally, does this fully explain why our RNN wave functions are unable to accurately capture the ground states with data alone?</p><p>The main "imperfections" in the projective measurement data we consider are the fluctuations which are present because the experimentally prepared states are low-energy states. In the experiments, perfect ground states are not prepared due coherent perturbations, such as parameter miscalibration and nonadiabatic effects due to finite preparation time, as well as due to incoherent errors. While these fluctuations could be mitigated by careful experimental fine tuning and potentially data postprocessing, we aim to show that even with such imperfections, the experimental data are valuable when used in our hybrid training approach. Deep in the ordered phase, the energy fluctuations in the considered data often appear as domain walls that separate regions of the array with opposite antiferromagnetic ordering (see the experimental shot shown in Fig. <ref type="figure">1</ref>). Having regions of opposite ordering is destructive in the calculation of the staggered magnetization, Eq. ( <ref type="formula">10</ref>). This explains why, in Fig. <ref type="figure">5</ref>, the average values of the staggered magnetization estimated directly from the experimental data are not in close agreement with the values obtained from our simulations. Figure <ref type="figure">6</ref> instead compares the entire distribution of staggered magnetization values obtained from the experimental data to the values obtained from our QMC simulations. From this, it is clear that some of the projective measurements captured the exact many-body ground state, or a state very close to it. Having established that there are imperfections present in the experimental data set, it is not yet clear how those imperfections manifest in the training of a model. In order to explore their effects, we compare models trained on 1000 experimental samples to models trained on 1000 perfect, uncorrelated samples obtained from our QMC simulations. In Fig. <ref type="figure">7</ref>, we compare both 1D and 2D RNN wave functions trained variationally and using our hybrid approach at four points across the phase transition.</p><p>The first thing to observe in Fig. <ref type="figure">7</ref> is that the imperfections in the experimental data have a clear effect on the training outcome when only data-driven training is employed. This is evidenced by the fact that the striped orange bars are lower than the plain orange bars for all values of delta and both types of RNN wave functions. When the hybrid approach is used to train our models, however, the energy estimates are more accurate in all cases, as seen by comparing the green bars to the corresponding orange bars. Furthermore, after the subsequent variational optimization, the effects of the imperfections in the data are less evident. When the hybrid approach was employed, the models pretrained with the experimental data performed as well as those pretrained on the perfect QMC samples in almost all cases. These results can be observed by comparing the striped green bars to the plain green bars. This is related to the idea that VMC can be seen as means for smoothing out errors and noise in data obtained from quantum simulations <ref type="bibr">[25]</ref>. Importantly, this result emphasizes the value of the data, despite the imperfections, and suggests that data with even more nonadiabatic energy excitations (i.e., data from experiments using faster sweep rates) could still provide value when used for data-enhanced VMC simulations.</p><p>We note that in some cases, it appears that the use of experimental samples with the hybrid training approach performs better than when the perfect QMC samples are used. Based FIG. <ref type="figure">7</ref>. The absolute error between the best estimated groundstate energy achieved by trained RNN wave functions H RNN and the ground-state energy given by quantum Monte Carlo simulations H QMC for four values of &#948;/ that span the phase transition. We compare models trained with only the experimental data to models trained with perfect samples obtained from quantum Monte Carlo simulations (orange and striped orange, respectively). We also compare models trained with the hybrid data-driven, then Hamiltonian-driven, method when the experimental data and the perfect samples are used for data-driven pretraining (green and striped green, respectively). Our experimental data contain 1000 snapshots of projective measurements and we equivalently use 1000 uncorrelated QMC samples as our "perfect" samples. The error bars represent the standard error of each result, which was obtained by running every simulation with multiple seeds. on the fact that, in most cases, the hybrid training simulations using the two types of data perform comparably, we believe that if additional simulations were performed and averaged to obtain the above results, this would no longer be the case. It is worth mentioning that in the machine-learning literature, it is well known that adding small amounts of noise to training data can prevent overfitting and thus improve a model's ability to generalize <ref type="bibr">[47]</ref><ref type="bibr">[48]</ref><ref type="bibr">[49]</ref>. From that perspective, it is possible that the imperfections in the experimental measurements have this effect and therefore improve the subsequent variational training; however, it is difficult to draw strong conclusions about this based on the results above.</p><p>To understand how a limited data set impacts the training of an RNN wave function, we compare models trained on 1000 perfect samples obtained from our QMC simulations to models trained on 10 000 perfect samples. We also show the results of the models trained on 1000 experimental samples for reference. In Fig. <ref type="figure">8</ref>, we again compare both 1D and 2D RNN wave functions for four values of delta across the phase transition, this time only examining the purely data-driven setting.</p><p>Clearly, access to a larger data set systematically improves the accuracy of the trained models for both the 1D and 2D RNN wave functions and across the phase transition. This is reflected by the comparison of the brown bars to the orange bars. In the case of the 2D RNN wave function, the 10-fold increase in the number of samples results in an accuracy improvement of an order of magnitude (or more). Therefore, the fact that our experimental data set is limited in size does play FIG. <ref type="figure">8</ref>. The absolute error between the best estimated groundstate energy achieved by trained RNN wave functions H RNN and the ground-state energy given by quantum Monte Carlo simulations H QMC for four values of &#948;/ that span the phase transition. We compare models trained with only the experimental data to models trained with 10 3 perfect samples obtained from quantum Monte Carlo simulations (orange and striped orange, respectively). We compare these models to models trained with 10 4 perfect samples (brown) to show that additional data can improve the energy error after data-driven pretraining. This corresponds to a better initialization for the subsequent variational optimization. The error bars represent the standard error of each result, which was obtained by running every simulation with multiple seeds.</p><p>a role in the data-driven training of our models. This is understandable, as these measurement shots are Born distributed, and having more shots means that the empirical distribution in the data set is a better approximation of the true underlying Born distribution.</p><p>From these experiments, it is evident that the imperfect and limited nature of the considered experimental data set does surface during the training (or pretraining) of our RNN wave functions. This partially explains why, in Fig. <ref type="figure">3</ref>, neither the 1D nor the 2D RNN wave functions trained on data alone are able to reach energies close to the energy obtained from our QMC simulations. Our hybrid training method, however, is not hindered by the limited or imperfect nature of the data because after the pretraining, the presumed Hamiltonian is introduced and optimization is continued variationally. This underscores the idea that unprocessed quantum data from today's devices can provide immense value through this type of hybrid quantum-classical simulation.</p><p>There is a final possible explanation for the gap in accuracy between energies obtained from RNNs trained on experimental data alone and the energies obtained from our QMC simulations: the assumption of the incorrect Hamiltonian governing the experiment. It is conceivable that either the form of the Hamiltonian is incorrect or the values of the parameters in the Hamiltonian are incorrect. In this case, variational training with a presumed Hamiltonian will guide the RNN wave function to the ground state of that Hamiltonian, potentially imposing an incorrect bias. Unfortunately, exploring this possibility is beyond the scope of this work, as it would require much more data than we have access to. We leave such explorations for future work.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>APPENDIX B: DERIVING H loc FOR VARIATIONAL OPTIMIZATION</head><p>During the variational optimization of our RNN wave functions, the tunable parameters of the model W are optimized by minimizing the expectation value of the energy, which is given by</p><p>Recall that our RNN wave functions encode a probability distribution p RNN (&#963;; W ) that yields an approximation to the ground-state wave function W (&#963;) = &#963;| W = &#8730; p RNN (&#963;; W ). Inserting the identity in the numerator and the denominator, and multiplying by 1, the above expectation value can be equivalently written as</p><p>as a normalized probability P(&#963; ). Here we also define the local energy</p><p>Approximating P(&#963; ) with the frequency with which a given configuration &#963; is sampled from the probability distribution encoded in our RNN wave function p RNN (&#963;; W ), we can define the expectation value of the energy as</p><p>We minimize this quantity during Hamiltonian-driven training. Because this expectation value is estimated at each Hamiltonian-driven training step and relies on samples drawn from our model, it is essential that these samples can be taken efficiently. Fortunately, due to the autoregressive nature of RNN wave functions, any number of independent samples can be drawn directly from the learned conditional probabilities in one pass of the RNN, which is an advantage over models that require Markov chain Monte Carlo sampling.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>APPENDIX C: EXPECTATION VALUES OF OFF-DIAGONAL OPERATORS</head><p>In this work, we estimate the expectation values of an offdiagonal operator &#212; using our trained RNN wave functions. As done for the estimation of the expectation value of the energy, we start with the following form of the expectation value:</p><p>Following the same steps outlined in Appendix B, this expectation value can be equivalently written as</p><p>Instead of defining &#212;loc , as done for the energy calculation, we will consider what happens when the off-diagonal operator &#212; operates on the bra &#963;|. If our purely off-diagonal operator &#212; maps a given configuration &#963; to some other configuration &#963; multiplied by a constant C O , that is &#212;|&#963; = C O |&#963; , we can rewrite our expectation value of &#212; as</p><p>, where we have used the fact that W (&#963;) = &#963;| W . As done previously, we can estimate this expectation value with a sample average</p><p>,</p><p>where we have approximated P(&#963; ) with the frequency with which &#963; is sampled from the RNN wave function.</p><p>To make this calculation more concrete, consider the off-diagonal operator &#212; = &#963; x i . In this case, C O = 1, and a configuration &#963; is mapped to a different configuration &#963; i , where (&#963; i ) j = &#948; i, j -&#963; j (which is to say &#963; i is identical to &#963; with only the ith spin flipped). By this definition, the spatially averaged expectation value of &#963; x i can then be written as</p><p>.</p><p>There are N terms in the sum over &#963; i , where N is the number of spins, as there are N-many ways to flip one spin in each sample. It is worth noting that, in addition to efficient sampling, direct access to the wave-function amplitudes of specific samples [e.g., W (&#963; i )] is essential for estimating off-diagonal operators. Autoregressive generative models provide explicit access to the learned approximation of the Born distribution, allowing these amplitudes to be directly and efficiently calculated.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>APPENDIX D: WHAT IS LEARNED FROM MEASUREMENT DATA</head><p>In an attempt to understand why data enhancement provides a dramatic improvement in performance for our 1D RNN wave functions, we tracked the checkerboard order parameter throughout the training of our models. Based on the observation that the hybrid-trained 1D RNNs provide accurate estimates for the order parameter deep in the checkerboard phase (&#948;/ &gt; 2.5), we hypothesized that it was easier for the model to learn about this two-dimensional ordering from the shots of projective measurements than from the Hamiltonian. This hypothesis is supported by Fig. <ref type="figure">9</ref>, which shows how estimates of this order parameter evolve over the course of the training for a ground state deep in the checkerboard phase (i.e., highly ordered).</p><p>When trained purely variationally, the 1D RNN is clearly unable to learn the two-dimensional ordering of the system. The estimated order parameter remains close to zero for the entirety of the training. The inability of the model to correctly learn the ordering of the phase is further exemplified by the bottom configuration displayed on the left side of Fig. <ref type="figure">9</ref>, which is a sample drawn from the model at the end of its purely variational optimization. When trained using only the experimental measurement data, the 1D RNN is able to capture some nonzero ordering, but the value learned by the model is limited by the data. Even the 2D RNN wave function, when trained on data alone, is not able to produce an estimate for M s that is notably better than the value of the order parameter calculated directly from the raw experimental data. More concretely, both orange curves are limited by the black dotted line. When trained using the hybrid data-driven, then Hamiltonian-driven, method, however, the 1D RNN is able to accurately capture the value of the order parameter corresponding to this ground state. In fact, it is in close agreement with the value obtained from our quantum Monte Carlo simulations after &#8764;1400 training steps. The top configuration displayed on the left side of Fig. <ref type="figure">9</ref> is a sample drawn from this hybrid-trained model and clearly contains much stronger ordering. Interestingly, for the 1D RNN, only combining these two types of training allows this model to capture the ordered phase. The 2D RNN wave functions are able to capture the ordering of this phase, likely due to their ability to pass information in more than one direction, but the data enhancement still provides a speedup in the time to convergence. In this case, the pretraining again seems to provide more information about the order of the phase, as seen by comparing the values of the order parameter early in the training for the 2D RNN trained purely variationally and using the hybrid method. These dynamics can be understood by realizing that our data enhancement provides the model with exemplary configurations of the target ground state or states close to the ground state early on in the training and optimizes the model to produce samples like the ones provided. This serves to better initialize the model and, because the model is able to produce samples similar to the data used during pretraining, the state space that must be traversed during the subsequent variational training is reduced. This is only the case when the data are a good representation of the ground state being targeted during the Hamiltonian-driven training.</p><p>We have thus demonstrated how certain features of a phase or, more specifically, a ground state in that phase can be more easily learned from measurement data than directly from the Hamiltonian during variational optimization. This result is foundational in our belief that this hybrid training approach will prove useful for learning ground states in exotic phases of matter. For instance, it has been shown that the nontrivial sign structure of some ground states is difficult to learn <ref type="bibr">[42,</ref><ref type="bibr">43]</ref>. Perhaps even a limited amount of data in a few bases would provide valuable information in learning such a ground state. </p></div></body>
		</text>
</TEI>
