<?xml-model href='http://www.tei-c.org/release/xml/tei/custom/schema/relaxng/tei_all.rng' schematypens='http://relaxng.org/ns/structure/1.0'?><TEI xmlns="http://www.tei-c.org/ns/1.0">
	<teiHeader>
		<fileDesc>
			<titleStmt><title level='a'>Noise and the Frontier of Quantum Supremacy</title></titleStmt>
			<publicationStmt>
				<publisher></publisher>
				<date>02/01/2022</date>
			</publicationStmt>
			<sourceDesc>
				<bibl> 
					<idno type="par_id">10338950</idno>
					<idno type="doi">10.1109/FOCS52979.2021.00127</idno>
					<title level='j'>2021 IEEE 62nd Annual Symposium on Foundations of Computer Science (FOCS)</title>
<idno></idno>
<biblScope unit="volume"></biblScope>
<biblScope unit="issue"></biblScope>					

					<author>Adam Bouland</author><author>Bill Fefferman</author><author>Zeph Landau</author><author>Yunchao Liu</author>
				</bibl>
			</sourceDesc>
		</fileDesc>
		<profileDesc>
			<abstract><ab><![CDATA[Noise is the defining feature of the NISQ era, but it remains unclear if noisy quantum devices are capable of quantum speedups. Quantum supremacy experiments have been a major step forward, but gaps remain between the theory behind these experiments and their actual implementations. In this work we initiate the study of the complexity of quantum random circuit sampling experiments with realistic amounts of noise.Actual quantum supremacy experiments have high levels of uncorrected noise and exponentially decaying fidelities. It is natural to ask if there is any signal of exponential complexity in these highly noisy devices. Surprisingly, we show that it remains hard to compute the output probabilities of noisy random quantum circuits without error correction. More formally, so long as the noise rate of the device is below the error detection threshold, we show it is #P-hard to compute the output probabilities of random circuits with a constant rate of noise per gate. This hardness persists even though these probabilities are exponentially close to uniform. Therefore the small deviations away from uniformity are hard to compute, formalizing an important intuition behind Google's supremacy claim.Interestingly these hardness results also have implications for the complexity of experiments in a low-noise setting. The issue here is that prior hardness results for computing output probabilities of random circuits are not robust enough to imprecision to connect with the Stockmeyer argument for hardness of sampling from circuits with constant fidelity. We exponentially improve the robustness of prior results to imprecision, both in the cases of Random Circuit Sampling and BosonSampling. In the latter case we bring the proven hardness within a constant factor in the exponent of the robustness required for hardness of sampling for the first time. We then show that our results are in tension with one another -the high-noise result implies the low-noise result is essentially optimal, even with generalizations of our techniques.]]></ab></abstract>
		</profileDesc>
	</teiHeader>
	<text><body xmlns="http://www.tei-c.org/ns/1.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xlink="http://www.w3.org/1999/xlink">
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1">Introduction</head><p>The last two years have seen the first claimed demonstrations of "quantum supremacy" [AAB + 19, ZWD + 20]-the experimental demonstration of an exponential quantum computational speedup. Quantum supremacy is both a necessary step on the path to building useful quantum computers, and also marks the first experimental challenge to the Extended Church-Turing thesis, a bedrock of theoretical computer science. Achieving quantum supremacy requires finding a computational task which is experimentally feasible and has strong complexity-theoretic evidence for classical intractability. Two prominent supremacy proposals have been BosonSampling <ref type="bibr">[AA11]</ref> and Random Circuit Sampling [BIS + 18], both of which involve sampling from the output distribution of a randomly chosen quantum experiment<ref type="foot">foot_0</ref> .</p><p>Despite having advantages in being near-term implementable, it is a priori unclear why these random quantum experiments should be difficult to classically simulate. While we know of many examples of quantum algorithms that attain exponential speedups over classical computation, they all rely on highly structured circuits (such as quantum Fourier transforms) which are far from typical. Why then should we expect a generic quantum experiment to realize a large computational advantage?</p><p>Aaronson and Arkhipov <ref type="bibr">[AA11]</ref> (in the setting of BosonSampling), and Bouland et al.</p><p>[BFNV19] and Movassagh <ref type="bibr">[Mov20]</ref> (in the setting of Random Circuit Sampling) gave initial evidence that these random experiments might be difficult to simulate classically<ref type="foot">foot_1</ref> . These works essentially show that it is #P-hard to exactly compute the output probabilities of ideal, noiseless, quantum experiments on average<ref type="foot">foot_2</ref> . To do this, they establish a worst-to-average-case reduction for computing the output probabilities of random experiments. Since it is a well-established fact that computing the output probability of a worst-case quantum experiment is #P-hard (see e.g., <ref type="bibr">[TD02]</ref>), these results prove that computing the output probabilities of a random quantum experiment is #P-hard as well.</p><p>These arguments have two primary limitations which keep them from directly matching the constraints of noisy quantum sampling experiments. First, they are sensitive to additive imprecision, which prevents them from addressing the hardness of classically mimicking low noise experiments. For example, in the case of random circuit sampling experiments with m gates (where we assume m is greater than the number of qubits, n) the prior works are only able to prove hardness of computing an estimate to the output probability that achieves an additive error at most 2 -O(m 3 ) <ref type="bibr">[BFNV19,</ref><ref type="bibr">Mov20]</ref>. In contrast, standard reductions <ref type="bibr">[AA11,</ref><ref type="bibr">Sto85]</ref> require hardness to additive imprecision O(2 -n ) in order to show that the more natural sampling task -which is what a lownoise<ref type="foot">foot_3</ref> random quantum circuit experiment solves "natively" -cannot be reproduced classically. A similarly large gap exists between the known (e -O(n 4 ) ) vs. conjectured (O(e -n log n )) robustness of hardness results for BosonSampling. Proving a hardness result which is robust even to this small amount of noise remains open for all quantum supremacy proposals.</p><p>The second and more fundamental limitation of these arguments is that they say nothing about the complexity of high noise experiments. The prior average-case hardness arguments of <ref type="bibr">[AA11,</ref><ref type="bibr">BMS16,</ref><ref type="bibr">BFNV19,</ref><ref type="bibr">Mov20]</ref> all explore the complexity of mimicking low-noise experiments performed with a high fidelity that stays constant as the system size grows. This would require improved control over the noise (e.g. error correction) to achieve asymptotically. But in the NISQ era we do not have the quantum resources to implement error correction and so experiments are highly noisy. For example, Google's recent quantum supremacy experiment estimated that their fidelity was merely &#8764; 0.2% (i.e., the experiment was &#8764; 99.8% noise) [AAB + 19] and that their fidelity will further decrease as they scale their system. Therefore their quantum supremacy claim hinges on whether or not random circuit sampling is intractable in the high noise regime, in which there is only a small signal of the correct experiment in a sea of noise, and this signal diminishes with system size. In some cases noise can render the simulation of certain quantum circuits substantially easier (e.g. <ref type="bibr">[BMS17,</ref><ref type="bibr">ZSW20]</ref>). For this reason, it is critical to examine if these hardness of computation results are robust to noise models that are relevant to near-term quantum experiments. For example, can anything be said about the hardness of computing the output probabilities of such circuits in the high noise regime, without error correction?</p><p>In this work we make progress on both these limitations. First, we substantially improve the robustness of prior hardness results. For example, in the case of RCS, we show that it is #P-hard to compute the output probabilities of (noiseless) random circuits to additive imprecision 2 -O(m log m) . In the regime of constant depth circuits where m = O(n), this robustness nearly reaches that needed to establish the classical hardness of sampling in the low noise regime, though a small gap remains.</p><p>Theorem 1 (Simplified). For any constant &#951; &lt; 1/4, it is #P-hard to compute the output probability of random quantum circuits C up to additive error 2 -O(m log m) , with probability at least 1 -&#951; over the choice of C with m gates.</p><p>We note that Kondo, Mori and Movassagh <ref type="bibr">[KMM21]</ref> recently claimed the same robustness as our Theorem 1 and arrived at it independently.</p><p>We also give an analogous result for BosonSampling, showing that it is hard to compute the output probabilities of (noiseless) random n-photon m = n 2 mode linear optical networks to additive imprecision e -6n log n-O(n) (see Corollary 1). This nearly matches the desired robustness for BosonSampling (O(e -n log n )) up to a constant factor in the exponent. To our knowledge, this is the first time that a quantum supremacy proposal exhibits a proven robustness of hardness which is polynomially related to the conjectured robustness in absolute terms.</p><p>Second, we show that these hardness results remain true even in the high noise regime -i.e. in the presence of a constant rate of noise, without error correction. For example, consider the case of random circuit sampling, and suppose one fixes a noise model, such as i.i.d. depolarizing noise at rate &#947; = &#920;(1) after each 2-qubit gate. We show that it is also #P-hard to compute the output probabilities of the noisy circuits<ref type="foot">foot_4</ref> on average to additive imprecision 2 -O(m log m) . This holds so long as the noise rate &#947; is below the error detection threshold.</p><p>Theorem 2 (Simplified). The same average-case hardness result as in Theorem 1 applies to noisy random quantum circuits, assuming the noise is gate-independent and can be error-detected.</p><p>This second hardness result is even more interesting when we consider additional structure present in the uncorrected noise setting: the output probabilities of random circuits converge exponentially quickly to the uniform distribution. More specifically, it is widely conjectured that the output probabilities of such noisy random quantum circuits are 2 -O(m) close to uniform [BIS + 18]. In fact we rigorously prove this convergence in a toy model in which errors are interspersed with global Haar-random unitaries, though only slower exponential convergences have been shown for random circuit sampling <ref type="bibr">[ABOIN96,</ref><ref type="bibr">GD18]</ref>. Therefore the output probabilities of the noisy circuits are very close to 2 -n , up to some exponentially suppressed deviations.</p><p>Previous work on average-case hardness. The core idea of prior average case hardness proofs, such as that of Lipton <ref type="bibr">[Lip89]</ref>, is to find an algebraic structure in the problem and use it as an error correcting code. Specifically, for many #P-hard problems, the hard function can be written as a low degree polynomial of the input values. One can think of this low degree polynomial as a Reed-Solomon code, which allows one to infer the (possibly incorrect) value of a worst-case instance from the (typically correct) values of many average-case instances. In order to do so one must find a way of relating the value of a worst case instance to the value of several average-case instances in a way which preserves the low degree polynomial structure. For example, in the case of permanent of matrices over finite fields, given an arbitrary matrix A &#8712; F n&#215;n q , if one picks a random matrix B and lets A(&#952;) = A + &#952;B for &#952; &#8712; F q , one obtains a family of related instances {A(&#952;)} &#952;&#8712;Fq which are each individually random, but correlated in a way which preserves this low-degree polynomial structure -namely, Per(A(&#952;)) is a low degree polynomial in &#952;. This allows one to infer the value of the worst-case instance Per(A(0)) from many values at &#952; = 0 by polynomial interpolation.</p><p>For random quantum circuits similar ideas can be made to work, but creating these correlated families of instances is not straightforward, due to the fact that the algebraic structure of quantum circuits is very different than that of permanents. There are currently two different ways to establish this polynomial structure <ref type="bibr">[BFNV19,</ref><ref type="bibr">Mov20]</ref>. The original approach of <ref type="bibr">[BFNV19]</ref> accomplished this by taking a family of correlated quantum circuits without a low degree polynomial structure, and imposing one by truncating a particular Taylor series. They used this structure to show the output probabilities of "truncated" circuits are hard to compute on average. While at first glance this average-case result might appear unrelated to the average-case hardness of random circuits -as the "input type" to this problem is different than the random circuit itself, due to the truncations -surprisingly Bouland et al. showed this average-case result is related to the original conjecture, in that robust hardness with respect to this truncated Taylor distribution is equivalent to robust hardness with respect to random circuits (see SI of <ref type="bibr">[BFNV19]</ref>). Interestingly, while Bouland et al.'s result provided the first evidence in support of the approximate average-case supremacy conjecture for random circuits, their work left open whether or not the true output probabilities of these circuits are hard to compute on average, as pointed out in the SI of <ref type="bibr">[BFNV19]</ref> and discussed in more detail by [Mov20, NPD + 20]. More recently, <ref type="bibr">Movassagh [Mov20]</ref> found a clever workaround to avoid the truncation issue. By simply choosing a different path between the worst and average case instances -known as the "Cayley path" -he showed it is hard to compute the exact output probabilities of random circuits, using a similar flow of argument to <ref type="bibr">[BFNV19]</ref>. As Movassagh's proof techniques are simpler, and the differences between these approaches could matter in high noise setting (see later discussion in Section 4.1 and Appendix A), we use Movassagh's interpolation method throughout this work when discussing random circuit sampling results.</p><p>More generally, the strategy of establishing average-case hardness via polynomial interpolation is well studied in the context of problems over finite fields. For example, using the Berlekamp-Welch algorithm <ref type="bibr">[WB86]</ref> and list decoding techniques, it has been shown that it is hard to compute the permanent of random matrices in F n&#215;n q with constant [GS92] or even inverse polynomial <ref type="bibr">[CPS99]</ref> probability, so long as the field size q is sufficiently large (see also <ref type="bibr">[Gur06]</ref> for an overview). However, in the context of quantum supremacy, the relevant polynomials are all over the fields C or R. For example, the real-valued output probabilities of quantum circuits are low degree polynomials in the complex gate entries of the input circuit. This complicates the picture for proving average-case hardness, because first, one needs to consider that some of the values might be incorrect, and second on these correct values, there may be some imprecision in the value, i.e. the correct values may only be within &#177;&#948; of the true real value<ref type="foot">foot_6</ref> .</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.1">The bottleneck in prior works</head><p>Existing average-case hardness results over the reals, such as those in [AA11, <ref type="bibr">BFNV19,</ref><ref type="bibr">Mov20]</ref>, can tolerate only a tiny amount (&#948; = 2 -poly(n) ) of such imprecision. This is for two reasons. First, many of the sophisticated tools used to show strong average-case hardness over finite fields, such as Berlekamp-Welch and list decoding techniques, do not immediately port over to the real case. Therefore one is forced to use straightforward Lagrangian interpolation/extrapolation techniques in the average case argument<ref type="foot">foot_7</ref> , following the original arguments of Lipton <ref type="bibr">[Lip89]</ref>. Second, it is well known that polynomial interpolation/extrapolation over the reals is exponentially sensitive to noise. This is essentially because there exist low degree polynomials (like Chebyshev polynomials) which are very close to one another in some interval but diverge rapidly outside of that interval. In fact very small errors in the value of average-case instances can blow up exponentially when extrapolated to the value of the worst-case instance.</p><p>Because of these two limitations, the strongest existing average-case hardness results for computing the output probabilities of random quantum circuits can only tolerate imprecision of order 2 -O(m 3 ) in the output probabilities of n-qubit m-gate quantum circuits. More precisely, Movassagh <ref type="bibr">[Mov20]</ref>, building on prior arguments of Aaronson and Arkhipov <ref type="bibr">[AA11]</ref> and Bouland et al.</p><p>[BFNV19]<ref type="foot">foot_8</ref> , showed the following:</p><p>Theorem ([Mov20], Simplified). It is #P-hard to compute the output probability of random quantum circuits C up to additive error 2 -O(m 3 ) , with probability &#8805; 1 -1 poly(n) over the choice of C with n qubits and m gates.</p><p>The origin of this O(m 3 ) factor in the exponent is simple to understand. A well-known result due to Paturi states that if two degree-d polynomials are &#948;-close to one another in the interval [0, &#8710;], then their values at 1 can differ by at most &#948;&#8226;2 O(d&#8710; -1 ) <ref type="bibr">[Pat92]</ref>, which has an exponential dependence in the relative distance 1/&#8710; and can be saturated by Chebyshev polynomials in many parameter regimes. The degree of the relevant polynomial in the context of random quantum circuits is O(m). The relative distance of interpolation 1/&#8710; in existing arguments is O(m 2 ). Therefore, the Paturi argument shows that polynomial extrapolation over this interval incurs a 2 O(m 3 ) blowup in the imprecision when going from the average-case instance to the worst-case instance, which means that a precision of at least 2 -O(m 3 ) is required in the average-case output value to infer the worst-case value (which is bounded in [0, 1]) to any nontrivial level of precision.</p><p>Let us be slightly more precise as to why the relative distance of interpolation is O(m 2 ). In these average case arguments, given an arbitrary (worst-case) circuit C, one defines a (randomly chosen) family of circuits C(&#952;) (&#952; &#8712; [0, 1]) which essentially<ref type="foot">foot_9</ref> has the following properties:</p><p>&#8226; C(1) = C, that is, the worst-case circuit corresponds to &#952; = 1.</p><p>&#8226; C(0) is random, and C(&#952;) is close to randomly distributed when &#952; is small. More specifically, C(&#952;) is O(m&#952;) close to random in total variation distance for small values of &#952;.</p><p>&#8226; The probability that the circuit C(&#952;) outputs the all-zero outcome 0 n , denoted as p 0 (C(&#952;)), is a degree d = O(m) polynomial in &#952;.</p><p>The basic idea of the average-case hardness arguments is to perform standard Lagrangian interpolation. Consider circuits C(&#952;) for many small values of &#952; &gt; 0 that are less than some maximum value &#952; max = &#8710; -say d + 1 of them. One then uses the algorithm which works in the average case to compute p 0 (C(&#952;)) for these circuits. If the algorithm can correctly compute p 0 (C(&#952;)) within &#177;&#948; imprecision on all of them, then these points uniquely define a polynomial q(&#952;) by Lagrangian interpolation which is close to the polynomial p 0 (C(&#952;)) at these evaluated points. By the Paturi error bound, the polynomial q is &#948; &#8226; 2 O(m&#8710; -1 ) close to p 0 (C(&#952;)) in the interval [0, 1]<ref type="foot">foot_10</ref> . Therefore one simply uses q(1) as an estimate for the worst case output probability p 0 (C(1)) = p 0 (C), which is within additive error &#948; &#8226; 2 O(m&#8710; -1 ) . So long as &#948; 2 -O(m&#8710; -1 ) , this will be a good approximation to the desired value.</p><p>The reason that existing arguments set &#8710; = O(1/m 2 ) -leading to a net imprecision tolerance of &#948; = 2 -O(m 3 ) -is that Lagrangian interpolation requires all of the O(m) evaluated points to be correct (that is, up to imprecision &#948;). So each evaluation point needs to be correct with probability at least 1-O(1/m), so that all points are correct with high probability by a union bound<ref type="foot">foot_11</ref> . However, if one has an algorithm which works on random instances, it will only work with probability &#8805; 1 -O(m&#8710;) on each evaluated points, because these points are not actually distributed randomly, but merely O(m&#8710;) close to random, as given in the second property defined above. Therefore one must set &#8710; = O(1/m 2 ) for this argument, to ensure that all of the evaluated points are correct with high probability.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.2">Going beyond the imprecision bottleneck</head><p>One can easily see that the value of &#8710;, as well as the degree of the base polynomial d = O(m), are the controlling factors of the error in this argument. A natural question is whether or not either of these can be improved. Increasing the value of &#8710; in the interpolation argument would cause some of the evaluated points to be incorrect, which means standard Lagrangian interpolation cannot be used. In the finite field case, it is well-known how to handle a constant fraction of incorrect points via the Berlekamp-Welch algorithm <ref type="bibr">[WB86,</ref><ref type="bibr">GS92]</ref>. It is natural to ask if this algorithm, or a variant of it, could be made to work over the Reals. Critically, in order for such an algorithm to be relevant to our context, we would need the algorithm itself to not blow up errors too badly -or else the gains we make from choosing closer evaluation points could be swamped by the error blowup of the Real version of Berlekamp-Welch.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.2.1">First step: robust Berlekamp-Welch over the Reals</head><p>Our first result is to fill in this gap, and show that a variant of Berlekamp-Welch argument still holds over R or C, without too much blowup in error. That is, suppose there exists a degree-d polynomial p(x) over the reals, as well as poly(d) evaluation points {x i } which are equally spaced over the interval [0, 1] and claimed values of the polynomial y i at those points, of which only 1 -&#951; fraction are correct within additive error &#177;&#948;, where &#951; &lt; 1 4 is a constant. Then our results show that any other polynomial q(x) which also &#948;-agrees with (possibly different) 1 -&#951; fraction of these evaluation points can diverge from p(x) by at most &#948; &#8226; 2 O(d) at all points in the interval.</p><p>Fortunately this error blowup happens to be less than that incurred by our interpolation argument, so is irrelevant to the asymptotics of our result. This statement is relatively easy to prove if all of the points the polynomials agree on are "bunched" together on one side of the interval, but extending this to the case where the correct/incorrect points are interspersed with one another requires a very careful argument. In particular our argument inductively shows that one can slowly "bunch" the correct and incorrect points together one by one without loss of generality, without blowing up the error. From there one can then appeal to the bunched case which follows from more standard arguments combining well-known statements about the maximality of Chebyshev polynomial blowups with Markov's inequality for the maximum derivative of polynomials (see Section 5 for details).</p><p>The astute reader will notice that this argument is information theoretic in nature -that is, if one were to be handed a polynomial which agrees with a large constant fraction of the points, then that polynomial can't diverge "too badly" from the true polynomial, but it provides no way of finding such a polynomial (which is part of the result of the Berlekamp-Welch algorithm). Interestingly, in the context of proving average-case #P hardness, this is all that we need! That is to say, we only need a form of Berlekamp-Welch error correction with an NP decoder. This is because for the purposes of the complexity-theoretic arguments relevant to quantum supremacy, it suffices to prove that computing the output probabilities of random circuits lies outside of the polynomial hierarchy 13 . Therefore using an NP decoder in our argument simply means we are showing these probabilities are #P hard to compute under BPP NP reductions, which is still outside of PH, unless PH collapses to a finite level. Whether an algorithmic version of this result holds over the Reals remains unclear, and is left as an intriguing open problem.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>2.2.2</head><p>The next barrier at 2 -O(m 2 ) precision With this in hand, we have now shown that it is #P-hard (under BPP NP reductions) to compute the output probabilities of random circuits to additive imprecision 14 2 -O(m 2 ) . We did so by decreasing the relative distance 1/&#8710; we needed to interpolate in our argument from O(m 2 ) to O(m) in relative terms, by allowing a constant fraction of the interpolation points to be incorrect. This proof proceeded by generalizing techniques developed for finite fields to the Reals, but with significant modifications to accommodate the imprecision of real values.</p><p>We now wish to push this result closer to achieving hardness to 2 -O(m) imprecision. Once again the degree of our polynomial is O(m), so unless we find a different polynomial to work with, the only way forward seems to be to increase the interpolation distance further -namely to set &#8710; = O(1). A natural question is if more sophisticated techniques developed for finite fields, such as list decoding, might help us to do this, by allowing us to handle a larger fraction of incorrect points, and hence handle a larger interpolation distance. Unfortunately these sort of ideas will not help us here by more than constant factors. The issue is that the distribution of circuits C(&#952;) are only O(m&#952;) close to random in total variation distance. Therefore, we cannot guarantee we can correctly evaluate any points beyond &#8710; = &#920;(1/m) with any substantial probability 15 . In fact even proving a variant of list decoding over the Reals would only improve our result by a factor of two in the constant prefactor in the exponent. As a result, new ideas will be required to tackle this barrier which are not derived from analogues in the finite field case.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.2.3">Second step: improved robustness of interpolation over the Reals</head><p>Therefore, our second step is to show improved robustness for real-valued polynomial interpolation. We will first show this using a novel error suppression technique based on variable rescaling which is specific to the Reals. We will then show that this technique proves that in prior robustness results, the Paturi bound was "overly pessimistic" as it was being applied in a parameter regime in which it is not tight! Therefore one can accurately interpolate degree d polynomials to 2 -O(d log &#8710; -1 ) additive error, and moreover significantly simplify prior proofs of robustness. We provide a pedagogical explanation of this proof below.</p><p>To see this, we first note that a variable rescaling trick already shows improved robustness using the standard Paturi bound. The basic idea is to consider "rescaling" the variables in our problem by replacing our interpolation variable &#952; with the variable x k for some large integer k &#8712; N. We then perform our polynomial interpolation algorithm over the variable x instead of the variable &#952;.</p><p>13 This is because the whole point of these arguments is that the output probabilities of classical sampling algorithms can be approximated in the polynomial hierarchy -see Appendix A for details.</p><p>14 We note that Movassagh <ref type="bibr">[Mov20]</ref> independently claimed a similar result, but was missing the robust Berlekamp-Welch theorem necessary for it to be rigorously proven (as also noted in <ref type="bibr">[ODMZ20]</ref>). Our robust Berlekamp-Welch argument (derived independently from Movassagh's work) completes his proof of this claim, but with the same weakening to hardness under BPP NP reductions. 15 For example, if one considers setting &#8710; = &#920;(1), the total variation distance between the distribution of C(&#952;) and random becomes 1 -1/exp, and none of the evaluated points will necessarily be correct with non-negligible probability.</p><p>That is, we still consider using the average-case algorithm to compute the output probabilities of many circuits C(&#952;) for many values of &#952; &lt; &#8710; = O(1/m). But when running the robust Berlekamp-Welch argument, instead of asking the NP oracle for the degree O(m) polynomial in &#952; which &#948;-approximates the correct values on 1 -&#951; fraction of the evaluating points, and using that to extrapolate our value at &#952; = 1, we instead ask for the same, degree O(mk) polynomial in the variable x, and use that polynomial in x to extrapolate our value at x = 1.</p><p>At first glance this sounds strange -how could simple renaming variables improve the robustness of interpolation? And why would we be using higher degree polynomials, when they increase the error of extrapolation in general? What is going on here is that we are performing a continuous transformation of the interval [0, 1] over which we are interpolating. That is, consider the map &#934; : [0, 1] &#8594; [0, 1] defined as &#934;(x) = x k . This transformation is bijective, continuous, and preserves the endpoints of the interval. Critically, this transformation "stretches out" the part of the interval near 0, and compresses it near 1. Therefore, for a fixed value of &#8710; = O(1/m), the corresponding value of x max is substantially increased -it becomes 1/m 1/k . In other words, the interpolation distance is much shorter for this new variable, which decreases the blowup in error from interpolation. Of course we pay a price for performing this transformation, as the degree of our polynomial in x is a factor of k larger than the degree of the polynomial in &#952;. However this blowup due to an increased polynomial degree is less important than the savings from decreased extrapolation distance, and in net this results in a substantially reduced error in our extrapolation.</p><p>To see this more concretely, recall that the extrapolation error of a degree d polynomial interpolated over distance &#8710; is &#948; &#8226; 2 O(d&#8710; -1 ) <ref type="bibr">[Pat92]</ref>. Our robust Berlekamp-Welch based algorithm performs interpolation with d = O(m) and 1/&#8710; = O(m). After applying our variable rescaling, the degree becomes O(mk), while &#8710; becomes 1/m 1/k . Therefore after this transformation this extrapolation error becomes only &#948; &#8226; 2 k&#8226;O(m)&#8226;m 1/k . By choosing k = log m this becomes merely a blowup of 2 O(m log m) . Crucially the error &#948; in the output probability of the quantum circuit is not affected by this rescaling of the input variables of the polynomial (namely &#952;) -and therefore the value of &#948; is the same in the rescaled polynomial. This therefore allows us to show our main result:</p><p>Theorem 1 (Restated, simplified). For any constant &#951; &lt; 1/4, it is #P-hard under a BPP NP reduction to compute the output probability of random quantum circuits C up to additive error 2 -O(m log m) , with probability at least 1 -&#951; over the choice of C with m gates.</p><p>It also shows an analogous hardness result in the case of BosonSampling, as we will describe in Section 4 (see Corollary 1).</p><p>A deeper examination of the above results reveals that this argument proves the Paturi bound itself it not tight in the parameter regime in which it was being applied to quantum supremacy arguments. The standard justification of the Paturi bound -that the error of a degree d polynomial interpolated over distance 1/&#8710; blows up by 2 O(d&#8710; -1 ) -is that Chebyshev polynomials saturate the bound. This is true in the regime of &#8710; = &#920;(1) -essentially because Chebyshev polynomials have coefficients of order 2 O(d) , and the individual monomials happen to "cancel out" in the unit interval, but this carefully orchestrated cancellation quickly falls apart as one leaves the unit interval, and the value of the polynomial approaches 2 O(d) . This behavior is what the Paturi bound tightly captures.</p><p>However our variable rescaling trick illustrates that in the regime of asymptotically decreasing &#8710;, this bound cannot possibly be tight, because nothing in our argument invalidated Chebyshev polynomials <ref type="foot">16</ref> . In other words, our variable rescaling argument proved that Chebyshev polynomials cannot blow up faster than 2 O(d log d) when extrapolated to distance 1/&#8710; = O(d). In retrospect this has a much simpler proof, and it follows because Chebyshev polynomials are after all only polynomials, and not exponential functions! That is, while Chebyshev polynomials appear to grow exponentially as they "leap out" of the small interval, this exponential growth cannot be sustained over longer ranges -because the function can only grow polynomially quickly in the large distance limit. To put this more formally, if one has a degree-d polynomial p(x) = d i=0 c i x i , then the value of the polynomial at &#8710; -1 must obey p(&#8710; -1 ) &#8804; i |c i |&#8710; -d . So if &#8710; -1 = d c for some constant c &gt; 0, and we know c i = 2 O(d) , we immediately get that Chebyshev polynomials -and hence our interpolation error -is bounded by</p><p>. That is to say, over longer extrapolation distances, Chebyshev polynomials (and in fact any polynomials with bounded coefficients, which is guaranteed in our settings) only have a 2 O(d log d) blowup.</p><p>Armed with this insight, we provide a significantly simpler proof of robustness of polynomial interpolation to 2 -O(d log d) additive imprecision starting from first principles. This proof is both simpler than our variable rescaling trick, and results in better constants in the exponent. Moreover, it substantially simplifies prior robustness of interpolation results, as once it is combined with our robust Berlekamp-Welch arguments, it removes the need to invoke heavy-duty techniques of Rahmanov <ref type="bibr">[Rak07]</ref> in the technical portion of the result and instead only requires applying the much simpler Markov's inequality<ref type="foot">foot_13</ref> . Instead our proof is self-contained and based on fundamental facts about real-valued polynomials.</p><p>In the end our final result for random circuit sampling with m 2-qubit gates shows robustness to additive imprecision &#948; = e -16m log m-O(m) . For an n-photon m = O(n c )-mode BosonSampling experiment, our result shows robustness to additive imprecision &#948; = e -(c+4)n log n-O(n) . In contrast the prior leading constants in the exponent of our results for RCS and BosonSampling were 147 and (c + 70), respectively, when using the variable rescaling error suppression technique (see Section 5.1 for a detailed discussion).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.3">Extending these arguments to the high-noise regime</head><p>Our second result is to extend these average-case hardness arguments to the high noise regime. That is, suppose we fix a noise model, such as i.i.d. depolarizing noise at rate &#947;. We show that it is also #P-hard (under BPP NP reductions) to estimate the output probabilities of noisy random circuits under this noise model to additive error 2 -O(m log m) as well. This holds so long as the noise rate &#947; is below the error detection (not correction!) threshold. More generally, our result holds for any stochastic noise model which admits error detection, and which is gate-independent.</p><p>There are two key ideas which enable this strengthening of our result. The first is to note that our worst to average-case reduction techniques apply essentially immediately to the noisy case! This is because for stochastic noise models like i.i.d. depolarizing noise, the output probability of the noisy circuit is a convex combination of the output probabilities of many different noiseless circuits. To be more precise, recall that depolarizing noise with Pauli noise rate &#947; after a two-qubit gate is equivalent to applying one of the 15 different nontrivial Pauli operators (such as I &#8855; X or Y &#8855; Z) with probability &#947;/15 after that gate, or else applying the identity operator with probability 1 -&#947;. Therefore, applying depolarizing noise after every gate of an m-gate circuit is equivalent to taking the original circuit C, and inserting into the circuit one of the 16 m different "noise patterns" &#958; that could have occurred in the circuit, with appropriately weighted probabilities -call these circuits C &#958; . The output probability of the noisy circuit is the appropriately weighted convex combination of the output probabilities of C &#958; . In other words, if p 0 (C, N ) is the probability that the noisy circuit outputs 0 n under a noise model N , then we have that</p><p>where T N is the distribution over error patterns arising from the particular noise model N . This is true in general for stochastic noise models.</p><p>The key point to note is that our average-case hardness arguments are invariant under taking convex combinations of circuits. This is because the key fact we use in our reduction is that, for some appropriately defined family of circuits C(&#952;), p 0 (C(&#952;)) is a low degree polynomial in &#952;. The set of low degree polynomials is convex. Therefore, if one simply defines C &#958; (&#952;) to be the same circuits C(&#952;) with the noise pattern &#958; inserted into the circuit, then we have that</p><p>is also a low degree polynomial in &#952;. As a result the exact same worst-to-average case reduction arguments work in the noisy setting (see Section 4.2 and 4.3 for more details). Interestingly, for this to work there is no need to know the noise rate &#947; or noise model to apply this argumentbecause the low degree polynomial structure is preserved by any noise model that is stochastic and gate-independent.</p><p>We have therefore established that average-case noisy output probabilities are as hard to estimate as worst-case ones. The second step required is to establish that these noisy worst-case probabilities are indeed #P-hard to compute<ref type="foot">foot_14</ref> . To do so we follow the observations first developed by Fujii <ref type="bibr">[Fuj16]</ref>. In particular, Fujii observed that no polynomial time algorithm can exactly<ref type="foot">foot_15</ref> sample from the output distribution of noisy quantum circuits in the worst case unless PH collapses, so long as the noise rate &#947; is below the error detection threshold. The basic idea is that quantum error detection is highly efficient. With polynomial overhead, one can create a quantum error detecting code which is exponentially sensitive to errors. That is, if no error is detected, then one is certain with probability 1 -1/exp that no error indeed occurred. Using this, Fujii showed that one can post-select the "no error detected" event so that the post-selected distribution is exponentially close to the ideal output distribution. By standard post-selection arguments along the lines of <ref type="bibr">[Aar05,</ref><ref type="bibr">BJS11]</ref> this suffices to show worst-case hardness. A modification of this argument can be used to show that computing these output probabilities is essentially #P-hard. This establishes the worst-case hardness of noisy circuits, which allows us to show the following theorem:</p><p>Theorem 2 (Restated, simplified). The same average-case hardness result as in Theorem 1 applies to noisy random quantum circuits, assuming the noise is gate-independent and can be error-detected.</p><p>Note that there is one restriction which applies to this theorem which is not present in the statement of Theorem 1. Namely, this theorem assumes that the family of random circuits considered is deep enough to encode a quantum error detecting code which works against the chosen noise model and noise rate. This is because the worst-case circuits used in the argument of Theorem 2 are error detecting circuits, while constant depth circuits suffice for Theorem 1. Therefore our result essentially says that the output probability of noisy random circuits are as hard to compute as the largest error detected circuit one could have fit into that particular random circuit architecture.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3">Discussion</head><p>In this work we have sought to resolve a fundamental disconnect between the theoretical basis of quantum supremacy -based on very low noise random quantum circuits [TD02, BJS11, AA11, BMS16, BFNV19, Mov20] -and the highly noisy experimental implementations performed so far [AAB + 19, ZWD + 20]. Namely, the initial theoretical attempts at proving "quantum supremacy" for sampling from random quantum circuits sought to do so assuming one can maintain a large constant fidelity as the system size grows -which would require error correction to achieve scalably. Even establishing this is classically intractable remains a major theoretical challenge, due to the sensitivity of the average-case hardness of computing results to small amounts of imprecision -a limitation that we made substantial progress on in our first result (Theorem 1) but did not resolve. However the initial experimental attempts at demonstrating supremacy showed a fidelity that decreases exponentially with system size, because quantum error correction is not being performed. As these sampling experiments become exponentially close to uniform asymptotically <ref type="bibr">[ABOIN96,</ref><ref type="bibr">GD18]</ref>, one cannot argue that these experiments are hard to reproduce in an approximate sense.</p><p>Consequently, if there is any hope of finding a hard quantum signal present in these nearterm experiments, it must be present in the exponentially diminishing deviations from the uniform distribution. A natural question is whether or not these small deviations from uniformity are hard to reproduce classically. Our second main result (Theorem 2) provides evidence in this direction, by extending the prior hardness of exact computation results for random quantum circuits to the noisy setting. That is, we prove it is hard to exactly compute the output probability of a noisy random quantum circuit, to finer precision than the rate of convergence to uniform. This can be interpreted as proving the existence of a quantum signal that survives the noise, thereby formalizing an intuition that underlies the optimism that the recent Google experiment is solving a hard problem.</p><p>A natural hope is that one might be able to use these results to show that one cannot exactly reproduce the results of the Google experiment in a sampling sense. That is, can one argue that no classical algorithm could sample from the output distribution of noisy random quantum circuits nearly exactly? Unfortunately we know we cannot do this using current techniques <ref type="bibr">[AA11]</ref>. The reason is that these techniques reduce hardness of exact (or approximate) sampling to the hardness of approximately computing output probabilities using Stockmeyer's algorithm <ref type="bibr">[Sto85]</ref>, and the amount of imprecision required to be tolerated (O(2 -n )) is intrinsic in the argument, even in the exact sampling case<ref type="foot">foot_16</ref> . As previously discussed, the output probabilities of noisy random circuits cannot be hard to compute to this degree of imprecision, due to the fact the noisy distributions are exponentially close to uniform. While <ref type="bibr">[AC17,</ref><ref type="bibr">AG19]</ref> took steps to try to address the simulation of such noisy experiments via a different angle, their approach assumes the fidelity remains a small constant as the system size scales, and requires making non-standard assumptions. Therefore, we believe fundamentally new ideas will be required -beyond the Stockmeyer approach -if one wishes to address the hardness of exactly simulating highly noisy circuits with decreasing fidelities.</p><p>Our main results also offer an interesting new message about the low-noise setting: any potential means for proving the hardness of sampling in the low noise setting cannot be extended to hold in the high noise setting. Our barrier holds in the setting of random circuits with superconstant depth 21 . This offers a significant barrier for improving the status quo of quantum supremacy results: the only chance of resolving the hardness of sampling problem (using the standard and only approach we have -via a reduction to approximate computing using Stockmeyer's theorem) is by inventing new techniques that are explicitly not resilient to uncorrected noise! This immediately rules out any "convex method" -i.e. any method where the underlying error correcting code structure is invariant under convex combinations -including polynomial interpolation, which we prove in Theorem 2 is noise resilient.</p><p>Our results are complementary, but distinct from past barrier results, starting with that of Aaronson and Arkhipov <ref type="bibr">[AA11]</ref>, who first realized that polynomial interpolation cannot close this gap if the ensemble of circuits anticoncentrates (See Appendix B for more details). While there are subtle technical differences between our barriers 22 , they arise for similar reasons -namely the set of low degree polynomials is convex. Later, Aaronson and Chen [AC17] proved an oracle separation relative to which approximate quantum sampling is classically easy, but the PH remains infinite. While this rules out any relativizing technique for resolving the hardness of approximate sampling, their result is very sensitive to the distinction between approximate and exact sampling -indeed, worst-case hardness of exact sampling results do exist and relativize <ref type="bibr">[BJS11]</ref>! Therefore it is unclear how to extend their barrier result to our setting in which we are interested in the hardness of exact sampling from the output distribution of high-noise and/or ideal random quantum circuits.</p><p>Most recently, a barrier result for hardness of sampling was shown by Napp et al. [NPD + 20] who give an approximation algorithm for computing the output probabilities of very shallow (i.e., fixed constant) depth 2D random quantum circuits. This is interesting, because one can use the methods of Bouland et al. and Movassagh to prove the average case hardness of exactly computing the output probabilities of the same ensemble of circuits. The implication is that any technique that establishes hardness of computing output probabilities of random quantum circuits to within additive error O(2 -n ) must not work for very shallow depth quantum circuits. This is a limitation, since current techniques that use polynomial interpolation are not sensitive to depth. This result is nicely complementary to our barrier, and in fact we can use it to conclude that any technique that extends the hardness of computation results to hardness of sampling results must be sensitive to both depth and noise.</p><p>Interestingly the noise barrier identified in this paper, and the depth barrier identified by Napp et al. [NPD + 20], rely on specific properties of random quantum circuits. We leave it as an open problem if analogous barriers apply to the BosonSampling case 23 . Of course it is already known that polynomial interpolation will not prove hardness of sampling for BosonSampling <ref type="bibr">[AA11]</ref> -in the case of m = n 2 modes this barrier shows polynomial interpolation cannot prove robust hardness beyond O(e -3n log n ) additive error. This leaves open the possibility of improving the exponent in our robustness result for BosonSampling from 6 to 3 using current techniques, but not beyond. However, the fact that our results for BosonSampling are closer to achieving the desired robustness -due to structural differences in the algebraic properties of BosonSampling vs. RCS -indicates that BosonSampling could have certain advantages for proving "quantum supremacy" in the low-noise regime going forward.</p><p>with d &gt; 1/&#947;0, where &#947;0 is the error detection threshold, any noise-agnostic proof technique cannot prove sufficient robustness to prove hardness of sampling in the low noise setting. Note the depth d must be sufficiently large to perform error detection as well. 22 In particular our barrier does not require anticoncentration, but instead requires a distribution over random circuits which converges to uniformity under depolarizing noise. Both properties are related to the scrambling nature of random circuits, but it is not a priori clear they are identical. 23 We note this might be possible using the techniques of e.g. [GPRS19, QBQGP20, ONFJ21] for simulating noisy BosonSampling experiments, though there are many details to be worked out.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4">Average-case hardness</head><p>In this section we develop the average-case hardness of computing the output probability of random quantum circuits. We first establish improved results in the noiseless setting, and then show that our techniques can be generalized to the noisy case. An important ingredient of our proof is a robust polynomial interpolation technique, which is independently developed in Section 5.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.1">Worst-to-average-case reduction</head><p>Our first result is to establish a worst-to-average-case reduction for computing the output probability of random quantum circuits in the noiseless setting. That is, if there exists an algorithm that can approximate the output probability of most random quantum circuits over an architecture, then there exists an algorithm that can approximate the output probability of every quantum circuit over the same architecture. Our average-case hardness result tolerates a constant failure probability (i.e. it is hard to compute the output probability for a large constant fraction of random circuits), and is robust up to a small additive error 2 -O(m log m) for random circuits with m gates. Our result is therefore an improvement over previous results <ref type="bibr">[BFNV19,</ref><ref type="bibr">Mov20]</ref> in both aspects.</p><p>The main idea of the worst-to-average-case reduction works as follows. Consider a circuit architecture over which computing the output probability up to an exponentially small additive error is #P-hard in the worst case. Then, to prove average-case hardness, it suffices to construct a reduction that belongs to some finite level of the Polynomial Hierarchy. That is, given an arbitrary circuit C 0 , our goal is to compute p 0 (C 0 ) using a procedure in the Polynomial Hierarchy, with access to an oracle O that can compute p 0 (C) for a large fraction of random circuits C. For a circuit architecture A, let H A be the distribution over circuits such that each gate is independently drawn from the Haar measure. Let {G i } i=1,...,m be the quantum gates in C 0 . We create a new circuit C 1 by applying a "one-time pad" to C 0 , where each gate G i is replaced by</p><p>where {H i } is independently drawn from the Haar measure over the unitary group and has the same dimension as G i . By the invariance of Haar measure, C 1 is distributed the same as H A . Therefore {H i } can be understood as the "random seed" used for the one-time pad. By definition, O can compute an accurate approximation of p 0 (C 1 ) with high probability. However, this number alone does not contain any information on our desired quantity p 0 (C 0 ). The main insight that allows us to correlate average-case solutions to the worst-case quantity, which was originally developed to show the average-case hardness for permanents [Lip89, GLR + 91], is to create many random instances {C i } by the following procedure: first sample a "random seed" {H i }, then apply small and different perturbations to the random seed, and then apply each perturbed random seed to C 0 . As a result, while each random instance is marginally distributed approximately according to H A , they are close to each other and are correlated in a way that reveals the worst-case quantity p 0 (C 0 ).</p><p>More specifically, suppose there is a way of perturbing the circuit <ref type="figure"/>and<ref type="figure">C</ref>(1) = C 0 . Moreover, suppose p 0 (C(&#952;)) for different values of &#952; are correlated in a way such that p 0 (C(1)) can be inferred from p 0 (C(&#952;)) for small values of &#952;. Then, to compute p 0 (C 0 ), it suffices to query O with C(&#952; i ) for many small &#952; i s.</p><p>One way to develop such a procedure is by perturbing the random seeds {H i } used in the construction of C 1 , which we define as follows.</p><p>Definition 1 (&#952;-perturbed random circuit distribution). Suppose there is a transform on unitary matrices H &#8594; H(&#952;) parameterized by a real number &#952; &#8712; [0, 1], that satisfies H(0) = H is unchanged, and H(1) = I is the identity matrix. For any circuit C 0 with gates {G i } i=1,...,m and a random seed {H i } i=1,...,m , the circuit C(&#952;) is defined by replacing G i with</p><p>Denote the distribution induced by C(&#952;) as H A,&#952; . Note that H A,0 = H A and H A,1 = 1 C 0 .</p><p>As discussed above, in order for such a perturbation to be useful for the worst-to-average-case reduction, we require that the following (informal) properties are satisfied.</p><p>Properties of the perturbed circuit distribution (informal).</p><p>1. When &#952; is small, H A,&#952; is close to H A in total variation distance. Therefore, when O is given an input C(&#952;) &#8764; H A,&#952; from the perturbed distribution, it is guaranteed to return a correct approximation of p 0 (C(&#952;)) with high success probability.</p><p>2. An approximation of p 0 (C(1)) can be inferred from approximations of p 0 (C(&#952;)) for small values of &#952; with a procedure in the Polynomial Hierarchy.</p><p>Finding such a good perturbation method is non-trivial. Two proposals were developed in previous work in the context of noiseless circuits. In <ref type="bibr">[BFNV19]</ref>, they considered the truncated Taylor series of He -&#952; log H , which is given by</p><p>and showed that p 0 (C(&#952;)) is a degree O(mK) polynomial in &#952;. Property 2 is then satisfied by using polynomial interpolation techniques. In this approach, H(&#952;) as defined by Eq. ( <ref type="formula">5</ref>) is not unitary, and the resulting average-case hardness is for a circuit family that has a small deviation from H A given by the truncation error of the Taylor series. While this truncation error does not affect the applicability of this approach to address hardness of approximate sampling, it becomes evident when applying to noisy circuits, as the requirement for accuracy is much higher (see appendix A for a detailed discussion). Later, Movassagh <ref type="bibr">[Mov20]</ref> developed a new perturbation method, called Cayley transform, that is able to stay within the unitary path.</p><p>Definition 2 (Cayley transform <ref type="bibr">[Mov20]</ref>). The Cayley transform of a unitary matrix H parameterized by &#945; &#8712; [0, 1] is a unitary matrix defined as</p><p>where A/B means A &#8226; B -1 . Consider the diagonalization of a unitary matrix H = j e i&#981; j |&#968; j &#968; j |, the following is an equivalent form of the Cayley transform,</p><p>Note that H(0) = H and H(1) = I.</p><p>Note that we are using slightly different notations from <ref type="bibr">[Mov20]</ref>. A self-consistent presentation of the properties of the Cayley transform is given in Appendix E.</p><p>While not having an evident polynomial structure as in the construction of <ref type="bibr">[BFNV19]</ref>, in [Mov20] they applied Cayley transform to the gates {H i } parameterized by &#945; = &#952;, and showed that the resulting p 0 (C(&#952;)) is a degree (O(m), O(m)) rational function in &#952;. A similar polynomial interpolation technique can be applied to infer p 0 (C(1)) from p 0 (C(&#952;)) with small values of &#952;.</p><p>In particular <ref type="bibr">[Mov20]</ref>, building on <ref type="bibr">[BFNV19]</ref>, obtained the following average-case hardness results (also see Appendix A).</p><p>Theorem 3 ([BFNV19, Mov20]). Let A be a circuit architecture so that computing p 0 (C) to within additive error 2 -O(m) is #P-hard in the worst case. The following results hold:</p><p>1. It is #P-hard to compute p 0 (C) exactly on input C &#8764; H A , with success probability at least 1 -&#951; over the choice of C, for any constant &#951; &lt; 1 4 .</p><p>2. It is #P-hard to compute p 0 (C) up to an additive imprecision 2 -O(m 3 ) on input C &#8764; H A , with success probability at least 1 -O(1)/m over the choice of C.</p><p>We improve these results by proving an average-case hardness that tolerates a larger additive imprecision 2 -O(m log m) , while only requiring a constant success probability over the choice of random circuits. These improvements follows from two new techniques: the first is a robust polynomial interpolation argument which tolerates a constant failure probability, and the second is an improved error bound for long distance polynomial extrapolation. Both results are developed in Section 5 and are used as black boxes here.</p><p>To establish our reduction, we first consider the &#952;-perturbed random circuit distribution instantiated by the Cayley transform.</p><p>Definition 3 (&#952;-Cayley perturbed random circuit distribution). For any circuit C 0 with gates {G i } i=1,...,m and a random seed {H i } i=1,...,m , the circuit C(&#952;) is defined by replacing G i with</p><p>where H i (&#952;) denotes the Cayley transform of H i parameterized by &#952;. Denote the distribution induced by C(&#952;) as H A,&#952; . Note that H A,0 = H A and H A,1 = 1 C 0 .</p><p>As previously shown by <ref type="bibr">[Mov20]</ref>, the output probability p 0 (C(&#952;)) is a low-degree rational function in &#952;.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Lemma 1 ([Mov20]</head><p>). For any circuit C 0 , let C(&#952;) be a circuit from the &#952;-Cayley perturbed random circuit distribution as in Definition 3. Then p 0 (C(&#952;)) is a degree (O(m), O(m)) rational function in &#952;.</p><p>Proof. Here we give a self-consistent proof as the details are useful for our later developments.</p><p>First, notice that the output amplitude can be written as the Feynman path integral,</p><p>where we have y 0 = y m = 0 n , H j (&#952;)G j is a local gate and I else denotes identity on all the other qubits. Let the diagonalization of H j be H j = l e i&#981; jl |&#968; jl &#968; jl |. Then, consider an individual term in the above sum,</p><p>Then Eq. ( <ref type="formula">10</ref>) is a degree (O(m), O(m)) rational function in &#952; with Q 0 (&#952;) being the denominator. Furthermore, notice that Q 0 (&#952;) does not depend on the Feynman path {y j }, and each term in the sum in Eq. ( <ref type="formula">9</ref>) shares the same denominator Q 0 (&#952;). Therefore, 0 n |C(&#952;)|0 n is a degree (O(m), O(m)) rational function in &#952;, and so is</p><p>is then the denominator of p 0 (C(&#952;)). Finally, note that if C 0 only consists of 2-qubit gates, then p 0 (C(&#952;)) is a degree (8m, 8m) rational function in &#952;.</p><p>Lemma 1 formally supports Property 2 of the &#952;-Cayley perturbed random circuit distribution. Intuitively, this low-degree rational function structure allows us to apply polynomial interpolation techniques to obtain an approximation to the worst-case quantity, even though the worst-case point (&#952; = 1) is far from the average-case data points (0 &lt; &#952; 1). Details of this interpolation technique are presented in the proof of our main result (Theorem 4) and in Section 5. Before presenting the main result, it remains to establish Property 1 of the &#952;-Cayley perturbed random circuit distribution, which is given in the following lemma.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Lemma 2 ([Mov20]</head><p>). Let H A,&#952; be the &#952;-Cayley perturbed random circuit distribution as in Definition 3, and H A be the distribution of Haar random circuits over A. Then we have</p><p>where D TV (&#8226;, &#8226;) denotes the total variation distance between probability distributions and m is the number of gates in A.</p><p>Proof. This total variation distance can be bounded by considering the distribution of each individual gates. In <ref type="bibr">[Mov20]</ref>, it was shown that the total variation distance between the Cayley transformed random unitary H(&#952;) and Haar random unitary is O(&#952;) (also see Lemma 12 in Appendix E). Therefore by additivity of the total variation distance we have D TV (H A,&#952; , H A ) = O(m&#952;).</p><p>Having established Property 1 and 2, we are now ready to state and prove our first result on the average-case hardness of random circuits.</p><p>Theorem 4. Let A be a circuit architecture so that computing p 0 (C) to within additive error 2 -O(m) is #P-hard in the worst case. Then the following problem is #P-hard under a BPP NP reduction: for any constant &#951; &lt; 1 4 , on input a random circuit C &#8764; H A with m gates, compute the output probability p 0 (C) up to additive error &#948; = exp (-O(m log m)), with probability at least 1 -&#951; over the choice of C.</p><p>Remark 1. Another way of stating Theorem 4 is the following: suppose there exists an algorithm that belongs to some finite level of PH that, on input a random circuit C &#8764; H A with m gates, computes the output probability p 0 (C) up to additive error &#948; = exp (-O(m log m)), with probability at least 1 -&#951; (&#951; &lt; 1 4 ) over the choice of C as well as the randomness of the algorithm. Then PH collapses to a finite level.</p><p>Proof. Let O be an algorithm that correctly approximates p 0 (C) of a random circuit C &#8764; H A up to additive error &#948;, with success probability at least 1 -&#951; over the choice of C. In the following, we show that there exists a BPP NP O procedure that on input any circuit C 0 , computes p 0 (C 0 ) up to additive error &#948; = &#948; exp (O(m log m)), with success probability at least 2 3 . The theorem statement then follows from the worst-case hardness of computing p 0 (C 0 ) over A.</p><p>Consider any circuit C 0 with m gates over the architecture A. Create a new circuit C 1 as follows: for each gate G i (i = 1, . . . , m) in C 0 , we replace G i with H i G i , where {H i } is independently drawn from the Haar measure over the unitary group and has the same dimension as G i . By the invariance of Haar measure, C 1 is distributed the same as H A .</p><p>Fix the random unitary gates {H i }. Next, we apply the &#952;-Cayley perturbation on C 0 using the random seed {H i } as in Definition 3 to get the perturbed circuit C(&#952;). By definition, we have</p><p>By Lemma 1, p 0 (C(&#952;)) is a degree (O(m), O(m)) rational function in &#952;. Let P (&#952;), Q(&#952;) be the numerator and denominator of p 0 (C(&#952;)), respectively, then p 0 (C(&#952;)) = P (&#952;) Q(&#952;) . Note that from the proof of Lemma 1, we have that</p><p>The goal is to recover p 0 (C(1)) = P (1) Q(1) from values of p 0 (C(&#952;)) for small &#952;. To do this, first note that Q(&#952;) is a known polynomial whose value can be computed in time O(m) for any &#952;. Therefore, the problem can be reduced to a polynomial interpolation for P (&#952;).</p><p>To analyze the error for the polynomial interpolation, it is useful to establish bounds for Q(&#952;). We can write Q(&#952;) as</p><p>where it is easy to see that Q(&#952;) &#8805; 1. Next, call the set of Haar random gates {H j } j=1,...,m &#946;-good, if all eigenvalues &#981; jl of all gates lie in the range [-&#960; + &#946;, &#960; -&#946;]. By Lemma 13, this happens with probability at least 1 -O(m&#946;). Suppose we choose &#946; = O(m -1 ) such that {H j } is &#946;-good with high constant probability. Then conditioned on {H j } being &#946;-good, we have</p><p>We also define the above upper bound of Q(&#952;) as K, where K = exp (O(m log m)).</p><p>Next we apply the algorithm O on input C(&#952;). Notice that by definition, O works on inputs from the distribution H A , while C(&#952;) is distributed according to H A,&#952; as in Definition 3. Therefore, the success probability of O depends on the distance between the two distributions, Pr </p><p>By a simple union bound, the above equation holds with failure probability at most</p><p>for a suitable choice of constants in the O-notation for &#8710; = O(m -1 ) and &#946; = O(m -1 ), such that &#951; &lt; 1 4 . To compute P (1), we apply the algorithm O to a set of circuits {C(&#952; i )}, where &#952; i (i = 1, . . . , O(m 2 )) is a set of equally spaced points in the interval [0, &#8710;], and different perturbed circuits C(&#952; i ) shares the same random seed {H j }. By Eq. ( <ref type="formula">18</ref>), we obtain a set of points {(&#952; i , y i )} such that Pr [|y i -</p><p>The problem is then reduced to a polynomial interpolation for P (&#952;), a degree O(m) polynomial, with the noisy data {(&#952; i , y i )}. Using our robust Berlekamp-Welch theorem which we develop in the next section (see Theorem 7), on input {(&#952; i , y i )} we can compute a number p &#8776; P (1) with access to an NP oracle, where the error can be bounded by</p><p>Finally, notice that</p><p>as Q(1) = 1, therefore by choosing &#948; = exp (-O(m log m)) with a sufficiently large constant, we can compute the worst-case output probability p 0 (C 0 ) up to additive error exp (-O(m log m)). The overall procedure is in BPP NP O . If O is an algorithm that belongs to some finite level of PH, then this procedure also belongs to a finite level of PH, and therefore by the worst-case #P hardness the PH collapses to a finite level.</p><p>In the above proof we used a result developed in Section 5 as a black box, which we refer to as robust Berlekamp-Welch algorithm. This algorithm allows us to do polynomial interpolation on noisy data, while tolerating a constant fraction of data points that can be arbitrarily wrong (see Section 2 for a detailed discussion). Similar to the Learning with Errors problem for solving noisy linear equations, the polynomial interpolation problem with both noise and corruption seems unlikely to be solved in polynomial time. However, recall that given the #P hardness of computing p 0 (C) in the worst case, in our worst-to-average-case reduction we are allowed to use any finite level of PH. We show how to do such a polynomial interpolation in Section 5 having access to a NP oracle. As a result, we are able to tolerate a constant failure probability in our average-case hardness.</p><p>Furthermore, our proof techniques can also be naturally applied to BosonSampling, where a fundamental question is to prove robust average-case hardness results for computing permanents of matrices with i.i.d. Gaussian entries <ref type="bibr">[AA11,</ref><ref type="bibr">AA14]</ref>. Consider a BosonSampling experiment with n photons and m = n c (c &gt; 2) output modes. Then assuming no collision, the output probability corresponding to a output pattern S = (s 1 , . . . , s m ) (</p><p>where A is a m &#215; n submatrix of a m &#215; m Haar random unitary matrix, and A S is the n &#215; n submatrix of A whose rows are selected according to S. When c is a large enough constant, A S is distributed close in total variation distance to matrices with i.i.d. complex Gaussian entries of mean 0 and variance 1 m . We then obtain the following result using similar proof techniques:</p><p>Corollary 1. For any constant &#951; &lt; 1 4 , it is #P-hard under a BPP NP reduction to compute the output probability of a n-photon m = n c -mode BosonSampling experiment up to additive error &#948; = e -(c+4)n log n-O(n) , with success probability 1 -&#951;.</p><p>Remark 2. As shown in <ref type="bibr">[AA11]</ref>, to prove the hardness of approximate sampling for BosonSampling experiments, it suffices to improve the constant in our result from c + 4 to c -1. This is because on average the output probability is roughly</p><p>However, the barrier result shown by <ref type="bibr">[AA11]</ref> implies that this improvement cannot be obtained using similar techniques based on polynomial interpolation. See Appendix B for a detailed discussion, where we also show that this barrier result does not rule out improving the constant to c + 1.</p><p>Proof. Let G n&#215;n denote the distribution over n &#215; n matrices where each entry is independently distributed according to CN (0, 1), the standard complex Gaussian distribution. Suppose c is a large enough constant<ref type="foot">foot_18</ref> , then an algorithm for approximating the output probability of a BosonSampling experiment can be used to approximate</p><p>with a small loss in the success probability.</p><p>Let O be an algorithm that, on input a random matrix X &#8764; G n&#215;n , correctly approximates p X up to additive error &#948;, with success probability at least 1 -&#951; over the choice of X. In the following, we show that there exists a BPP NP O procedure that on input any 0/1 matrix X 0 , computes |Per(X 0 )| 2 up to additive error &#948; = &#948; exp ((c + 4)n log n + O(n)), with success probability at least 2 3 . The theorem statement then follows from the worst-case #P hardness of computing Per(X 0 ).</p><p>Let X 1 &#8764; G n&#215;n and define</p><p>Then |Per(X(&#952;))| 2 is a degree d = 2n polynomial in &#952;. By a result of <ref type="bibr">[AA11]</ref>, the total variation distance between the distribution of X(&#952;) and G n&#215;n is O(n 2 &#952;). This can be shown by calculating the distance for one entry, which is O(&#952;), and then multiply by n 2 as the entries are independently distributed.</p><p>Consider O(n 2 ) uniformly spaced points {&#952; i } in [0, &#8710;] with &#8710; = O(n -2 ). For a suitable choice of constants, we can guarantee that for each data point &#952; i ,</p><p>for some constant &#951; &lt; 1 4 . The procedure then follows by sending the points {(&#952; i , O(X(&#952; i )))} to the robust Berlekamp-Welch algorithm (Theorem 7) to obtain the desired approximation to the worst-case quantity |Per(X 0 )| 2 = |Per(X(1))| 2 . we have a BPP NP O procedure for computing |Per(X 0 )| 2 up to additive error</p><p>which concludes the proof.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.2">Noisy circuits and error detection</head><p>Next, we generalize our average-case hardness result to noisy random quantum circuits. We first define the computational problem of approximating the output probability of noisy quantum circuits, and show that worst-case hardness can be established via error detection. Then in Section 4.3 we show that our previous reduction can be directly applied to noisy circuits, therefore obtaining average-case hardness.</p><p>For simplicity, here we consider noise models that are local and stochastic. Namely, for a circuit C with m gates, there are r = O(m) independent noise channels, where each noise channel N k (k = 1, . . . , r) acts on a constant number of neighboring qubits and has the following form,</p><p>where I is the identity channel, E k is an arbitrary CPTP map, and &#947; is the noise rate. We assume that the noise model N = {N k } k=1,...,r is fixed a priori and does not depend on the choice of gates.</p><p>In principle, our hardness proof can be applied to more general noise models, as long as they are gate-independent and feasible for error correction/detection. In particular, the choice of a fixed noise rate for each noise channel is not essential and is for simplicity, as otherwise we can define &#947; as the maximum of all noise rates. Our noise model (28) can be understood as follows: for each k = 1, . . . , r, with probability 1 -&#947;, do nothing; with probability &#947;, apply E k . The overall probability of having no error is at least (1 -&#947;) r . A simulation of the noisy circuit needs to take expectation over these stochastic paths. Equivalently, we can think of N k as quantum channels that deterministically acts on input density matrices. Simulating the output probability of a noisy circuit is therefore equivalent to computing the diagonal elements of the output density matrix of the noisy circuit. To show the computational hardness of simulating noisy circuits, one straightforward idea is to apply fault-tolerant quantum error correction. Suppose A is an architecture over which faulttolerant error correction is possible. For example, A can consist of nearest neighbor two qubit gates over a 1D or 2D grid. Let C 0 be a quantum circuit whose output probability is #P-hard to approximate within an exponentially small additive error. Then by fault-tolerance threshold theorems <ref type="bibr">[Kit97,</ref><ref type="bibr">ABO08]</ref>, when the noise rate of N is below a constant threshold, we can construct a quantum circuit C over A with a polynomial overhead in the size of C 0 , such that even with the presence of noise, the output distribution of C is exponentially close to the output distribution of C 0 in total variation distance. The computational hardness of simulating C with the noise model N then follows from the hardness of simulating C 0 .</p><p>An improvement of the above argument developed by Fujii <ref type="bibr">[Fuj16]</ref> is to identify the fact that fault-tolerant error detection suffices. This has the benefit of having a smaller overhead in implementing fault-tolerance, and more importantly, a higher noise threshold [Rei06, AGP07]. Fujii's result, which establishes the hardness of sampling from the output distribution of noisy circuits in the worst case, follows from a similar post-selection argument by Bremner, Jozsa and Shepherd <ref type="bibr">[BJS11]</ref>: following Aaronson's PostBQP = PP result <ref type="bibr">[Aar05]</ref>, to prove that a family of circuits are hard to sample from, it suffices to show that they are universal for quantum computation under post-selection. To apply this argument, note that applying standard threshold theorems for an error detecting code, we have that when post-selecting the "no error detected" event, the output distribution of a noisy circuit that implements error detection is exponentially close to the underlying logical circuit in total variation distance, which means that the error-detected noisy circuits are universal for BQP under post-selection. Therefore, when noise is below the error detection threshold, the output distribution of noisy circuits are hard to sample from in the worst case.</p><p>Different from Fujii's result, here we consider the worst-case hardness of computing the output probability of noisy circuits. We show that here error detection also suffices: when the noise rate is below the error detection threshold, computing the output probability up to an exponentially small additive error is hard for the Polynomial Hierarchy.</p><p>Theorem 5. Fix a noise model N , let &#947; th be the corresponding fault-tolerant error detection threshold, and suppose the noise rate of N satisfies &#947; &lt; &#947; th . Suppose there exists a PH procedure (that belongs to some finite level of PH) that on input C with m gates, computes p 0 (C, N ) up to additive error &#948; = exp(-O(m log m)). Then PH collapses to a finite level.</p><p>The proof follows from a similar application of the threshold theorem, and is presented in Appendix C.</p><p>Proof. The proof follows a similar argument as in Lemma 1. Consider a noise model N = {N j }. We prove the lemma statement in a special case where r = m and each gate H j (&#952;)G j is followed by a noise channel N j , which can be directly generalized to the general case where the noise channels are arbitrarily placed.</p><p>As each noise channel N j only acts on a constant number of qubits, it can be written as N j (&#961;) = l j E jl j &#961;E &#8224; jl j where the summation is over constant number of terms, and {E jl j } l j are the noise operators of N j satisfying l j E &#8224; jl j E jl j = I. Let U j be the unitary channel of the gate H j (&#952;)G j . Then, we have</p><p>Here for simplicity we omit the identity operators acting on qubits outside the support of the gates. The summation index l 1 , . . . , l m can be understood as a noise trajectory, where the noise operators are stochastically applied. Note that for each term in the above summation, the same analysis for p 0 (C(&#952;)) can be applied (cf. Lemma 1), which shows that</p><p>Note that this denominator does not depend on the noise trajectory. Therefore, each term in the summation shares the same denominator, and the sum p 0 (C(&#952;), N ) is also a degree (O(m), O(m)) rational function in &#952; with denominator Q(&#952;).</p><p>Having established worst-case hardness of noisy circuits (Theorem 5) and the low-degree rational function property (Lemma 3) for the reduction, the proof of average-case hardness then follows from a similar argument as in Theorem 4. Theorem 6. Fix a noise model N . Let A be a circuit architecture that can implement universal fault-tolerant error detection, and suppose the noise rate &#947; &lt; &#947; th is smaller than the error detection threshold. Suppose there exists a PH procedure (that belongs to some finite level of PH) that, on input a random circuit C &#8764; H A with m gates, computes the output probability p 0 (C, N ) up to additive error &#948; = exp (-O(m log m)), with probability at least 1 -&#951; (&#951; &lt; 1 4 ) over the choice of C as well as the randomness of the algorithm. Then PH collapses to a finite level.</p><p>Proof. Let O be an algorithm that correctly approximates p 0 (C, N ) of a random circuit C &#8764; H A up to additive error &#948;, with success probability at least 1 -&#951; over the choice of C. In the following, we show that there exists a BPP NP O procedure that on input any circuit C 0 , computes p 0 (C 0 , N ) up to additive error &#948; = &#948; exp (O(m log m)), with success probability at least 2 3 . The theorem statement then follows from Theorem 5, the worst-case hardness result.</p><p>Following the same reduction technique as in Theorem 4, we sample a set of random unitary gates {H i } and apply the &#952;-Cayley perturbation on C 0 using the random seed {H i } as in Definition 3 to get the perturbed circuit C(&#952;). By definition, we have</p><p>By Lemma 3, p 0 (C(&#952;), N ) is a degree (O(m), O(m)) rational function in &#952;. Let P (&#952;), Q(&#952;) be the numerator and denominator of p 0 (C(&#952;), N ), respectively, then p 0 (C(&#952;), N ) = P (&#952;) Q(&#952;) . Note that from the proof of Lemma 3, we have that</p><p>The goal is to recover p 0 (C 0 , N ) from values of p 0 (C(&#952;), N ) for small &#952;. To do this, first note that Q(&#952;) is a known polynomial whose value can be computed in time O(m) for any &#952;. Therefore, the problem can be reduced to a polynomial interpolation for P (&#952;), as</p><p>The rest of the proof is therefore the same as in Theorem 4.</p><p>An important distinction between the noisy random circuit model and the ideal model brings interesting implications to our hardness result. In noisy random circuits, it was shown <ref type="bibr">[ABOIN96,</ref><ref type="bibr">GD18]</ref> that the output distribution converges to the uniform distribution very quickly as circuit depth increases. Therefore, trivially outputs 1 2 n is already a good approximation algorithm for p 0 (C, N ), while this is not the case for ideal random circuits, whose output probability has the Porter-Thomas distribution which is far from uniform in total variation distance. In other words, the amount of additive imprecision that we can tolerate in the hardness proof has an upper bound : it cannot be larger than the distance between the noisy output distribution and the uniform distribution.</p><p>This behavior can be understood as follows: the decoherence caused by noise leaks quantum information to the environment (here we consider incoherent noise models), and the amount of information that remains in the system decreases as circuit depth increases, which eventually goes to 0 and corresponds to the maximally mixed state. It was first shown by Aharonov et al. <ref type="bibr">[ABOIN96]</ref> that the output distribution of any circuit with i.i.d. depolarizing noise converges to the uniform distribution with rate e -&#947;d , where &#947; is the noise rate on each qubit at a time step, and d is circuit depth. Gao and Duan [GD18] proved a similar convergence speed for random circuits with general i.i.d. Pauli noise. However, it was conjectured [BIS + 18] that the actual convergence speed for noisy random circuits is much faster, with rate e -&#947;nd (or e -&#947;O(m) ), which was indeed observed in experiments [AAB + 19]. While the proof of this behavior for the local random circuit model (Fig. <ref type="figure">1a</ref>) is out of the scope for this paper, we conjecture that it is true at least for some parameter region of (&#947;, n, d). Furthermore, we provide a rigorous analysis for the toy model with global random gates (Fig. <ref type="figure">1b</ref>), which we present in Appendix D.</p><p>Lemma 4. The output distribution of the global noisy random circuit model (Fig. <ref type="figure">1b</ref>) converges to the uniform distribution with speed e -&#8486;(&#947;nd) . More specifically, let p e denote the output distribution of the noisy circuit with i.i.d. depolarizing noise with noise rate &#947;. Then</p><p>where c, d 0 &gt; 0 are universal constants, and the expectation is over the global random circuit ensemble.</p><p>To apply this result, we can use a simple Markov's inequality which suggests that the output distribution of most noisy quantum circuits satisfy 2 n x p 2 e (x) -1 = e -&#8486;(&#947;nd) . Then, for each of these circuits, the total variation distance between the noisy distribution and the uniform distribution can be bounded by</p><p>via Cauchy-Schwarz inequality. An immediate observation is that this convergence to uniformity behavior implies that our hardness result in the noisy case is essentially optimal. Indeed, for a family of noisy random circuits where the distance to uniform is bounded by e -O(m) , our hardness result says that it is still hard to compute the output probability to within e -O(m log m) additive imprecision, while it is trivial to achieve e -O(m) additive imprecision. The key point here is that our proof is noise and architecture agnostic, as long as they enables error detection. Therefore, it is impossible to improve our hardness result to have better robustness, as long as the agnostic property remains.</p><p>The same limitation also applies to the noiseless case. In our proof, the noisy hardness result is a direct generalization of the noiseless case due to the linearity and convex structure, which means that our proof for the noiseless case cannot be improved either, as long as it can still be generalized to the noisy case. Therefore, any attempt to address the hardness of approximate sampling via a reduction to computing output probabilities (which requires a robustness of 2 -n /poly(n)) in the ideal case cannot be natually adapted to noisy circuits. In this sense our result can be viewed as a complement of [NPD + 20], which shows that the proof technique needs to be sensitive to depth because of an efficient simulation algorithm for shallow random circuits.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5">Robust Berlekamp-Welch</head><p>We prove the following theorem for robust polynomial interpolation. All polynomials considered here have real coefficients, and our results can be generalized to the complex case, as for polynomials with complex coefficients we can deal with the real and imaginary parts separately. Below our theorem is stated with success probability 2 3 , which can be amplified to 1 -1/exp(d) by taking the median of poly(d) independent experiments.</p><p>Theorem 7 (Robust Berlekamp-Welch). For a degree d polynomial P (x), suppose there is a set of data points D = {(x i , y i )} such that |D| = 100d 2 and {x i } is equally spaced in the interval [0, &#8710;] (&#8710; &lt; 1). Furthermore, assume that each point (x i , y i ) satisfies</p><p>where &#951; &lt; 1 4 is a constant, then there exists a P NP algorithm which, takes D as input, returns a number p such that |p -P (1)| &#8804; &#948;e d log &#8710; -1 +O(d) , with success probability at least 2 3 .</p><p>Remark 3. Note that our result can be improved to using only O(d) data points, by applying a result of Rakhmanov <ref type="bibr">[Rak07]</ref>. However, the number of data points we use does not affect our main results, as long as it is polynomial in d. Here we give a simpler proof using O(d 2 ) data points.</p><p>Proof. For simplicity, our proof is presented by specifying &#951; = 1 300 , which can be naturally generalized to any &#951; &lt; 1 4 . Say a point (x i , y i ) is correct if |y i -P (x i )| &#8804; &#948;. By Eq. ( <ref type="formula">34</ref>), the expected number of wrong points is less than 1 300 fraction. By Markov's inequality, with probability at least 2 3 , 0.99 fraction of points in D are correct. In the following we show that conditioned on 0.99 fraction of points being correct (denote the set of correct points as F ), there exists a deterministic P NP algorithm that computes a number p that satisfies |p -P (1)| &#8804; &#948;e d log &#8710; -1 +O(d) . This implies the statement of the theorem. Consider the following computational problem:</p><p>Problem 1. Given 100d 2 points D, decide if there exists at least 0.99 fraction of them (denoted as F ) and a degree</p><p>Problem 1 is in NP because a certificate (F , Q) can be efficiently verified by checking each point in F .</p><p>When giving D in the statement of the theorem as input, this problem has a satisfying certificate (F, P ). Having access to a NP oracle, as this problem is guaranteed to have a solution, we can find a certificate (F , Q) by performing binary search in the certificate space, which may be different from (F, P ). However, in the following we show that any solution will satisfy the requirement of the theorem, and the algorithm simply outputs Q(1).</p><p>Let R(x) = P (x) -Q(x). As |F &#8745; F | &#8805; 0.98|D|, there are at least 0.98 fraction of points in D that satisfies</p><p>Recall that D is a set of equally spaced points in [0, &#8710;]. Lemma 6 says that in such condition R(x) can be uniformly bounded in the interval [0, &#8710;]:</p><p>Finally, as R is uniformly bounded in [0, &#8710;], we can use Lemma 8 to bound the error at x = 1,</p><p>which concludes the proof.</p><p>The above proof needs several additional results which we develop in the following. First, recall the following result of Paturi, which gives a bound of how fast a polynomial can grow when it is uniformly bounded in a small interval.</p><p>Lemma 5 <ref type="bibr">(Paturi [Pat92]</ref>). Let P be a degree d polynomial which satisfies</p><p>Proof. This is implied by Fact 2 and Corollary 2 of <ref type="bibr">[Pat92]</ref>.</p><p>The first missing step in the proof of Theorem 7, which is in Eq. (36), is to show the following: if a degree d polynomial is bounded by &#948; for a constant fraction of a set of uniformly spaced points in [0, &#8710;], then the maximum value of the polynomial in the interval [0, &#8710;] is at most &#948;e O(d) . We show this in the following lemma, which is the main technical part of the full proof. Proof. Denote A as good points at which P is bounded, and denote B -A as bad points. We first prove the bound under a special case, where the good points are exactly the first 0.98 fraction. More specifically, for the set of uniformly spaced points</p><p>where &#948; = 1 100d 2 , suppose the polynomial is bounded at the first 0.98 fraction of points,</p><p>When a polynomial is bounded at consecutive uniformly spaced points in an interval, it can also be uniformly bounded at all points in that interval. Using Lemma 7 we get </p><p>where r k can have real and imaginary parts. We construct a new polynomial Q(x) = L d k=1 (x-r k ) according to the following rule: According to property 2, we proceed by proving an upper bound for M . Let B = {0, &#948;, 2&#948;, . . . , (50d 2 -1)&#948;} &#8838; [0, 1 2 ]. Among these points, at least 0.96 fraction of them are good for Q(x), denoted by A . More specifically,</p><p>If A is a consecutive set starting from 0 (i.e. A = {0, &#948;, 2&#948;, . . . , (|A | -1)&#948;}), then we can obtain the desired bound by repeating the argument at the beginning of this proof. When this is not the case, there exists y &#8712; B such that y is bad and y + &#948; is good. We proceed by converting Q(x) to a new polynomial to eliminate this kind of points, while maintaining the good properties of Q.</p><p>Let y be the maximum element in B such that y is bad and y + &#948; is good (assuming such y exists). By definition, all good points larger than y is consecutive, denoted as</p><p>for some T &#8805; 1. We perform the following operation on</p><p>We prove the following properties for R. From property 5-6 we conclude that R(x) has the same number of good points as Q(x) (if some point outside of (A &#8745; [0, y]) &#8746; {y, y + &#948;, . . . , y + (T -1)&#948;} is good for R, we still label it as bad). As y is good for R, we also have max{z &#8712; B : z + &#948; &#8712; A } &lt; y. Finally we show that R has a larger maximum.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="7.">Let</head><p>Next, we repeat the above process until {z &#8712; B : z + &#948; &#8712; A } is empty, and denote the resulting polynomial as R, for which property 5-7 still holds. As R is bounded by 1 at {0, &#948;, 2&#948;, . . . , (48d 2 -1)&#948;}, we obtain</p><p>by repeating the argument at the beginning of the proof, which concludes the full proof.</p><p>A missing ingredient in the above proof is to show that when a polynomial in bounded on a set of O(d 2 ) equally spaced points in an interval, then it can be uniformly bounded on that interval. As discussed in Remark 3, the number of points needed can be improved to O(d) by using a powerful result of Rakhmanov <ref type="bibr">[Rak07]</ref>.</p><p>Lemma 7. Let P be a degree d polynomial which satisfies |P (x)| &#8804; c for equally spaced points x &#8712; {0, a N , 2a N , . . . , a} in the interval [0, a], where d 2 N &#8804; 1 -&#949;. Then P is uniformly bounded as</p><p>Proof. To prove a uniform bound for P (x), we use a Markov's inequality <ref type="bibr">[Lor05]</ref> to bound the maximum derivative,</p><p>which gives M &#8804; c &#949; .</p><p>We are now ready to fill in the final missing part in the proof of Theorem 7 which is in Eq. (37). A first idea is to directly apply Paturi's Lemma 5 which gives the final error bound &#948;e O(d&#8710; -1 ) . However, this is clearly over for degree d polynomials, the error growth should not be much bigger than (1/&#8710;) d when &#8710; is very small. Based on this intuition, our following result gives an improvement for the error bound of long distance polynomial extrapolation. (50)</p><p>Then we have</p><p>Lemma 9 (Lemma 4.1 in <ref type="bibr">[She13]</ref>). Let P (x) = d i=0 a i x i be a polynomial. Then</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.1">Alternative proof via variable rescaling</head><p>In Section 2.2.3 we first introduced a proof technique via variable rescaling, which gives the same asymptotic bounds as the results presented above, but yields worse constants. In the following we explicitly calculate the constants when applying this variable rescaling technique to BosonSampling, where the goal is to establish robust average case hardness for random permanents. We consider the following general framework for polynomial interpolation. Consider a degree d polynomial P (x) that is bounded on a large fraction of points in the interval [0, &#8710;] (&#8710; &lt; 1). That is, suppose there is a set of data points D = {x i } which is equally spaced in the interval [0, &#8710;], such that</p><p>for at least 1 -&#951; fraction of points in D. Here we assume that |D| is a large enough polynomial in d (for example, |D| = 1000d 2 ), and &#951; is a small enough constant (for example, &#951; = 0.001). Our results in this subsection suggest that |P (1)| &#8804; &#948;e O(d&#8710; -1 ) , and we will also calculate the constant in the O notation. The first step is to bound the polynomial P (x) in the interval [0, &#8710;]. Consider the set of points {0, t, 2t, . . . , (|D| -1)t} where t = &#8710; |D| , and we know that P (x) is bounded at 1 -&#951; fraction of these points. From the proof of Lemma 6, it was shown that without loss of generality, we can assume that the polynomial is bounded at the first 1 2 -&#951; fraction of points in D. Combining Lemma 7 and Lemma 5 which introduces a constant 4 in the exponent, we have</p><p>This allows us to directly invoke Lemma 5 and obtain</p><p>Next we plug in the parameters for BosonSampling. The natural polynomial Per(X(&#952;)) is a degree n polynomial in &#952;. Therefore, the polynomial |Per(X(&#952; k ))| 2 is a degree 2kn polynomial in &#952;. When &#952; &#8712; [0, &#8710;], the total variation distance between X(&#952; k ) and Gaussian permanents is bounded by 2.01n 2 &#8710; k [AA11], which we require to be a small constant (say 0.001). This then gives &#8710; -1 = c 1/k &#8226; n 2/k , for some large constant c &#8764; 2000, and the exponent is</p><p>Let k = t log n for some constant t, then the exponent becomes 8t </p><p>Consider BosonSampling experiment with n photons and m = n c (c &gt; 2) output modes. The goal is to prove average case hardness for the output probability up to additive imprecision n! m n &#8776; e -(c-1)n log n .</p><p>(59)</p><p>In the worst case, the output probability is hard within 1/m n = e -cn log n additive error. Therefore our proof shows average case hardness with robustness</p><p>One approach to proving the approximate sampling conjecture is by showing that the output probability of random circuits are #P-hard to approximate on average. Here "approximate" corresponds to a 2 -n poly(n) additive imprecision, which follows naturally from Stockmeyer's approximate counting theorem <ref type="bibr">[Sto85]</ref>.</p><p>Conjecture 2. It is #P-hard to compute p 0 (C) up to additive imprecision 2 -n poly(n) on input a random circuit C &#8764; H A , with probability at least 1 -1 poly(n) over the choice of C as well as the internal randomness of the algorithm.</p><p>Theorem 8 ([BFNV19]). Conjecture 2 implies Conjecture 1.</p><p>Although Conjecture 2 has not been proven so far, necessary conditions of Conjecture 2 were established in the form of average-case hardness of p 0 (C) that is robust up to a smaller additive imprecision. An open direction is therefore to improve these results to match the 2 -n poly(n) additive imprecision required by Conjecture 2.</p><p>As introduced in Section 4, <ref type="bibr">[BFNV19]</ref> developed an approach based on truncated Taylor series. Consider the circuit distribution H A,&#952;,K from Definition 1 which is instantiated by Eq. (5). They proved the exact average-case hardness for circuits drawn from H A,&#952;,K .</p><p>Theorem 9 ([BFNV19]). It is #P-hard to exactly compute 3 4 + 1 poly(n) of the probabilities p 0 (C ) over the choice of C from the distribution H A,&#952;,K where &#952; = 1 poly(n) , K = poly(n). They also showed that the average-case hardness still holds up to an additive imprecision of 2 -poly(m) , but did not quantify the degree of the polynomial involved<ref type="foot">foot_19</ref> . Note that here circuits drawn from the distribution H A,&#952;,K are not unitary and has a small truncation error from the truncation of the Taylor series in Eq. (5). Therefore, Theorem 9 (as well as the 2 -O(m 3 ) robust version) does not imply the exact version of Conjecture 2 (that is, the exact average-case hardness of circuits drawn from H A ). However, as the truncation error is much smaller than the additive imprecision requirement of Conjecture 2, it is shown that the approximate version of Theorem 9 is actually equivalent to Conjecture 2.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Theorem 10 ([BFNV19]</head><p>). Conjecture 2 is equivalent to the following: it is #P-hard to compute p 0 (C ) up to additive imprecision 2 -n poly(n) on input a random circuit C &#8764; H A,&#952;,K where &#952; = 1 poly(n) , K = poly(n), with probability at least 1 -1 poly(n) over the choice of C as well as the internal randomness of the algorithm.</p><p>As discussed in Section 4, Movassagh's subsequent approach based on the Cayley transform <ref type="bibr">[Mov20]</ref> does not have the truncation problem, and therefore they were able to prove the exact version of Conjecture 2. Movassagh also showed a similar robust version of his theorem with quantified robustness of 2 -O(m 3 ) additive imprecision. Therefore these two approaches are equivalent from the viewpoint of providing evidence for the approximate sampling conjecture: to prove Conjecture 2, it is necessary and sufficient to improve the robustness of either one of these approaches to 2 -n poly(n) . Our first result can be viewed as making progress in this direction. Note that in the main text our proof is based on a generalization of the Cayley path approach, but our techniques can also be applied to truncated Taylor series and obtain similar results. On the other hand, for our second result on noisy circuits, the Cayley path approach is more preferable, as in this case we are no longer in the regime of providing evidence for the approximate sampling conjecture, but are interested in tiny deviations from uniformity in the output distribution of noisy circuits.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>B Limitations of polynomial interpolation techniques</head><p>Here we give a self-contained description of the barrier to proving hardness of approximate sampling using polynomial interpolation techniques. This argument was first developed by Aaronson and Arkhipov for BosonSampling [AA11, Section 9.2], which can also be naturally generalized to random circuit sampling. They show that, assuming anti-concentration (which is proven for random circuit sampling), the average-case hardness proof based on polynomial interpolation cannot tolerate the amount of error required for showing hardness of approximate sampling.</p><p>Consider the output probability of a n-photon m = n c mode BosonSampling experiment. Following the notation in the proof of Corollary 1, the goal is to prove average-case hardness of computing p X = |Per(X)| 2 /m n for X &#8764; G n&#215;n . To do so, we consider</p><p>where X 0 is an arbitrary 0/1 matrix and X 1 &#8764; G n&#215;n . The goal is to recover |Per(X 0 )| 2 using polynomial interpolation, given approximate values of |Per(X(&#952;))| 2 for small values of &#952;. The main idea of the barrier result is to show the following:</p><p>There exists a polynomial that agrees well with |Per(X(&#952;))| 2 when &#952; is small, but is far from |Per(X 0 )| 2 when &#952; = 1.</p><p>This statement implies that it is impossible to recover the worst-case value from noisy average-case observations. To construct such a polynomial P (&#952;), consider</p><p>where 1 denotes the all-one matrix and t &gt; 0. Then the error of the polynomial P (&#952;) is given by</p><p>Using the anti-concentration property of Per(X 1 ), with high probability the error is given by</p><p>To prove hardness of approximate sampling, we require t = 1/q(n) for a sufficiently large polynomial q. However, in this case the worst-case error becomes too large and can be achieved by existing classical algorithms <ref type="bibr">[Gur05,</ref><ref type="bibr">AH14]</ref>.</p><p>On the other hand, choosing t = 1/ (n!) 2 is sufficient for worst-case hardness. In this case we can tolerate an additive error poly(n)/n! for average-case permanents. Consider the output probability and normalize the errors by m n , we conclude that the barrier result of Aasonron and Arkhipov does not rule out the possibility of showing hardness of computing the output probability p X up to additive error e -(c+1)n log n .</p><p>The same argument can also be applied to random circuit sampling, which shows that proving hardness of computing p 0 (C) up to additive error 2 -n /poly(n), which is required for proving hardness of sampling, is impossible using polynomial interpolation techniques. On the other hand, this barrier result does not rule out the possibility of showing hardness of computing p 0 (C) up to additive error 2 -O(n) .</p><p>Interestingly, using essentially the same techniques, our result is a log factor far from the barrier for random circuit sampling, while for BosonSampling our result is only a constant factor away.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>C Worst-case hardness for noisy circuits from error detection</head><p>It is a well-known result that computing the output probability p 0 (C) for quantum circuits C is #P-hard in the worst case [FR99, TD02, BJS11, BMS16], where the hardness is robust under constant multiplicative approximation or exponentially small additive approximation. Here, we consider a slightly different approach based on <ref type="bibr">[FGHP99]</ref>. There, it was shown that determining if p 0 (C) = 0 is hard for the complexity class coC = P, which corresponds to deciding whether an efficiently computable Boolean function</p><p>Importantly, it was shown <ref type="bibr">[TO92]</ref> that coC = P is hard for PH under randomized reductions. Therefore, determining if p 0 (C) = 0 is hard for PH unless PH collapses.</p><p>In the following, we show that we can assume an exponentially small gap in the above decision problem without loss of generality. Consider the following promise decision problem.</p><p>Problem 2. On input a n-qubit quantum circuit C, decide if p 0 (C) = 0, or p 0 (C) &#8805; 1 2 2n . We show that this gapped variant is still hard for PH following a similar argument as in <ref type="bibr">[FGHP99]</ref>.</p><p>Lemma 10. Problem 2 is hard for coC = P under a polynomial time reduction.</p><p>Proof. Consider an efficiently computable Boolean function f : {0, 1} n 0 &#8594; {-1, 1}. Using the standard technique for classical reversible computation, we can create a n-qubit</p><p>As x&#8712;{0,1} n 0 f (x) takes integer value, we know that either p 0 (C) = 0, or p 0 (C) &#8805; 1 2 2n 0 &#8805; 1 2 2n . Therefore, deciding Problem 2 on input C is equivalent to deciding if x&#8712;{0,1} n 0 f (x) = 0.</p><p>Next, we show that simulating a noisy circuit up to additive error exp (-O(m log m)) is as hard as Problem 2, and therefore is hard for PH. The proof requires the use of fault-tolerant quantum error detection. Given an arbitrary noise model N in the form of Eq. ( <ref type="formula">28</ref>), its local and stochastic property implies that there exists a constant &#947; th &gt; 0 such that fault-tolerant quantum error detection is possible when &#947; &lt; &#947; th . Different from quantum error correction, the faulttolerance by error detection is conditional, or via post selection. Given a logical circuit C 0 of n 0 qubits and m 0 gates, suppose C implements C 0 with an error detection code using n qubits and m gates. Suppose that C has two output registers x, z, where x denotes the logical output, and z denotes the error detection syndrome, where z = 0 implies that no error is detected throughout the circuit. The fault-tolerance property guarantees that conditioned on no error being detected, the output distribution of C is close to the logical circuit C 0 , that is, Pr Using the fact that coC = P is hard for PH under randomized reductions <ref type="bibr">[TO92]</ref>, Theorem 5 is a direct corollary of Theorem 11.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>D Proof of Lemma 4</head><p>The proof uses techniques that are developed by Harrow and Low <ref type="bibr">[HL09]</ref>, which were developed to show that random quantum circuits converge to unitary 2-design. Below we show that noisy circuits correspond to entirely different convergence behavior. These techniques can also be applied to analyze local random circuits, which we leave as important future work.</p><p>Consider the quantity CP (collision probability/classical purity) defined as follows,</p><p>p 2 e (x) -1.</p><p>Here GRC(n, d) denotes the global random circuit distribution on n qubits with depth d, as shown in Fig. <ref type="figure">1b</ref>, and p e denotes the output distribution of the noisy circuit. We will drop the subscript U &#8764; GRC(n, d) unless necessary. For simplicity, here we consider i.i.d. depolarizing noise, where each blue dot represents a single qubit noise channel</p><p>where &#947; is the noise rate and I is the identity matrix. Using Cauchy-Schwarz inequality, it is easy to see that for each specific circuit, the classical purity is an upper bound of the total variation distance, as D TV (p e , p uniform ) = 1 2 x p e (x) -</p><p>To see why we use CP to represent the distance, notice that CP is a linear function of the two-copy output density matrix &#961; e &#8855; &#961; e . Therefore, by linearity of expectation, we can consider an initial state |0 n 0 n | &#8855;2 going through an average quantum channel; that is, to understand the convergence behavior of CP , it suffices to track the evolution of the second moment of the density matrix E[&#961; e &#8855; &#961; e ] as the circuit depth increases. More specifically, we can write any Hermitian operator in the Fourier basis as E[&#961; e &#8855; &#961; e ] = 1 2 n p,q &#948;(p, q)&#963; p &#8855; &#963; q , (77)</p><p>where p, q &#8712; {0, 1, 2, 3} n are 2n-bit strings, &#963; p , &#963; q &#8712; {I, X, Y, Z} &#8855;n are the corresponding Pauli operators indexed by 0 &#8594; I, 1 &#8594; X, 2 &#8594; Y, 3 &#8594; Z, and &#948;(p, q) = 1 2 n Tr[E[&#961; e &#8855; &#961; e ]&#963; p &#8855; &#963; q ] = 1 2 n E Tr[(&#961; e &#8855; &#961; e ) &#963; p &#8855; &#963; q ] (78) are the Fourier coefficients, where the second equality follows from linearity. As we take expectation over random quantum circuits U &#8764; GRC(n, d), the Fourier coefficients &#948;(p, q) only depend on (E, n, d), i.e. noise channel, system size, and circuit depth, and they contain all the information needed to evaluate CP . More specifically we have</p><p>where we have defined &#945;(p) := &#948;(p, p).</p><p>In the following we will study the evolution of these coefficients &#948;(p, q) as gates and noise are applied. For the initial state |0 n 0 n | &#8855;2 , we have &#948;(p, q) = 1 2 n for any p, q &#8712; {0, 3} n and 0 otherwise. Next, consider the application of a global random unitary on the Pauli basis, it was <ref type="bibr">[HL09]</ref> that</p><p>Therefore the off-diagonal coefficients &#948;(p, q) (p = q) becomes 0 for depth d &#8805; 1. From now on we only consider diagonal coefficients &#945;(p) = &#948;(p, p). The evolution of &#945; under a global random unitary is therefore given by &#945; (p) = &#945;(p), p = 0 n , 1 2 2n -1 q&#8712;{0,1,2,3} n \{0 n } &#945;(q), p = 0 n .</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>(81)</head><p>It is easy to observe the following properties: (1) &#945;(0 n ) = 1 2 n is a constant; (2) for p = 0 n , &#945; (p) does not depend on p; (3) p&#8712;{0,1,2,3} n &#945;(p) is preserved by unitary evolution.</p><p>Finally, consider the effect of depolarizing noise on a single qubit, which gives</p><p>Therefore the effect of noise can be summarized as</p><p>where |p| denotes the Hamming weight of p, i.e. the number of non-zero terms.</p><p>Combining everything together, we can write the evolution of p&#8712;{0,1,2,3} n \{0 n } &#945;(p) in one cycle (i.e. one global gate followed by a layer of noise channels) as</p><p>Define the decay coefficient as Lemma 12. For N = O(1) and &#952; &#8712; (0, 0.1), the total variation distance between &#181; &#952; and &#181; H is bounded by</p><p>Proof. First, notice that &#181; &#952; and &#181; H has the same eigenvector distribution. As the eigenvalues and eigenvectors are independently distributed in &#181; H , the integral only concerns the eigenvalue density.</p><p>To see this, we use the two-sided invariance of the Haar measure. Let U &#8764; &#181; H U(N ) be expressed as U = V &#923;V &#8224; , where V is the eigenvectors and &#923; is the eigenvalues. For any unitary Q &#8712; U(N ), QU Q &#8224; is equal to U in distribution. Therefore, QU Q &#8224; is also equal to U in distribution when Q is independently drawn from &#181; H . As QU Q &#8224; = (QV )&#923;(QV ) &#8224; , the eigenvectors V is distributed the same as QV up to global phase, which is distributed the same as Q, which is independent from &#923;. To summarize, the eigenvectors and eigenvalues of Haar random unitary are independently distributed, where the eigenvectors are distributed the same as &#181; H up to global phase, and we quote the eigenvalue density from <ref type="bibr">[HP06]</ref>:</p><p>where &#981; i &#8712; [-&#960;, &#960;]. Let g &#952; be the eigenvalue density of &#181; &#952; . The total variation distance is then reduced to the following integral over Lebesgue measure on [-&#960;, &#960;] N ,</p><p>An example of the transform f &#952; is plotted in Fig. <ref type="figure">2</ref>. It is easy to prove that f &#952; is monotonic and has range [-&#960;, &#960;].</p><p>To prove the upper bound of the total variation distance, we need to calculate the transformed eigenvalue density g &#952; (&#981; 1 , . . . , &#981; N ), by applying f &#952; to &#981; 1 , . . . , &#981; N jointly distributed according to g H . Then g &#952; can be written as g &#952; (&#981; 1 , . . . , &#981; N ) = g H f -1 &#952; (&#981; 1 ), . . . , f -1 &#952; (&#981; N )</p><p>where the inverse transform of f &#952; is given by</p><p>and its derivative given by</p><p>It is easy to prove the following bounds by a simple calculation: assume &#952; is small enough (say &#952; &lt; 0.1), then for all &#981; &#8712; [-&#960;, &#960;],</p><p>where the constants in the above two equations is independent of &#981;. The proof then follows by directly bounding the difference between the densities. Consider </p><p>where the third line follows from Eq. ( <ref type="formula">96</ref>) and the continuity of g H . Finally the integral of the difference between the densities is still bounded by O(&#952;), which concludes the full proof.</p><p>Note that in the matrix form of the Cayley transform</p><p>where H = j e i&#981; j |&#968; j &#968; j |, the magnitude of the denominator in the above equation goes to infinity as &#981; j is close to &#177;&#960;, which will affect the accuracy of the polynomial interpolation in our worst-to-average-case reduction. However, the probability of such an event is small when H is drawn from &#181; H , as shown in the following lemma.</p><p>Lemma 13. The Haar distribution &#181; H over the unitary group U(N ) satisfies</p><p>where &#981; j (j = 1 . . . N ) denotes the eigenvalues of the random unitary distributed according to &#181; H .</p><p>Proof. The proof is a simple union bound: (100)</p><note type="other">Pr</note></div><note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="1" xml:id="foot_0"><p>Either a random circuit in the case of RCS, or a random linear optical network in the case of BosonSampling.</p></note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="2" xml:id="foot_1"><p>See also the complementary work of<ref type="bibr">[AC17,</ref><ref type="bibr">AG19]</ref> for alternative evidence that directly addresses the hardness of scoring well on benchmarking tests such as cross-entropy.</p></note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="3" xml:id="foot_2"><p>Please see Section 2 for a discussion of the differences between these works.</p></note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="4" xml:id="foot_3"><p>That is, such a hardness result would imply that no classical algorithm can sample from any distribution within a fixed constant total variation distance of the ideal distribution sampled by the noiseless quantum circuit. Interestingly, we do not even have hardness of exact sampling results for random quantum circuits owing to this same imprecision issue!</p></note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="5" xml:id="foot_4"><p>Here we mean the output probabilities averaged over the noise, i.e. the corresponding diagonal entries of the density matrix after evolution under the circuit and noise channels.</p></note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="6" xml:id="foot_5"><p>In particular [AAB + 19] argued for the hardness of their experiment under a highly simplified noise model of global depolarizing noise, since then the output probabilities are a convex combination of the true distribution with the uniform distribution. Our result generalizes this statement in a more realistic, per-gate noise model, and more generally to any error-detectable noise model.</p></note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="7" xml:id="foot_6"><p>Indeed, some amount of imprecision is inevitable in any digital model of computation, where real numbers are approximated with poly-sized binary representations to &#948; = 2 -poly(n) accuracy.</p></note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="8" xml:id="foot_7"><p>We note that if one adopts a model of computation which admits exact representations of real numbers, then the Berlekamp-Welch algorithm does work, as noted in<ref type="bibr">[BFNV19]</ref>, but this is not known to work in a digital model of computation where one cannot exactly represent real numbers.</p></note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="9" xml:id="foot_8"><p>More precisely,<ref type="bibr">[BFNV19]</ref> showed (using the techniques of Aaronson and Arkhipov<ref type="bibr">[AA11]</ref>) that it is hard to compute the "truncated" circuit probabilities to accuracy 2 -poly(m) , but did not quantify the polynomial in their argument as it depends on a number of unspecified parameter settings. Movassagh<ref type="bibr">[Mov20]</ref> improved this to show hardness of computing the true output probabilities, and quantified the robustness to 2 -O(m 3 ) .</p></note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="10" xml:id="foot_9"><p>Technically speaking neither the techniques of<ref type="bibr">[BFNV19]</ref> nor<ref type="bibr">[Mov20]</ref> have all of these properties, but the authors found workarounds to this limitation.</p></note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="11" xml:id="foot_10"><p>This is not entirely straightforward, because Paturi's bound on error blowup is for polynomials which are uniformly close in some interval, and these polynomials are only close at particular points, but this turns out not to be an issue as we will describe later.</p></note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="12" xml:id="foot_11"><p>These points are highly correlated with one another, so unfortunately a union bound is the best tool we know how to use here.</p></note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="16" xml:id="foot_12"><p>That is, if T d is the dth Chebyshev polynomial, our errors could uniquely specify T d (x k ) as our error polynomial, as all we have done is replaced x with x k .</p></note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="17" xml:id="foot_13"><p>More formally, the previous arguments based on Lagrangian interpolation used a union bound to argue all evaluated points were correct, then invoked Rakhmanov's argument to argue that the polynomial was uniformly bounded in that interval, then invoked Paturi's bound to show that the polynomial remained bounded when extrapolated. Our new proof first argues (using our robust Berlekamp-Welch argument) that the polynomial is uniformly bounded, then applies a simple argument to obtain a bound on the coefficients of the difference polynomial, then directly bounds the error at extrapolation, without the need to invoke Rakhmanov's results.</p></note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="18" xml:id="foot_14"><p>More formally, we prove coC=P hardness, which is hard enough to suffice for quantum supremacy arguments as coC=P is outside of the polynomial hierarchy unless PH collapses.</p></note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="19" xml:id="foot_15"><p>That is, up to constant multiplicative error on each of the exponentially small output probabilities.</p></note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="20" xml:id="foot_16"><p>Technically speaking, the imprecision incurred by Stockmeyer's algorithm is multiplicative, and so depends on the output probability being estimated, which for random circuits is O(2 -n ) with very high probability due to known "anti-concentration" results [HL09, BHH16, HM18, BCG20, DHJB20].</p></note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="21" xml:id="foot_17"><p>More formally, the conjectured rate of convergence of noisy circuits to the uniform distribution is O(e -dn&#947; ) where n is the number of qubits and d is the depth of the circuit, and &#947; is the depolarizing noise rate per qubit. The robustness required for proving hardness of sampling in the low noise setting is O(2 -n ). Therefore, for any circuit</p></note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="24" xml:id="foot_18"><p>The proof in<ref type="bibr">[AA11]</ref> requires c to be at least 5, and it was conjectured that c &gt; 2 suffices.</p></note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="25" xml:id="foot_19"><p>In retrospect, setting &#952; = 1 m 2 polylog(m) and K = polylog(m) yields robustness 2 -&#213;(m 3 ) to computing this quantity, but these parameter settings or the arguments justifying them were not given in the original paper.</p></note>
		</body>
		</text>
</TEI>
