A fault-tolerant quantum computer must decode and correct errors faster than they appear to prevent exponential slowdown due to error correction. The Union-Find (UF) decoder is promising with an average time complexity slightly higher than $O(d^3)$. We report a distributed version of the UF decoder that exploits parallel computing resources for further speedup. Using an FPGA-based implementation, we empirically show that this distributed UF decoder has a sublinear average time complexity with regard to $$d$$, given $O(d^3)$ parallel computing resources. The decoding time per measurement round decreases as $$d$$ increases, the first time for a quantum error decoder. The implementation employs a scalable architecture called Helios that organizes parallel computing resources into a hybrid tree-grid structure. Using a Xilinx VCU129 FPGA, we successfully implement $$d$$ up to 21 with an average decoding time of 11.5 ns per measurement round under 0.1\% phenomenological noise, and 23.7 ns for $$d=17$$ under equivalent circuit-level noise. This performance is significantly faster than any existing decoder implementation. Furthermore, we show that \name can optimize for resource efficiency by decoding $$d=51$ on a Xilinx VCU129 FPGA with an average latency of 544 ns per measurement round. 
                        more » 
                        « less   
                    
                            
                            The T-Complexity Costs of Error Correction for Control Flow in Quantum Computation
                        
                    
    
            Numerous quantum algorithms require the use of quantum error correction to overcome the intrinsic unreliability of physical qubits. However, quantum error correction imposes a unique performance bottleneck, known asT-complexity, that can make an implementation of an algorithm as a quantum program run more slowly than on idealized hardware. In this work, we identify that programming abstractions for control flow, such as the quantum if-statement, can introduce polynomial increases in theT-complexity of a program. If not mitigated, this slowdown can diminish the computational advantage of a quantum algorithm. To enable reasoning about the costs of control flow, we present a cost model that a developer can use to accurately analyze theT-complexity of a program under quantum error correction and pinpoint the sources of slowdown. To enable the mitigation of these costs, we present a set of program-level optimizations that a developer can use to rewrite a program to reduce itsT-complexity, predict theT-complexity of the optimized program using the cost model, and then compile it to an efficient circuit via a straightforward strategy. We implement the program-level optimizations in Spire, an extension of the Tower quantum compiler. Using a set of 11 benchmark programs that use control flow, we empirically show that the cost model is accurate, and that Spire’s optimizations recover programs that are asymptotically efficient, meaning their runtimeT-complexity under error correction is equal to their time complexity on idealized hardware. Our results show that optimizing a program before it is compiled to a circuit can yield better results than compiling the program to an inefficient circuit and then invoking a quantum circuit optimizer found in prior work. For our benchmarks, only 2 of 8 tested quantum circuit optimizers recover circuits with asymptotically efficientT-complexity. Compared to these 2 optimizers, Spire uses 54×–2400× less compile time. 
        more » 
        « less   
        
    
                            - Award ID(s):
- 1751011
- PAR ID:
- 10553924
- Publisher / Repository:
- ACM
- Date Published:
- Journal Name:
- Proceedings of the ACM on Programming Languages
- Volume:
- 8
- Issue:
- PLDI
- ISSN:
- 2475-1421
- Page Range / eLocation ID:
- 492 to 517
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
- 
            
- 
            null (Ed.)Quantum computers are growing in size, and design decisions are being made now that attempt to squeeze more computation out of these machines. In this spirit, we design a method to boost the computational power of near-term quantum computers by adapting protocols used in quantum error correction to implement "Approximate Quantum Error Correction (AQEC)." By approximating fully-fledged error correction mechanisms, we can increase the compute volume (qubits × gates, or "Simple Quantum Volume (SQV)") of near-term machines. The crux of our design is a fast hardware decoder that can approximately decode detected error syndromes rapidly. Specifically, we demonstrate a proof-of-concept that approximate error decoding can be accomplished online in near-term quantum systems by designing and implementing a novel algorithm in Single-Flux Quantum (SFQ) superconducting logic technology. This avoids a critical decoding backlog, hidden in all offline decoding schemes, that leads to idle time exponential in the number of T gates in a program. Our design utilizes one SFQ processing module per physical qubit. Employing state-of-the-art SFQ synthesis tools, we show that the circuit area, power, and latency are within the constraints of contemporary quantum system designs. Under pure dephasing error models, the proposed accelerator and AQEC solution is able to expand SQV by factors between 3,402 and 11,163 on expected near-term machines. The decoder achieves a 5% accuracy-threshold and pseudo-thresholds of ∼ 5%,4.75%,4.5%, and 3.5% physical error-rates for code distances 3,5,7, and 9. Decoding solutions are achieved in a maximum of ∼20 nanoseconds on the largest code distances studied. By avoiding the exponential idle time in offline decoders, we achieve a 10x reduction in required code distances to achieve the same logical performance as alternative designs.more » « less
- 
            Cumulative memory—the sum of space used per step over the duration of a computation—is a fine-grained measure of time-space complexity that was introduced to analyze cryptographic applications like password hashing. It is a more accurate cost measure for algorithms that have infrequent spikes in memory usage and are run in environments such as cloud computing that allow dynamic allocation and de-allocation of resources during execution, or when many instances of an algorithm are interleaved in parallel. We prove the first lower bounds on cumulative memory complexity for both sequential classical computation and quantum circuits. Moreover, we develop general paradigms for bounding cumulative memory complexity inspired by the standard paradigms for proving time-space tradeoff lower bounds that can only lower bound the maximum space used during an execution. The resulting lower bounds on cumulative memory that we obtain are just as strong as the best time-space tradeoff lower bounds, which are very often known to be tight. Although previous results for pebbling and random oracle models have yielded time-space tradeoff lower bounds larger than the cumulative memory complexity, our results show that in general computational models such separations cannot follow from known lower bound techniques and are not true for many functions. Among many possible applications of our general methods, we show that any classical sorting algorithm with success probability at least 1/poly(n) requires cumulative memory\(\tilde{\Omega }(n^2) \), any classical matrix multiplication algorithm requires cumulative memoryΩ(n6/T), any quantum sorting circuit requires cumulative memoryΩ(n3/T), and any quantum circuit that findskdisjoint collisions in a random function requires cumulative memoryΩ(k3n/T2).more » « less
- 
            Practical applications of quantum computing depend on fault-tolerant devices with error correction. We study the problem of compiling quantum circuits for quantum computers implementing surface codes. Optimal or near-optimal compilation is critical for both efficiency and correctness. The compilation problem requires (1)mappingcircuit qubits to the device qubits and (2)routingexecution paths between interacting qubits. We solve this problem efficiently and near-optimally with a novel algorithm that exploits thedependency structureof circuit operations to formulate discrete optimization problems that can be approximated viasimulated annealing, a classic and simple algorithm. Our extensive evaluation shows that our approach is powerful and flexible for compiling realistic workloads.more » « less
- 
            null (Ed.)We present VOQC, the first fully verified optimizer for quantum circuits, written using the Coq proof assistant. Quantum circuits are expressed as programs in a simple, low-level language called SQIR, a simple quantum intermediate representation, which is deeply embedded in Coq. Optimizations and other transformations are expressed as Coq functions, which are proved correct with respect to a semantics of SQIR programs. SQIR uses a semantics of matrices of complex numbers, which is the standard for quantum computation, but treats matrices symbolically in order to reason about programs that use an arbitrary number of quantum bits. SQIR's careful design and our provided automation make it possible to write and verify a broad range of optimizations in VOQC, including full-circuit transformations from cutting-edge optimizers.more » « less
 An official website of the United States government
An official website of the United States government 
				
			 
					 
					
 
                                    