skip to main content

Title: Full-stack, real-system quantum computer studies: architectural comparisons and design insights
In recent years, Quantum Computing (QC) has progressed to the point where small working prototypes are available for use. Termed Noisy Intermediate-Scale Quantum (NISQ) computers, these prototypes are too small for large benchmarks or even for Quantum Error Correction, but they do have sufficient resources to run small benchmarks, particularly if compiled with optimizations to make use of scarce qubits and limited operation counts and coherence times. QC has not yet, however, settled on a particular preferred device implementation technology, and indeed different NISQ prototypes implement qubits with very different physical approaches and therefore widely-varying device and machine characteristics. Our work performs a full-stack, benchmark-driven hardware-software analysis of QC systems. We evaluate QC architectural possibilities, software-visible gates, and software optimizations to tackle fundamental design questions about gate set choices, communication topology, the factors affecting benchmark performance and compiler optimizations. In order to answer key cross-technology and cross-platform design questions, our work has built the first top-to-bottom toolflow to target different qubit device technologies, including superconducting and trapped ion qubits which are the current QC front-runners. We use our toolflow, TriQ, to conduct real-system measurements on 7 running QC prototypes from 3 different groups, IBM, Rigetti, and University of Maryland. From more » these real-system experiences at QC's hardware-software interface, we make observations about native and software-visible gates for different QC technologies, communication topologies, and the value of noise-aware compilation even on lower-noise platforms. This is the largest cross-platform real-system QC study performed thus far; its results have the potential to inform both QC device and compiler design going forward. « less
; ; ; ; ;
Award ID(s):
1730449 1730082
Publication Date:
Journal Name:
46th Annual International Symposium on Computer Architecture (ISCA)
Page Range or eLocation-ID:
527 to 540
Sponsoring Org:
National Science Foundation
More Like this
  1. Trapped ions (TIs) are a leading candidate for building Noisy Intermediate-Scale Quantum (NISQ) hardware. TI qubits have fundamental advantages over other technologies, featuring high qubit quality, coherence time, and qubit connectivity. However, current TI systems are small in size and typically use a single trap architecture, which has fundamental scalability limitations. To progress toward the next major milestone of 50--100 qubit TI devices, a modular architecture termed the Quantum Charge Coupled Device (QCCD) has been proposed. In a QCCD-based TI device, small traps are connected through ion shuttling. While the basic hardware components for such devices have been demonstrated, building a 50--100 qubit system is challenging because of a wide range of design possibilities for trap sizing, communication topology, and gate implementations and the need to match diverse application resource requirements. Toward realizing QCCD-based TI systems with 50--100 qubits, we perform an extensive application-driven architectural study evaluating the key design choices of trap sizing, communication topology, and operation implementation methods. To enable our study, we built a design toolflow, which takes a QCCD architecture's parameters as input, along with a set of applications and realistic hardware performance models. Our toolflow maps the applications onto the target device and simulates theirmore »execution to compute metrics such as application run time, reliability, and device noise rates. Using six applications and several hardware design points, we show that trap sizing and communication topology choices can impact application reliability by up to three orders of magnitude. Microarchitectural gate implementation choices influence reliability by another order of magnitude. From these studies, we provide concrete recommendations to tune these choices to achieve highly reliable and performant application executions. With industry and academic efforts underway to build TI devices with 50-100 qubits, our insights have the potential to influence QC hardware in the near future and accelerate the progress toward practical QC systems.« less
  2. The current phase of quantum computing is in the Noisy Intermediate-Scale Quantum (NISQ) era. On NISQ devices, two-qubit gates such as CNOTs are much noisier than single-qubit gates, so it is essential to minimize their count. Quantum circuit synthesis is a process of decomposing an arbitrary unitary into a sequence of quantum gates, and can be used as an optimization tool to produce shorter circuits to improve overall circuit fidelity. However, the time-to-solution of synthesis grows exponentially with the number of qubits. As a result, synthesis is intractable for circuits on a large qubit scale. In this paper, we propose a hierarchical, block-by-block opti-mization framework, QGo, for quantum circuit optimization. Our approach allows an exponential cost optimization to scale to large circuits. QGo uses a combination of partitioning and synthesis: 1) partition the circuit into a sequence of independent circuit blocks; 2) re-generate and optimize each block using quantum synthesis; and 3) re-compose the final circuit by stitching all the blocks together. We perform our analysis and show the fidelity improvements in three different regimes: small-size circuits on real devices, medium-size circuits on noisy simulations, and large-size circuits on analytical models. Our technique can be applied after existing optimizations tomore »achieve higher circuit fidelity. Using a set of NISQ benchmarks, we show that QGo can reduce the number of CNOT gates by 29.9% on average and up to 50% when compared with industrial compiler optimizations such as t|ket). When executed on the IBM Athens system, shorter depth leads to higher circuit fidelity. We also demonstrate the scalability of our QGo technique to optimize circuits of 60+ qubits, Our technique is the first demonstration of successfully employing and scaling synthesis in the compilation tool chain for large circuits. Overall, our approach is robust for direct incorporation in production compiler toolchains to further improve the circuit fidelity.« less
  3. Crosstalk is a major source of noise in Noisy Intermediate-Scale Quantum (NISQ) systems and is a fundamental challenge for hardware design. When multiple instructions are executed in parallel, crosstalk between the instructions can corrupt the quantum state and lead to incorrect program execution. Our goal is to mitigate the application impact of crosstalk noise through software techniques. This requires (i) accurate characterization of hardware crosstalk, and (ii) intelligent instruction scheduling to serialize the affected operations. Since crosstalk characterization is computationally expensive, we develop optimizations which reduce the characterization overhead. On 3 20-qubit IBMQ systems, we demonstrate two orders of magnitude reduction in characterization time (compute time on the QC device) compared to all-pairs crosstalk measurements. Informed by these characterization, we develop a scheduler that judiciously serializes high crosstalk instructions balancing the need to mitigate crosstalk and exponential decoherence errors from serialization. On real-system runs on 3 IBMQ systems, our scheduler improves the error rate of application circuits by up to 5.6x, compared to the IBM instruction scheduler and offers near-optimal crosstalk mitigation in practice. In a broader picture, the difficulty of mitigating crosstalk has recently driven QC vendors to move towards sparser qubit connectivity or disabling nearby operations entirely inmore »hardware, which can be detrimental to performance. Our work makes the case for software mitigation of crosstalk errors.« less
  4. Despite rapid advances in quantum computing technologies, the qubit connectivity limitation remains to be a critical challenge. Both near-term NISQ quantum computers and relatively long-term scalable quantum architectures do not offer full connectivity. As a result, quantum circuits may not be directly executed on quantum hardware, and a quantum compiler needs to perform qubit routing to make the circuit compatible with the device layout. During the qubit routing step, the compiler inserts SWAP gates and performs circuit transformations. Given the connectivity topology of the target hardware, there are typically multiple qubit routing candidates. The state-of-the-art compilers use a cost function to evaluate the number of SWAP gates for different routes and then select the one with the minimum number of SWAP gates. After qubit routing, the quantum compiler performs gate optimizations upon the circuit with the newly inserted SWAP gates. In this paper, we observe that the aforementioned qubit routing is not optimal, and qubit routing should not be independent on subsequent gate optimizations. We find that with the consideration of gate optimizations, not all of the SWAP gates have the same basis-gate cost. These insights lead to the development of our qubit routing algorithm, NASSC (Not All Swaps havemore »the Same Cost). NASSC is the first algorithm that considers the subsequent optimizations during the routing step. Our optimization-aware qubit routing leads to better routing decisions and benefits subsequent optimizations. We also propose a new optimization-aware decomposition for the inserted SWAP gates. Our experiments show that the routing overhead compiled with our routing algorithm is reduced by up to 69.30% (21.30% on average) in the number of CNOT gates and up to 43.50% (7.61% on average) in the circuit depth compared with the state-of-the-art scheme, SABRE.« less
  5. Quantum technologies currently struggle to scale beyond moderate scale prototypes and are unable to execute even reasonably sized programs due to prohibitive gate error rates or coherence times. Many software approaches rely on heavy compiler optimization to squeeze extra value from noisy machines but are fundamentally limited by hardware. Alone, these software approaches help to maximize the use of available hardware but cannot overcome the inherent limitations posed by the underlying technology. An alternative approach is to explore the use of new, though potentially less developed, technology as a path towards scalability. In this work we evaluate the advantages and disadvantages of a Neutral Atom (NA) architecture. NA systems offer several promising advantages such as long range interactions and native multiqubit gates which reduce communication overhead, overall gate count, and depth for compiled programs. Long range interactions, however, impede parallelism with restriction zones surrounding interacting qubit pairs. We extend current compiler methods to maximize the benefit of these advantages and minimize the cost. Furthermore, atoms in an NA device have the possibility to randomly be lost over the course of program execution which is extremely detrimental to total program execution time as atom arrays are slow to load. When themore »compiled program is no longer compatible with the underlying topology, we need a fast and efficient coping mechanism. We propose hardware and compiler methods to increase system resilience to atom loss dramatically reducing total computation time by circumventing complete reloads or full recompilation every cycle.« less