skip to main content


The NSF Public Access Repository (NSF-PAR) system and access will be unavailable from 11:00 PM ET on Thursday, May 23 until 2:00 AM ET on Friday, May 24 due to maintenance. We apologize for the inconvenience.

Title: Not All SWAPs Have the Same Cost: A Case for Optimization-Aware Qubit Routing
Despite rapid advances in quantum computing technologies, the qubit connectivity limitation remains to be a critical challenge. Both near-term NISQ quantum computers and relatively long-term scalable quantum architectures do not offer full connectivity. As a result, quantum circuits may not be directly executed on quantum hardware, and a quantum compiler needs to perform qubit routing to make the circuit compatible with the device layout. During the qubit routing step, the compiler inserts SWAP gates and performs circuit transformations. Given the connectivity topology of the target hardware, there are typically multiple qubit routing candidates. The state-of-the-art compilers use a cost function to evaluate the number of SWAP gates for different routes and then select the one with the minimum number of SWAP gates. After qubit routing, the quantum compiler performs gate optimizations upon the circuit with the newly inserted SWAP gates. In this paper, we observe that the aforementioned qubit routing is not optimal, and qubit routing should not be independent on subsequent gate optimizations. We find that with the consideration of gate optimizations, not all of the SWAP gates have the same basis-gate cost. These insights lead to the development of our qubit routing algorithm, NASSC (Not All Swaps have the Same Cost). NASSC is the first algorithm that considers the subsequent optimizations during the routing step. Our optimization-aware qubit routing leads to better routing decisions and benefits subsequent optimizations. We also propose a new optimization-aware decomposition for the inserted SWAP gates. Our experiments show that the routing overhead compiled with our routing algorithm is reduced by up to 69.30% (21.30% on average) in the number of CNOT gates and up to 43.50% (7.61% on average) in the circuit depth compared with the state-of-the-art scheme, SABRE.  more » « less
Award ID(s):
Author(s) / Creator(s):
; ;
Date Published:
Journal Name:
2022 IEEE International Symposium on High-Performance Computer Architecture (HPCA)
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Quantum computing is gaining momentum in revolutionizing the way we approach complex problem-solving. However, the practical implementation of quantum algorithms remains a significant challenge due to the error-prone and hardware limits of near-term quantum devices. For instance, physical qubit connections are limited, which necessitates the use of quantum SWAP gates to dynamically transform the logical topology during execution. In addition, to optimize fidelity, it is essential to ensure that 1) the allocated hardware has a low error rate and 2) the number of SWAP gates injected into the circuit is minimized. To address these challenges, we propose a suite of algorithms: the Fidelity-aware Graph Extraction Algorithm (FGEA) is used to identify the hardware region with the lowest probability of error, the Frequency-based Mapping Algorithm (FMA) allocates logical-physical qubits that reduce the potential distance of topological transformation, and the Heuristic Routing Algorithm (HRA) searches for an optimal swapping injection strategy. We evaluate the proposed algorithms on the IBM-provided Noisy Intermediate-Scale Quantum (NISQ) computer, using a dataset consisting of 17 different quantum circuits of various sizes. The circuits are executed on the IBM Toronto Falcon processor. The three proposed algorithms outperform the existing SABRE algorithm in reducing the number of SWAP gates required. Therefore, our proposed algorithms hold significant promise in enhancing the fidelity and reducing the number of SWAP gates required in implementing Quantum algorithms. 
    more » « less
  2. Current quantum computers are especially error prone and require high levels of optimization to reduce operation counts and maximize the probability the compiled program will succeed. These computers only support operations decomposed into one- and two-qubit gates and only two-qubit gates between physically connected pairs of qubits. Typical compilers first decompose operations, then route data to connected qubits. We propose a new compiler structure, Orchestrated Trios, that first decomposes to the three-qubit Toffoli, routes the inputs of the higher-level Toffoli operations to groups of nearby qubits, then finishes decomposition to hardware-supported gates. This significantly reduces communication overhead by giving the routing pass access to the higher-level structure of the circuit instead of discarding it. A second benefit is the ability to now select an architecture-tuned Toffoli decomposition such as the 8-CNOT Toffoli for the specific hardware qubits now known after the routing pass. We perform real experiments on IBM Johannesburg showing an average 35% decrease in two-qubit gate count and 23% increase in success rate of a single Toffoli over Qiskit. We additionally compile many near-term benchmark algorithms showing an average 344% increase in (or 4.44x) simulated success rate on the Johannesburg architecture and compare with other architecture types. 
    more » « less
  3. The current phase of quantum computing is in the Noisy Intermediate-Scale Quantum (NISQ) era. On NISQ devices, two-qubit gates such as CNOTs are much noisier than single-qubit gates, so it is essential to minimize their count. Quantum circuit synthesis is a process of decomposing an arbitrary unitary into a sequence of quantum gates, and can be used as an optimization tool to produce shorter circuits to improve overall circuit fidelity. However, the time-to-solution of synthesis grows exponentially with the number of qubits. As a result, synthesis is intractable for circuits on a large qubit scale. In this paper, we propose a hierarchical, block-by-block opti-mization framework, QGo, for quantum circuit optimization. Our approach allows an exponential cost optimization to scale to large circuits. QGo uses a combination of partitioning and synthesis: 1) partition the circuit into a sequence of independent circuit blocks; 2) re-generate and optimize each block using quantum synthesis; and 3) re-compose the final circuit by stitching all the blocks together. We perform our analysis and show the fidelity improvements in three different regimes: small-size circuits on real devices, medium-size circuits on noisy simulations, and large-size circuits on analytical models. Our technique can be applied after existing optimizations to achieve higher circuit fidelity. Using a set of NISQ benchmarks, we show that QGo can reduce the number of CNOT gates by 29.9% on average and up to 50% when compared with industrial compiler optimizations such as t|ket). When executed on the IBM Athens system, shorter depth leads to higher circuit fidelity. We also demonstrate the scalability of our QGo technique to optimize circuits of 60+ qubits, Our technique is the first demonstration of successfully employing and scaling synthesis in the compilation tool chain for large circuits. Overall, our approach is robust for direct incorporation in production compiler toolchains to further improve the circuit fidelity. 
    more » « less
  4. In recent years, Quantum Computing (QC) has progressed to the point where small working prototypes are available for use. Termed Noisy Intermediate-Scale Quantum (NISQ) computers, these prototypes are too small for large benchmarks or even for Quantum Error Correction, but they do have sufficient resources to run small benchmarks, particularly if compiled with optimizations to make use of scarce qubits and limited operation counts and coherence times. QC has not yet, however, settled on a particular preferred device implementation technology, and indeed different NISQ prototypes implement qubits with very different physical approaches and therefore widely-varying device and machine characteristics. Our work performs a full-stack, benchmark-driven hardware-software analysis of QC systems. We evaluate QC architectural possibilities, software-visible gates, and software optimizations to tackle fundamental design questions about gate set choices, communication topology, the factors affecting benchmark performance and compiler optimizations. In order to answer key cross-technology and cross-platform design questions, our work has built the first top-to-bottom toolflow to target different qubit device technologies, including superconducting and trapped ion qubits which are the current QC front-runners. We use our toolflow, TriQ, to conduct real-system measurements on 7 running QC prototypes from 3 different groups, IBM, Rigetti, and University of Maryland. From these real-system experiences at QC's hardware-software interface, we make observations about native and software-visible gates for different QC technologies, communication topologies, and the value of noise-aware compilation even on lower-noise platforms. This is the largest cross-platform real-system QC study performed thus far; its results have the potential to inform both QC device and compiler design going forward. 
    more » « less
  5. Compiling high-level quantum programs to machines that are size constrained (i.e. limited number of quantum bits) and time constrained (i.e. limited number of quantum operations) is challenging. In this paper, we present SQUARE (Strategic QUantum Ancilla REuse), a compilation infrastructure that tackles allocation and reclamation of scratch qubits (called ancilla) in modular quantum programs. At its core, SQUARE strategically performs uncomputation to create opportunities for qubit reuse. Current Noisy Intermediate-Scale Quantum (NISQ) computers and forward-looking Fault-Tolerant (FT) quantum computers have fundamentally different constraints such as data locality, instruction parallelism, and communication overhead. Our heuristic-based ancilla-reuse algorithm balances these considerations and fits computations into resource-constrained NISQ or FT quantum machines, throttling parallelism when necessary. To precisely capture the workload of a program, we propose an improved metric, the "active quantum volume," and use this metric to evaluate the effectiveness of our algorithm. Our results show that SQUARE improves the average success rate of NISQ applications by 1.47X. Surprisingly, the additional gates for uncomputation create ancilla with better locality, and result in substantially fewer swap gates and less gate noise overall. SQUARE also achieves an average reduction of 1.5X (and up to 9.6X) in active quantum volume for FT machines. 
    more » « less