Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher.
Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?
Some links on this page may take you to non-federal websites. Their policies may differ from this site.
-
The control of cryogenic qubits in today’s super-conducting quantum computer prototypes presents significant scalability challenges due to the massive costs of generating/routing the analog control signals that need to be sent from a classical controller at room temperature to the quantum chip inside the dilution refrigerator. Thus, researchers in industry and academia have focused on designing in-fridge classical controllers in order to mitigate these challenges. Due to the maturity of CMOS logic, many industrial efforts (Microsoft, Intel) have focused on Cryo-CMOS as a near-term solution to design in-fridge classical controllers. Meanwhile, Supercon-ducting Single Flux Quantum (SFQ) is an alternative, less mature classical logic family proposed for large-scale in-fridge controllers. SFQ logic has the potential to maximize scalability thanks to its ultra-high speed and very low power consumption. However, architecture design for SFQ logic poses challenges due to its unconventional pulse-driven nature and lack of dense memory and logic. Thus, research at the architecture level is essential to guide architects to design SFQ-based classical controllers for large-scale quantum machines.In this paper, we present DigiQ, the first system-level design of a Noisy Intermediate Scale Quantum (NISQ)-friendly SFQ-based classical controller. We perform a design space exploration of SFQ-based controllers and co-design the quantum gate decompositions and SFQ-based implementation of those decompositions to find an optimal SFQ-friendly design point that trades area and power for latency and control while ensuring good quantum algorithmic performance. Our co-design results in a single instruction, multiple data (SIMD) controller architecture, which has high scalability, but imposes new challenges on the calibration of control pulses. We present software-level solutions to address these challenges, which if unaddressed would degrade quantum circuit fidelity given the imperfections of qubit hardware.To validate and characterize DigiQ, we first implement it using hardware description languages and synthesize it using state-of-the-art/validated SFQ synthesis tools. Our synthesis results show that DigiQ can operate within the tight power and area budget of dilution refrigerators at >42,000-qubit scales. Second, we confirm the effectiveness of DigiQ in running quantum algorithms by modeling the execution time and fidelity of a variety of NISQ applications. We hope that the promising results of this paper motivate experimentalists to further explore SFQ-based quantum controllers to realize large-scale quantum machines with maximized scalability.more » « less
-
Scalability of today’s superconducting quantum computers is limited due to the huge costs of generating/routing microwave control pulses per qubit from room temperature. One active research area in both industry and academia is to push the classical controllers to the dilution refrigerator in order to increase the scalability of quantum computers. Superconducting Single Flux Quantum (SFQ) is a classical logic technology with low power consumption and ultra-high speed, and thus is a promising candidate for in-fridge classical controllers with maximized scalability. Prior work has demonstrated high-fidelity SFQ-based single-qubit gates. However, little research has been done on SFQ-based multi-qubit gates, which are necessary to realize SFQ-based universal quantum computing.In this paper, we present the first thorough analysis of SFQ-based two-qubit gates. Our observations show that SFQ-based two-qubit gates tend to have high leakage to qubit non-computational subspace, which presents severe design challenges. We show that despite these challenges, we can realize gates with high fidelity by carefully designing optimal control methods and qubit architectures. We develop optimal control methods that suppress leakage, and also investigate various qubit architectures that reduce the leakage. After carefully engineering our SFQ-friendly quantum system, we show that it can achieve similar gate fidelity and gate time to microwave-based quantum systems. The promising results of this paper show that (1) SFQ-based universal quantum computation is both feasible and effective; and (2) SFQ is a promising approach in designing classical controller for quantum machines because it can increase the scalability while preserving gate fidelity and performance.more » « less
-
null (Ed.)We present a hybrid optical-electrical analog deep learning (DL) accelerator, the first work to use incoherent optical signals for DL workloads. Incoherent optical designs are more attractive than coherent ones as the former can be more easily realized in practice. However, a significant challenge in analog DL accelerators, where multiply-accumulate operations are dominant, is that there is no known solution to perform accumulation using incoherent optical signals. We overcome this challenge by devising a hybrid approach: accumulation is done in the electrical domain, while multiplication is performed in the optical domain. The key technology enabler of our design is the transistor laser, which performs electrical-to-optical and optical-to-electrical conversions efficiently to tightly integrate electrical and optical devices into compact circuits. As such, our design fully realizes the ultra high-speed and high-energy-efficiency advantages of analog and optical computing. Our evaluation results using the MNIST benchmark show that our design achieves 2214× and 65× improvements in latency and energy, respectively, compared to a state-of-the-art memristor-based analog design.more » « less
-
null (Ed.)Quantum computers are growing in size, and design decisions are being made now that attempt to squeeze more computation out of these machines. In this spirit, we design a method to boost the computational power of near-term quantum computers by adapting protocols used in quantum error correction to implement "Approximate Quantum Error Correction (AQEC)." By approximating fully-fledged error correction mechanisms, we can increase the compute volume (qubits × gates, or "Simple Quantum Volume (SQV)") of near-term machines. The crux of our design is a fast hardware decoder that can approximately decode detected error syndromes rapidly. Specifically, we demonstrate a proof-of-concept that approximate error decoding can be accomplished online in near-term quantum systems by designing and implementing a novel algorithm in Single-Flux Quantum (SFQ) superconducting logic technology. This avoids a critical decoding backlog, hidden in all offline decoding schemes, that leads to idle time exponential in the number of T gates in a program. Our design utilizes one SFQ processing module per physical qubit. Employing state-of-the-art SFQ synthesis tools, we show that the circuit area, power, and latency are within the constraints of contemporary quantum system designs. Under pure dephasing error models, the proposed accelerator and AQEC solution is able to expand SQV by factors between 3,402 and 11,163 on expected near-term machines. The decoder achieves a 5% accuracy-threshold and pseudo-thresholds of ∼ 5%,4.75%,4.5%, and 3.5% physical error-rates for code distances 3,5,7, and 9. Decoding solutions are achieved in a maximum of ∼20 nanoseconds on the largest code distances studied. By avoiding the exponential idle time in offline decoders, we achieve a 10x reduction in required code distances to achieve the same logical performance as alternative designs.more » « less
-
We present a new interposer-level optical network based on direct-modulated lasers such as vertical-cavity surfaceemitting lasers (VCSELs) or transistor lasers (TLs). Our key observation is that, the physics of these lasers is such that they must transmit significantly more power (21×) than is needed by the receiver. We take advantage of this excess optical power to create a new network architecture called Rome, which splits optical signals using passive splitters to allow flexible bandwidth allocation among different transmitter and receiver pairs while imposing minimal power and design costs. Using multi-chip module GPUs (MCM-GPUs) as a case study, we thoroughly evaluate network power and performance, and show that (1) Rome is capable of efficiently scaling up MCM-GPUs with up to 1024 streaming multiprocessors, and (2) Rome outperforms various competing designs in terms of energy efficiency (by up to 4×) and performance (by up to 143%).more » « less
-
We present the first all-optical network, Baldur, to enable power-efficient and high-speed communications in future exascale computing systems. The essence of Baldur is its ability to perform packet routing on-the-fly in the optical domain using an emerging technology called the transistor laser (TL), which presents interesting opportunities and challenges at the system level. Optical packet switching readily eliminates many inefficiencies associated with the crossings between optical and electrical domains. However, TL gates consume high power at the current technology node, which makes TL-based buffering and optical clock recovery impractical. Consequently, we must adopt novel (bufferless and clock-less) architecture and design approaches that are substantially different from those used in current networks. At the architecture level, we support a bufferless design by turning to techniques that have fallen out of favor for current networks. Baldur uses a low-radix, multi-stage network with a simple routing algorithm that drops packets to handle congestion, and we further incorporate path multiplicity and randomness to minimize packet drops. This design also minimizes the number of TL gates needed in each switch. At the logic design level, a non-conventional, length-based data encoding scheme is used to eliminate the need for clock recovery. We thoroughly validate and evaluate Baldur using a circuit simulator and a network simulator. Our results show that Baldur achieves up to 3,000X lower average latency while consuming 3.2X-26.4X less power than various state-of-the art networks under a wide variety of traffic patterns and real workloads, for the scale of 1,024 server nodes. Baldur is also highly scalable, since its power per node stays relatively constant as we increase the network size to over 1 million server nodes, which corresponds to 14.6X-31.0X power improvements compared to state-of-the-art networks at this scale.more » « less