Superconductor electronics (SCE) promise computer systems with orders of magnitude higher speeds and lower energy consumption than their complementary metal–oxide semiconductor (CMOS) counterparts. At the same time, the scalability and resource utilization of superconducting systems are major concerns. Some of these concerns come from device-level challenges and the gap between SCE and CMOS technology nodes, and others come from the way Josephson junctions (JJs) are used. Toward this end, we notice that a considerable fraction of hardware resources are not involved in logic operations, but rather are used for fan-out and buffering purposes. In this article, we ask if there is a way to reduce these overheads, propose the use of JJs at the cell boundaries to increase the number of outputs that a single stage can drive, and establish a set of rules to discretize critical currents in a way that is conducive to this assignment. Finally, we explore the design trade-offs that the presented approach opens up and demonstrate its promise through detailed analog simulations and modeling analyses. Our experiments indicate that the introduced method leads to a 48% savings in the JJ count for a tree with a fan-out of 1024, as well as an average of 43% of the JJ count for signal splitting and 32% for clock splitting in ISCAS’85 benchmarks.
more »
« less
This content will become publicly available on December 19, 2025
Superconductor bistable vortex memory for data storage and readout
Abstract Despite superconductor electronics (SCE) advantages, the realization of SCE logic faces a significant challenge due to the absence of dense and scalable nonvolatile memory. While various nonvolatile memory technologies, including Non-destructive readout, vortex transitional memory, and magnetic memory, have been explored, designing a dense crossbar array and achieving a superconductor random-access memory remains challenging. This work introduces a novel, nonvolatile, high-density, and scalable vortex-based memory design for SCE logic called bistable vortex memory. Our proposed design addresses scaling issues with an estimated area of 10 × 10 um2while boasting zero static power with the dynamic energy consumption of 12 aJ for single-bit read and write operations. The current summation capability enables analog operations for in-memory or near-memory computational tasks. We demonstrate the efficacy of our approach with a 32 × 32 superconductor memory array operating at 20 GHz. Additionally, we showcase the accumulation property of the memory through analog simulations conducted on an 8 × 8 superconductor crossbar array.
more »
« less
- Award ID(s):
- 2124453
- PAR ID:
- 10579139
- Publisher / Repository:
- IEEE
- Date Published:
- Journal Name:
- Superconductor Science and Technology
- Volume:
- 38
- Issue:
- 1
- ISSN:
- 0953-2048
- Page Range / eLocation ID:
- 015020
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Nowadays, research topics on AI accelerator designs have attracted great interest, where accelerating Deep Neural Network (DNN) using Processing-in-Memory (PIM) platforms is an actively-explored direction with great potential. PIM platforms, which simultaneously aims to address power- and memory-wall bottlenecks, have shown orders of performance enhancement in comparison to the conventional computing platforms with Von-Neumann architecture. As one direction of accelerating DNN in PIM, resistive memory array (aka. crossbar) has drawn great research interest owing to its analog current-mode weighted summation operation which intrinsically matches the dominant Multiplication-and-Accumulation (MAC) operation in DNN, making it one of the most promising candidates. An alternative direction for PIM-based DNN acceleration is through bulk bit-wise logic operations directly performed on the content in digital memories. Thanks to the high fault-tolerant characteristic of DNN, the latest algorithmic progression successfully quantized DNN parameters to low bit-width representations, while maintaining competitive accuracy levels. Such DNN quantization techniques essentially convert MAC operation to much simpler addition/subtraction or comparison operations, which can be performed by bulk bit-wise logic operations in a highly parallel fashion. In this paper, we build a comprehensive evaluation framework to quantitatively compare and analyze aforementioned PIM based analog and digital approaches for DNN acceleration.more » « less
-
Abstract Superconductor Electronics (SCE) is a fast and power-efficient technology with great potential for overcoming conventional CMOS electronics' scaling limits. Nevertheless, the primary challenge confronting SCE today is its integration level, which lags several orders of magnitude behind CMOS circuits. In this study, we have innovated and simulated a novel logic family grounded in the principles of phase shifts occurring in 0 and π Josephson junctions. The fast phase logic (FPL) eliminates the need for large inductor loops and shunt resistances by combining the half-flux and phase logic. Therefore, the Josephson junction (JJ) area only limits the integration density. The cells designed with this paradigm are fast, and the clock-to-Q delay for logic cells is about 4ps. While maintaining over 50% parameter margins for wiring cells. This logic is power efficient and can increase the integration by at least 100 times in the SCE chips.more » « less
-
Abstract The superior density of passive analog-grade memristive crossbar circuits enables storing large neural network models directly on specialized neuromorphic chips to avoid costly off-chip communication. To ensure efficient use of such circuits in neuromorphic systems, memristor variations must be substantially lower than those of active memory devices. Here we report a 64 × 64 passive crossbar circuit with ~99% functional nonvolatile metal-oxide memristors. The fabrication technology is based on a foundry-compatible process with etch-down patterning and a low-temperature budget. The achieved <26% coefficient of variance in memristor switching voltages is sufficient for programming a 4K-pixel gray-scale pattern with a <4% relative tuning error on average. Analog properties are also successfully verified via experimental demonstration of a 64 × 10 vector-by-matrix multiplication with an average 1% relative conductance import accuracy to model the MNIST image classification by ex-situ trained single-layer perceptron, and modeling of a large-scale multilayer perceptron classifier based on more advanced conductance tuning algorithm.more » « less
-
Latest algorithmic development has brought competitive classification accuracy for neural networks despite constraining the network parameters to ternary or binary representations. These findings show significant optimization opportunities to replace computationally-intensive convolution operations (based on multiplication) with more efficient and less complex operations such as addition. In hardware implementation domain, processing-in-memory architecture is becoming a promising solution to alleviate enormous energy-hungry data communication between memory and processing units, bringing considerable improvement for system performance and energy efficiency while running such large networks. In this paper, we review several of our recent works regarding Processing-in-Memory (PIM) accelerator based on Magnetic Random Access Memory computational sub-arrays to accelerate the inference mode of quantized neural networks using digital non-volatile memory rather than using analog crossbar operation. In this way, we investigate the performance of two distinct in-memory addition schemes compared to other digital methods based on processing-in-DRAM/GPU/ASIC design to tackle DNN power and memory wall bottleneck.more » « less
An official website of the United States government
