Abstract The Ising model provides a natural mapping for many computationally hard combinatorial optimization problems (COPs). Consequently, dynamical system-inspired computing models and hardware platforms that minimize the Ising Hamiltonian, have recently been proposed as a potential candidate for solving COPs, with the promise of significant performance benefit. However, prior work on designing dynamical systems as Ising machines has primarily considered quadratic interactions among the nodes. Dynamical systems and models considering higher order interactions among the Ising spins remain largely unexplored, particularly for applications in computing. Therefore, in this work, we propose Ising spin-based dynamical systems that consider higher order (> 2) interactions among the Ising spins, which subsequently, enables us to develop computational models to directly solve many COPs that entail such higher order interactions (i.e., COPs on hypergraphs). Specifically, we demonstrate our approach by developing dynamical systems to compute the solution for the Boolean NAE-K-SAT (K ≥ 4) problem as well as solve the Max-K-Cut of a hypergraph. Our work advances the potential of the physics-inspired ‘toolbox’ for solving COPs.
more »
« less
Data Movement-Aware, Ping-Pong Ising Machine Supporting Full Connectivity and Variable Bitwidths
Ising computation is an emerging paradigm for efficiently solving the time-consuming Combinatorial Optimization problems (COP). In particular, Compute-In-Memory (CIM) based Ising machines are promising for Hamiltonian computations capturing the spin state dynamics using a dense memory array. However, the advancement of CIM-based Ising machines for accurately solving complex COPs is limited by the lack of full spin-connectivity, small bitwidths of the spin interaction coefficients (J), data movement energy costs within the CIM, and the area/energy overheads of peripheral CIM analog circuits. In this work, we present a data movement-aware, CIM-based Ising machine for efficiently solving COPs. The unique design contributions are: (i) “Ping-Pong” transpose array architecture restricting single-bit spin data movement within the memory array while maintaining interaction coefficients (J) stationary, (ii) Fully connected spins and multi-bit J for >329× faster solution time, (iii) Configurable J bit-widths and spin connectivity to support wide variety of complex COPs. (iv) Bitcell-Reference (BR) based capacitor-bank-less ADC for sensing, occupying 1.81× less area than the baseline SAR ADC. The silicon prototype in 65 nm CMOS demonstrates >4.7× better power efficiency, and >9.8× better area efficiency compared to prior works.
more »
« less
- Award ID(s):
- 2145236
- PAR ID:
- 10560652
- Publisher / Repository:
- IEEE
- Date Published:
- ISBN:
- 979-8-3503-8813-8
- Page Range / eLocation ID:
- 721 to 724
- Format(s):
- Medium: X
- Location:
- Bruges, Belgium
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Abstract Realizing increasingly complex artificial intelligence (AI) functionalities directly on edge devices calls for unprecedented energy efficiency of edge hardware. Compute-in-memory (CIM) based on resistive random-access memory (RRAM) 1 promises to meet such demand by storing AI model weights in dense, analogue and non-volatile RRAM devices, and by performing AI computation directly within RRAM, thus eliminating power-hungry data movement between separate compute and memory 2–5 . Although recent studies have demonstrated in-memory matrix-vector multiplication on fully integrated RRAM-CIM hardware 6–17 , it remains a goal for a RRAM-CIM chip to simultaneously deliver high energy efficiency, versatility to support diverse models and software-comparable accuracy. Although efficiency, versatility and accuracy are all indispensable for broad adoption of the technology, the inter-related trade-offs among them cannot be addressed by isolated improvements on any single abstraction level of the design. Here, by co-optimizing across all hierarchies of the design from algorithms and architecture to circuits and devices, we present NeuRRAM—a RRAM-based CIM chip that simultaneously delivers versatility in reconfiguring CIM cores for diverse model architectures, energy efficiency that is two-times better than previous state-of-the-art RRAM-CIM chips across various computational bit-precisions, and inference accuracy comparable to software models quantized to four-bit weights across various AI tasks, including accuracy of 99.0 percent on MNIST 18 and 85.7 percent on CIFAR-10 19 image classification, 84.7-percent accuracy on Google speech command recognition 20 , and a 70-percent reduction in image-reconstruction error on a Bayesian image-recovery task.more » « less
-
Abstract A prominent approach to solving combinatorial optimization problems on parallel hardware is Ising machines, i.e., hardware implementations of networks of interacting binary spin variables. Most Ising machines leverage second-order interactions although important classes of optimization problems, such as satisfiability problems, map more seamlessly to Ising networks with higher-order interactions. Here, we demonstrate that higher-order Ising machines can solve satisfiability problems more resource-efficiently in terms of the number of spin variables and their connections when compared to traditional second-order Ising machines. Further, our results show on a benchmark dataset of Booleank-satisfiability problems that higher-order Ising machines implemented with coupled oscillators rapidly find solutions that are better than second-order Ising machines, thus, improving the current state-of-the-art for Ising machines.more » « less
-
null (Ed.)The increasingly central role of speech based human computer interaction necessitates on-device, low-latency, low-power, high-accuracy key word spotting (KWS). State-of-the-art accuracies on speech-related tasks have been achieved by long short-term memory (LSTM) neural network (NN) models. Such models are typically computationally intensive because of their heavy use of Matrix vector multiplication (MVM) operations. Compute-in-Memory (CIM) architectures, while well suited to MVM operations, have not seen widespread adoption for LSTMs. In this paper we adapt resistive random access memory based CIM architectures for KWS using LSTMs. We find that a hybrid system composed of CIM cores and digital cores achieves 90% test accuracy on the google speech data set at the cost of 25 uJ/decision. Our optimized architecture uses 5-bit inputs, and analog weights to produce 6-bit outputs. All digital computation are performed with 8-bit precision leading to a 3.7× improvement in computational efficiency compared to equivalent digital systems at that accuracy.more » « less
-
In this paper, we pave a novel way towards the concept of bit-wise In-Memory Convolution Engine (IMCE) that could implement the dominant convolution computation of Deep Convolutional Neural Networks (CNN) within memory. IMCE employs parallel computational memory sub-array as a fundamental unit based on our proposed Spin Orbit Torque Magnetic Random Access Memory (SOT-MRAM) design. Then, we propose an accelerator system architecture based on IMCE to efficiently process low bit-width CNNs. This architecture can be leveraged to greatly reduce energy consumption dealing with convolutional layers and also accelerate CNN inference. The device to architecture co-simulation results show that the proposed system architecture can process low bit-width AlexNet on ImageNet data-set favorably with 785.25μJ/img, which consumes ~3× less energy than that of recent RRAM based counterpart. Besides, the chip area is ~4× smaller.more » « less
An official website of the United States government

