skip to main content


Title: A Monolithic Stochastic Computing Architecture for Energy Efficient Arithmetic
Abstract

As the energy and hardware investments necessary for conventional high‐precision digital computing continue to explode in the era of artificial intelligence (AI), a change in paradigm that can trade precision for energy and resource efficiency is being sought for many computing applications. Stochastic computing (SC) is an attractive alternative since, unlike digital computers, which require many logic gates and a high transistor volume to perform basic arithmetic operations such as addition, subtraction, multiplication, sorting, etc., SC can implement the same using simple logic gates. While it is possible to accelerate SC using traditional silicon complementary metal–oxide–semiconductor (CMOS) technology, the need for extensive hardware investment to generate stochastic bits (s‐bits), the fundamental computing primitive for SC, makes it less attractive. Memristor and spin‐based devices offer natural randomness but depend on hybrid designs involving CMOS peripherals for accelerating SC, which increases area and energy burden. Here, the limitations of existing and emerging technologies are overcome, and a standalone SC architecture embedded in memory and based on 2D memtransistors is experimentally demonstrated. The monolithic and non‐von‐Neumann SC architecture occupies a small hardware footprint and consumes a miniscule amount of energy (<1 nJ) for both s‐bit generation and arithmetic operations, highlighting the benefits of SC.

 
more » « less
NSF-PAR ID:
10384695
Author(s) / Creator(s):
 ;  ;  ;  ;  ;  
Publisher / Repository:
Wiley Blackwell (John Wiley & Sons)
Date Published:
Journal Name:
Advanced Materials
Volume:
35
Issue:
2
ISSN:
0935-9648
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Stochastic computing (SC) reduces the complexity of computation by representing numbers with long streams of independent bits. However, increasing performance in SC comes with either an increase in area or a loss in accuracy. Processing in memory (PIM) computes data in-place while having high memory density and supporting bit-parallel operations with low energy consumption. In this article, we propose COSMO, an architecture for co mputing with s tochastic numbers in me mo ry, which enables SC in memory. The proposed architecture is general and can be used for a wide range of applications. It is a highly dense and parallel architecture that supports most SC encodings and operations in memory. It maximizes the performance and energy efficiency of SC by introducing several innovations: (i) in-memory parallel stochastic number generation, (ii) efficient implication-based logic in memory, (iii) novel memory bit line segmenting, (iv) a new memory-compatible SC addition operation, and (v) enabling flexible block allocation. To show the generality and efficiency of our stochastic architecture, we implement image processing, deep neural networks (DNNs), and hyperdimensional (HD) computing on the proposed hardware. Our evaluations show that running DNN inference on COSMO is 141× faster and 80× more energy efficient as compared to GPU. 
    more » « less
  2. In-memory-computing (IMC) SRAM architecture has gained significant attention as it achieves high energy efficiency for computing a convolutional neural network (CNN) model [1]. Recent works investigated the use of analog-mixed-signal (AMS) hardware for high area and energy efficiency [2], [3]. However, AMS hardware output is well known to be susceptible to process, voltage, and temperature (PVT) variations, limiting the computing precision and ultimately the inference accuracy of a CNN. We reconfirmed, through the simulation of a capacitor-based IMC SRAM macro that computes a 256D binary dot product, that the AMS computing hardware has a significant root-mean-square error (RMSE) of 22.5% across the worst-case voltage, temperature (Fig. 16.1.1 top left) and 3-sigma process variations (Fig. 16.1.1 top right). On the other hand, we can implement an IMC SRAM macro using robust digital logic [4], which can virtually eliminate the variability issue (Fig. 16.1.1 top). However, digital circuits require more devices than AMS counterparts (e.g., 28 transistors for a mirror full adder [FA]). As a result, a recent digital IMC SRAM shows a lower area efficiency of 6368F2/b (22nm, 4b/4b weight/activation) [5] than the AMS counterpart (1170F2/b, 65nm, 1b/1b) [3]. In light of this, we aim to adopt approximate arithmetic hardware to improve area and power efficiency and present two digital IMC macros (DIMC) with different levels of approximation (Fig. 16.1.1 bottom left). Also, we propose an approximation-aware training algorithm and a number format to minimize inference accuracy degradation induced by approximate hardware (Fig. 16.1.1 bottom right). We prototyped a 28nm test chip: for a 1b/1b CNN model for CIFAR-10 and across 0.5-to-1.1V supply, the DIMC with double-approximate hardware (DIMC-D) achieves 2569F2/b, 932-2219TOPS/W, 475-20032GOPS, and 86.96% accuracy, while for a 4b/1b CNN model, the DIMC with the single-approximate hardware (DIMC-S) achieves 3814F2/b, 458-990TOPS/W 
    more » « less
  3. Abstract Bayesian networks (BNs) find widespread application in many real-world probabilistic problems including diagnostics, forecasting, computer vision, etc. The basic computing primitive for BNs is a stochastic bit (s-bit) generator that can control the probability of obtaining ‘1’ in a binary bit-stream. While silicon-based complementary metal-oxide-semiconductor (CMOS) technology can be used for hardware implementation of BNs, the lack of inherent stochasticity makes it area and energy inefficient. On the other hand, memristors and spintronic devices offer inherent stochasticity but lack computing ability beyond simple vector matrix multiplication due to their two-terminal nature and rely on extensive CMOS peripherals for BN implementation, which limits area and energy efficiency. Here, we circumvent these challenges by introducing a hardware platform based on 2D memtransistors. First, we experimentally demonstrate a low-power and compact s-bit generator circuit that exploits cycle-to-cycle fluctuation in the post-programmed conductance state of 2D memtransistors. Next, the s-bit generators are monolithically integrated with 2D memtransistor-based logic gates to implement BNs. Our findings highlight the potential for 2D memtransistor-based integrated circuits for non-von Neumann computing applications. 
    more » « less
  4. One of the key knowledge areas in Computer Science (CS) is Digital Logic and Computer Architecture where the learning outcome is an understanding of Boolean algebra, logic gates, registers, or arithmetic logic units, etc. and explaining how software and hardware are related to a computing system. Experimental Centric based Instructional Pedagogy (ECP) with portable laboratory instrumentation might provide real hands-on experience to obtain a practical understanding of those concepts at a lower cost compared with virtual hands-on laboratories that lack direct interaction with real apparatus or no integration of labs in the course. This work presents the initial adaptation of ECP to introduce the fundamentals of digital logic concepts in a Computer Architecture course in Spring 2022 for the first time in a CS department at a university teaching such courses without a lab and serving predominantly minority students. To establish a conducive and dynamic classroom environment by discovering course content through exploration, students majoring in CS were introduced to several logic gate types, worked with breadboards to connect circuits, and carried out operations to produce the necessary output using the commercial ADALM 1K Active Learning Module. To evaluate the impact of the ECP on students; performance in the class, three different evaluation methods were used, such as classroom observation, a signature assignment, and a Motivated Strategies for Learning Questionnaire (MSLQ) survey. The Classroom Observation Protocol for Undergraduate STEM (COPUS) findings indicated greater student engagement when ECP is used; the Signature assignment results indicated improved learning outcomes for students; and the MLSQ survey, which measures students; motivation, critical thinking, curiosity, collaboration, and metacognition, determined a positive impact of the ECP on the CS participants. 
    more » « less
  5. Stochastic computing (SC) is a re-emerging computing paradigm providing low-cost and noise-tolerant designs for a wide range of arithmetic operations. SC circuits operate on uniform bit-streams with the value determined by the probability of observing 1’s in the bit-stream. The accuracy of SC operations highly depends on the correlation between input bit-streams. While some operations such as minimum and maximum value functions require highly correlated inputs, some other such as multiplication operation need uncorrelated or independent inputs for accurate computation. Developing low-cost and accurate correlation manipulation circuits is an important research in SC as these circuits can manage correlation between bit-streams without expensive bit-stream regeneration. This work proposes a novel in-stream correlator and decorrelator circuit that manages 1) correlation between stochastic bit-streams, and 2) distribution of 1’s in the output bit-streams. Compared to state-of-the-art solutions, our designs achieve lower hardware cost and higher accuracy. The output bit-streams enjoy a low-discrepancy distribution of bits which leads to higher quality of results. The effectiveness of the proposed circuits is shown with two case studies: SC design of sorting and median filtering 
    more » « less