skip to main content

This content will become publicly available on February 20, 2023

Title: DIMC: 2219TOPS/W 2569F2/b Digital In-Memory Computing Macro in 28nm Based on Approximate Arithmetic Hardware
In-memory-computing (IMC) SRAM architecture has gained significant attention as it achieves high energy efficiency for computing a convolutional neural network (CNN) model [1]. Recent works investigated the use of analog-mixed-signal (AMS) hardware for high area and energy efficiency [2], [3]. However, AMS hardware output is well known to be susceptible to process, voltage, and temperature (PVT) variations, limiting the computing precision and ultimately the inference accuracy of a CNN. We reconfirmed, through the simulation of a capacitor-based IMC SRAM macro that computes a 256D binary dot product, that the AMS computing hardware has a significant root-mean-square error (RMSE) of 22.5% across the worst-case voltage, temperature (Fig. 16.1.1 top left) and 3-sigma process variations (Fig. 16.1.1 top right). On the other hand, we can implement an IMC SRAM macro using robust digital logic [4], which can virtually eliminate the variability issue (Fig. 16.1.1 top). However, digital circuits require more devices than AMS counterparts (e.g., 28 transistors for a mirror full adder [FA]). As a result, a recent digital IMC SRAM shows a lower area efficiency of 6368F2/b (22nm, 4b/4b weight/activation) [5] than the AMS counterpart (1170F2/b, 65nm, 1b/1b) [3]. In light of this, we aim to adopt approximate arithmetic hardware to more » improve area and power efficiency and present two digital IMC macros (DIMC) with different levels of approximation (Fig. 16.1.1 bottom left). Also, we propose an approximation-aware training algorithm and a number format to minimize inference accuracy degradation induced by approximate hardware (Fig. 16.1.1 bottom right). We prototyped a 28nm test chip: for a 1b/1b CNN model for CIFAR-10 and across 0.5-to-1.1V supply, the DIMC with double-approximate hardware (DIMC-D) achieves 2569F2/b, 932-2219TOPS/W, 475-20032GOPS, and 86.96% accuracy, while for a 4b/1b CNN model, the DIMC with the single-approximate hardware (DIMC-S) achieves 3814F2/b, 458-990TOPS/W « less
Authors:
; ; ; ; ;
Award ID(s):
1919147
Publication Date:
NSF-PAR ID:
10342205
Journal Name:
2022 IEEE International Solid- State Circuits Conference (ISSCC)
Page Range or eLocation-ID:
266 to 268
Sponsoring Org:
National Science Foundation
More Like this
  1. This article presents C3SRAM, an in-memory-computing SRAM macro. The macro is an SRAM module with the circuits embedded in bitcells and peripherals to perform hardware acceleration for neural networks with binarized weights and activations. The macro utilizes analog-mixed-signal (AMS) capacitive-coupling computing to evaluate the main computations of binary neural networks, binary-multiply-and-accumulate operations. Without the need to access the stored weights by individual row, the macro asserts all its rows simultaneously and forms an analog voltage at the read bitline node through capacitive voltage division. With one analog-to-digital converter (ADC) per column, the macro realizes fully parallel vector–matrix multiplication in a single cycle. The network type that the macro supports and the computing mechanism it utilizes are determined by the robustness and error tolerance necessary in AMS computing. The C3SRAM macro is prototyped in a 65-nm CMOS. It demonstrates an energy efficiency of 672 TOPS/W and a speed of 1638 GOPS (20.2 TOPS/mm 2 ), achieving 3975 × better energy–delay product than the conventional digital baseline performing the same operation. The macro achieves 98.3% accuracy for MNIST and 85.5% for CIFAR-10, which is among the best in-memory computing works in terms of energy efficiency and inference accuracy tradeoff.
  2. Resonant tunneling diodes (RTDs) have come full-circle in the past 10 years after their demonstration in the early 1990s as the fastest room-temperature semiconductor oscillator, displaying experimental results up to 712 GHz and fmax values exceeding 1.0 THz [1]. Now the RTD is once again the preeminent electronic oscillator above 1.0 THz and is being implemented as a coherent source [2] and a self-oscillating mixer [3], amongst other applications. This paper concerns RTD electroluminescence – an effect that has been studied very little in the past 30+ years of RTD development, and not at room temperature. We present experiments and modeling of an n-type In0.53Ga0.47As/AlAs double-barrier RTD operating as a cross-gap light emitter at ~300K. The MBE-growth stack is shown in Fig. 1(a). A 15-μm-diam-mesa device was defined by standard planar processing including a top annular ohmic contact with a 5-μm-diam pinhole in the center to couple out enough of the internal emission for accurate free-space power measurements [4]. The emission spectra have the behavior displayed in Fig. 1(b), parameterized by bias voltage (VB). The long wavelength emission edge is at  = 1684 nm - close to the In0.53Ga0.47As bandgap energy of Ug ≈ 0.75 eV at 300 K.more »The spectral peaks for VB = 2.8 and 3.0 V both occur around  = 1550 nm (h = 0.75 eV), so blue-shifted relative to the peak of the “ideal”, bulk InGaAs emission spectrum shown in Fig. 1(b) [5]. These results are consistent with the model displayed in Fig. 1(c), whereby the broad emission peak is attributed to the radiative recombination between electrons accumulated on the emitter side, and holes generated on the emitter side by interband tunneling with current density Jinter. The blue-shifted main peak is attributed to the quantum-size effect on the emitter side, which creates a radiative recombination rate RN,2 comparable to the band-edge cross-gap rate RN,1. Further support for this model is provided by the shorter wavelength and weaker emission peak shown in Fig. 1(b) around = 1148 nm. Our quantum mechanical calculations attribute this to radiative recombination RR,3 in the RTD quantum well between the electron ground-state level E1,e, and the hole level E1,h. To further test the model and estimate quantum efficiencies, we conducted optical power measurements using a large-area Ge photodiode located ≈3 mm away from the RTD pinhole, and having spectral response between 800 and 1800 nm with a peak responsivity of ≈0.85 A/W at  =1550 nm. Simultaneous I-V and L-V plots were obtained and are plotted in Fig. 2(a) with positive bias on the top contact (emitter on the bottom). The I-V curve displays a pronounced NDR region having a current peak-to-valley current ratio of 10.7 (typical for In0.53Ga0.47As RTDs). The external quantum efficiency (EQE) was calculated from EQE = e∙IP/(∙IE∙h) where IP is the photodiode dc current and IE the RTD current. The plot of EQE is shown in Fig. 2(b) where we see a very rapid rise with VB, but a maximum value (at VB= 3.0 V) of only ≈2×10-5. To extract the internal quantum efficiency (IQE), we use the expression EQE= c ∙i ∙r ≡ c∙IQE where ci, and r are the optical-coupling, electrical-injection, and radiative recombination efficiencies, respectively [6]. Our separate optical calculations yield c≈3.4×10-4 (limited primarily by the small pinhole) from which we obtain the curve of IQE plotted in Fig. 2(b) (right-hand scale). The maximum value of IQE (again at VB = 3.0 V) is 6.0%. From the implicit definition of IQE in terms of i and r given above, and the fact that the recombination efficiency in In0.53Ga0.47As is likely limited by Auger scattering, this result for IQE suggests that i might be significantly high. To estimate i, we have used the experimental total current of Fig. 2(a), the Kane two-band model of interband tunneling [7] computed in conjunction with a solution to Poisson’s equation across the entire structure, and a rate-equation model of Auger recombination on the emitter side [6] assuming a free-electron density of 2×1018 cm3. We focus on the high-bias regime above VB = 2.5 V of Fig. 2(a) where most of the interband tunneling should occur in the depletion region on the collector side [Jinter,2 in Fig. 1(c)]. And because of the high-quality of the InGaAs/AlAs heterostructure (very few traps or deep levels), most of the holes should reach the emitter side by some combination of drift, diffusion, and tunneling through the valence-band double barriers (Type-I offset) between InGaAs and AlAs. The computed interband current density Jinter is shown in Fig. 3(a) along with the total current density Jtot. At the maximum Jinter (at VB=3.0 V) of 7.4×102 A/cm2, we get i = Jinter/Jtot = 0.18, which is surprisingly high considering there is no p-type doping in the device. When combined with the Auger-limited r of 0.41 and c ≈ 3.4×10-4, we find a model value of IQE = 7.4% in good agreement with experiment. This leads to the model values for EQE plotted in Fig. 2(b) - also in good agreement with experiment. Finally, we address the high Jinter and consider a possible universal nature of the light-emission mechanism. Fig. 3(b) shows the tunneling probability T according to the Kane two-band model in the three materials, In0.53Ga0.47As, GaAs, and GaN, following our observation of a similar electroluminescence mechanism in GaN/AlN RTDs (due to strong polarization field of wurtzite structures) [8]. The expression is Tinter = (2/9)∙exp[(-2 ∙Ug 2 ∙me)/(2h∙P∙E)], where Ug is the bandgap energy, P is the valence-to-conduction-band momentum matrix element, and E is the electric field. Values for the highest calculated internal E fields for the InGaAs and GaN are also shown, indicating that Tinter in those structures approaches values of ~10-5. As shown, a GaAs RTD would require an internal field of ~6×105 V/cm, which is rarely realized in standard GaAs RTDs, perhaps explaining why there have been few if any reports of room-temperature electroluminescence in the GaAs devices. [1] E.R. Brown,et al., Appl. Phys. Lett., vol. 58, 2291, 1991. [5] S. Sze, Physics of Semiconductor Devices, 2nd Ed. 12.2.1 (Wiley, 1981). [2] M. Feiginov et al., Appl. Phys. Lett., 99, 233506, 2011. [6] L. Coldren, Diode Lasers and Photonic Integrated Circuits, (Wiley, 1995). [3] Y. Nishida et al., Nature Sci. Reports, 9, 18125, 2019. [7] E.O. Kane, J. of Appl. Phy 32, 83 (1961). [4] P. Fakhimi, et al., 2019 DRC Conference Digest. [8] T. Growden, et al., Nature Light: Science & Applications 7, 17150 (2018). [5] S. Sze, Physics of Semiconductor Devices, 2nd Ed. 12.2.1 (Wiley, 1981). [6] L. Coldren, Diode Lasers and Photonic Integrated Circuits, (Wiley, 1995). [7] E.O. Kane, J. of Appl. Phy 32, 83 (1961). [8] T. Growden, et al., Nature Light: Science & Applications 7, 17150 (2018).« less
  3. RRAM-based in-memory computing (IMC) effectively accelerates deep neural networks (DNNs) and other machine learning algorithms. On the other hand, in the presence of RRAM device variations and lower precision, the mapping of DNNs to RRAM-based IMC suffers from severe accuracy loss. In this work, we propose a novel hybrid IMC architecture that integrates an RRAM-based IMC macro with a digital SRAM macro using a programmable shifter to compensate for the RRAM variations and recover the accuracy. The digital SRAM macro consists of a small SRAM memory array and an array of multiply-and-accumulate (MAC) units. The non-ideal output from the RRAM macro, due to device and circuit non-idealities, is compensated by adding the precise output from the SRAM macro. In addition, the programmable shifter allows for different scales of compensation by shifting the SRAM macro output relative to the RRAM macro output. On the algorithm side, we develop a framework for the training of DNNs to support the hybrid IMC architecture through ensemble learning. The proposed framework performs quantization (weights and activations), pruning, RRAM IMC-aware training, and employs ensemble learning through different compensation scales by utilizing the programmable shifter. Finally, we design a silicon prototype of the proposed hybrid IMC architecturemore »in the 65nm SUNY process to demonstrate its efficacy. Experimental evaluation of the hybrid IMC architecture shows that the SRAM compensation allows for a realistic IMC architecture with multi-level RRAM cells (MLC) even though they suffer from high variations. The hybrid IMC architecture achieves up to 21.9%, 12.65%, and 6.52% improvement in post-mapping accuracy over state-of-the-art techniques, at minimal overhead, for ResNet-20 on CIFAR-10, VGG-16 on CIFAR-10, and ResNet-18 on ImageNet, respectively.« less
  4. A bstract A search for a heavy resonance decaying to a top quark and a W boson in the fully hadronic final state is presented. The analysis is performed using data from proton-proton collisions at a center-of-mass energy of 13 TeV, corresponding to an integrated luminosity of 137 fb − 1 recorded by the CMS experiment at the LHC. The search is focused on heavy resonances, where the decay products of each top quark or W boson are expected to be reconstructed as a single, large-radius jet with a distinct substructure. The production of an excited bottom quark, b * , is used as a benchmark when setting limits on the cross section for a heavy resonance decaying to a top quark and a W boson. The hypotheses of b * quarks with left-handed, right-handed, and vector-like chiralities are excluded at 95% confidence level for masses below 2.6, 2.8, and 3.1 TeV, respectively. These are the most stringent limits on the b * quark mass to date, extending the previous best limits by almost a factor of two.
  5. A bstract A search for a heavy resonance decaying into a top quark and a W boson in proton-proton collisions at $$ \sqrt{s} $$ s = 13 TeV is presented. The data analyzed were recorded with the CMS detector at the LHC and correspond to an integrated luminosity of 138 fb − 1 . The top quark is reconstructed as a single jet and the W boson, from its decay into an electron or muon and the corresponding neutrino. A top quark tagging technique based on jet clustering with a variable distance parameter and simultaneous jet grooming is used to identify jets from the collimated top quark decay. The results are interpreted in the context of two benchmark models, where the heavy resonance is either an excited bottom quark b ∗ or a vector-like quark B. A statistical combination with an earlier search by the CMS Collaboration in the all-hadronic final state is performed to place upper cross section limits on these two models. The new analysis extends the lower range of resonance mass probed from 1.4 down to 0.7 TeV. For left-handed, right-handed, and vector-like couplings, b ∗ masses up to 3.0, 3.0, and 3.2 TeV are excluded atmore »95% confidence level, respectively. The observed upper limits represent the most stringent constraints on the b ∗ model to date.« less