NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

A 4.2-to-0.5-V, 0.8-μA–0.8-mA, Power-Efficient Three-Level SIMO Buck Converter for a Quad-Voltage RISC-V Microprocessor

https://doi.org/10.1109/TVLSI.2024.3477632

Kim, Dongkwun; Wang, Zhaoqing; Huang, Paul Xuanyuanliang; Chundi, Pavan Kumar; Kim, Suhwan; Blanco, Andrés A; Krishnamurthy, Ram K; Seok, Mingoo (January 2025, IEEE Transactions on Very Large Scale Integration (VLSI) Systems)

Full Text Available
iMCU: A 28-nm Digital In-Memory Computing-Based Microcontroller Unit for TinyML

https://doi.org/10.1109/JSSC.2024.3362274

Lin, Chuan-Tung; Huang, Paul Xuanyuanliang; Oh, Jonghyun; Wang, Dewei; Seok, Mingoo (January 2024, IEEE Journal of Solid-State Circuits)

Full Text Available
D6CIM: 60.4-TOPS/W, 1.46-TOPS/mm ² , 1005-Kb/mm ² Digital 6T-SRAM-Based Compute-in-Memory Macro Supporting 1-to-8b Fixed-Point Arithmetic in 28-nm CMOS

https://doi.org/10.1109/ESSCIRC59616.2023.10268725

Oh, Jonghyun; Lin, Chuan-Tung; Seok, Mingoo (September 2023, IEEE)

Full Text Available
A Metastability Risk Prediction and Mitigation Technique for Clock-Domain Crossing With Single-Stage Synchronizer in Near-Threshold-Voltage Multivoltage/ Frequency-Domain Network-on-Chip

https://doi.org/10.1109/JSSC.2023.3283961

Lin, Chuxiong; He, Weifeng; Sun, Yanan; Shao, Lin; Zhang, Bo; Yang, Jun; Seok, Mingoo (June 2023, IEEE Journal of Solid-State Circuits)

Full Text Available
iMCU: A 102-μJ, 61-ms Digital In-Memory Computingbased Microcontroller Unit for Edge TinyML

https://doi.org/10.1109/CICC57935.2023.10121221

Lin, Chuan-Tung; Huang, Paul Xuanyuanliang; Oh, Jonghyun; Wang, Dewei; Seok, Mingoo (April 2023, 2023 IEEE Custom Integrated Circuits Conference (CICC))

Full Text Available
DIMCA: An Area-Efficient Digital In-Memory Computing Macro Featuring Approximate Arithmetic Hardware in 28 nm

https://doi.org/10.1109/JSSC.2023.3313519

Lin, Chuan-Tung; Wang, Dewei; Zhang, Bo; Chen, Gregory K; Knag, Phil C; Krishnamurthy, Ram Kumar; Seok, Mingoo (January 2023, IEEE Journal of Solid-State Circuits)

Full Text Available
TICA: A 0.3V, Variation-Resilient 64-Stage Deeply-Pipelined Bitcoin Mining Core with Timing Slack Inference and Clock Frequency Adaption

https://doi.org/10.1109/CICC53496.2022.9772803

Li, Jieyu; He, Weifeng; Zhang, Bo; He, Guanghui; Yang, Jun; Seok, Mingoo (April 2022, 2022 IEEE Custom Integrated Circuits Conference (CICC))

Energy-efficient bitcoin mining cores have gained significant attention since the energy cost for computing dominates the mining expenses [1]. Ultra-low-voltage (ULV) digital circuits have emerged as an attractive approach to improve the energy-efficiency. However, they demand a large timing margin for the worst-case process, voltage, and temperature (PVT) variations, undermining a significant portion of energy savings. Recent works, including multi-phase latch pipeline [1], tunable replica circuits [2]–[3], in-situ error detection and correction (EDAC) [4]–[6], and dynamic timing enhancement [7], can reduce the pessimistic margin. However, it is not straightforward to adopt those techniques in mining cores due to their deeply-pipelined architecture (up to 128 stages [1]). For example, to adopt EDAC, the deep pipeline requires inserting many bulky error detectors as it has many critical paths. Our experiment with a 0.3V 28-nm mining core shows >18.9% registers need to be replaced with error detectors, considering 6σ local process variation only. Also, multiple stages can have timing errors simultaneously, making an error correction process (e.g., clock gating [5], VDD boosting [6]) complex and costly.
more » « less
Full Text Available
MPAM: Reliable, Low-Latency, Near-Threshold-Voltage Multi-Voltage/Frequency-Domain Network-on-Chip with Metastability Risk Prediction and Mitigation

https://doi.org/10.1109/CICC53496.2022.9772849

Lin, Chuxiong; He, Weifeng; Sun, Yannan; Shao, Lin; Zhang, Bo; Yang, Jun; Seok, Mingoo (April 2022, 2022 IEEE Custom Integrated Circuits Conference (CICC))

Emerging applications like a drone and an autonomous vehicle require system-on-a-chips (SoCs) with high reliability, e.g., the mean-time-between-failure (MTBF) needs to be over tens of thousands of hours [1]. Meanwhile, as these applications require increasingly higher performance and energy efficiency, a multi-core architecture is often desirable. Here, each core operates in an independent voltage/frequency (V/F) domain, ideally from the near-threshold voltage (NTV) to super-threshold, while communicating with one another via a network-on-chip (NoC) [2]. However, this makes it challenging to ensure robustness in clock domain crossing against metastability. Metastability becomes even more critical to NTV circuits since metastability resolution time constant T grows super-linearly with voltage scaling [3]. Conventionally, an NoC uses multi-stage (4 stages in [4]) synchronizers to improve MTBF, but they increase latency and cannot completely eliminate metastability. Recently, [5] proposed a novel NTV flip-flop, which has a lower probability of having metastability. Another recent work [6] proposed to detect the necessary condition of metastability and mitigate it by modulating the RX clock and also requesting retransmission to guarantee data correctness. However, as it detects a necessary condition, not actual metastability, it tends to overly request retransmission, hurting latency, throughput, and energy efficiency.
more » « less
Full Text Available
DIMC: 2219TOPS/W 2569F2/b Digital In-Memory Computing Macro in 28nm Based on Approximate Arithmetic Hardware

https://doi.org/10.1109/ISSCC42614.2022.9731659

Wang, Dewei; Lin, Chuan-Tung; Chen, Gregory K.; Knag, Phil; Krishnamurthy, Ram K.; Seok, Mingoo (February 2022, 2022 IEEE International Solid- State Circuits Conference (ISSCC))

In-memory-computing (IMC) SRAM architecture has gained significant attention as it achieves high energy efficiency for computing a convolutional neural network (CNN) model [1]. Recent works investigated the use of analog-mixed-signal (AMS) hardware for high area and energy efficiency [2], [3]. However, AMS hardware output is well known to be susceptible to process, voltage, and temperature (PVT) variations, limiting the computing precision and ultimately the inference accuracy of a CNN. We reconfirmed, through the simulation of a capacitor-based IMC SRAM macro that computes a 256D binary dot product, that the AMS computing hardware has a significant root-mean-square error (RMSE) of 22.5% across the worst-case voltage, temperature (Fig. 16.1.1 top left) and 3-sigma process variations (Fig. 16.1.1 top right). On the other hand, we can implement an IMC SRAM macro using robust digital logic [4], which can virtually eliminate the variability issue (Fig. 16.1.1 top). However, digital circuits require more devices than AMS counterparts (e.g., 28 transistors for a mirror full adder [FA]). As a result, a recent digital IMC SRAM shows a lower area efficiency of 6368F2/b (22nm, 4b/4b weight/activation) [5] than the AMS counterpart (1170F2/b, 65nm, 1b/1b) [3]. In light of this, we aim to adopt approximate arithmetic hardware to improve area and power efficiency and present two digital IMC macros (DIMC) with different levels of approximation (Fig. 16.1.1 bottom left). Also, we propose an approximation-aware training algorithm and a number format to minimize inference accuracy degradation induced by approximate hardware (Fig. 16.1.1 bottom right). We prototyped a 28nm test chip: for a 1b/1b CNN model for CIFAR-10 and across 0.5-to-1.1V supply, the DIMC with double-approximate hardware (DIMC-D) achieves 2569F2/b, 932-2219TOPS/W, 475-20032GOPS, and 86.96% accuracy, while for a 4b/1b CNN model, the DIMC with the single-approximate hardware (DIMC-S) achieves 3814F2/b, 458-990TOPS/W
more » « less
Full Text Available
CDAR-DRAM: An In-situ Charge Detection and Adaptive Data Restoration DRAM Architecture for Performance and Energy Efficiency Improvement

https://doi.org/10.1109/DAC18074.2021.9586146

Lin, Chuxiong; He, Weifeng; Sun, Yanan; Mao, Zhigang; Seok, Mingoo (December 2021, 2021 58th ACM/IEEE Design Automation Conference (DAC))

As the capacity of DRAM continues to grow, the refresh operation rapidly becomes the performance and power-efficiency bottleneck. Also, restore time, the time given for recharging cells post access, makes an increasingly large amount of negative impact on performance. To tackle these problems, in this paper, we propose an in-situ charge detection and adaptive data restoration DRAM (CDAR-DRAM) architecture, which can dynamically adjust the refresh rate and also relax the constraints on restore time. The proposed CDAR-DRAM employs a low-cost skewed-inverter-based detector, which can reduce the excessive timing margins that prior work added to guarantee the functionality of leaky DRAM cells under the worst-case temperature condition. Moreover, an adaptive DRAM refresh and restore scheme is proposed, which can switch automatically between two modes: (i) a refresh mode that supports adaptive refresh rate, and (ii) a restore mode that relaxes the constraints on restore time dynamically for cells having sufficient charge. With the transistor-and architecture-level simulations, we evaluate the CDAR-DRAM in an 8-core system across different workloads. Compared with the prior art, the proposed architecture achieves a 9.4% improvement in system performance and a 14.3% reduction in energy consumption, without requiring the time-consuming profiling process which many prior works employed.
more » « less
Full Text Available

« Prev Next »

Search for: All records