skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Search for: All records

Creators/Authors contains: "Wang, Shan"

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Free, publicly-accessible full text available November 20, 2025
  2. Free, publicly-accessible full text available July 23, 2025
  3. With the prosperous development of Deep Neural Network (DNNs), numerous Process-In-Memory (PIM) designs have emerged to accelerate DNN models with exceptional throughput and energy-efficiency. PIM accelerators based on Non-Volatile Memory (NVM) or volatile memory offer distinct advantages for computational efficiency and performance. NVM based PIM accelerators, demonstrated success in DNN inference, face limitations in on-device learning due to high write energy, latency, and instability. Conversely, fast volatile memories, like SRAM, offer rapid read/write operations for DNN training, but suffer from significant leakage currents and large memory footprints. In this paper, for the first time, we present a fully-digital sparse processing in hybrid NVM-SRAM design, synergistically combines the strengths of NVM and SRAM, tailored for on-device continual learning. Our designed NVM and SRAM based PIM circuit macros could support both storage and processing of N:M structured sparsity pattern, significantly improving the storage and computing efficiency. Exhaustive experiments demonstrate that our hybrid system effectively reduces area and power consumption while maintaining high accuracy, offering a scalable and versatile solution for on-device continual learning. 
    more » « less
    Free, publicly-accessible full text available June 23, 2025
  4. Free, publicly-accessible full text available May 20, 2025
  5. Free, publicly-accessible full text available May 1, 2025
  6. While magnetoresistive random-access memory (MRAM) stands out as a leading candidate for embedded nonvolatile memory and last-level cache applications, its endurance is compromised by substantial self-heating due to the high programming current density. The effect of self-heating on the endurance of the magnetic tunnel junction (MTJ) has primarily been studied in spin-transfer torque (STT)-MRAM. Here, we analyze the transient temperature response of two-terminal spin–orbit torque (SOT)-MRAM with a 1 ns switching current pulse using electro-thermal simulations. We estimate a peak temperature range of 350–450 °C in 40 nm diameter MTJs, underscoring the critical need for thermal management to improve endurance. We suggest several thermal engineering strategies to reduce the peak temperature by up to 120 °C in such devices, which could improve their endurance by at least a factor of 1000× at 0.75 V operating voltage. These results suggest that two-terminal SOT-MRAM could significantly outperform conventional STT-MRAM in terms of endurance, substantially benefiting from thermal engineering. These insights are pivotal for thermal optimization strategies in the development of MRAM technologies. 
    more » « less
    Free, publicly-accessible full text available July 7, 2025
  7. Due to the separate memory and computation units in traditional Von-Neumann architecture, massive data transfer dominates the overall computing system’s power and latency, known as the ‘Memory-Wall’ issue. Especially with ever-increasing deep learning-based AI model size and computing complexity, it becomes the bottleneck for state-of-the-art AI computing systems. To address this challenge, In-Memory Computing (IMC) based Neural Network accelerators have been widely investigated to support AI computing within memory. However, most of those works focus only on inference. The on-device training and continual learning have not been well explored yet. In this work, for the first time, we introduce on-device continual learning with STT-assisted-SOT (SAS) Magnetic Random Access Memory (MRAM) based IMC system. On the hardware side, we have fabricated a SAS-MRAM device prototype with 4 Magnetic Tunnel Junctions (MTJ, each at 100nm × 50nm) sharing a common heavy metal layer, achieving significantly improved memory writing and area efficiency compared to traditional SOT-MRAM. Next, we designed fully digital IMC circuits with our SAS-MRAM to support both neural network inference and on-device learning. To enable efficient on-device continual learning for new task data, we present an 8-bit integer (INT8) based continual learning algorithm that utilizes our SAS-MRAM IMC-supported bit-serial digital in-memory convolution operations to train a small parallel reprogramming Network (Rep-Net) while freezing the major backbone model. Extensive studies have been presented based on our fabricated SAS-MRAM device prototype, cross-layer device-circuit benchmarking and simulation, as well as the on-device continual learning system evaluation. 
    more » « less