Search for: All records

Creators/Authors contains: "Cao, Yu"

« Prev Next »

Total Resources

38

Resource Type
Conference Paper

15

Conference Proceeding

0

Dataset

0

Journal Article

23

Workshop Report

0

Availability
Full Text / Resource Available

37

Citation Only

1

Save Results
Excel (limit 2000)
CSV (limit 5000)
XML (limit 5000)

Have feedback or suggestions for a way to improve these results?
!

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

A 65nm RRAM Compute-in-Memory Macro for Genome Sequencing Alignment

Zhang, Fan ; He, Wangxin ; Yeo, Injune ; Lieh, Maximilian ; Cady, Nathaniel ; Cao, Yu ; Seo, Jae-sun ; Fan, Deliang ( September 2023 , Proceedings of ESSCIRC)

In genomic analysis, the major computation bottle- neck is the memory- and compute-intensive DNA short reads alignment due to memory-wall challenge. This work presents the first Resistive RAM (RRAM) based Compute-in-Memory (CIM) macro design for accelerating state-of-the-art BWT based genome sequencing alignment. Our design could support all the core instructions, i.e., XNOR based match, count, and addition, required by alignment algorithm. The proposed CIM macro implemented in integration of HfO2 RRAM and 65nm CMOS demonstrates the best energy efficiency to date with 2.07 TOPS/W and 2.12G suffixes/J at 1.0V.
more » « less
Free, publicly-accessible full text available September 1, 2024
Learning Optimal Flows for Non-Equilibrium Importance Sampling

Cao, Yu ; Vanden-Eijnden, Eric ( December 2022 , Advances in Neural Information Processing Systems 35 (NeurIPS 2022))

Full Text Available
Continuous data assimilation for the 3D Ladyzhenskaya model: analysis and computations

https://doi.org/10.1016/j.nonrwa.2022.103659

Cao, Yu ; Giorgini, Andrea ; Jolly, Michael ; Pakzad, Ali ( December 2022 , Nonlinear Analysis: Real World Applications)

Full Text Available
XMA2: A crossbar-aware multi-task adaption framework via 2-tier masks

https://doi.org/10.3389/felec.2022.1032485

Zhang, Fan ; Yang, Li ; Meng, Jian ; Seo, Jae-sun ; Cao, Yu ; Fan, Deliang ( December 2022 , Frontiers in Electronics)

Recently, ReRAM crossbar-based deep neural network (DNN) accelerator has been widely investigated. However, most prior works focus on single-task inference due to the high energy consumption of weight reprogramming and ReRAM cells’ low endurance issue. Adapting the ReRAM crossbar-based DNN accelerator for multiple tasks has not been fully explored. In this study, we propose XMA 2 , a novel crossbar-aware learning method with a 2-tier masking technique to efficiently adapt a DNN backbone model deployed in the ReRAM crossbar for new task learning. During the XMA 2 -based multi-task adaption (MTA), the tier-1 ReRAM crossbar-based processing-element- (PE-) wise mask is first learned to identify the most critical PEs to be reprogrammed for essential new features of the new task. Subsequently, the tier-2 crossbar column-wise mask is applied within the rest of the weight-frozen PEs to learn a hardware-friendly and column-wise scaling factor for new task learning without modifying the weight values. With such crossbar-aware design innovations, we could implement the required masking operation in an existing crossbar-based convolution engine with minimal hardware/memory overhead to adapt to a new task. The extensive experimental results show that compared with other state-of-the-art multiple-task adaption methods, XMA 2 achieves the highest accuracy on all popular multi-task learning datasets.
more » « less
Full Text Available
DeePKS + ABACUS as a Bridge between Expensive Quantum Mechanical Models and Machine Learning Potentials

https://doi.org/10.1021/acs.jpca.2c05000

Li, Wenfei ; Ou, Qi ; Chen, Yixiao ; Cao, Yu ; Liu, Renxi ; Zhang, Chunyi ; Zheng, Daye ; Cai, Chun ; Wu, Xifan ; Wang, Han ; et al ( December 2022 , The Journal of Physical Chemistry A)

Full Text Available
XMA: A Crossbar-aware Multi-task Adaption Framework via Shift-based Mask Learning Method

https://doi.org/10.1145/3489517.3530458

Zhang, Fan ; Yang, Li ; Meng, Jian ; Seo, Jae-sun ; Cao, Yu ; Fan, Deliang ( July 2022 , Design Automation Conference (DAC))

ReRAM crossbar array as a high-parallel fast and energy-efficient structure attracts much attention, especially on the acceleration of Deep Neural Network (DNN) inference on one specific task. However, due to the high energy consumption of weight re-programming and the ReRAM cells’ low endurance problem, adapting the crossbar array for multiple tasks has not been well explored. In this paper, we propose XMA, a novel crossbar-aware shift-based mask learning method for multiple task adaption in the ReRAM crossbar DNN accelerator for the first time. XMA leverages the popular mask-based learning algorithm’s benefit to mitigate catastrophic forgetting and learn a task-specific, crossbar column-wise, and shift-based multi-level mask, rather than the most commonly used elementwise binary mask, for each new task based on a frozen backbone model. With our crossbar-aware design innovation, the required masking operation to adapt for a new task could be implemented in an existing crossbar-based convolution engine with minimal hardware/memory overhead and, more importantly, no need for power-hungry cell re-programming, unlike prior works. The extensive experimental results show that, compared with state-of-the art multiple task adaption Piggyback method [1], XMA achieves 3.19% higher accuracy on average, while saving 96.6% memory overhead. Moreover, by eliminating cell re-programming, XMA achieves ∼4.3× higher energy efficiency than Piggyback.
more » « less
Full Text Available
A determining form for the 2D Rayleigh-Benard problem

Cao, Yu ; Jolly, Michael S. ; Titi, Edriss S. ( April 2022 , Pure and applied functional analysis)

We construct a determining form for the 2D Rayleigh-Benard (RB) system in a strip with solid horizontal boundaries, in the cases of no-slip and stress-free boundary conditions. The determining form is an ODE in a Banach space of trajectories whose steady states comprise the long-time dynamics of the RB system. In fact, solutions on the global attractor of the RB system can be further identified through the zeros of a scalar equation to which the ODE reduces for each initial trajectory. The twist in this work is that the trajectories are for the velocity field only, which in turn determines the corresponding trajectories of the temperature.
more » « less
Full Text Available
Hybrid RRAM/SRAM In-Memory Computing for Robust DNN Acceleration

https://doi.org/10.1109/TCAD.2022.3197516

Krishnan, Gokul ; Wang, Zhenyu ; Yeo, Injune ; Yang, Li ; Meng, Jian ; Liehr, Maximilian ; Joshi, Rajiv V. ; Cady, Nathaniel C. ; Fan, Deliang ; Seo, Jae-sun ; et al ( August 2022 , IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems)

RRAM-based in-memory computing (IMC) effectively accelerates deep neural networks (DNNs) and other machine learning algorithms. On the other hand, in the presence of RRAM device variations and lower precision, the mapping of DNNs to RRAM-based IMC suffers from severe accuracy loss. In this work, we propose a novel hybrid IMC architecture that integrates an RRAM-based IMC macro with a digital SRAM macro using a programmable shifter to compensate for the RRAM variations and recover the accuracy. The digital SRAM macro consists of a small SRAM memory array and an array of multiply-and-accumulate (MAC) units. The non-ideal output from the RRAM macro, due to device and circuit non-idealities, is compensated by adding the precise output from the SRAM macro. In addition, the programmable shifter allows for different scales of compensation by shifting the SRAM macro output relative to the RRAM macro output. On the algorithm side, we develop a framework for the training of DNNs to support the hybrid IMC architecture through ensemble learning. The proposed framework performs quantization (weights and activations), pruning, RRAM IMC-aware training, and employs ensemble learning through different compensation scales by utilizing the programmable shifter. Finally, we design a silicon prototype of the proposed hybrid IMC architecture in the 65nm SUNY process to demonstrate its efficacy. Experimental evaluation of the hybrid IMC architecture shows that the SRAM compensation allows for a realistic IMC architecture with multi-level RRAM cells (MLC) even though they suffer from high variations. The hybrid IMC architecture achieves up to 21.9%, 12.65%, and 6.52% improvement in post-mapping accuracy over state-of-the-art techniques, at minimal overhead, for ResNet-20 on CIFAR-10, VGG-16 on CIFAR-10, and ResNet-18 on ImageNet, respectively.
more » « less
Full Text Available
XST: A Crossbar Column-wise Sparse Training for Efficient Continual Learning

https://doi.org/10.23919/DATE54114.2022.9774660

Zhang, Fan ; Yang, Li ; Meng, Jian ; Seo, Jae-Sun ; Cao, Yu ; Fan, Deliang ( March 2022 , 2022 Design, Automation & Test in Europe Conference & Exhibition (DATE))

Leveraging the ReRAM crossbar-based In-Memory-Computing (IMC) to accelerate single task DNN inference has been widely studied. However, using the ReRAM crossbar for continual learning has not been explored yet. In this work, we propose XST, a novel crossbar column-wise sparse training framework for continual learning. XST significantly reduces the training cost and saves inference energy. More importantly, it is friendly to existing crossbar-based convolution engine with almost no hardware overhead. Compared with the state-of-the-art CPG method, the experiments show that XST's accuracy achieves 4.95 % higher accuracy. Furthermore, XST demonstrates ~5.59 × training speedup and 1.5 × inference energy-saving.
more » « less
Full Text Available
XBM: A Crossbar Column-wise Binary Mask Learning Method for Efficient Multiple Task Adaption

https://doi.org/10.1109/ASP-DAC52403.2022.9712508

Zhang, Fan ; Yang, Li ; Meng, Jian ; Cao, Yu Kevin ; Seo, Jae-sun ; Fan, Deliang ( January 2022 , 2022 27th Asia and South Pacific Design Automation Conference (ASP-DAC))

Recently, utilizing ReRAM crossbar array to accelerate DNN inference on single task has been widely studied. However, using the crossbar array for multiple task adaption has not been well explored. In this paper, for the first time, we propose XBM, a novel crossbar column-wise binary mask learning method for multiple task adaption in ReRAM crossbar DNN accelerator. XBM leverages the mask-based learning algorithm's benefit to avoid catastrophic forgetting to learn a task-specific mask for each new task. With our hardware-aware design innovation, the required masking operation to adapt for a new task could be easily implemented in existing crossbar based convolution engine with minimal hardware/ memory overhead and, more importantly, no need of power hungry cell re-programming, unlike prior works. The extensive experimental results show that compared with state-of-the-art multiple task adaption methods, XBM keeps the similar accuracy on new tasks while only requires 1.4% mask memory size compared with popular piggyback. Moreover, the elimination of cell re-programming or tuning saves up to 40% energy during new task adaption.
more » « less
Full Text Available

« Prev Next »