NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

TFSRAM: A 249.8TOPS/W Timing-to-First-Spike Compute-in-Memory Neuromorphic Processing Engine With Twin-Column SRAM Synapses

https://doi.org/10.1109/TCASAI.2024.3452649

Li, Ziru; Zheng, Qilin; Ku, Jonathan; Taylor, Brady; Li, Hai (September 2024, IEEE Transactions on Circuits and Systems for Artificial Intelligence)

Full Text Available
MulPi: A Multi-Class and Patient-Independent Computing-in-SRAM Seizure Classifier

https://doi.org/10.1109/BioCAS61083.2024.10798153

Kim, Bokyung; Huang, Qijia; Taylor, Brady; Zheng, Qilin; Ku, Jonathan; Ramos, Nicky; Yeats, Eric; Chen, Yiran; Li, Hai ”Helen” (October 2024, IEEE)

Full Text Available
Improving the Efficiency of In-Memory-Computing Macro with a Hybrid Analog-Digital Computing Mode for Lossless Neural Network Inference

https://doi.org/10.1145/3649329.3658472

Zheng, Qilin; Li, Ziru; Ku, Jonathan; Wang, Yitu; Taylor, Brady; Fan, Deliang; Chen, Yiran (June 2024, ACM)

Full Text Available
NDSEARCH: Accelerating Graph-Traversal-Based Approximate Nearest Neighbor Search through Near Data Processing

https://doi.org/10.1109/ISCA59077.2024.00035

Wang, Yitu; Li, Shiyu; Zheng, Qilin; Song, Linghao; Li, Zongwang; Chang, Andrew; Li, Hai “Helen”; Chen, Yiran (June 2024, IEEE)

Approximate nearest neighbor search (ANNS) is a key retrieval technique for vector database and many data center applications, such as person re-identification and recommendation systems. It is also fundamental to retrieval augmented generation (RAG) for large language models (LLM) now. Among all the ANNS algorithms, graph-traversal-based ANNS achieves the highest recall rate. However, as the size of dataset increases, the graph may require hundreds of gigabytes of memory, exceeding the main memory capacity of a single workstation node. Although we can do partitioning and use solid-state drive (SSD) as the backing storage, the limited SSD I/O bandwidth severely degrades the performance of the system. To address this challenge, we present NDSEARCh, a hardware-software co-designed near-data processing (NDP) solution for ANNS processing. NDSeARCH consists of a novel in-storage computing architecture, namely, SEARSSD, that supports the ANNS kernels and leverages logic unit (LUN)-level parallelism inside the NAND flash chips. NDSEARCH also includes a processing model that is customized for NDP and cooperates with SearSSD. The processing model enables us to apply a two-level scheduling to improve the data locality and exploit the internal bandwidth in NDSearch, and a speculative searching mechanism to further accelerate the ANNS workload. Our results show that NDSEARCH improves the throughput by up to 31.7×,14.6×,7.4×, and 2.9× over CPU, GPU, a state-of-the-art SmartSSD-only design, and DeepStore, respectively. NDSEARCH also achieves two orders-of-magnitude higher energy efficiency than CPU and GPU.
more » « less
Full Text Available
ModSRAM: Algorithm-Hardware Co-Design for Large Number Modular Multiplication in SRAM

https://doi.org/10.1145/3649329.3656496

Ku, Jonathan Hao-Cheng; Zhang, Junyao; Shan, Haoxuan; Samudrala, Saichand; Wu, Jiawen; Zheng, Qilin; Li, Ziru; Rajendran, Jeyavijayan; Chen, Yiran (June 2024, ACM)

Full Text Available
A Survey: Collaborative Hardware and Software Design in the Era of Large Language Models

Guo, Cong; Cheng, Feng; Du, Zhixu; Kiessling, James; Ku, Jonathan; Li, Shiyu; Li, Zhixu; Ma, Mingyuan; Molom-Ochir, Tergel; Morris, Benjamin; et al (February 2025, IEEE circuits and systems magazine)

Free, publicly-accessible full text available February 6, 2026
Block-Wise Mixed-Precision Quantization: Enabling High Efficiency for Practical ReRAM-based DNN Accelerators

https://doi.org/10.1109/TCAD.2024.3409193

Wu, Xueying; Hanson, Edward; Wang, Nansu; Zheng, Qilin; Yang, Xiaoxuan; Yang, Huanrui; Li, Shiyu; Cheng, Feng; Pande, Partha Pratim; Doppa, Janardhan Rao; et al (June 2024, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems)

Full Text Available
An Efficient Memory System Design with Specialized Caching Mechanism for Recommendation Inference

https://doi.org/10.1145/3609384

Wang, Yitu; Li, Shiyu; Zheng, Qilin; Chang, Andrew; Li, Hai; Chen, Yiran (October 2023, ACM Transactions on Embedded Computing Systems)

Recommendation systems have been widely embedded into many Internet services. For example, Meta’s deep learning recommendation model (DLRM) shows high predictive accuracy of click-through rate in processing large-scale embedding tables. The SparseLengthSum (SLS) kernel of the DLRM dominates the inference time of the DLRM due to intensive irregular memory accesses to the embedding vectors. Some prior works directly adopt near-data processing (NDP) solutions to obtain higher memory bandwidth to accelerate SLS. However, their inferior memory hierarchy induces a low performance-cost ratio and fails to fully exploit the data locality. Although some software-managed cache policies were proposed to improve the cache hit rate, the incurred cache miss penalty is unacceptable considering the high overheads of executing the corresponding programs and the communication between the host and the accelerator. To address the issues aforementioned, we proposeEMS-i, an efficient memory system design that integrates Solid State Drive (SSD) into the memory hierarchy using Compute Express Link (CXL) for recommendation system inference. We specialize the caching mechanism according to the characteristics of various DLRM workloads and propose a novel prefetching mechanism to further improve the performance. In addition, we delicately design the inference kernel and develop a customized mapping scheme for SLS operation, considering the multi-level parallelism in SLS and the data locality within a batch of queries. Compared to the state-of-the-art NDP solutions,EMS-iachieves up to 10.9× speedup over RecSSD and the performance comparable to RecNMP with 72% energy savings.EMS-ialso saves up to 8.7× and 6.6 × memory cost w.r.t. RecSSD and RecNMP, respectively.
more » « less
Full Text Available
SpikeSen: Low-Latency In-Sensor-Intelligence Design With Neuromorphic Spiking Neurons

https://doi.org/10.1109/TCSII.2023.3235888

Li, Ziru; Zheng, Qilin; Chen, Yiran; Li, Hai (June 2023, IEEE Transactions on Circuits and Systems II: Express Briefs)

Full Text Available
Accelerating Sparse Attention with a Reconfigurable Non-volatile Processing-In-Memory Architecture

https://doi.org/10.1109/DAC56929.2023.10247908

Zheng, Qilin; Li, Shiyu; Wang, Yitu; Li, Ziru; Chen, Yiran; Li, Hai Helen (July 2023, IEEE)

Full Text Available

« Prev Next »

Search for: All records