NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

MonoSparse-CAM: Efficient Tree Model Processing via Monotonicity and Sparsity in CAMs

https://doi.org/10.1109/ISCAS56072.2025.11043997

Molom-Ochir, Tergel; Taylor, Brady; Li, Hai Helen; Chen, Yiran (May 2025, IEEE)

Free, publicly-accessible full text available May 25, 2026
Emerging Computing Mechanisms for Edge AI

https://doi.org/10.1109/MNANO.2025.3533804

Kim, Bokyung; Li, Hai Helen; Chen, Yiran (April 2025, IEEE Nanotechnology Magazine)

Free, publicly-accessible full text available April 1, 2026
qGDP: Quantum Legalization and Detailed Placement for Superconducting Quantum Computers

https://doi.org/10.23919/DATE64628.2025.10993236

Zhang, Junyao; Zhou, Guanglei; Cheng, Feng; Ku, Jonathan; Ding, Qi; Gu, Jiaqi; Wang, Hanrui; Li, Hai Helen; Chen, Yiran (March 2025, IEEE)

Free, publicly-accessible full text available March 31, 2026
Prosperity: Accelerating Spiking Neural Networks via Product Sparsity

https://doi.org/10.1109/HPCA61900.2025.00066

Wei, Chiyue; Guo, Cong; Cheng, Feng; Li, Shiyu; Yang, Hao Frank; Li, Hai Helen; Chen, Yiran (March 2025, IEEE)

Free, publicly-accessible full text available March 1, 2026
EDALearn: A Comprehensive RTL-to-Signoff EDA Benchmark for Democratized and Reproducible ML for EDA Research

https://doi.org/10.1145/3676536.3697116

Pan, Jingyu; Chang, Chen-Chia; Xie, Zhiyao; Chen, Yiran; Li, Hai Helen (October 2024, ACM)

Full Text Available
Processing-in-Memory Designs Based on Emerging Technology for Efficient Machine Learning Acceleration

Kim, Bokyung; Li, Hai Helen; Chen, Yiran (June 2024, ACM)

The unprecedented success of artificial intelligence (AI) enriches machine learning (ML)-based applications. The availability of big data and compute-intensive algorithms empowers versatility and high accuracy in ML approaches. However, the data processing and innumerable computations burden conventional hardware systems with high power consumption and low performance. Breaking away from the traditional hardware design, non-conventional accelerators exploiting emerging technology have gained significant attention with a leap forward since the emerging devices enable processing-in-memory (PIM) designs of dramatic improvement in efficiency. This paper presents a summary of state-of-the-art PIM accelerators over a decade. The PIM accelerators have been implemented for diverse models and advanced algorithm techniques across diverse neural networks in language processing and image recognition to expedite inference and training. We will provide the implemented designs, methodologies, and results, following the development in the past years. The promising direction of the PIM accelerators, vertically stacking for More than Moore, is also discussed.
more » « less
Full Text Available
MulPi: A Multi-Class and Patient-Independent Computing-in-SRAM Seizure Classifier

https://doi.org/10.1109/BioCAS61083.2024.10798153

Kim, Bokyung; Huang, Qijia; Taylor, Brady; Zheng, Qilin; Ku, Jonathan; Ramos, Nicky; Yeats, Eric; Chen, Yiran; Li, Hai ”Helen” (October 2024, IEEE)

Full Text Available
NDSEARCH: Accelerating Graph-Traversal-Based Approximate Nearest Neighbor Search through Near Data Processing

https://doi.org/10.1109/ISCA59077.2024.00035

Wang, Yitu; Li, Shiyu; Zheng, Qilin; Song, Linghao; Li, Zongwang; Chang, Andrew; Li, Hai “Helen”; Chen, Yiran (June 2024, IEEE)

Approximate nearest neighbor search (ANNS) is a key retrieval technique for vector database and many data center applications, such as person re-identification and recommendation systems. It is also fundamental to retrieval augmented generation (RAG) for large language models (LLM) now. Among all the ANNS algorithms, graph-traversal-based ANNS achieves the highest recall rate. However, as the size of dataset increases, the graph may require hundreds of gigabytes of memory, exceeding the main memory capacity of a single workstation node. Although we can do partitioning and use solid-state drive (SSD) as the backing storage, the limited SSD I/O bandwidth severely degrades the performance of the system. To address this challenge, we present NDSEARCh, a hardware-software co-designed near-data processing (NDP) solution for ANNS processing. NDSeARCH consists of a novel in-storage computing architecture, namely, SEARSSD, that supports the ANNS kernels and leverages logic unit (LUN)-level parallelism inside the NAND flash chips. NDSEARCH also includes a processing model that is customized for NDP and cooperates with SearSSD. The processing model enables us to apply a two-level scheduling to improve the data locality and exploit the internal bandwidth in NDSearch, and a speculative searching mechanism to further accelerate the ANNS workload. Our results show that NDSEARCH improves the throughput by up to 31.7×,14.6×,7.4×, and 2.9× over CPU, GPU, a state-of-the-art SmartSSD-only design, and DeepStore, respectively. NDSEARCH also achieves two orders-of-magnitude higher energy efficiency than CPU and GPU.
more » « less
Full Text Available
Accelerating Sparse Attention with a Reconfigurable Non-volatile Processing-In-Memory Architecture

https://doi.org/10.1109/DAC56929.2023.10247908

Zheng, Qilin; Li, Shiyu; Wang, Yitu; Li, Ziru; Chen, Yiran; Li, Hai Helen (July 2023, IEEE)

Full Text Available
DefT: Boosting Scalability of Deformable Convolution Operations on GPUs

https://doi.org/10.1145/3582016.3582017

Hanson, Edward; Horton, Mark; Li, Hai; Chen, Yiran (March 2023, The 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 3 (ASPLOS ’23),)

Deformable Convolutional Networks (DCN) have been proposed as a powerful tool to boost the representation power of Convolutional Neural Networks (CNN) in computer vision tasks via adaptive sampling of the input feature map. Much like vision transformers, DCNs utilize a more flexible inductive bias than standard CNNs and have also been shown to improve performance of particular models. For example, drop-in DCN layers were shown to increase the AP score of Mask RCNN by 10.6 points while introducing only 1% additional parameters and FLOPs, improving the state-of-the art model at the time of publication. However, despite evidence that more DCN layers placed earlier in the network can further improve performance, we have not seen this trend continue with further scaling of deformations in CNNs, unlike for vision transformers. Benchmarking experiments show that a realistically sized DCN layer (64H×64W, 64 in-out channel) incurs a 4× slowdown on a GPU platform, discouraging the more ubiquitous use of deformations in CNNs. These slowdowns are caused by the irregular input-dependent access patterns of the bilinear interpolation operator, which has a disproportionately low arithmetic intensity (AI) compared to the rest of the DCN. To address the disproportionate slowdown of DCNs and enable their expanded use in CNNs, we propose DefT, a series of workload-aware optimizations for DCN kernels. DefT identifies performance bottlenecks in DCNs and fuses specific operators that are observed to limit DCN AI. Our approach also uses statistical information of DCN workloads to adapt the workload tiling to the DCN layer dimensions, minimizing costly out-of-boundary input accesses. Experimental results show that DefT mitigates up to half of DCN slowdown over the current-art PyTorch implementation. This translates to a layerwise speedup of up to 134% and a reduction of normalized training time of 46% on a fully DCN-enabled ResNet model.
more » « less
Full Text Available

« Prev Next »

Search for: All records