NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

AIRES: Accelerating Out-of-Core GCNs via Algorithm-System Co-Design

https://doi.org/10.1109/ASAP65064.2025.00011

Jayakody, Shakya; Zhao, Youpeng; Wang, Jun (July 2025, Proceedings)

Graph convolutional networks (GCNs) are fundamental in various scientific applications, ranging from biomedical protein-protein interactions (PPI) to large-scale recommendation systems. An essential component for modeling graph structures in GCNs is sparse general matrix-matrix multiplication (SpGEMM). As the size of graph data continues to scale up, SpGEMMs are often conducted in an out-of-core fashion due to limited GPU memory space in resource-constrained systems. Albeit recent efforts that aim to alleviate the memory constraints of out-of-core SpGEMM through either GPU feature caching, hybrid CPU-GPU memory layout, or performing the computation in sparse format, current systems suffer from both high I/O latency and GPU under-utilization issues. In this paper, we first identify the problems of existing systems, where sparse format data alignment and memory allocation are the main performance bottlenecks, and propose AIRES, a novel algorithm-system co-design solution to accelerate out-of-core SpGEMM computation for GCNs. Specifically, from the algorithm angle, AIRES proposes to alleviate the data alignment issues on the block level for matrices in sparse formats and develops a tiling algorithm to facilitate row block-wise alignment. On the system level, AIRES employs a three-phase dynamic scheduling that features a dual-way data transfer strategy utilizing a tiered memory system: integrating GPU memory, GPU Direct Storage (GDS), and host memory to reduce I/O latency and improve throughput. Evaluations show that AIRES significantly outperforms the state-of-the-art methods, achieving up to 1.8× lower latency in real-world graph processing benchmarks.
more » « less
Free, publicly-accessible full text available July 28, 2026
ALISA: Accelerating Large Language Model Inference via Sparsity-Aware KV Caching

https://doi.org/10.1109/ISCA59077.2024.00077

Zhao, Youpeng; Wu, Di; Wang, Jun (June 2024, IEEE)

Full Text Available
EMBARK: Memory Bounded Architectural Improvement in CSR-CSC Sparse Matrix Multiplication

https://doi.org/10.1109/CIC58953.2023.00012

Jayakody, Shakya; Wang, Jun (November 2023, IEEE)

Full Text Available
AA-forecast: anomaly-aware forecast for extreme events

https://doi.org/10.1007/s10618-023-00919-7

Farhangi, Ashkan; Bian, Jiang; Huang, Arthur; Xiong, Haoyi; Wang, Jun; Guo, Zhishan (May 2023, Data Mining and Knowledge Discovery)

Full Text Available
Integrating Cybersecurity Into a Big Data Ecosystem

https://doi.org/10.1109/MILCOM52596.2021.9652997

Tall, Anne M.; Zou, Cliff C.; Wang, Jun (November 2021, MILCOM 2021 - 2021 IEEE Military Communications Conference (MILCOM))

Full Text Available
Disperse Access Considered Energy Inefficiency in Intel Optane DC Persistent Memory Servers

https://doi.org/10.1109/ICDCS47774.2020.00107

Li, Daping; Wan, Jiguang; Wang, Jun; Zhou, Jian; Lu, Kai; Xu, Peng; Wu, Fei; Xie, Changsheng (November 2020, 2020 IEEE 40th International Conference on Distributed Computing Systems (ICDCS))
null (Ed.)
The Intel Optane DC Persistent Memory Module (AEP), which is the first commercial available Non-Volatile Memory (NVM) product, offers comparable performance with DRAM while providing larger capacities and data persistence. Existing researches that substitute NVM with DRAM or hybridize them are either emulator-based or focused on how to improve the energy efficiency for writes. Unfortunately, the energy efficiency of the real AEP system is less explored. Based on real AEP, we observe that even though eliminating the DRAM-like refresh energy consumptions, AEP consumes significant different energy at different performance levels. Specifically, requests with time intervals (dispersed) underperform in both performance and energy efficiency when compared with the case of requests without time intervals (compact). This disparity and parallelism exploitation potentials motivate us to propose Sprint-AEP, an energy-efficiency-oriented scheduling method for AEP-equipped servers. Sprint-AEP fully activates adequate AEPs to serve most of the requests by deferring the write requests and prefetching the hottest data. The remaining AEPs will stay in idle mode with a low idle power to save energy. Besides, we also utilize the read parallelism to accelerate the sync and prefetching processes. Compared with energy-unaware AEP usages, our experimental results show that Sprint-AEP saves up to 26% energy with little performance degradation.
more » « less
Full Text Available
Lelantus: Fine-Granularity Copy-On-Write Operations for Secure Non-Volatile Memories

https://doi.org/10.1109/ISCA45697.2020.00056

Zhou, Jian; Awad, Amro; Wang, Jun (May 2020, International Symposium on Computer Architecture (ISCA))

Full Text Available

Search for: All records