NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

HIRAC: A Hierarchical Accelerator with Sorting-based Packing for SpGEMMs in DNN Applications

https://doi.org/10.1109/HPCA56546.2023.10070977

Shabani, Hesam; Singh, Abhishek; Youhana, Bishoy; Guo, Xiaochen (February 2023, 2023 IEEE International Symposium on High-Performance Computer Architecture)

Full Text Available
ASA: A ccelerating S parse A ccumulation in Column-wise SpGEMM

https://doi.org/10.1145/3543068

Zhang, Chao; Bremer, Maximilian; Chan, Cy; Shalf, John; Guo, Xiaochen (December 2022, ACM Transactions on Architecture and Code Optimization)

Sparse linear algebra is an important kernel in many different applications. Among various sparse general matrix-matrix multiplication (SpGEMM) algorithms, Gustavson’s column-wise SpGEMM has good locality when reading input matrix and can be easily parallelized by distributing the computation of different columns of an output matrix to different processors. However, the sparse accumulation (SPA) step in column-wise SpGEMM, which merges partial sums from each of the multiplications by the row indices, is still a performance bottleneck. The state-of-the-art software implementation uses a hash table for partial sum search in the SPA, which makes SPA the largest contributor to the execution time of SpGEMM. There are three reasons that cause the SPA to become the bottleneck: (1) hash probing requires data-dependent branches that are difficult for a branch predictor to predict correctly; (2) the accumulation of partial sum is dependent on the results of the hash probing, which makes it difficult to hide the hash probing latency; and (3) hash collision requires time-consuming linear search and optimizations to reduce these collisions require an accurate estimation of the number of non-zeros in each column of the output matrix. This work proposes ASA architecture to accelerate the SPA. ASA overcomes the challenges of SPA by (1) executing the partial sum search and accumulate with a single instruction through ISA extension to eliminate data-dependent branches in hash probing, (2) using a dedicated on-chip cache to perform the search and accumulation in a pipelined fashion, (3) relying on the parallel search capability of a set-associative cache to reduce search latency, and (4) delaying the merging of overflowed entries. As a result, ASA achieves an average of 2.25× and 5.05× speedup as compared to the state-of-the-art software implementation of a Markov clustering application and its SpGEMM kernel, respectively. As compared to a state-of-the-art hashing accelerator design, ASA achieves an average of 1.95× speedup in the SpGEMM kernel.
more » « less
Full Text Available
Optimizing Recurrent Spiking Neural Networks with Small Time Constants for Temporal Tasks

https://doi.org/10.1145/3546790.3546796

Zeng, Yuan; Jeffs, Edward; Stewart, Terrence; Berdichevsky, Yevgeny; Guo, Xiaochen (July 2022, Proceedings of the International Conference on Neuromorphic Systems 2022)

Full Text Available
Efficient and Accurate Computational Model of Neuron with Spike Frequency Adaptation

https://doi.org/10.1109/EMBC46164.2021.9629799

Ferdous, Zubayer Ibne; Yu, Anlan; Zeng, Yuan; Guo, Xiaochen; Yan, Zhiyuan; Berdichevsky, Yevgeny (November 2021, 43rd Annual International Conference of the IEEE Engineering in Medicine & Biology Society (EMBC))

Full Text Available
An In-Memory Analog Computing Co-Processor for Energy-Efficient CNN Inference on Mobile Devices

https://doi.org/10.1109/ISVLSI51109.2021.00043

Elbtity, Mohammed; Singh, Abhishek; Reidy, Brendan; Guo, Xiaochen; Zand, Ramtin (July 2021, 2021 IEEE Computer Society Annual Symposium on VLSI (ISVLSI))

Full Text Available
Temporal Learning with Biologically Fitted SNN Models

https://doi.org/10.1145/3477145.3477153

Zeng, Yuan; Stewart, Terrence C; Ibne Ferdous, Zubayer; Berdichevsky, Yevgeny; Guo, Xiaochen (July 2021, Proceedings of the International Conference on Neuromorphic Systems 2021)

Full Text Available
DOVA PRO: A Dynamic Overwriting Voltage Adjustment Technique for STT-MRAM L1 Cache Considering Dielectric Breakdown Effect

https://doi.org/10.1109/TVLSI.2021.3073415

Chen, Jinbo; Lu, Chengcheng; Ni, Jiacheng; Guo, Xiaochen; Girard, Patrick; Cheng, Yuanqing (July 2021, IEEE Transactions on Very Large Scale Integration (VLSI) Systems)

Full Text Available
SIP: Boosting Up Graph Computing by Separating the Irregular Property Data

https://doi.org/10.1145/3386263.3406905

Ni, Jiacheng; Guo, Xiaochen; Cheng, Yuanqing (September 2020, GLSVLSI '20: Proceedings of the 2020 on Great Lakes Symposium on VLSI)
null (Ed.)
Full Text Available
RnR: A Software-Assisted Record-and-Replay Hardware Prefetcher

Zhang, Chao; Zeng, Yuan; Shalf, John; Guo, Xiaochen (October 2020, Proceedings of the 53rd IEEE/ACM International Symposium on Microarchitecture (MICRO))

Full Text Available
ECC Cache: A Lightweight Error Detection for Phase-Change Memory Stuck-At Faults

Zhang, Chao; Abdelaal, Khaled; Chen, Angel; Zhao, Xinhui; Wen, Wujie; Guo, Xiaochen (November 2020, 2020 IEEE/ACM International Conference On Computer Aided Design (ICCAD))
null (Ed.)
Full Text Available

« Prev Next »

Search for: All records