NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Processing-in-Memory Technology for Machine Learning: From Basic to ASIC

https://doi.org/10.1109/TCSII.2022.3168404

Taylor, Brady; Zheng, Qilin; Li, Ziru; Li, Shiyu; Chen, Yiran (June 2022, IEEE Transactions on Circuits and Systems II: Express Briefs)

Full Text Available
FL-WBC: Enhancing Robustness against Model Poisoning Attacks in Federated Learning from a Client Perspective

Sun, Jingwei; Li, Ang; DiValentin, Louis; Hassanzadeh, Amin; Chen, Yiran; Li, Hai (December 2021, Annual Conference on Neural Information Processing Systems (NeurIPS))

Full Text Available
Exploring Applications of STT-RAM in GPU Architectures

https://doi.org/10.1109/TCSI.2020.3031895

Liu, Xiaoxiao; Mao, Mengjie; Bi, Xiuyuan; Li, Hai; Chen, Yiran (January 2021, IEEE Transactions on Circuits and Systems I: Regular Papers)
null (Ed.)
Full Text Available
ReTransformer: ReRAM-based processing-in-memory architecture for transformer acceleration

https://doi.org/10.1145/3400302.3415640

Yang, Xiaoxuan; Yan, Bonan; Li, Hai; Chen, Yiran (November 2020, IEEE/ACM International Conference on Computer-Aided Design (ICCAD),)
null (Ed.)
Full Text Available
PENNI: Pruned Kernel Sharing for Efficient CNN Inference

LI, Shiyu; Hanson, Edward; Li, Hai Li; Chen, Yiran (July 2020, International Conference on Machine Learning)

Although state-of-the-art (SOTA) CNNs achieve outstanding performance on various tasks, their high computation demand and massive number of parameters make it difficult to deploy these SOTA CNNs onto resource-constrained devices. Previous works on CNN acceleration utilize low-rank approximation of the original convolution layers to reduce computation cost. However, these methods are very difficult to conduct upon sparse models, which limits execution speedup since redundancies within the CNN model are not fully exploited. We argue that kernel granularity decomposition can be conducted with low-rank assumption while exploiting the redundancy within the remaining compact coefficients. Based on this observation, we propose PENNI, a CNN model compression framework that is able to achieve model compactness and hardware efficiency simultaneously by (1) implementing kernel sharing in convolution layers via a small number of basis kernels and (2) alternately adjusting bases and coefficients with sparse constraints. Experiments show that we can prune 97% parameters and 92% FLOPs on ResNet18 CIFAR10 with no accuracy loss, and achieve 44% reduction in run-time memory consumption and a 53% reduction in inference latency.
more » « less
Full Text Available
AutoGrow: Automatic Layer Growing in Deep Convolutional Networks

https://doi.org/10.1145/3394486.3403126

Wen, Wei; Yan, Feng; Chen, Yiran; Li, Hai (July 2020, ACM SIGKDD Conference on Knowledge Discovery and Data Mining)
null (Ed.)
Full Text Available
DeepHoyer: Learning Sparser Neural Network with Differentiable Scale-Invariant Sparsity Measures

Yang, Huanrui; Wen, Wei; Li, Hai (April 2020, International Conference on Learning Representations)

In seeking for sparse and efficient neural network models, many previous works investigated on enforcing `1 or `0 regularizers to encourage weight sparsity during training. The `0 regularizer measures the parameter sparsity directly and is invariant to the scaling of parameter values. But it cannot provide useful gradients and therefore requires complex optimization techniques. The `1 regularizer is almost everywhere differentiable and can be easily optimized with gradient descent. Yet it is not scale-invariant and causes the same shrinking rate to all parameters, which is inefficient in increasing sparsity. Inspired by the Hoyer measure (the ratio between `1 and `2 norms) used in traditional compressed sensing problems, we present DeepHoyer, a set of sparsity-inducing regularizers that are both differentiable almost everywhere and scale-invariant. Our experiments show that enforcing DeepHoyer regularizers can produce even sparser neural network models than previous works, under the same accuracy level. We also show that DeepHoyer can be applied to both element-wise and structural pruning. The codes are available at https://github.com/yanghr/DeepHoyer.
more » « less
Full Text Available
A Survey of Accelerator Architectures for Deep Neural Networks

https://doi.org/10.1016/j.eng.2020.01.007

Chen, Yiran; Xie, Yuan; Song, Linghao; Chen, Fan; Tang, Tianqi (March 2020, Engineering)

Full Text Available
RED: A ReRAM-based Efficient Accelerator for Deconvolutional Computation

https://doi.org/10.1109/TCAD.2020.2981055

Li, Ziru; Li, Bing; Fan, Zichen; Li, Hai (March 2020, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems)

Deconvolution is a key component in contemporary neural networks, especially generative adversarial networks (GANs) and fully convolutional networks (FCNs). Due to extra operations of deconvolution compared to convolution, considerable degradation of performance as well as energy efficiency is incurred when implementing deconvolution on the existing resistive random access memory (ReRAM)-based processing-in-memory (PIM) accelerators. In this work, we propose a ReRAM-based accelerator design, RED, for providing high-performance and low-energy deconvolution. We analyze the deconvolution execution on the existing ReRAM-based PIMs and utilize its interior computation pattern for design optimization. RED includes two major contributions: pixel-wise mapping scheme and zero-skipping data flow. Pixel-wise mapping scheme removes the zero insertion and performs convolutions over several ReRAM arrays and thus enables parallel computations with non-zero inputs. Zero-skipping data flow, assisted with customized input buffers design, enhances the computation parallelism and input data reuse. In evaluation, we compare RED against the existing ReRAM-based PIMs and CMOS-based counterpart with a variety of GAN and FCN models, each of which contains multiple deconvolution layers. The experimental results show that RED achieves a 4.0×-56.16× speedup and a 1.05×-18.17× energy efficiency improvement over previous related accelerator designs.
more » « less
Full Text Available
AutoShrink: A Topology-aware NAS for Discovering Efficient Neural Architecture

Zhang, Tunhou; Cheng, Hsin-Pai; Li, Zhenwen; Yan, Feng; Huang, Chengyu; Li, Hai; Chen, Yiran (February 2020, The Thirty-Fourth AAAI Conference on Artificial Intelligence)

Resource is an important constraint when deploying Deep Neural Networks (DNNs) on mobile and edge devices. Existing works commonly adopt the cell-based search approach, which limits the flexibility of network patterns in learned cell structures. Moreover, due to the topology-agnostic nature of existing works, including both cell-based and node-based approaches, the search process is time consuming and the performance of found architecture may be sub-optimal. To address these problems, we propose AutoShrink, a topology-aware Neural Architecture Search (NAS) for searching efficient building blocks of neural architectures. Our method is node-based and thus can learn flexible network patterns in cell structures within a topological search space. Directed Acyclic Graphs (DAGs) are used to abstract DNN architectures and progressively optimize the cell structure through edge shrinking. As the search space intrinsically reduces as the edges are progressively shrunk, AutoShrink explores more flexible search space with even less search time. We evaluate AutoShrink on image classification and language tasks by crafting ShrinkCNN and ShrinkRNN models. ShrinkCNN is able to achieve up to 48% parameter reduction and save 34% Multiply-Accumulates (MACs) on ImageNet-1K with comparable accuracy of state-of-the-art (SOTA) models. Specifically, both ShrinkCNN and ShrinkRNN are crafted within 1.5 GPU hours, which is 7.2× and 6.7× faster than the crafting time of SOTA CNN and RNN models, respectively.
more » « less
Full Text Available

« Prev Next »

Search for: All records