NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

3D Capsule Networks for Object Classification With Weight Pruning

https://doi.org/10.1109/ACCESS.2020.2971950

Kakillioglu, Burak; Ren, Ao; Wang, Yanzhi; Velipasalar, Senem (January 2020, IEEE Access)

Full Text Available
Improving DNN Fault Tolerance using Weight Pruning and Differential Crossbar Mapping for ReRAM-based Edge AI

https://doi.org/10.1109/ISQED51717.2021.9424332

Yuan, Geng; Liao, Zhiheng; Ma, Xiaolong; Cai, Yuxuan; Kong, Zhenglun; Shen, Xuan; Fu, Jingyan; Li, Zhengang; Zhang, Chengming; Peng, Hongwu; et al (April 2021, 22nd International Symposium on Quality Electronic Design (ISQED))
null (Ed.)
Full Text Available
A low-computation-complexity, energy-efficient, and high-performance linear program solver based on primal–dual interior point method using memristor crossbars

https://doi.org/10.1016/j.nancom.2018.01.001

Cai, Ruizhe; Ren, Ao; Soundarajan, Sucheta; Wang, Yanzhi (December 2018, Nano Communication Networks)

Full Text Available
ADMM-NN: An Algorithm-Hardware Co-Design Framework of DNNs Using Alternating Direction Methods of Multipliers

https://doi.org/10.1145/3297858.3304076

Ren, Ao; Zhang, Tianyun; Ye, Shaokai; Li, Jiayu; Xu, Wenyao; Qian, Xuehai; Lin, Xue; Wang, Yanzhi (April 2019, the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems)

Model compression is an important technique to facilitate efficient embedded and hardware implementations of deep neural networks (DNNs), a number of prior works are dedicated to model compression techniques. The target is to simultaneously reduce the model storage size and accelerate the computation, with minor effect on accuracy. Two important categories of DNN model compression techniques are weight pruning and weight quantization. The former leverages the redundancy in the number of weights, whereas the latter leverages the redundancy in bit representation of weights. These two sources of redundancy can be combined, thereby leading to a higher degree of DNN model compression. However, a systematic framework of joint weight pruning and quantization of DNNs is lacking, thereby limiting the available model compression ratio. Moreover, the computation reduction, energy efficiency improvement, and hardware performance overhead need to be accounted besides simply model size reduction, and the hardware performance overhead resulted from weight pruning method needs to be taken into consideration. To address these limitations, we present ADMM-NN, the first algorithm-hardware co-optimization framework of DNNs using Alternating Direction Method of Multipliers (ADMM), a powerful technique to solve non-convex optimization problems with possibly combinatorial constraints. The first part of ADMM-NN is a systematic, joint framework of DNN weight pruning and quantization using ADMM. It can be understood as a smart regularization technique with regularization target dynamically updated in each ADMM iteration, thereby resulting in higher performance in model compression than the state-of-the-art. The second part is hardware-aware DNN optimizations to facilitate hardware-level implementations. We perform ADMM-based weight pruning and quantization considering (i) the computation reduction and energy efficiency improvement, and (ii) the hardware performance overhead due to irregular sparsity. The first requirement prioritizes the convolutional layer compression over fully-connected layers, while the latter requires a concept of the break-even pruning ratio, defined as the minimum pruning ratio of a specific layer that results in no hardware performance degradation. Without accuracy loss, ADMM-NN achieves 85× and 24× pruning on LeNet-5 and AlexNet models, respectively, --- significantly higher than the state-of-the-art. The improvements become more significant when focusing on computation reduction. Combining weight pruning and quantization, we achieve 1,910× and 231× reductions in overall model size on these two benchmarks, when focusing on data storage. Highly promising results are also observed on other representative DNNs such as VGGNet and ResNet-50. We release codes and models at https://github.com/yeshaokai/admm-nn.
more » « less
Full Text Available
An area and energy efficient design of domain-wall memory-based deep convolutional neural networks using stochastic computing

https://doi.org/10.1109/ISQED.2018.8357306

Ma, Xiaolong; Zhang, Yipeng; Yuan, Geng; Ren, Ao; Li, Zhe; Han, Jie; Hu, Jingtong; Wang, Yanzhi (March 2018, 2018 19th International Symposium on Quality Electronic Design (ISQED))

Full Text Available
Ultra-fast robust compressive sensing based on memristor crossbars

https://doi.org/10.1109/ICASSP.2017.7952333

Liu, Sijia; Ren, Ao; Wang, Yanzhi; Varshney, Pramod K. (March 2017, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP))

In this paper, we propose a new approach for robust compressive sensing (CS) using memristor crossbars that are constructed by recently invented memristor devices. The exciting features of a memristor crossbar, such as high density, low power and great scalability, make it a promising candidate to perform large-scale matrix operations. To apply memristor crossbars to solve a robust CS problem, the alternating directions method of multipliers (ADMM) is employed to split the original problem into subproblems that involve the solution of systems of linear equations. A system of linear equations can then be solved using memristor crossbars with astonishing O(1) time complexity. We also study the impact of hardware variations on the memristor crossbar based CS solver from both theoretical and practical points of view. The resulting overall complexity is given by O(n), which achieves O(n2.5) speed-up compared to the state-of-the-art software approach. Numerical results are provided to illustrate the effectiveness of the proposed CS solver.
more » « less
Full Text Available
Structured Weight Matrices-Based Hardware Accelerators in Deep Neural Networks: FPGAs and ASICs

https://doi.org/10.1145/3194554.3194625

Ding, Caiwen; Ren, Ao; Yuan, Geng; Ma, Xiaolong; Li, Jiayu; Liu, Ning; Yuan, Bo; Wang, Yanzhi (January 2018, Proceedings of the 2018 on Great Lakes Symposium on VLSI)

Full Text Available
Memristor crossbar-based ultra-efficient next-generation baseband processors

https://doi.org/10.1109/MWSCAS.2017.8053125

Yuan, Geng; Ding, Caiwen; Cai, Ruizhe; Ma, Xiaolong; Zhao, Ziyi; Ren, Ao; Yuan, Bo; Wang, Yanzhi (August 2017, 2017 IEEE 60th International Midwest Symposium on Circuits and Systems (MWSCAS))

As one of the most promising future fundamental devices, memristor has its unique advantage on implementing low-power high-speed matrix multiplication. Taking advantage of the high performance on basic matrix operation and flexibilitys of memristor crossbars, in this paper, we investigate both discrete Fourier transformation (DFT) and miltiple-input and multi-output (MIMO) detection unit in baseband processor. We reformulate the signal processing algorithms and model structures into a matrix-based framework, and present a memristor crossbar based DFT module design and MIMO detector module design. For both designs, experimental results demonstrate significant gains in speed and power efficiency compared with traditional CMOS-based designs.
more » « less
Full Text Available
Algorithm-hardware co-optimization of the memristor-based framework for solving SOCP and homogeneous QCQP problems

https://doi.org/10.1109/ASPDAC.2017.7858420

Ren, Ao; Liu, Sijia; Cai, Ruizhe; Wen, Wujie; Varshney, Pramod K.; Wang, Yanzhi (January 2017, 2017 22nd Asia and South Pacific Design Automation Conference (ASP-DAC))

A memristor crossbar, which is constructed with memristor devices, has the unique ability to change and memorize the state of each of its memristor elements. It also has other highly desirable features such as high density, low power operation and excellent scalability. Hence the memristor crossbar technology can potentially be utilized for developing low-complexity and high-scalability solution frameworks for solving a large class of convex optimization problems, which involve extensive matrix operations and have critical applications in multiple disciplines. This paper, as the first attempt towards this direction, proposes a novel memristor crossbar-based framework for solving two important convex optimization problems, i.e., second-order cone programming (SOCP) and homogeneous quadratically constrained quadratic programming (QCQP) problems. In this paper, the alternating direction method of multipliers (ADMM) is adopted. It splits the SOCP and homogeneous QCQP problems into sub-problems that involve the solution of linear systems, which could be effectively solved using the memristor crossbar in O(1) time complexity. The proposed algorithm is an iterative procedure that iterates a constant number of times. Therefore, algorithms to solve SOCP and homogeneous QCQP problems have pseudo-O(N) complexity, which is a significant reduction compared to the state-of-the-art software solvers (O(N3.5)-O(N4)).
more » « less
Full Text Available

Search for: All records