NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Joslim: Joint Widths and Weights Optimization for Slimmable Neural Networks

https://doi.org/10.1007/978-3-030-86523-8_8

Chin, T.-W.; Morcos, A.; Marculescu, D. (September 2021, Joint European Conference on Machine Learning and Knowledge Discovery in Databases. Research Track. ECML PKDD 2021.)

Full Text Available
Width transfer: on the (in)variance of width optimization

https://doi.org/10.1109/CVPRW53098.2021.00334

Chin, Ting-Wu; Marculescu, Diana; Morcos, Ari S. (June 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW))
null (Ed.)
Full Text Available
Renofeation: A Simple Transfer Learning Method for Improved Adversarial Robustness

https://doi.org/10.1109/CVPRW53098.2021.00362

Chin, Ting-Wu; Zhang, Cha; Marculescu, Diana (June 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW))
null (Ed.)
Full Text Available
One Weight Bitwidth to Rule Them All

Chin, Ting-Wu; Chuang, Pierce; Chandra, Vikas; Marculescu, Diana (August 2020, European Conference on Computer Vision Workshops)

Weight quantization for deep ConvNets has shown promising results for applications such as image classification and semantic segmentation and is especially important for applications where memory storage is limited. However, when aiming for quantization without accuracy degradation, different tasks may end up with different bitwidths. This creates complexity for software and hardware support and the complexity accumulates when one considers mixed-precision quantization, in which case each layer’s weights use a different bitwidth. Our key insight is that optimizing for the least bitwidth subject to no accuracy degradation is not necessarily an optimal strategy. This is because one cannot decide optimality between two bitwidths if one has smaller model size while the other has better accuracy. In this work, we take the first step to understand if some weight bitwidth is better than others by aligning all to the same model size using a width-multiplier. Under this setting, somewhat surprisingly, we show that using a single bitwidth for the whole network can achieve better accuracy compared to mixed-precision quantization targeting zero accuracy degradation when both have the same model size. In particular, our results suggest that when the number of channels becomes a target hyperparameter, a single weight bitwidth throughout the network shows superior results for model compression.
more » « less
Full Text Available
Single-Path Mobile AutoML: Efficient ConvNet Design and NAS Hyperparameter Optimization

https://doi.org/10.1109/JSTSP.2020.2971421

Stamoulis, Dimitrios; Ding, Ruizhou; Wang, Di; Lymberopoulos, Dimitrios; Priyantha, Bodhi; Liu, Jie; Marculescu, Diana (May 2020, IEEE Journal of Selected Topics in Signal Processing)

Full Text Available
DeepNVM: A Framework for Modeling and Analysis of Non-Volatile Memory Technologies for Deep Learning Applications

https://doi.org/10.23919/DATE48585.2020.9116263

Inci, Ahmet Fatih; Meric Isgenc, Mehmet; Marculescu, Diana (March 2020, 2020 Design, Automation & Test in Europe Conference & Exhibition (DATE))

Non-volatile memory (NVM) technologies such as spin-transfer torque magnetic random access memory (STT-MRAM) and spin-orbit torque magnetic random access memory (SOT-MRAM) have significant advantages compared to conventional SRAM due to their non-volatility, higher cell density, and scalability features. While previous work has investigated several architectural implications of NVM for generic applications, in this work we present DeepNVM, a framework to characterize, model, and analyze NVM-based caches in GPU architectures for deep learning (DL) applications by combining technologyspecific circuit-level models and the actual memory behavior of various DL workloads. We present both iso-capacity and isoarea performance and energy analysis for systems whose lastlevel caches rely on conventional SRAM and emerging STT-MRAM and SOT-MRAM technologies. In the iso-capacity case, STT-MRAM and SOT-MRAM provide up to 4.2× and 5× energy-delay product (EDP) reduction and 2.4× and 3× area reduction compared to conventional SRAM, respectively. Under iso-area assumptions, STT-MRAM and SOT-MRAM provide 2.3× EDP reduction on average across all workloads when compared to SRAM. Our comprehensive cross-layer framework is demonstrated on STT-/SOT-MRAM technologies and can be used for the characterization, modeling, and analysis of any NVM technology for last-level caches in GPU platforms for deep learning applications.
more » « less
Full Text Available
ViP: Virtual Pooling for Accelerating CNN-based Image Classification and Object Detection

https://doi.org/10.1109/WACV45572.2020.9093418

Chen, Zhuo; Zhang, Jiyuan; Ding, Ruizhou; Marculescu, Diana (March 2020, 2020 IEEE Winter Conference on Applications of Computer Vision (WACV))

In recent years, Convolutional Neural Networks (CNNs) have shown superior capability in visual learning tasks. While accuracy-wise CNNs provide unprecedented performance, they are also known to be computationally intensive and energy demanding for modern computer systems. In this paper, we propose Virtual Pooling (ViP), a model-level approach to improve speed and energy consumption of CNN-based image classification and object detection tasks, with a provable error bound. We show the efficacy of ViP through experiments on four CNN models, three representative datasets, both desktop and mobile platforms, and two visual learning tasks, i.e., image classification and object detection. For example, ViP delivers 2.1x speedup with less than 1.5% accuracy degradation in ImageNet classification on VGG16, and 1.8x speedup with 0.025 mAP degradation in PASCAL VOC object detection with Faster-RCNN. ViP also reduces mobile GPU and CPU energy consumption by up to 55% and 70%, respectively. As a complementary method to existing acceleration approaches, ViP achieves 1.9x speedup on ThiNet leading to a combined speedup of 5.23x on VGG16. Furthermore, ViP provides a knob for machine learning practitioners to generate a set of CNN models with varying trade-offs between system speed/energy consumption and accuracy to better accommodate the requirements of their tasks. Code is available at https://github.com/cmu-enyac/VirtualPooling.
more » « less
Full Text Available
Single-Path NAS: Designing Hardware-Ecient ConvNets in less than 4 Hours

Stamoulis, Dimitrios; Ding, Ruizhou; Wang, Di; Lymberopoulos, Dimitrios; Priyantha, Bodhi; Liu, Jie; Marculescu, Diana (September 2019, European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases)

Full Text Available
Regularizing Activation Distribution for Training Binarized Deep Networks

Ding, Ruizhou; Chin, Ting-Wu; Liu, Zeye; Marculescu, Diana (June 2019, IEEE Conference on Computer Vision and Pattern Recognition)

Full Text Available
Single-Path NAS: Device-Aware Efficient ConvNet Design

Stamoulis, Dimitrios; Ding, Ruizhou; Wang, Di; Lymberopoulos, Dimitrios; Priyantha, Bodhi; Liu, Jie; Marculescu, Diana (June 2019, Joint Workshop on On-Device Machine Learning & Compact Deep Neural Network Representations with Industrial Applications (ODML-CDNNRIA) in Conjunction with International Conference on Machine Learning)

Full Text Available

« Prev Next »

Search for: All records