Search for: All records

Award ID contains: 2107085

« Prev Next »

Total Resources

11

Resource Type
Conference Paper

7

Conference Proceeding

0

Dataset

0

Journal Article

4

Workshop Report

0

Availability
Full Text / Resource Available

5

Citation Only

6

Save Results
Excel (limit 2000)
CSV (limit 5000)
XML (limit 5000)

Have feedback or suggestions for a way to improve these results?
!

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

MobileTL: On-Device Transfer Learning with Inverted Residual Blocks

https://doi.org/10.1609/aaai.v37i6.25874

Chiang, Hung-Yueh ; Frumkin, Natalia ; Liang, Feng ; Marculescu, Diana ( June 2023 , Proceedings of the AAAI Conference on Artificial Intelligence)

Transfer learning on edge is challenging due to on-device limited resources. Existing work addresses this issue by training a subset of parameters or adding model patches. Developed with inference in mind, Inverted Residual Blocks (IRBs) split a convolutional layer into depthwise and pointwise convolutions, leading to more stacking layers, e.g., convolution, normalization, and activation layers. Though they are efficient for inference, IRBs require that additional activation maps are stored in memory for training weights for convolution layers and scales for normalization layers. As a result, their high memory cost prohibits training IRBs on resource-limited edge devices, and making them unsuitable in the context of transfer learning. To address this issue, we present MobileTL, a memory and computationally efficient on-device transfer learning method for models built with IRBs. MobileTL trains the shifts for internal normalization layers to avoid storing activation maps for the backward pass. Also, MobileTL approximates the backward computation of the activation layer (e.g., Hard-Swish and ReLU6) as a signed function which enables storing a binary mask instead of activation maps for the backward pass. MobileTL fine-tunes a few top blocks (close to output) rather than propagating the gradient through the whole network to reduce the computation cost. Our method reduces memory usage by 46% and 53% for MobileNetV2 and V3 IRBs, respectively. For MobileNetV3, we observe a 36% reduction in floating-point operations (FLOPs) when fine-tuning 5 blocks, while only incurring a 0.6% accuracy reduction on CIFAR10. Extensive experiments on multiple datasets demonstrate that our method is Pareto-optimal (best accuracy under given hardware constraints) compared to prior work in transfer learning for edge devices.
more » « less
Free, publicly-accessible full text available June 27, 2024
SUGAR: Efficient Subgraph-level Training via Resource-aware Graph Partitioning

https://doi.org/10.1109/TC.2023.3288755

Xue, Zihui ; Yang, Yuedong ; Marculescu, Radu ( June 2023 , IEEE Transactions on Computers)

Graph Neural Networks (GNNs) have demonstrated a great potential in a variety of graph-based applications, such as recommender systems, drug discovery, and object recognition. Nevertheless, resource efficient GNN learning is a rarely explored topic despite its many benefits for edge computing and Internet of Things (IoT) applications. To improve this state of affairs, this work proposes efficient subgraph-level training via resource aware graph partitioning (SUGAR). SUGAR first partitions the initial graph into a set of disjoint subgraphs and then performs local training at the subgraph-level. We provide a theoretical analysis and conduct extensive experiments on five graph benchmarks to verify its efficacy in practice. Our results across five different hardware platforms demonstrate great runtime speedup and memory reduction of SUGAR on large-scale graphs. We believe SUGAR opens a new research direction towards developing GNN methods that are resource-efficient, hence suitable for IoT deployment.
more » « less
Free, publicly-accessible full text available June 16, 2024
CLIP4VideoCap: Rethinking Clip for Video Captioning with Multiscale Temporal Fusion and Commonsense Knowledge

https://doi.org/10.1109/ICASSP49357.2023.10097128

Mahmud, Tanvir ; Liang, Feng ; Qing, Yaling ; Marculescu, Diana ( June 2023 , 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP))

Free, publicly-accessible full text available June 4, 2024
Open-Vocabulary Semantic Segmentation with Mask-adapted CLIP

https://doi.org/10.1109/CVPR52729.2023.00682

Liang, Feng ; Wu, Bichen ; Dai, Xiaoliang ; Li, Kunpeng ; Zhao, Yinan ; Zhang, Hang ; Zhang, Peizhao ; Vajda, Peter ; Marculescu, Diana ( June 2023 , 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR))

Free, publicly-accessible full text available June 1, 2024
MOHAWK: Mobility and Heterogeneity-Aware Dynamic Community Selection for Hierarchical Federated Learning

https://doi.org/10.1145/3576842.3582378

Farcas, Allen-Jasmin ; Lee, Myungjin ; Kompella, Ramana Rao ; Latapie, Hugo ; De Veciana, Gustavo ; Marculescu, Radu ( May 2023 , IoTDI)

The recent developments in Federated Learning (FL) focus on optimizing the learning process for data, hardware, and model heterogeneity. However, most approaches assume all devices are stationary, charging, and always connected to the Wi-Fi when training on local data. We argue that when real devices move around, the FL process is negatively impacted and the device energy spent for communication is increased. To mitigate such effects, we propose a dynamic community selection algorithm which improves the communication energy efficiency and two new aggregation strategies that boost the learning performance in Hierarchical FL (HFL). For real mobility traces, we show that compared to state-of-the-art HFL solutions, our approach is scalable, achieves better accuracy on multiple datasets, converges up to 3.88× faster, and is significantly more energy efficient for both IID and non-IID scenarios.
more » « less
Free, publicly-accessible full text available May 9, 2024
Demo Abstract: A Hardware Prototype Targeting Federated Learning with User Mobility and Device Heterogeneity

https://doi.org/10.1145/3576842.3589160

Farcas, Allen-Jasmin ; Marculescu, Radu ( May 2023 , IoTDI)

This paper presents a new hardware prototype to explore how centralized and hierarchical federated learning systems are impacted by real-world devices distribution, availability, and heterogeneity. Our results show considerable learning performance degradation and wasted energy during training when users mobility is accounted for. Hence, we provide a prototype that can be used as a design exploration tool to better design, calibrate and evaluate FL systems for real-world deployment.
more » « less
Free, publicly-accessible full text available May 9, 2024
Quarantine in Motion: A Graph Learning Framework to Reduce Disease Transmission Without Lockdown

https://doi.org/10.1109/ASONAM55673.2022.10068686

Hurtado, Sofia ; Marculescu, Radu ; Drake, Justin ( November 2022 , ASONAM)

Exposure notification applications are developed to increase the scale and speed of disease contact tracing. Indeed, by taking advantage of Bluetooth technology, they track the infected population’s mobility and then inform close contacts to get tested. In this paper, we ask whether these applications can extend from reactive to preemptive risk management tools? To this end, we propose a new framework that utilizes graph neural networks (GNN) and real-world Foursquare mobility data to predict high risk locations on an hourly basis. As a proof of concept, we then simulate a risk-informed Foursquare population of over 36,000 people in Austin TX after the peak of an outbreak. We find that even after 50% of the population has been infected with COVID-19, they can still maintain their mobility, while reducing the new infections by 13%. Consequently, these results are a first step towards achieving what we call Quarantine in Motion.
more » « less
Full Text Available
QUIDAM: A Framework for Qu ant i zation-Aware D NN A ccelerator and M odel Co-Exploration

https://doi.org/10.1145/3555807

Inci, Ahmet ; Virupaksha, Siri Garudanagiri ; Jain, Aman ; Chin, Ting-Wu ; Thallam, Venkata Vivek ; Ding, Ruizhou ; Marculescu, Diana ( September 2022 , ACM Transactions on Embedded Computing Systems)

As the machine learning and systems communities strive to achieve higher energy-efficiency through custom deep neural network (DNN) accelerators, varied precision or quantization levels, and model compression techniques, there is a need for design space exploration frameworks that incorporate quantization-aware processing elements into the accelerator design space while having accurate and fast power, performance, and area models. In this work, we present QUIDAM , a highly parameterized quantization-aware DNN accelerator and model co-exploration framework. Our framework can facilitate future research on design space exploration of DNN accelerators for various design choices such as bit precision, processing element type, scratchpad sizes of processing elements, global buffer size, number of total processing elements, and DNN configurations. Our results show that different bit precisions and processing element types lead to significant differences in terms of performance per area and energy. Specifically, our framework identifies a wide range of design points where performance per area and energy varies more than 5 × and 35 ×, respectively. With the proposed framework, we show that lightweight processing elements achieve on par accuracy results and up to 5.7 × more performance per area and energy improvement when compared to the best INT16 based implementation. Finally, due to the efficiency of the pre-characterized power, performance, and area models, QUIDAM can speed up the design exploration process by 3-4 orders of magnitude as it removes the need for expensive synthesis and characterization of each design.
more » « less
Full Text Available
SUGAR: Efficient Subgraph-level Training via Resource-aware Graph Partitioning

Zihui Xue, Yuedong Yang ( August 2022 , IEEE transactions on computers)

Abstract—Graph Neural Networks (GNNs) have demonstrated a great potential in a variety of graph-based applications, such as recommender systems, drug discovery, and object recognition. Nevertheless, resource-efficient GNN learning is a rarely explored topic despite its many benefits for edge computing and Internet of Things (IoT) applications. To improve this state of affairs, this work proposes efficient subgraph-level training via resource-aware graph partitioning (SUGAR). SUGAR first partitions the initial graph into a set of disjoint subgraphs and then performs local training at the subgraph-level. We provide a theoretical analysis and conduct extensive experiments on five graph benchmarks to verify its efficacy in practice. Our results across five different hardware platforms demonstrate great runtime speedup and memory reduction of SUGAR on large-scale graphs. We believe SUGAR opens a new research direction towards developing GNN methods that are resource-efficient, hence suitable for IoT deployment. NOTE: This paper is currently under review.
more » « less
Full Text Available
ANT: Adapt Network Across Time for Efficient Video Processing

https://doi.org/10.1109/CVPRW56347.2022.00293

Liang, Feng ; Chin, Ting-Wu ; Zhou, Yang ; Marculescu, Diana ( June 2022 , 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW))

Full Text Available

« Prev Next »