Search for: All records

Award ID contains: 1822085

« Prev Next »

Total Resources

22

Resource Type
Conference Paper

19

Conference Proceeding

0

Dataset

0

Journal Article

3

Workshop Report

0

Availability
Full Text / Resource Available

20

Citation Only

2

Save Results
Excel (limit 2000)
CSV (limit 5000)
XML (limit 5000)

Have feedback or suggestions for a way to improve these results?
!

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Fed-CBS: A Heterogeneity-Aware Client Sampling Mechanism for Federated Learning via Class-Imbalance Reduction

Zhang, Jianyi ; Li, Ang ; Tang, Minxue ; Sun, Jingwei ; Chen, Xiang ; Zhang, Fan ; Chen, Changyou ; Chen, Yiran ; Li, Hai ( July 2023 , International Conference on Machine Learning)

Due to the often limited communication bandwidth of edge devices, most existing federated learning (FL) methods randomly select only a subset of devices to participate in training at each communication round. Compared with engaging all the available clients, such a random-selection mechanism could lead to significant performance degradation on non-IID (independent and identically distributed) data. In this paper, we present our key observation that the essential reason resulting in such performance degradation is the class-imbalance of the grouped data from randomly selected clients. Based on this observation, we design an efficient heterogeneity-aware client sampling mechanism, namely, Federated Class-balanced Sampling (Fed-CBS), which can effectively reduce class-imbalance of the grouped dataset from the intentionally selected clients. We first propose a measure of class-imbalance which can be derived in a privacy-preserving way. Based on this measure, we design a computationefficient client sampling strategy such that the actively selected clients will generate a more classbalanced grouped dataset with theoretical guarantees. Experimental results show that Fed-CBS outperforms the status quo approaches in terms of test accuracy and the rate of convergence while achieving comparable or even better performance than the ideal setting where all the available clients participate in the FL training.
more » « less
Free, publicly-accessible full text available July 23, 2024
ReAugKD: Retrieval-Augmented Knowledge Distillation For Pre-trained Language Models

Zhang, Jianyi ; Muhamed, Aashiq ; Anantharaman, Aditya ; Wang, Guoyin ; Chen, Changyou ; Zhong, Kai ; Cui, Qingjun ; Xu, Yi ; Zeng, Belinda ; Chilimbi, Trishul ; et al ( July 2023 , The 61st Annual Meeting of the Association for Computational Linguistics)

Knowledge Distillation (KD) (Hinton et al., 2015) is one of the most effective approaches for deploying large-scale pre-trained language models in low-latency environments by transferring the knowledge contained in the largescale models to smaller student models. Previous KD approaches use the soft labels and intermediate activations generated by the teacher to transfer knowledge to the student model parameters alone. In this paper, we show that having access to non-parametric memory in the form of a knowledge base with the teacher’s soft labels and predictions can further enhance student capacity and improve generalization. To enable the student to retrieve from the knowledge base effectively, we propose a new Retrieval-augmented KD framework with a loss function that aligns the relational knowledge in teacher and student embedding spaces. We show through extensive experiments that our retrieval mechanism can achieve state-of-the-art performance for taskspecific knowledge distillation on the GLUE benchmark (Wang et al., 2018a).
more » « less
Free, publicly-accessible full text available July 9, 2024
FedSEA: A Semi-Asynchronous Federated Learning Framework for Extremely Heterogeneous Devices

https://doi.org/10.1145/3560905.3568538

Sun, Jingwei ; Li, Ang ; Duan, Lin ; Alam, Samiul ; Deng, Xuliang ; Guo, Xin ; Wang, Haiming ; Gorlatova, Maria ; Zhang, Mi ; Li, Hai ; et al ( November 2022 , The 20th ACM Conference on Embedded Networked Sensor Systems)

Federated learning (FL) has attracted increasing attention as a promising technique to drive a vast number of edge devices with artificial intelligence. However, it is very challenging to guarantee the efficiency of a FL system in practice due to the heterogeneous computation resources on different devices. To improve the efficiency of FL systems in the real world, asynchronous FL (AFL) and semi-asynchronous FL (SAFL) methods are proposed such that the server does not need to wait for stragglers. However, existing AFL and SAFL systems suffer from poor accuracy and low efficiency in realistic settings where the data is non-IID distributed across devices and the on-device resources are extremely heterogeneous. In this work, we propose FedSEA - a semi-asynchronous FL framework for extremely heterogeneous devices. We theoretically disclose that the unbalanced aggregation frequency is a root cause of accuracy drop in SAFL. Based on this analysis, we design a training configuration scheduler to balance the aggregation frequency of devices such that the accuracy can be improved. To improve the efficiency of the system in realistic settings where the devices have dynamic on-device resource availability, we design a scheduler that can efficiently predict the arriving time of local updates from devices and adjust the synchronization time point according to the devices' predicted arriving time. We also consider the extremely heterogeneous settings where there exist extremely lagging devices that take hundreds of times as long as the training time of the other devices. In the real world, there might be even some extreme stragglers which are not capable of training the global model. To enable these devices to join in training without impairing the systematic efficiency, Fed-SEA enables these extreme stragglers to conduct local training on much smaller models. Our experiments show that compared with status quo approaches, FedSEA improves the inference accuracy by 44.34% and reduces the systematic time cost and local training time cost by 87.02× and 792.9×. FedSEA also reduces the energy consumption of the devices with extremely limited resources by 752.9×.
more » « less
Full Text Available
Toward Efficient and Adaptive Design of Video Detection System with Deep Neural Networks

https://doi.org/10.1145/3484946

Mao, Jiachen ; Yang, Qing ; Li, Ang ; Nixon, Kent W. ; Li, Hai ; Chen, Yiran ( May 2022 , ACM Transactions on Embedded Computing Systems)

In the past decade, Deep Neural Networks (DNNs), e.g., Convolutional Neural Networks, achieved human-level performance in vision tasks such as object classification and detection. However, DNNs are known to be computationally expensive and thus hard to be deployed in real-time and edge applications. Many previous works have focused on DNN model compression to obtain smaller parameter sizes and consequently, less computational cost. Such methods, however, often introduce noticeable accuracy degradation. In this work, we optimize a state-of-the-art DNN-based video detection framework—Deep Feature Flow (DFF) from the cloud end using three proposed ideas. First, we propose Asynchronous DFF (ADFF) to asynchronously execute the neural networks. Second, we propose a Video-based Dynamic Scheduling (VDS) method that decides the detection frequency based on the magnitude of movement between video frames. Last, we propose Spatial Sparsity Inference, which only performs the inference on part of the video frame and thus reduces the computation cost. According to our experimental results, ADFF can reduce the bottleneck latency from 89 to 19 ms. VDS increases the detection accuracy by 0.6% mAP without increasing computation cost. And SSI further saves 0.2 ms with a 0.6% mAP degradation of detection accuracy.
more » « less
Full Text Available
An Audio Frequency Unfolding Framework for Ultra-Low Sampling Rate Sensors

https://doi.org/10.1109/ISQED54688.2022.9806149

Gao, Zhihui ; Tang, Minxue ; Li, Ang ; Chen, Yiran ( April 2022 , 2022 23rd International Symposium on Quality Electronic Design)

Full Text Available
FL-WBC: Enhancing Robustness against Model Poisoning Attacks in Federated Learning from a Client Perspective

Sun, Jingwei ; Li, Ang ; DiValentin, Louis ; Hassanzadeh, Amin ; Chen, Yiran ; Li, Hai ( December 2021 , Annual Conference on Neural Information Processing Systems (NeurIPS))

Full Text Available
Automatic Routability Predictor Development Using Neural Architecture Search

https://doi.org/10.1109/ICCAD51958.2021.9643483

Chang, Chen-Chia ; Pan, Jingyu ; Zhang, Tunhou ; Xie, Zhiyao ; Hu, Jiang ; Qi, Weiyi ; Lin, Chun-Wei ; Liang, Rongjian ; Mitra, Joydeep ; Fallon, Elias ; et al ( November 2021 , International Conference on Computer-Aided Design (ICCAD))

Full Text Available
ESCALATE: Boosting the Efficiency of Sparse CNN Accelerator with Kernel Decomposition

https://doi.org/10.1145/3466752.3480043

Li, Shiyu ; Hanson, Edward ; Qian, Xuehai ; Li, Hai "Helen" ; Chen, Yiran ( October 2021 , IEEE/ACM International Symposium on Microarchitecture)
null (Ed.)
The ever-growing parameter size and computation cost of Convolutional Neural Network (CNN) models hinder their deployment onto resource-constrained platforms. Network pruning techniques are proposed to remove the redundancy in CNN parameters and produce a sparse model. Sparse-aware accelerators are also proposed to reduce the computation cost and memory bandwidth requirements of inference by leveraging the model sparsity. The irregularity of sparse patterns, however, limits the efficiency of those designs. Researchers proposed to address this issue by creating a regular sparsity pattern through hardware-aware pruning algorithms. However, the pruning rate of these solutions is largely limited by the enforced sparsity patterns. This limitation motivates us to explore other compression methods beyond pruning. With two decoupled computation stages, we found that kernel decomposition could potentially take the processing of the sparse pattern off from the critical path of inference and achieve a high compression ratio without enforcing the sparse patterns. To exploit these advantages, we propose ESCALATE, an algorithm-hardware co-design approach based on kernel decomposition. At algorithm level, ESCALATE reorganizes the two computation stages of the decomposed convolution to enable a stream processing of the intermediate feature map. We proposed a hybrid quantization to exploit the different reuse frequency of each part of the decomposed weight. At architecture level, ESCALATE proposes a novel ‘Basis-First’ dataflow and its corresponding microarchitecture design to maximize the benefits brought by the decomposed convolution.
more » « less
Full Text Available
TPrune: Efficient Transformer Pruning for Mobile Devices

https://doi.org/10.1145/3446640

Mao, Jiachen ; Yang, Huanrui ; Li, Ang ; Li, Hai ; Chen, Yiran ( July 2021 , ACM Transactions on Cyber-Physical Systems)
null (Ed.)
The invention of Transformer model structure boosts the performance of Neural Machine Translation (NMT) tasks to an unprecedented level. Many previous works have been done to make the Transformer model more execution-friendly on resource-constrained platforms. These researches can be categorized into three key fields: Model Pruning, Transfer Learning, and Efficient Transformer Variants. The family of model pruning methods are popular for their simplicity in practice and promising compression rate and have achieved great success in the field of convolution neural networks (CNNs) for many vision tasks. Nonetheless, previous Transformer pruning works did not perform a thorough model analysis and evaluation on each Transformer component on off-the-shelf mobile devices. In this work, we analyze and prune transformer models at the line-wise granularity and also implement our pruning method on real mobile platforms. We explore the properties of all Transformer components as well as their sparsity features, which are leveraged to guide Transformer model pruning. We name our whole Transformer analysis and pruning pipeline as TPrune. In TPrune, we first propose Block-wise Structured Sparsity Learning (BSSL) to analyze Transformer model property. Then, based on the characters derived from BSSL, we apply Structured Hoyer Square (SHS) to derive the final pruned models. Comparing with the state-of-the-art Transformer pruning methods, TPrune is able to achieve a higher model compression rate with less performance degradation. Experimental results show that our pruned models achieve 1.16×–1.92× speedup on mobile devices with 0%–8% BLEU score degradation compared with the original Transformer model.
more » « less
Full Text Available
Soteria: Provable Defense against Privacy Leakage in Federated Learning from Representation Perspective

Sun, J. ; Li, A. ; Wang, B. ; Yang, H. ; Li, H. ; Chen, Y. ( June 2021 , IEEE Conference on Computer Vision and Pattern Recognition)
null (Ed.)
Full Text Available

« Prev Next »