skip to main content


Title: DLion: Decentralized Distributed Deep Learning in Micro-Clouds
Deep learning (DL) is a popular technique for building models from large quantities of data such as pictures, videos, messages generated from edges devices at rapid pace all over the world. It is often infeasible to migrate large quantities of data from the edges to centralized data center(s) over WANs for training due to privacy, cost, and performance reasons. At the same time, training large DL models on edge devices is infeasible due to their limited resources. An attractive alternative for DL training distributed data is to use micro-clouds---small-scale clouds deployed near edge devices in multiple locations. However, micro-clouds present the challenges of both computation and network resource heterogeneity as well as dynamism. In this paper, we introduce DLion, a new and generic decentralized distributed DL system designed to address the key challenges in micro-cloud environments, in order to reduce overall training time and improve model accuracy. We present three key techniques in DLion: (1) Weighted dynamic batching to maximize data parallelism for dealing with heterogeneous and dynamic compute capacity, (2) Per-link prioritized gradient exchange to reduce communication overhead for model updates based on available network capacity, and (3) Direct knowledge transfer to improve model accuracy by merging the best performing model parameters. We build a prototype of DLion on top of TensorFlow and show that DLion achieves up to 4.2X speedup in an Amazon GPU cluster, and up to 2X speed up and 26% higher model accuracy in a CPU cluster over four state-of-the-art distributed DL systems.  more » « less
Award ID(s):
1717834
NSF-PAR ID:
10292672
Author(s) / Creator(s):
;
Date Published:
Journal Name:
HPDC '21
Page Range / eLocation ID:
227 to 238
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. The significantly increasing number of vehicles brings convenience to daily life while also introducing significant challenges to the transportation network and air pollution. It has been proved that platooning/clustering-based driving can significantly reduce road congestion and exhaust emissions and improve road capacity and energy efficiency. This paper aims to improve the stability of vehicle clustering to enhance the lifetime of cooperative driving. Specifically, we use a Graph Neural Network (GNN) model to learn effective node representations, which can help aggregate vehicles with similar patterns into stable clusters. To the best of our knowledge, this is the first generalized learnable GNN-based model for vehicular ad hoc network clustering. In addition, our centralized approach makes full use of the ubiquitous presence of the base stations and edge clouds. It is noted that a base station has a vantage view of the vehicle distribution within the coverage area as compared to distributed clustering approaches. Specifically, eNodeB-assisted clustering can greatly reduce the control message overhead during the cluster formation and offload to eNodeB the complex computations required for machine learning algorithms. We evaluated the performance of the proposed clustering algorithms on the open-source highD dataset. The experiment results demonstrate that the average cluster lifetime and cluster efficiency of our GNN-based clustering algorithm outperforms state-of-the-art baselines. 
    more » « less
  2. Multi-spectral satellite images that remotely sense the Earth's surface at regular intervals are often contaminated due to occlusion by clouds. Remote sensing imagery captured via satellites, drones, and aircraft has successfully influenced a wide range of fields such as monitoring vegetation health, tracking droughts, and weather forecasting, among others. Researchers studying the Earth's surface are often hindered while gathering reliable observations due to contaminated reflectance values that are sensitive to thin, thick, and cirrus clouds, as well as their shadows. In this study, we propose a deep learning network architecture, CloudNet, to alleviate cloud-occluded remote sensing imagery captured by Landsat-8 satellite for both visible and non-visible spectral bands. We propose a deep neural network model trained on a distributed storage cluster that leverages historical trends within Landsat-8 imagery while complementing this analysis with high-resolution Sentinel-2 imagery. Our empirical benchmarks profile the efficiency of the CloudNet model with a range of cloud-occluded pixels in the input image. We further compare our CloudNet's performance with state-of-the-art deep learning approaches such as SpAGAN and Resnet. We propose a novel method, dynamic hierarchical transfer learning, to reduce computational resource requirements while training the model to achieve the desired accuracy. Our model regenerates features of cloudy images with a high PSNR accuracy of 34.28 dB. 
    more » « less
  3. Edge devices provide inference on predictive tasks to many end-users. However, deploying deep neural networks that achieve state-of-the-art accuracy on these devices is infeasible due to edge resource constraints. Nevertheless, cloud-only processing, the de-facto standard, is also problematic, since uploading large amounts of data imposes severe communication bottlenecks. We propose a novel end-to-end hybrid learning framework that allows the edge to selectively query only those hard examples that the cloud can classify correctly. Our framework optimizes over neural architectures and trains edge predictors and routing models so that the overall accuracy remains high while minimizing the overall latency. Training a hybrid learner is difficult since we lack annotations of hard edge-examples. We introduce a novel proxy supervision in this context and show that our method adapts seamlessly and near optimally across different latency regimes. On the ImageNet dataset, our proposed method deployed on a micro-controller unit exhibits 25% reduction in latency compared to cloud-only processing while suffering no excess loss. 
    more » « less
  4. Autonomous vehicles (AV) are expected to revolutionize transportation and improve road safety significantly. However, these benefits do not come without cost; AVs require large Deep-Learning (DL) models and powerful hardware platforms to operate reliably in real-time, requiring between several hundred watts to one kilowatt of power. This power consumption can dramatically reduce vehicles’ driving range and affect emissions. To address this problem, we propose SAGE: a methodology for selectively offloading the key energy-consuming modules of DL architectures to the cloud to optimize edge, energy usage while meeting real-time latency constraints. Furthermore, we leverage Head Network Distillation (HND) to introduce efficient bottlenecks within the DL architecture in order to minimize the network overhead costs of offloading with almost no degradation in the model’s performance. We evaluate SAGE using an Nvidia Jetson TX2 and an industry-standard Nvidia Drive PX2 as the AV edge, devices and demonstrate that our offloading strategy is practical for a wide range of DL models and internet connection bandwidths on 3G, 4G LTE, and WiFi technologies. Compared to edge-only computation, SAGE reduces energy consumption by an average of 36.13% , 47.07% , and 55.66% for an AV with one low-resolution camera, one high-resolution camera, and three high-resolution cameras, respectively. SAGE also reduces upload data size by up to 98.40% compared to direct camera offloading. 
    more » « less
  5. Federated learning (FL) involves training a model over massive distributed devices, while keeping the training data localized and private. This form of collaborative learning exposes new tradeoffs among model convergence speed, model accuracy, balance across clients, and communication cost, with new challenges including: (1) straggler problem—where clients lag due to data or (computing and network) resource heterogeneity, and (2) communication bottleneck—where a large number of clients communicate their local updates to a central server and bottleneck the server. Many existing FL methods focus on optimizing along only one single dimension of the tradeoff space. Existing solutions use asynchronous model updating or tiering-based, synchronous mechanisms to tackle the straggler problem. However, asynchronous methods can easily create a communication bottleneck, while tiering may introduce biases that favor faster tiers with shorter response latencies. To address these issues, we present FedAT, a novel Federated learning system with Asynchronous Tiers under Non-i.i.d. training data. FedAT synergistically combines synchronous, intra-tier training and asynchronous, cross-tier training. By bridging the synchronous and asynchronous training through tiering, FedAT minimizes the straggler effect with improved convergence speed and test accuracy. FedAT uses a straggler-aware, weighted aggregation heuristic to steer and balance the training across clients for further accuracy improvement. FedAT compresses uplink and downlink communications using an efficient, polyline-encoding-based compression algorithm, which minimizes the communication cost. Results show that FedAT improves the prediction performance by up to 21.09% and reduces the communication cost by up to 8.5×, compared to state-of-the-art FL methods. 
    more » « less