skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.

Attention:

The NSF Public Access Repository (PAR) system and access will be unavailable from 10:00 PM ET on Thursday, February 12 until 1:00 AM ET on Friday, February 13 due to maintenance. We apologize for the inconvenience.


Title: Fast and scalable all-optical network architecture for distributed deep learning
With the ever-increasing size of training models and datasets, network communication has emerged as a major bottleneck in distributed deep learning training. To address this challenge, we propose an optical distributed deep learning (ODDL) architecture. ODDL utilizes a fast yet scalable all-optical network architecture to accelerate distributed training. One of the key features of the architecture is its flow-based transmit scheduling with fast reconfiguration. This allows ODDL to allocate dedicated optical paths for each traffic stream dynamically, resulting in low network latency and high network utilization. Additionally, ODDL provides physically isolated and tailored network resources for training tasks by reconfiguring the optical switch using LCoS-WSS technology. The ODDL topology also uses tunable transceivers to adapt to time-varying traffic patterns. To achieve accurate and fine-grained scheduling of optical circuits, we propose an efficient distributed control scheme that incurs minimal delay overhead. Our evaluation on real-world traces showcases ODDL’s remarkable performance. When implemented with 1024 nodes and 100 Gbps bandwidth, ODDL accelerates VGG19 training by 1.6× and 1.7× compared to conventional fat-tree electrical networks and photonic SiP-Ring architectures, respectively. We further build a four-node testbed, and our experiments show that ODDL can achieve comparable training time compared to that of anidealelectrical switching network.  more » « less
Award ID(s):
1907142
PAR ID:
10539248
Author(s) / Creator(s):
; ; ; ; ;
Publisher / Repository:
IEEE Xplore
Date Published:
Journal Name:
Journal of Optical Communications and Networking
Volume:
16
Issue:
3
ISSN:
1943-0620
Page Range / eLocation ID:
342
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Many times, training a large scale deep learning neural network on a single machine becomes more and more difficult for a complex network model. Distributed training provides an efficient solution, but Byzantine attacks may occur on participating workers. They may be compromised or suffer from hardware failures. If they upload poisonous gradients, the training will become unstable or even converge to a saddle point. In this paper, we propose FABA, a Fast Aggregation algorithm against Byzantine Attacks, which removes the outliers in the uploaded gradients and obtains gradients that are close to the true gradients. We show the convergence of our algorithm. The experiments demonstrate that our algorithm can achieve similar performance to non-Byzantine case and higher efficiency as compared to previous algorithms. 
    more » « less
  2. null (Ed.)
    Bacteria identification can be a time-consuming process. Machine learning algorithms that use deep convolutional neural networks (CNNs) provide a promising alternative. Here, we present a deep learning based approach paired with Raman spectroscopy to rapidly and accurately detect the identity of a bacteria class. We propose a simple 4-layer CNN architecture and use a 30-class bacteria isolate dataset for training and testing. We achieve an identification accuracy of around 86% with identification speeds close to real-time. This optical/biological detection method is promising for applications in the detection of microbes in liquid biopsies and concentrated environmental liquid samples, where fast and accurate detection is crucial. This study uses a recently published dataset of Raman spectra from bacteria samples and an improved CNN model built with TensorFlow. Results show improved identification accuracy and reduced network complexity. 
    more » « less
  3. Deep reinforcement learning (DRL) has gained immense success in many applications, including gaming AI, robotics, and system scheduling. Distributed algorithms and architectures have been vastly proposed (e.g., actor-learner architecture) to accelerate DRL training with large-scale server-based clusters. However, training on-policy algorithms with the actor-learner architecture unavoidably induces resource wasting due to synchronization between learners and actors, thus resulting in significantly extra billing. As a promising alternative, serverless computing naturally fits on-policy synchronization and alleviates resource wasting in distributed DRL training with pay-as-you-go pricing. Yet, none has leveraged serverless computing to facilitate DRL training. This paper proposes MinionsRL, the first serverless distributed DRL training framework that aims to accelerate DRL training- and cost-efficiency with dynamic actor scaling. We prototype MinionsRL on top of Microsoft Azure Container Instances and evaluate it with popular DRL tasks from OpenAI Gym. Extensive experiments show that MinionsRL reduces total training time by up to 52% and training cost by 86% compared to latest solutions. 
    more » « less
  4. The edge computing paradigm allows computationally intensive tasks to be offloaded from small devices to nearby (more) powerful servers, via an edge network. The intersection between such edge computing paradigm and Machine Learning (ML), in general, and deep learning in particular, has brought to light several advantages for network operators: from automating management tasks, to gain additional insights on their networks. Most of the existing approaches that use ML to drive routing and traffic control decisions are valuable but rarely focus on challenged networks, that are characterized by continually varying network conditions and the high volume of traffic generated by edge devices. In particular, recently proposed distributed ML-based architectures require either a long synchronization phase or a training phase that is unsustainable for challenged networks. In this paper, we fill this knowledge gap with Blaster, a federated architecture for routing packets within a distributed edge network, to improve the application's performance and allow scalability of data-intensive applications. We also propose a novel path selection model that uses Long Short Term Memory (LSTM) to predict the optimal route. Finally, we present some initial results obtained by testing our approach via simulations and with a prototype deployed over the GENI testbed. By leveraging a Federated Learning (FL) model, our approach shows that we can optimize the communication between SDN controllers, preserving bandwidth for the data traffic. 
    more » « less
  5. Deep reinforcement learning (DRL) has demonstrated significant potential in various applications, including gaming AI, robotics, and system scheduling. DRL algorithms produce, sample, and learn from training data online through a trial-and-error process, demanding considerable time and computational resources. To address this, distributed DRL algorithms and paradigms have been developed to expedite training using extensive resources. Through carefully designed experiments, we are the first to observe that strategically increasing the actor-environment interactions by spawning more concurrent actors at certain training rounds within ephemeral time frames can significantly enhance training efficiency. Yet, current distributed DRL solutions, which are predominantly server-based (or serverful), fail to capitalize on these opportunities due to their long startup times, limited adaptability, and cumbersome scalability. This paper proposesNitro, a generic training engine for distributed DRL algorithms that enforces timely and effective boosting with concurrent actors instantaneously spawned by serverless computing. With serverless functions,Nitroadjusts data sampling strategies dynamically according to the DRL training demands.Nitroseizes the opportunity of real-time boosting by accurately and swiftly detecting an empirical metric. To achieve cost efficiency, we design a heuristic actor scaling algorithm to guideNitrofor cost-aware boosting budget allocation. We integrateNitrowith state-of-the-art DRL algorithms and frameworks and evaluate them on AWS EC2 and Lambda. Experiments with Mujoco and Atari benchmarks show thatNitroimproves the final rewards (i.e., training quality) by up to 6× and reduces training costs by up to 42%. 
    more » « less