skip to main content


This content will become publicly available on February 22, 2025

Title: Fast and scalable all-optical network architecture for distributed deep learning

With the ever-increasing size of training models and datasets, network communication has emerged as a major bottleneck in distributed deep learning training. To address this challenge, we propose an optical distributed deep learning (ODDL) architecture. ODDL utilizes a fast yet scalable all-optical network architecture to accelerate distributed training. One of the key features of the architecture is its flow-based transmit scheduling with fast reconfiguration. This allows ODDL to allocate dedicated optical paths for each traffic stream dynamically, resulting in low network latency and high network utilization. Additionally, ODDL provides physically isolated and tailored network resources for training tasks by reconfiguring the optical switch using LCoS-WSS technology. The ODDL topology also uses tunable transceivers to adapt to time-varying traffic patterns. To achieve accurate and fine-grained scheduling of optical circuits, we propose an efficient distributed control scheme that incurs minimal delay overhead. Our evaluation on real-world traces showcases ODDL’s remarkable performance. When implemented with 1024 nodes and 100 Gbps bandwidth, ODDL accelerates VGG19 training by 1.6× and 1.7× compared to conventional fat-tree electrical networks and photonic SiP-Ring architectures, respectively. We further build a four-node testbed, and our experiments show that ODDL can achieve comparable training time compared to that of anidealelectrical switching network.

 
more » « less
NSF-PAR ID:
10492071
Author(s) / Creator(s):
; ; ; ; ;
Publisher / Repository:
Optical Society of America
Date Published:
Journal Name:
Journal of Optical Communications and Networking
Volume:
16
Issue:
3
ISSN:
1943-0620; JOCNBB
Format(s):
Medium: X Size: Article No. 342
Size(s):
["Article No. 342"]
Sponsoring Org:
National Science Foundation
More Like this
  1. Optical neural networks (ONNs), implemented on an array of cascaded Mach–Zehnder interferometers (MZIs), have recently been proposed as a possible replacement for conventional deep learning hardware. They potentially offer higher energy efficiency and computational speed when compared to their electronic counterparts. By utilizing tunable phase shifters, one can adjust the output of each of MZI to enable emulation of arbitrary matrix–vector multiplication. These phase shifters are central to the programmability of ONNs, but they require a large footprint and are relatively slow. Here we propose an ONN architecture that utilizes parity–time (PT) symmetric couplers as its building blocks. Instead of modulating phase, gain–loss contrasts across the array are adjusted as a means to train the network. We demonstrate that PT symmetric ONNs (PT-ONNs) are adequately expressive by performing the digit-recognition task on the Modified National Institute of Standards and Technology dataset. Compared to conventional ONNs, the PT-ONN achieves a comparable accuracy (67% versus 71%) while circumventing the problems associated with changing phase. Our approach may lead to new and alternative avenues for fast training in chip-scale ONNs.

     
    more » « less
  2. null (Ed.)
    Bacteria identification can be a time-consuming process. Machine learning algorithms that use deep convolutional neural networks (CNNs) provide a promising alternative. Here, we present a deep learning based approach paired with Raman spectroscopy to rapidly and accurately detect the identity of a bacteria class. We propose a simple 4-layer CNN architecture and use a 30-class bacteria isolate dataset for training and testing. We achieve an identification accuracy of around 86% with identification speeds close to real-time. This optical/biological detection method is promising for applications in the detection of microbes in liquid biopsies and concentrated environmental liquid samples, where fast and accurate detection is crucial. This study uses a recently published dataset of Raman spectra from bacteria samples and an improved CNN model built with TensorFlow. Results show improved identification accuracy and reduced network complexity. 
    more » « less
  3. Many times, training a large scale deep learning neural network on a single machine becomes more and more difficult for a complex network model. Distributed training provides an efficient solution, but Byzantine attacks may occur on participating workers. They may be compromised or suffer from hardware failures. If they upload poisonous gradients, the training will become unstable or even converge to a saddle point. In this paper, we propose FABA, a Fast Aggregation algorithm against Byzantine Attacks, which removes the outliers in the uploaded gradients and obtains gradients that are close to the true gradients. We show the convergence of our algorithm. The experiments demonstrate that our algorithm can achieve similar performance to non-Byzantine case and higher efficiency as compared to previous algorithms.

     
    more » « less
  4. There are increasing requirements for data center interconnection (DCI) services, which use fiber to connect any DC distributed in a metro area and quickly establish high-capacity optical paths between cloud services and mobile edge computing and the users. In such networks, coherent transceivers with various optical frequency ranges, modulators, and modulation formats installed at each connection point must be used to meet service requirements such as fast-varying traffic requests between user computing resources. This requires technology and architectures that enable users and DCI operators to cooperate to achieve fast provisioning of WDM links and flexible route switching in a short time, independent of the transceiver’s implementation and characteristics. We propose an approach to estimate the end-to-end (EtE) generalized signal-to-noise ratio (GSNR) accurately in a short time, not by measuring the GSNR at the operational route and wavelength for the EtE optical path but by simply applying a quality of transmission probe channel link by link, at a wavelength/modulation-format convenient for measurement. Assuming connections between transceivers of various frequency ranges, modulators, and modulation formats, we propose a device software architecture in which the DCI operator optimizes the transmission mode between user transceivers with high accuracy using only common parameters such as the bit error rate. In this paper, we first implement software libraries for fast WDM provisioning and experimentally build different routes to verify the accuracy of this approach. For the operational EtE GSNR measurements, the accuracy estimated from the sum of the measurements for each link was 0.6 dB, and the wavelength-dependent error was about 0.2 dB. Then, using field fibers deployed in the NSF COSMOS testbed, a Linux-based transmission device software architecture, and transceivers with different optical frequency ranges, modulators, and modulation formats, the fast WDM provisioning of an optical path was completed within 6 min.

     
    more » « less
  5. The edge computing paradigm allows computationally intensive tasks to be offloaded from small devices to nearby (more) powerful servers, via an edge network. The intersection between such edge computing paradigm and Machine Learning (ML), in general, and deep learning in particular, has brought to light several advantages for network operators: from automating management tasks, to gain additional insights on their networks. Most of the existing approaches that use ML to drive routing and traffic control decisions are valuable but rarely focus on challenged networks, that are characterized by continually varying network conditions and the high volume of traffic generated by edge devices. In particular, recently proposed distributed ML-based architectures require either a long synchronization phase or a training phase that is unsustainable for challenged networks. In this paper, we fill this knowledge gap with Blaster, a federated architecture for routing packets within a distributed edge network, to improve the application's performance and allow scalability of data-intensive applications. We also propose a novel path selection model that uses Long Short Term Memory (LSTM) to predict the optimal route. Finally, we present some initial results obtained by testing our approach via simulations and with a prototype deployed over the GENI testbed. By leveraging a Federated Learning (FL) model, our approach shows that we can optimize the communication between SDN controllers, preserving bandwidth for the data traffic. 
    more » « less