skip to main content


Title: Scaling Deep-Learning Inference with Chiplet-based Architecture and Photonic Interconnects
Chiplet-based architectures have been proposed to scale computing systems for deep neural networks (DNNs). Prior work has shown that for the chiplet-based DNN accelerators, the electrical network connecting the chiplets poses a major challenge to system performance, energy consumption, and scalability. Some emerging interconnect technologies such as silicon photonics can potentially overcome the challenges facing electrical interconnects as photonic interconnects provide high bandwidth density, superior energy efficiency, and ease of implementing broadcast and multicast operations that are prevalent in DNN inference. In this paper, we propose a chiplet-based architecture named SPRINT for DNN inference. SPRINT uses a global buffer to simplify the data transmission between storage and computation, and includes two novel designs: (1) a reconfigurable photonic network that can support diverse communications in DNN inference with minimal implementation cost, and (2) a customized dataflow that exploits the ease of broadcast and multicast feature of photonic interconnects to support highly parallel DNN computations. Simulation studies using ResNet50 DNN model show that SPRINT achieves 46% and 61% execution time and energy consumption reduction, respectively, as compared to other state-of-the-art chiplet-based architectures with electrical or photonic interconnects.  more » « less
Award ID(s):
1901165
NSF-PAR ID:
10330058
Author(s) / Creator(s):
; ;
Date Published:
Journal Name:
58th ACM/IEEE Design Automation Conference (DAC)
Page Range / eLocation ID:
931 to 936
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. In pursuit of higher inference accuracy, deep neural network (DNN) models have significantly increased in complexity and size. To overcome the consequent computational challenges, scalable chiplet-based accelerators have been proposed. However, data communication using metallic-based interconnects in these chiplet-based DNN accelerators is becoming a primary obstacle to performance, energy efficiency, and scalability. The photonic interconnects can provide adequate data communication support due to some superior properties like low latency, high bandwidth and energy efficiency, and ease of broadcast communication. In this paper, we propose SPACX: a Silicon Photonics-based Chiplet ACcelerator for DNN inference applications. Specifically, SPACX includes a photonic network design that enables seamless single-chiplet and cross-chiplet broadcast communications, and a tailored dataflow that promotes data broadcast and maximizes parallelism. Furthermore, we explore the broadcast granularities of the photonic network and implications on system performance and energy efficiency. A flexible bandwidth allocation scheme is also proposed to dynamically adjust communication bandwidths for different types of data. Simulation results using several DNN models show that SPACX can achieve 78% and 75% reduction in execution time and energy, respectively, as compared to other state-of-the-art chiplet-based DNN accelerators. 
    more » « less
  2. Abstract

    As deep neural network (DNN) models grow ever-larger, they can achieve higher accuracy and solve more complex problems. This trend has been enabled by an increase in available compute power; however, efforts to continue to scale electronic processors are impeded by the costs of communication, thermal management, power delivery and clocking. To improve scalability, we propose a digital optical neural network (DONN) with intralayer optical interconnects and reconfigurable input values. The path-length-independence of optical energy consumption enables information locality between a transmitter and a large number of arbitrarily arranged receivers, which allows greater flexibility in architecture design to circumvent scaling limitations. In a proof-of-concept experiment, we demonstrate optical multicast in the classification of 500 MNIST images with a 3-layer, fully-connected network. We also analyze the energy consumption of the DONN and find that digital optical data transfer is beneficial over electronics when the spacing of computational units is on the order of$$>10\,\upmu $$>10μm.

     
    more » « less
  3. Today’s Deep Neural Network (DNN) inference systems contain hundreds of billions of parameters, resulting in significant latency and energy overheads during inference due to frequent data transfers between compute and memory units. Processing-in-Memory (PiM) has emerged as a viable solution to tackle this problem by avoiding the expensive data movement. PiM approaches based on electrical devices suffer from throughput and energy efficiency issues. In contrast, Optically-addressed Phase Change Memory (OPCM) operates with light and achieves much higher throughput and energy efficiency compared to its electrical counterparts. This paper introduces a system-level design that takes the OPCM programming overhead into consideration, and identifies that the programming cost dominates the DNN inference on OPCM-based PiM architectures. We explore the design space of this system and identify the most energy-efficient OPCM array size and batch size. We propose a novel thresholding and reordering technique on the weight blocks to further reduce the programming overhead. Combining these optimizations, our approach achieves up to 65.2x higher throughput than existing photonic accelerators for practical DNN workloads. 
    more » « less
  4. An optical circuit-switched network core has the potential to overcome the inherent challenges of a conventional electrical packet-switched core of today's compute clusters. As optical circuit switches (OCS) directly handle the photon beams without any optical-electrical-optical (O/E/O) conversion and packet processing, OCS-based network cores have the following desirable properties: a) agnostic to data-rate, b) negligible/zero power consumption, c) no need of transceivers, d) negligible forwarding latency, and e) no need for frequent upgrade. Unfortunately, OCS can only provide point-to-point (unicast) circuits. They do not have built-in support for one-to-many (multicast) communication, yet multicast is fundamental to a plethora of data-intensive applications running on compute clusters nowadays. In this paper, we propose Shufflecast, a novel optical network architecture for next-generation compute clusters that can support high-performance multicast satisfying all the properties of an OCS-based network core. Shufflecast leverages small fanout, inexpensive, passive optical splitters to connect the Top-of-rack (ToR) switch ports, ensuring data-rate agnostic, low-power, physical-layer multicast. We thoroughly analyze Shufflecast's highly scalable data plane, light-weight control plane, and graceful failure handling. Further, we implement a complete prototype of Shufflecast in our testbed and extensively evaluate the network. Shufflecast is more power-efficient than the state-of-the-art multicast mechanisms. Also, Shufflecast is more cost-efficient than a conventional packet-switched network. By adding Shufflecast alongside an OCS-based unicast network, an all-optical network core with the aforementioned desirable properties supporting both unicast and multicast can be realized. 
    more » « less
  5. Photonic network-on-chip (PNoC) architectures employ photonic links with dense wavelength-division multiplexing (DWDM) to enable high throughput on-chip transfers. Unfortunately, increasing the DWDM degree (i.e., using a larger number of wavelengths) to achieve a higher aggregated data rate in photonic links and, hence, higher throughput in PNoCs, requires sophisticated and costly laser sources along with extra photonic hardware. This extra hardware can introduce undesired noise to the photonic link and increase the bit error rate (BER), power, and area consumption of PNoCs. To mitigate these issues, the use of 4-pulse amplitude modulation (4-PAM) signaling, instead of the conventional on-off keying (OOK) signaling, can halve the wavelength signals utilized in photonic links for achieving the target aggregate data rate while reducing the overhead of crosstalk noise, BER, and photonic hardware. There are various designs of 4-PAM modulators reported in the literature. For example, the signal superposition (SS)–, electrical digital-to-analog converter (EDAC)–, and optical digital-to-analog converter (ODAC)–based designs of 4-PAM modulators have been reported. However, it is yet to be explored how these SS-, EDAC-, and ODAC-based 4-PAM modulators can be utilized to design DWDM-based photonic links and PNoC architectures. In this article, we provide a systematic analysis of the SS, EDAC, and ODAC types of 4-PAM modulators from prior work with regards to their applicability and utilization overheads. We then present a heuristic-based search method to employ these 4-PAM modulators for designing DWDM-based SS, EDAC, and ODAC types of 4-PAM photonic links with two different design goals: (i) to attain the desired BER of 10 -9 at the expense of higher optical power and lower aggregate data rate and (ii) to attain maximum aggregate data rate with the desired BER of 10 -9 at the expense of longer packet transfer latency. We then employ our designed 4-PAM SS–, 4-PAM EDAC–, 4-PAM ODAC–, and conventional OOK modulator–based photonic links to constitute corresponding variants of the well-known CLOS and SWIFT PNoC architectures. We eventually compare our designed SS-, EDAC-, and ODAC-based variants of 4-PAM links and PNoCs with the conventional OOK links and PNoCs in terms of performance and energy efficiency in the presence of inter-channel crosstalk. From our link-level and PNoC-level evaluation, we have observed that the 4-PAM EDAC–based variants of photonic links and PNoCs exhibit better performance and energy efficiency compared with the OOK-, 4-PAM SS–, and 4-PAM ODAC–based links and PNoCs. 
    more » « less