- Award ID(s):
- Publication Date:
- NSF-PAR ID:
- Journal Name:
- 58th ACM/IEEE Design Automation Conference (DAC)
- Page Range or eLocation-ID:
- 931 to 936
- Sponsoring Org:
- National Science Foundation
More Like this
In pursuit of higher inference accuracy, deep neural network (DNN) models have significantly increased in complexity and size. To overcome the consequent computational challenges, scalable chiplet-based accelerators have been proposed. However, data communication using metallic-based interconnects in these chiplet-based DNN accelerators is becoming a primary obstacle to performance, energy efficiency, and scalability. The photonic interconnects can provide adequate data communication support due to some superior properties like low latency, high bandwidth and energy efficiency, and ease of broadcast communication. In this paper, we propose SPACX: a Silicon Photonics-based Chiplet ACcelerator for DNN inference applications. Specifically, SPACX includes a photonic network design that enables seamless single-chiplet and cross-chiplet broadcast communications, and a tailored dataflow that promotes data broadcast and maximizes parallelism. Furthermore, we explore the broadcast granularities of the photonic network and implications on system performance and energy efficiency. A flexible bandwidth allocation scheme is also proposed to dynamically adjust communication bandwidths for different types of data. Simulation results using several DNN models show that SPACX can achieve 78% and 75% reduction in execution time and energy, respectively, as compared to other state-of-the-art chiplet-based DNN accelerators.
As deep neural network (DNN) models grow ever-larger, they can achieve higher accuracy and solve more complex problems. This trend has been enabled by an increase in available compute power; however, efforts to continue to scale electronic processors are impeded by the costs of communication, thermal management, power delivery and clocking. To improve scalability, we propose a digital optical neural network (DONN) with intralayer optical interconnects and reconfigurable input values. The path-length-independence of optical energy consumption enables information locality between a transmitter and a large number of arbitrarily arranged receivers, which allows greater flexibility in architecture design to circumvent scaling limitations. In a proof-of-concept experiment, we demonstrate optical multicast in the classification of 500 MNIST images with a 3-layer, fully-connected network. We also analyze the energy consumption of the DONN and find that digital optical data transfer is beneficial over electronics when the spacing of computational units is on the order of
m. $$>10\,\upmu $$
Photonic network-on-chip (PNoC) architectures employ photonic links with dense wavelength-division multiplexing (DWDM) to enable high throughput on-chip transfers. Unfortunately, increasing the DWDM degree (i.e., using a larger number of wavelengths) to achieve a higher aggregated data rate in photonic links and, hence, higher throughput in PNoCs, requires sophisticated and costly laser sources along with extra photonic hardware. This extra hardware can introduce undesired noise to the photonic link and increase the bit error rate (BER), power, and area consumption of PNoCs. To mitigate these issues, the use of 4-pulse amplitude modulation (4-PAM) signaling, instead of the conventional on-off keying (OOK) signaling, can halve the wavelength signals utilized in photonic links for achieving the target aggregate data rate while reducing the overhead of crosstalk noise, BER, and photonic hardware. There are various designs of 4-PAM modulators reported in the literature. For example, the signal superposition (SS)–, electrical digital-to-analog converter (EDAC)–, and optical digital-to-analog converter (ODAC)–based designs of 4-PAM modulators have been reported. However, it is yet to be explored how these SS-, EDAC-, and ODAC-based 4-PAM modulators can be utilized to design DWDM-based photonic links and PNoC architectures. In this article, we provide a systematic analysis of the SS, EDAC, andmore »
Shufflecast: An Optical, Data-Rate Agnostic and Low-Power Multicast Architecture for Next-Generation Compute ClustersAn optical circuit-switched network core has the potential to overcome the inherent challenges of a conventional electrical packet-switched core of today's compute clusters. As optical circuit switches (OCS) directly handle the photon beams without any optical-electrical-optical (O/E/O) conversion and packet processing, OCS-based network cores have the following desirable properties: a) agnostic to data-rate, b) negligible/zero power consumption, c) no need of transceivers, d) negligible forwarding latency, and e) no need for frequent upgrade. Unfortunately, OCS can only provide point-to-point (unicast) circuits. They do not have built-in support for one-to-many (multicast) communication, yet multicast is fundamental to a plethora of data-intensive applications running on compute clusters nowadays. In this paper, we propose Shufflecast, a novel optical network architecture for next-generation compute clusters that can support high-performance multicast satisfying all the properties of an OCS-based network core. Shufflecast leverages small fanout, inexpensive, passive optical splitters to connect the Top-of-rack (ToR) switch ports, ensuring data-rate agnostic, low-power, physical-layer multicast. We thoroughly analyze Shufflecast's highly scalable data plane, light-weight control plane, and graceful failure handling. Further, we implement a complete prototype of Shufflecast in our testbed and extensively evaluate the network. Shufflecast is more power-efficient than the state-of-the-art multicast mechanisms. Also, Shufflecast is moremore »
UPTPU: Improving Energy Efficiency of a Tensor Processing Unit through Underutilization Based Power-GatingThe AI boom is bringing a plethora of domain-specific architectures for Neural Network computations. Google’s Tensor Processing Unit (TPU), a Deep Neural Network (DNN) accelerator, has replaced the CPUs/GPUs in its data centers, claiming more than 15X rate of inference. However, the unprecedented growth in DNN workloads with the widespread use of AI services projects an increasing energy consumption of TPU based data centers. In this work, we parametrize the extreme hardware underutilization in TPU systolic array and propose UPTPU: an intelligent, dataflow adaptive power-gating paradigm to provide a staggering 3.5X - 6.5X energy efficiency to TPU for different input batch sizes.