In pursuit of higher inference accuracy, deep neural network (DNN) models have significantly increased in complexity and size. To overcome the consequent computational challenges, scalable chiplet-based accelerators have been proposed. However, data communication using metallic-based interconnects in these chiplet-based DNN accelerators is becoming a primary obstacle to performance, energy efficiency, and scalability. The photonic interconnects can provide adequate data communication support due to some superior properties like low latency, high bandwidth and energy efficiency, and ease of broadcast communication. In this paper, we propose SPACX: a Silicon Photonics-based Chiplet ACcelerator for DNN inference applications. Specifically, SPACX includes a photonic network design that enables seamless single-chiplet and cross-chiplet broadcast communications, and a tailored dataflow that promotes data broadcast and maximizes parallelism. Furthermore, we explore the broadcast granularities of the photonic network and implications on system performance and energy efficiency. A flexible bandwidth allocation scheme is also proposed to dynamically adjust communication bandwidths for different types of data. Simulation results using several DNN models show that SPACX can achieve 78% and 75% reduction in execution time and energy, respectively, as compared to other state-of-the-art chiplet-based DNN accelerators.
more »
« less
Opportunities for Cross-Layer Design in High-Performance Computing Systems with Integrated Silicon Photonic Networks
With the ever growing complexity of high performance computing (HPC) systems to satisfy emerging application requirements (e.g., high memory bandwidth requirement for machine learning applications), the performance bottleneck in such systems has moved from being computation-centric to be more communication-centric. Silicon photonic interconnection networks have been proposed to address the aggressive communication requirements in HPC systems, to realize higher bandwidth, lower latency, and better energy efficiency. There have been many successful efforts on developing silicon photonic devices, integrated circuits, and architectures for HPC systems. Moreover, many efforts have been made to address and mitigate the impact of different challenges (e.g., fabrication process and thermal variations) in silicon photonic interconnects. However, most of these efforts have focused only on a single design layer in the system design space (e.g., device, circuit or architecture level). Therefore, there is often a gap between what a design technique can improve in one layer, and what it might impair in another one. In this paper, we discuss the promise of cross-layer design methodologies for HPC systems integrating silicon photonic interconnects. In particular, we discuss how such cross-layer design solutions based on cooperatively designing and exchanging design objectives among different system design layers can help achieve the best possible performance when integrating silicon photonics into HPC systems
more »
« less
- Award ID(s):
- 1813370
- PAR ID:
- 10190938
- Date Published:
- Journal Name:
- IEEE/ACM Design, Automation and Test in Europe (DATE) Conference and Exhibition
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Chiplet-based architectures have been proposed to scale computing systems for deep neural networks (DNNs). Prior work has shown that for the chiplet-based DNN accelerators, the electrical network connecting the chiplets poses a major challenge to system performance, energy consumption, and scalability. Some emerging interconnect technologies such as silicon photonics can potentially overcome the challenges facing electrical interconnects as photonic interconnects provide high bandwidth density, superior energy efficiency, and ease of implementing broadcast and multicast operations that are prevalent in DNN inference. In this paper, we propose a chiplet-based architecture named SPRINT for DNN inference. SPRINT uses a global buffer to simplify the data transmission between storage and computation, and includes two novel designs: (1) a reconfigurable photonic network that can support diverse communications in DNN inference with minimal implementation cost, and (2) a customized dataflow that exploits the ease of broadcast and multicast feature of photonic interconnects to support highly parallel DNN computations. Simulation studies using ResNet50 DNN model show that SPRINT achieves 46% and 61% execution time and energy consumption reduction, respectively, as compared to other state-of-the-art chiplet-based architectures with electrical or photonic interconnects.more » « less
-
Abstract Microwave photonics uses light to carry and process microwave signals over a photonic link. However, light can instead be used as a stimulus to microwave devices that directly control microwave signals. Such optically controlled amplitude and phase-shift switches are investigated for use in reconfigurable microwave systems, but they suffer from large footprint, high optical power level required for switching, lack of scalability and complex integration requirements, restricting their implementation in practical microwave systems. Here, we report Monolithic Optically Reconfigurable Integrated Microwave Switches (MORIMSs) built on a CMOS compatible silicon photonic chip that addresses all of the stringent requirements. Our scalable micrometer-scale switches provide higher switching efficiency and require optical power orders of magnitude lower than the state-of-the-art. Also, it opens a new research direction on silicon photonic platforms integrating microwave circuitry. This work has important implications in reconfigurable microwave and millimeter wave devices for future communication networks.more » « less
-
Programmable photonic integrated circuits are expected to play an increasingly important role in enabling high-bandwidth optical interconnects and large-scale in-memory computing as needed to support the rise of artificial intelligence and machine learning technology. To that end, chalcogenide-based non-volatile phase-change materials (PCMs) present a promising solution due to zero static power. However, high switching voltage and a small number of operating levels present serious roadblocks to the widespread adoption of PCM-programmable units. Here, we demonstrate an electrically programmable wide bandgap Sb2S3-clad silicon ring resonator using a silicon microheater at a complementary-metal–oxide–semiconductor compatible voltage of <3 V. Our device shows a low switching energy of 35.33 nJ (0.48 mJ) for amorphization (crystallization) and reversible phase transitions with high endurance (>2000 switching events) near 1550 nm. Combining a volatile thermo-optic effect with non-volatile PCMs, we demonstrate 7-bit (127 levels) operation with excellent repeatability and reduced power consumption. Our demonstration of low-voltage and low-energy operation, combined with the hybrid volatile–nonvolatile approach, marks a significant step toward integrating PCM-based programmable units in large-scale optical interconnects.more » « less
-
Parallel I/O is an effective method to optimize data movement between memory and storage for many scientific applications. Poor performance of traditional disk-based file systems has led to the design of I/O libraries which take advantage of faster memory layers, such as on-node memory, present in high-performance computing (HPC) systems. By allowing caching and prefetching of data for applications alternating computation and I/O phases, a faster memory layer also provides opportunities for hiding the latency of I/O phases by overlapping them with computation phases, a technique called asynchronous I/O. Since asynchronous parallel I/O in HPC systems is still in the initial stages of development, there hasn't been a systematic study of the factors affecting its performance.In this paper, we perform a systematic study of various factors affecting the performance and efficacy of asynchronous I/O, we develop a performance model to estimate the aggregate I/O bandwidth achievable by iterative applications using synchronous and asynchronous I/O based on past observations, and we evaluate the performance of the recently developed asynchronous I/O feature of a parallel I/O library (HDF5) using benchmarks and real-world science applications. Our study covers parallel file systems on two large-scale HPC systems: Summit and Cori, the former with a GPFS storage and the latter with a Lustre parallel file system.more » « less
An official website of the United States government

