skip to main content

Title: Shufflecast: An Optical, Data-Rate Agnostic and Low-Power Multicast Architecture for Next-Generation Compute Clusters
An optical circuit-switched network core has the potential to overcome the inherent challenges of a conventional electrical packet-switched core of today's compute clusters. As optical circuit switches (OCS) directly handle the photon beams without any optical-electrical-optical (O/E/O) conversion and packet processing, OCS-based network cores have the following desirable properties: a) agnostic to data-rate, b) negligible/zero power consumption, c) no need of transceivers, d) negligible forwarding latency, and e) no need for frequent upgrade. Unfortunately, OCS can only provide point-to-point (unicast) circuits. They do not have built-in support for one-to-many (multicast) communication, yet multicast is fundamental to a plethora of data-intensive applications running on compute clusters nowadays. In this paper, we propose Shufflecast, a novel optical network architecture for next-generation compute clusters that can support high-performance multicast satisfying all the properties of an OCS-based network core. Shufflecast leverages small fanout, inexpensive, passive optical splitters to connect the Top-of-rack (ToR) switch ports, ensuring data-rate agnostic, low-power, physical-layer multicast. We thoroughly analyze Shufflecast's highly scalable data plane, light-weight control plane, and graceful failure handling. Further, we implement a complete prototype of Shufflecast in our testbed and extensively evaluate the network. Shufflecast is more power-efficient than the state-of-the-art multicast mechanisms. Also, Shufflecast is more more » cost-efficient than a conventional packet-switched network. By adding Shufflecast alongside an OCS-based unicast network, an all-optical network core with the aforementioned desirable properties supporting both unicast and multicast can be realized. « less
; ; ; ; ; ;
Award ID(s):
1718980 1815525
Publication Date:
Journal Name:
IEEE/ACM Transactions on Networking
Page Range or eLocation-ID:
1 to 16
Sponsoring Org:
National Science Foundation
More Like this
  1. We present the first all-optical network, Baldur, to enable power-efficient and high-speed communications in future exascale computing systems. The essence of Baldur is its ability to perform packet routing on-the-fly in the optical domain using an emerging technology called the transistor laser (TL), which presents interesting opportunities and challenges at the system level. Optical packet switching readily eliminates many inefficiencies associated with the crossings between optical and electrical domains. However, TL gates consume high power at the current technology node, which makes TL-based buffering and optical clock recovery impractical. Consequently, we must adopt novel (bufferless and clock-less) architecture and design approaches that are substantially different from those used in current networks. At the architecture level, we support a bufferless design by turning to techniques that have fallen out of favor for current networks. Baldur uses a low-radix, multi-stage network with a simple routing algorithm that drops packets to handle congestion, and we further incorporate path multiplicity and randomness to minimize packet drops. This design also minimizes the number of TL gates needed in each switch. At the logic design level, a non-conventional, length-based data encoding scheme is used to eliminate the need for clock recovery. We thoroughly validate and evaluatemore »Baldur using a circuit simulator and a network simulator. Our results show that Baldur achieves up to 3,000X lower average latency while consuming 3.2X-26.4X less power than various state-of-the art networks under a wide variety of traffic patterns and real workloads, for the scale of 1,024 server nodes. Baldur is also highly scalable, since its power per node stays relatively constant as we increase the network size to over 1 million server nodes, which corresponds to 14.6X-31.0X power improvements compared to state-of-the-art networks at this scale.« less
  2. Emerging distributed cloud architectures, e.g., fog and mobile edge computing, are playing an increasingly impor-tant role in the efficient delivery of real-time stream-processing applications (also referred to as augmented information services), such as industrial automation and metaverse experiences (e.g., extended reality, immersive gaming). While such applications require processed streams to be shared and simultaneously consumed by multiple users/devices, existing technologies lack efficient mechanisms to deal with their inherent multicast na-ture, leading to unnecessary traffic redundancy and network congestion. In this paper, we establish a unified framework for distributed cloud network control with generalized (mixed-cast) traffic flows that allows optimizing the distributed execution of the required packet processing, forwarding, and replication operations. We first characterize the enlarged multicast network stability region under the new control framework (with respect to its unicast counterpart). We then design a novel queuing system that allows scheduling data packets according to their current destination sets, and leverage Lyapunov drift-plus-penalty con-trol theory to develop the first fully decentralized, throughput-and cost-optimal algorithm for multicast flow control. Numerical experiments validate analytical results and demonstrate the performance gain of the proposed design over existing network control policies.
  3. Photonic network-on-chip (PNoC) architectures employ photonic links with dense wavelength-division multiplexing (DWDM) to enable high throughput on-chip transfers. Unfortunately, increasing the DWDM degree (i.e., using a larger number of wavelengths) to achieve a higher aggregated data rate in photonic links and, hence, higher throughput in PNoCs, requires sophisticated and costly laser sources along with extra photonic hardware. This extra hardware can introduce undesired noise to the photonic link and increase the bit error rate (BER), power, and area consumption of PNoCs. To mitigate these issues, the use of 4-pulse amplitude modulation (4-PAM) signaling, instead of the conventional on-off keying (OOK) signaling, can halve the wavelength signals utilized in photonic links for achieving the target aggregate data rate while reducing the overhead of crosstalk noise, BER, and photonic hardware. There are various designs of 4-PAM modulators reported in the literature. For example, the signal superposition (SS)–, electrical digital-to-analog converter (EDAC)–, and optical digital-to-analog converter (ODAC)–based designs of 4-PAM modulators have been reported. However, it is yet to be explored how these SS-, EDAC-, and ODAC-based 4-PAM modulators can be utilized to design DWDM-based photonic links and PNoC architectures. In this article, we provide a systematic analysis of the SS, EDAC, andmore »ODAC types of 4-PAM modulators from prior work with regards to their applicability and utilization overheads. We then present a heuristic-based search method to employ these 4-PAM modulators for designing DWDM-based SS, EDAC, and ODAC types of 4-PAM photonic links with two different design goals: (i) to attain the desired BER of 10 -9 at the expense of higher optical power and lower aggregate data rate and (ii) to attain maximum aggregate data rate with the desired BER of 10 -9 at the expense of longer packet transfer latency. We then employ our designed 4-PAM SS–, 4-PAM EDAC–, 4-PAM ODAC–, and conventional OOK modulator–based photonic links to constitute corresponding variants of the well-known CLOS and SWIFT PNoC architectures. We eventually compare our designed SS-, EDAC-, and ODAC-based variants of 4-PAM links and PNoCs with the conventional OOK links and PNoCs in terms of performance and energy efficiency in the presence of inter-channel crosstalk. From our link-level and PNoC-level evaluation, we have observed that the 4-PAM EDAC–based variants of photonic links and PNoCs exhibit better performance and energy efficiency compared with the OOK-, 4-PAM SS–, and 4-PAM ODAC–based links and PNoCs.« less
  4. Electro-optic (EO) modulators rely on the interaction of optical and electrical signals with second-order nonlinear media. For the optical signal, this interaction can be strongly enhanced using dielectric slot–waveguide structures that exploit a field discontinuity at the interface between a high-index waveguide core and the low-index EO cladding. In contrast to this, the electrical signal is usually applied through conductive regions in the direct vicinity of the optical waveguide. To avoid excessive optical loss, the conductivity of these regions is maintained at a moderate level, thus leading to inherentRClimitations of the modulation bandwidth. In this paper, we show that these limitations can be overcome by extending the slot–waveguide concept to the modulating radio-frequency (RF) signal. Our device combines an RF slotline that relies onBaTiO3as a high-k dielectric material with a conventional silicon photonic slot waveguide and a highly efficient organic EO cladding material. In a proof-of-concept experiment, we demonstrate a 1 mm long Mach–Zehnder modulator that offers a 3 dB bandwidth of 76 GHz and a 6 dB bandwidth of 110 GHz along with a smallπ<#comment/>voltage of 1.3 V (Uπ<#comment/>L=1.3Vmm). Wemore »further demonstrate the viability of the device in a data-transmission experiment using four-state pulse-amplitude modulation (PAM4) at line rates up to 200 Gbit/s. Our first-generation devices leave vast room for further improvement and may open an attractive route towards highly efficient silicon photonic modulators that combine sub-1 mm device lengths with sub-1 V drive voltages and modulation bandwidths of more than 100 GHz.

    « less
  5. The property of (quasi-)reversibility of Markov chains have led to elegant characterization of steady-state distribution for complex queueing networks, e.g. celebrated Jackson networks, BCMP (Baskett, Chandi, Muntz, Palacois) and Kelly theorem. In a nutshell, despite the complicated interaction, in the steady-state, the queues in such networks exhibit independence and subsequently lead to explicit calculations of distributional properties of the queuing network that may seem impossible at the outset. The model of stochastic processing network (cf. Harrison 2000) captures variety of dynamic resource allocation problems including the flow-level networks used for modeling bandwidth sharing in the Internet, switched networks (cf. Shah, Wischik 2006) for modeling packet scheduling in the Internet router and wireless medium access, and hybrid flow-packet networks for modeling job-and-packet level scheduling in data centers. Unlike before, an appropriate resource allocation or scheduling policy is required in such networks to achieve good performance. Given the complexity, asymptotic analytic approaches such as fluid model or Lyapunov-Foster criteria to establish positive-recurrence and heavy traffic or diffusion approximation to characterize the scaled steady-state distribution became method of choice. A remarkable progress has been made along these lines over the past few decades, but there is a need for much more to matchmore »the explicit calculations in the context of reversible networks. In this work, we will present an alternative to this approach that leads to non-asymptotic, explicit characterization of steady-state distribution akin BCMP / Kelly theorems. This involves (a) identifying a "relaxation" of the given stochastic processing network in terms of an appropriate (quasi-)reversible queueing network, and (b) finding a resource allocation or scheduling policy of interest that "emulates" the "relaxed" networks within "small error". The proof is in the puddling -- we will present three examples of this program: (i) distributed scheduling in wireless network, (ii) scheduling in switched networks, and (iii) flow-packet scheduling in a data center. The notion of "baseline performance" (cf. Harrison, Mandayam, Shah, Yang 2014) will naturally emerges as a consequence of this program. We will discuss open questions pertaining multi-hop networks and computation complexity.« less