skip to main content

Title: Baldur: A Power-Efficient and Scalable Network Using All-Optical Switches
We present the first all-optical network, Baldur, to enable power-efficient and high-speed communications in future exascale computing systems. The essence of Baldur is its ability to perform packet routing on-the-fly in the optical domain using an emerging technology called the transistor laser (TL), which presents interesting opportunities and challenges at the system level. Optical packet switching readily eliminates many inefficiencies associated with the crossings between optical and electrical domains. However, TL gates consume high power at the current technology node, which makes TL-based buffering and optical clock recovery impractical. Consequently, we must adopt novel (bufferless and clock-less) architecture and design approaches that are substantially different from those used in current networks. At the architecture level, we support a bufferless design by turning to techniques that have fallen out of favor for current networks. Baldur uses a low-radix, multi-stage network with a simple routing algorithm that drops packets to handle congestion, and we further incorporate path multiplicity and randomness to minimize packet drops. This design also minimizes the number of TL gates needed in each switch. At the logic design level, a non-conventional, length-based data encoding scheme is used to eliminate the need for clock recovery. We thoroughly validate and evaluate Baldur using a circuit simulator and a network simulator. Our results show that Baldur achieves up to 3,000X lower average latency while consuming 3.2X-26.4X less power than various state-of-the art networks under a wide variety of traffic patterns and real workloads, for the scale of 1,024 server nodes. Baldur is also highly scalable, since its power per node stays relatively constant as we increase the network size to over 1 million server nodes, which corresponds to 14.6X-31.0X power improvements compared to state-of-the-art networks at this scale.  more » « less
Award ID(s):
1640196 1640192
Author(s) / Creator(s):
; ; ; ; ; ;
Date Published:
Journal Name:
2020 IEEE Symposium on High Performance Computer Architecture (HPCA)
Page Range / eLocation ID:
153 to 166
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Green wireless networks Wake-up radio Energy harvesting Routing Markov decision process Reinforcement learning 1. Introduction With 14.2 billions of connected things in 2019, over 41.6 billions expected by 2025, and a total spending on endpoints and services that will reach well over $1.1 trillion by the end of 2026, the Internet of Things (IoT) is poised to have a transformative impact on the way we live and on the way we work [1–3]. The vision of this ‘‘connected continuum’’ of objects and people, however, comes with a wide variety of challenges, especially for those IoT networks whose devices rely on some forms of depletable energy support. This has prompted research on hardware and software solutions aimed at decreasing the depen- dence of devices from ‘‘pre-packaged’’ energy provision (e.g., batteries), leading to devices capable of harvesting energy from the environment, and to networks – often called green wireless networks – whose lifetime is virtually infinite. Despite the promising advances of energy harvesting technologies, IoT devices are still doomed to run out of energy due to their inherent constraints on resources such as storage, processing and communica- tion, whose energy requirements often exceed what harvesting can provide. The communication circuitry of prevailing radio technology, especially, consumes relevant amount of energy even when in idle state, i.e., even when no transmissions or receptions occur. Even duty cycling, namely, operating with the radio in low energy consumption ∗ Corresponding author. E-mail address: (G. Koutsandria). (sleep) mode for pre-set amounts of time, has been shown to only mildly alleviate the problem of making IoT devices durable [4]. An effective answer to eliminate all possible forms of energy consumption that are not directly related to communication (e.g., idle listening) is provided by ultra low power radio triggering techniques, also known as wake-up radios [5,6]. Wake-up radio-based networks allow devices to remain in sleep mode by turning off their main radio when no communication is taking place. Devices continuously listen for a trigger on their wake-up radio, namely, for a wake-up sequence, to activate their main radio and participate to communication tasks. Therefore, devices wake up and turn their main radio on only when data communication is requested by a neighboring device. Further energy savings can be obtained by restricting the number of neighboring devices that wake up when triggered. This is obtained by allowing devices to wake up only when they receive specific wake-up sequences, which correspond to particular protocol requirements, including distance from the destina- tion, current energy status, residual energy, etc. This form of selective awakenings is called semantic addressing [7]. Use of low-power wake-up radio with semantic addressing has been shown to remarkably reduce the dominating energy costs of communication and idle listening of traditional radio networking [7–12]. This paper contributes to the research on enabling green wireless networks for long lasting IoT applications. Specifically, we introduce a ABSTRACT This paper presents G-WHARP, for Green Wake-up and HARvesting-based energy-Predictive forwarding, a wake-up radio-based forwarding strategy for wireless networks equipped with energy harvesting capabilities (green wireless networks). Following a learning-based approach, G-WHARP blends energy harvesting and wake-up radio technology to maximize energy efficiency and obtain superior network performance. Nodes autonomously decide on their forwarding availability based on a Markov Decision Process (MDP) that takes into account a variety of energy-related aspects, including the currently available energy and that harvestable in the foreseeable future. Solution of the MDP is provided by a computationally light heuristic based on a simple threshold policy, thus obtaining further computational energy savings. The performance of G-WHARP is evaluated via GreenCastalia simulations, where we accurately model wake-up radios, harvestable energy, and the computational power needed to solve the MDP. Key network and system parameters are varied, including the source of harvestable energy, the network density, wake-up radio data rate and data traffic. We also compare the performance of G-WHARP to that of two state-of-the-art data forwarding strategies, namely GreenRoutes and CTP-WUR. Results show that G-WHARP limits energy expenditures while achieving low end-to-end latency and high packet delivery ratio. Particularly, it consumes up to 34% and 59% less energy than CTP-WUR and GreenRoutes, respectively. 
    more » « less
  2. Abstract

    The large scale control over thousands of quantum emitters desired by quantum network technology is limited by the power consumption and cross-talk inherent in current microwave techniques. Here we propose a quantum repeater architecture based on densely-packed diamond color centers (CCs) in a programmable electrode array, with quantum gates driven by electric or strain fields. This ‘field programmable spin array’ (FPSA) enables high-speed spin control of individual CCs with low cross-talk and power dissipation. Integrated in a slow-light waveguide for efficient optical coupling, the FPSA serves as a quantum interface for optically-mediated entanglement. We evaluate the performance of the FPSA architecture in comparison to a routing-tree design and show an increased entanglement generation rate scaling into the thousand-qubit regime. Our results enable high fidelity control of dense quantum emitter arrays for scalable networking.

    more » « less
  3. We aim to preserve a large amount of data generated insidebase station-less sensor networks(BSNs) while considering that sensor nodes are selfish. BSNs refer to emerging sensing applications deployed in challenging and inhospitable environments (e.g., underwater exploration); as such, there do not exist data-collecting base stations in the BSN to collect the data. Consequently, the generated data has to be stored inside the BSN before uploading opportunities become available. Our goal is to preserve the data inside the BSN with minimum energy cost by incentivizing the storage- and energy-constrained sensor nodes to participate in the data preservation process. We refer to the problem as DPP:datapreservationproblem in the BSN. Previous research assumes that all the sensor nodes are cooperative and that sensors have infinite battery power and design a minimum-cost flow-based data preservation solution. However, in a distributed setting and under different control, the resource-constrained sensor nodes could behave selfishly only to conserve their resources and maximize their benefit.

    In this article, we first solve DPP by designing an integer linear programming (ILP)-based optimal solution without considering selfishness. We then establish a game-theoretical framework that achieves provably truthful and optimal data preservation in BSNs. For a special case of DPP wherein nodes are not energy-constrained, referred to as DPP-W, we design a data preservation game DPG-1 that integrates algorithmic mechanism design (AMD) and a more efficient minimum cost flow-based data preservation solution. We show that DPG-1 yields dominant strategies for sensor nodes and delivers truthful and optimal data preservation. For the general case of DPP (wherein nodes are energy-constrained), however, DPG-1 fails to achieve truthful and optimal data preservation. Utilizing packet-level flow observation of sensor node behaviors computed by minimum cost flow and ILP, we uncover the cause of the failure of the DPG-1. It is due to the packet dropping by the selfish nodes that manipulate the AMD technique. We then design a data preservation game DPG-2 for DPP that traces and punishes manipulative nodes in the BSN. We show that DPG-2 delivers dominant strategies for truth-telling nodes and achieves provably optimal data preservation with cheat-proof guarantees. Via extensive simulations under different network parameters and dynamics, we show that our games achieve system-wide data preservation solutions with optimal energy cost while enforcing truth-telling of sensor nodes about their private cost types. One salient feature of our work is its integrated game theory and network flows approach. With the observation of flow level sensor node behaviors provided by the network flows, our proposed games can synthesize “microscopic” (i.e., selfish and local) behaviors of sensor nodes and yield targeted “macroscopic” (i.e., optimal and global) network performance of data preservation in the BSN.

    more » « less
  4. We present FireSim, an open-source simulation platform that enables cycle-exact microarchitectural simulation of large scale-out clusters by combining FPGA-accelerated simulation of silicon-proven RTL designs with a scalable, distributed network simulation. Unlike prior FPGA-accelerated simulation tools, FireSim runs on Amazon EC2 F1, a public cloud FPGA platform, which greatly improves usability, provides elasticity, and lowers the cost of large-scale FPGA-based experiments. We describe the design and implementation of FireSim and show how it can provide sufficient performance to run modern applications at scale, to enable true hardware-software co-design. As an example, we demonstrate automatically generating and deploying a target cluster of 1,024 3.2 GHz quad-core server nodes, each with 16 GB of DRAM, interconnected by a 200 Gbit/s network with 2 microsecond latency, which simulates at a 3.4 MHz processor clock rate (less than 1,000x slowdown over real-time). In aggregate, this FireSim instantiation simulates 4,096 cores and 16 TB of memory, runs ~ 14 billion instructions per second, and harnesses 12.8 million dollars worth of FPGAs-at a total cost of only ~ $100 per simulation hour to the user. We present several examples to show how FireSim can be used to explore various research directions in warehouse-scale machine design, including modeling networks with high-bandwidth and low-latency, integrating arbitrary RTL designs for a variety of commodity and specialized datacenter nodes, and modeling a variety of datacenter organizations, as well as reusing the scale-out FireSim infrastructure to enable fast, massively parallel cycle-exact single-node microarchitectural experimentation. 
    more » « less
  5. The control of cryogenic qubits in today’s super-conducting quantum computer prototypes presents significant scalability challenges due to the massive costs of generating/routing the analog control signals that need to be sent from a classical controller at room temperature to the quantum chip inside the dilution refrigerator. Thus, researchers in industry and academia have focused on designing in-fridge classical controllers in order to mitigate these challenges. Due to the maturity of CMOS logic, many industrial efforts (Microsoft, Intel) have focused on Cryo-CMOS as a near-term solution to design in-fridge classical controllers. Meanwhile, Supercon-ducting Single Flux Quantum (SFQ) is an alternative, less mature classical logic family proposed for large-scale in-fridge controllers. SFQ logic has the potential to maximize scalability thanks to its ultra-high speed and very low power consumption. However, architecture design for SFQ logic poses challenges due to its unconventional pulse-driven nature and lack of dense memory and logic. Thus, research at the architecture level is essential to guide architects to design SFQ-based classical controllers for large-scale quantum machines.In this paper, we present DigiQ, the first system-level design of a Noisy Intermediate Scale Quantum (NISQ)-friendly SFQ-based classical controller. We perform a design space exploration of SFQ-based controllers and co-design the quantum gate decompositions and SFQ-based implementation of those decompositions to find an optimal SFQ-friendly design point that trades area and power for latency and control while ensuring good quantum algorithmic performance. Our co-design results in a single instruction, multiple data (SIMD) controller architecture, which has high scalability, but imposes new challenges on the calibration of control pulses. We present software-level solutions to address these challenges, which if unaddressed would degrade quantum circuit fidelity given the imperfections of qubit hardware.To validate and characterize DigiQ, we first implement it using hardware description languages and synthesize it using state-of-the-art/validated SFQ synthesis tools. Our synthesis results show that DigiQ can operate within the tight power and area budget of dilution refrigerators at >42,000-qubit scales. Second, we confirm the effectiveness of DigiQ in running quantum algorithms by modeling the execution time and fidelity of a variety of NISQ applications. We hope that the promising results of this paper motivate experimentalists to further explore SFQ-based quantum controllers to realize large-scale quantum machines with maximized scalability. 
    more » « less