Abstract Block-Adaptive-Tree Solar-wind Roe-type Upwind Scheme (BATSRUS), our state-of-the-art extended magnetohydrodynamic code, is the most used and one of the most resource-consuming models in the Space Weather Modeling Framework. It has always been our objective to improve its efficiency and speed with emerging techniques, such as GPU acceleration. To utilize the GPU nodes on modern supercomputers, we port BATSRUS to GPUs with the OpenACC API. Porting the code to a single GPU requires rewriting and optimizing the most used functionalities of the original code into a new solver, which accounts for around 1% of the entire program in length. To port it to multiple GPUs, we implement a new message-passing algorithm to support its unique block-adaptive grid feature. We conduct weak scaling tests on as many as 256 GPUs and find good performance. The program has 50%–60% parallel efficiency on up to 256 GPUs and up to 95% efficiency within a single node (four GPUs). Running large problems on more than one node has reduced efficiency due to hardware bottlenecks. We also demonstrate our ability to run representative magnetospheric simulations on GPUs. The performance for a single A100 GPU is about the same as 270 AMD “Rome” CPU cores (2.1 128-core nodes), and it runs 3.6 times faster than real time. The simulation can run 6.9 times faster than real time on four A100 GPUs.
more »
« less
Direct-modulated optical networks for interposer systems
We present a new interposer-level optical network based on direct-modulated lasers such as vertical-cavity surfaceemitting lasers (VCSELs) or transistor lasers (TLs). Our key observation is that, the physics of these lasers is such that they must transmit significantly more power (21×) than is needed by the receiver. We take advantage of this excess optical power to create a new network architecture called Rome, which splits optical signals using passive splitters to allow flexible bandwidth allocation among different transmitter and receiver pairs while imposing minimal power and design costs. Using multi-chip module GPUs (MCM-GPUs) as a case study, we thoroughly evaluate network power and performance, and show that (1) Rome is capable of efficiently scaling up MCM-GPUs with up to 1024 streaming multiprocessors, and (2) Rome outperforms various competing designs in terms of energy efficiency (by up to 4×) and performance (by up to 143%).
more »
« less
- PAR ID:
- 10184049
- Date Published:
- Journal Name:
- NOCS '19
- Page Range / eLocation ID:
- 1 to 8
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
We introduce an ensemble of artificial intelligence models for gravitational wave detection that we trained in the Summit supercomputer using 32 nodes, equivalent to 192 NVIDIA V100 GPUs, within 2 h. Once fully trained, we optimized these models for accelerated inference using NVIDIA TensorRT . We deployed our inference-optimized AI ensemble in the ThetaGPU supercomputer at Argonne Leadership Computer Facility to conduct distributed inference. Using the entire ThetaGPU supercomputer, consisting of 20 nodes each of which has 8 NVIDIA A100 Tensor Core GPUs and 2 AMD Rome CPUs, our NVIDIA TensorRT -optimized AI ensemble processed an entire month of advanced LIGO data (including Hanford and Livingston data streams) within 50 s. Our inference-optimized AI ensemble retains the same sensitivity of traditional AI models, namely, it identifies all known binary black hole mergers previously identified in this advanced LIGO dataset and reports no misclassifications, while also providing a 3 X inference speedup compared to traditional artificial intelligence models. We used time slides to quantify the performance of our AI ensemble to process up to 5 years worth of advanced LIGO data. In this synthetically enhanced dataset, our AI ensemble reports an average of one misclassification for every month of searched advanced LIGO data. We also present the receiver operating characteristic curve of our AI ensemble using this 5 year long advanced LIGO dataset. This approach provides the required tools to conduct accelerated, AI-driven gravitational wave detection at scale.more » « less
-
Abstract Nonlinear microscopy provides excellent depth penetration and axial sectioning for 3D imaging, yet widespread adoption is limited by reliance on expensive ultrafast pulsed lasers. This work circumvents such limitations by employing rare‐earth doped upconverting nanoparticles (UCNPs), specifically Yb3+/Tm3+co‐doped NaYF4nanocrystals, which exhibit strong multimodal nonlinear optical responses under continuous‐wave (CW) excitation. These UCNPs emit multiple wavelengths at UV (λ ≈ 450 nm), blue (λ ≈ 450 nm), and NIR (λ ≈ 800 nm), whose intensities are nonlinearly governed by excitation power. Exploiting these properties, multi‐colored nonlinear emissions enable functional imaging of cerebral blood vessels in deep brain. Using a simple optical setup, high resolution in vivo 3D imaging of mouse cerebrovascular networks at depths up to 800 µmm is achieved, surpassing performance of conventional imaging methods using CW lasers. In vivo cerebrovascular flow dynamics is also visualized with wide‐field video‐rate imaging under low‐powered CW excitation. Furthermore, UCNPs enable depth‐selective, 3D‐localized photo‐modulation through turbid media, presenting spatiotemporally targeted light beacons. This innovative approach, leveraging UCNPs' intrinsic nonlinear optical characteristics, significantly advances multimodal nonlinear microscopy with CW lasers, opening new opportunities in bio‐imaging, remote optogenetics, and photodynamic therapy.more » « less
-
High-peak-power lasers are fundamental to high-field science: increased laser intensity has enabled laboratory astrophysics, relativistic plasma physics, and compact laser-based particle accelerators. However, the meter-scale optics required for multi-petawatt lasers to avoid light-induced damage make further increases in power challenging. Plasma tolerates orders-of-magnitude higher light flux than glass, but previous efforts to miniaturize lasers by constructing plasma analogs for conventional optics were limited by low efficiency and poor optical quality. We describe a new approach to plasma optics based on avalanche ionization of atomic clusters that produces plasma volume transmission gratings with dramatically increased diffraction efficiency. We measure an average efficiency of up to 36% and a single-shot efficiency of up to 60%, which is comparable to key components of high-power laser beamlines, while maintaining high spatial quality and focusability. These results suggest that plasma diffraction gratings may be a viable component of future lasers with peak power beyond 10 PW.more » « less
-
Attacks based on power analysis have been long existing and studied, with some recent works focused on data exfiltration from victim systems without using conventional communications (e.g., WiFi). Nonetheless, prior works typically rely on intrusive direct power measurement, either by implanting meters in the power outlet or tapping into the power cable, thus jeopardizing the stealthiness of attacks. In this paper, we propose NoDE (Noise for Data Exfiltration), a new system for stealthy data exfiltration from enterprise desktop computers. Specifically, NoDE achieves data exfiltration over a building's power network by exploiting high-frequency voltage ripples (i.e., switching noises) generated by power factor correction circuits built into today's computers. Located at a distance and even from a different room, the receiver can non-intrusively measure the voltage of a power outlet to capture the high-frequency switching noises for online information decoding without supervised training/learning. To evaluate NoDE, we run experiments on seven different computers from top vendors and using top-brand power supply units. Our results show that for a single transmitter, NoDE achieves a rate of up to 28.48 bits/second with a distance of 90 feet (27.4 meters) without the line of sight, demonstrating a practically stealthy threat. Based on the orthogonality of switching noise frequencies of different computers, we also demonstrate simultaneous data exfiltration from four computers using only one receiver. Finally, we present a few possible defenses, such as installing noise filters, and discuss their limitations.more » « less
An official website of the United States government

