skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Search for: All records

Creators/Authors contains: "Zhang, T."

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Accurate simulation of earthquake scenarios is essential for advancing seismic hazard analysis and risk mitigation strategies. At the San Diego Supercomputer Center (SDSC), our research focuses on optimizing the performance and reliability of large-scale earthquake simulations using the AWP-ODC software. By implementing GPU-aware MPI calls, we enable direct data processing within GPU memory, eliminating the need for explicit data transfers between CPU and GPU. This GPU-aware MPI achieves nearly ideal parallel efficiency at full scale across both Nvidia and AMD GPUs, leveraging the MVAPICH-PLUS support on Frontier at Oak Ridge National Laboratory and Vista at the Texas Advanced Computing Center. We utilized the MVAPICH-Plus 4.0 compiler to enable ZFP compression, which significantly enhances inter-node communication efficiency – a critical improvement given the communication bottleneck inherent in large-scale simulations. Our GPU-aware AWP-ODC versions include linear forward, topography and nonlinear Iwan-type solvers with discontinuous mesh support. On the Frontier system with MVAPICH 4.0, Hip-aware MPI calls on MI250X GPUs deliver nearly ideal weak-scaling speedup up to 8,192 nodes for both linear and topography versions. On TACC’s Vista system, CUDA-aware MPI calls on GH200 GPUs substantially outperform their non-GPU-aware counterparts across all three solver versions. This poster will present a detailed evaluation of GPU-aware AWP-ODC using MVAPICH, including the impact of ZFP message compression compared to the native versions. Our results highlight the importance of Mvapich support for GPU-ware MPI and on-the-fly compression techniques for accelerating and scaling earthquake simulations. 
    more » « less
    Free, publicly-accessible full text available August 20, 2026
  2. We have implemented GPU-aware support across all AWP-ODC versions and enhanced message-passing collective communications for this memory-bound finite-difference solver. This provides cutting-edge communication support for production simulations on leadership-class computing facilities, including OLCF Frontier and TACC Vista. We achieved significant performance gains, reaching 37 sustained Petaflop/s and reducing time-to-solution by 17.2% using the GPU-aware feature on 8,192 Frontier nodes, or 65,336 MI250X GCDs. The AWP-ODC code has also been optimized for TACC Vista, an Arm-based NVIDIA GH200 Grace Hopper Superchip, demonstrating excellent application performance. This poster will showcase studies and GPU performance characteristics. We will discuss our verification of GPU-aware development and the use of high-performance MVAPICH libraries, including on-the-fly compression, on modern GPU clusters. 
    more » « less
    Free, publicly-accessible full text available September 10, 2026
  3. Free, publicly-accessible full text available April 30, 2026
  4. We introduce RandAR, a decoder-only visual autoregressive (AR) model capable of generatng images in arbitrary token orders. Unlike previous decoder-only AR models that rely on a predefined generation order, RandAR removes this inductive bias, unlocking new capabilities in decoder-only generation. Our essential design enabling random order is to insert a "position instruction token" before each image token to be predicted, representing the spatial location of the next image token. Trained on randomly permuted token sequences -- a more challenging task than fixed-order generation, RandAR achieves comparable performance to conventional raster-order counterpart. More importantly, decoder-only transformers trained from random orders acquire new capabilities. For the efficiency bottleneck of AR models, RandAR adopts parallel decoding with KV-Cache at inference time, enjoying 2.5x acceleration without sacrificing generation quality. Additionally, RandAR supports in-painting, outpainting and resolution extrapolation in a zero-shot manner.We hope RandAR inspires new directions for decoder-only visual generation models and broadens their applications across diverse scenarios. Our project page is at https://rand-ar.github.io/. 
    more » « less
    Free, publicly-accessible full text available June 11, 2026
  5. Free, publicly-accessible full text available April 30, 2026
  6. Free, publicly-accessible full text available April 11, 2026
  7. Free, publicly-accessible full text available April 6, 2026
  8. Free, publicly-accessible full text available April 30, 2026
  9. Free, publicly-accessible full text available December 10, 2025
  10. Free, publicly-accessible full text available December 1, 2025