NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

A Deep Probabilistic Framework for Continuous Time Dynamic Graph Generation

https://doi.org/10.1609/aaai.v39i16.33896

Hosseini, Ryien; Simini, Filippo; Vishwanath, Venkatram; Hoffmann, Henry (April 2025, Proceedings of the AAAI Conference on Artificial Intelligence)

Recent advancements in graph representation learning have shifted attention towards dynamic graphs, which exhibit evolving topologies and features over time. The increased use of such graphs creates a paramount need for generative models suitable for applications such as data augmentation, obfuscation, and anomaly detection. However, there are few generative techniques that handle continuously changing temporal graph data; existing work largely relies on augmenting static graphs with additional temporal information to model dynamic interactions between nodes. In this work, we propose a fundamentally different approach: We instead directly model interactions as a joint probability of an edge forming between two nodes at a given time. This allows us to autoregressively generate new synthetic dynamic graphs in a largely assumption free, scalable, and inductive manner. We formalize this approach as DG-Gen, a generative framework for continuous time dynamic graphs, and demonstrate its effectiveness over five datasets. Our experiments demonstrate that DG-Gen not only generates higher fidelity graphs compared to traditional methods but also significantly advances link prediction tasks.
more » « less
Free, publicly-accessible full text available April 11, 2026
Evaluating Energy Efficiency of Ai Accelerators Using Two Mlperf Benchmarks

https://doi.org/10.1109/CCGRID64434.2025.00035

Ferdaus, Farah; Wu, Xingfu; Taylor, Valerie; Lan, Zhiling; Shanmugavelu, Sanjif; Vishwanath, Venkatram; Papka, Michael E (May 2025, IEEE)

Free, publicly-accessible full text available May 19, 2026
Centimani: Enabling Fast AI Accelerator Selection for DNN Training with a Novel Performance Predictor

Xie, Zhen; Emani, Murali; Yu, Xiaodong; Tao, Dingwen; He, Xin; Su, Pengfei; Zhou, Keren; Vishwanath, Venkatram (July 2024, 2024 USENIX Annual Technical Conference (USENIX ATC 24))

Full Text Available
Stimulus: Accelerate Data Management for Scientific AI applications in HPC

https://doi.org/10.1109/CCGrid54584.2022.00020

Devarajan, Hariharan; Kougkas, Anthony; Zheng, Huihuo; Vishwanath, Venkatram; Sun, Xian-He (May 2022, 2022 22nd IEEE International Symposium on Cluster, Cloud and Internet Computing (CCGrid))

Modern scientific workflows couple simulations with AI-powered analytics by frequently exchanging data to accelerate time-to-science to reduce the complexity of the simulation planes. However, this data exchange is limited in performance and portability due to a lack of support for scientific data formats in AI frameworks. We need a cohesive mechanism to effectively integrate at scale complex scientific data formats such as HDF5, PnetCDF, ADIOS2, GNCF, and Silo into popular AI frameworks such as TensorFlow, PyTorch, and Caffe. To this end, we designed Stimulus, a data management library for ingesting scientific data effectively into the popular AI frameworks. We utilize the StimOps functions along with StimPack abstraction to enable the integration of scientific data formats with any AI framework. The evaluations show that Stimulus outperforms several large-scale applications with different use-cases such as Cosmic Tagger (consuming HDF5 dataset in PyTorch), Distributed FFN (consuming HDF5 dataset in TensorFlow), and CosmoFlow (converting HDF5 into TFRecord and then consuming that in TensorFlow) by 5.3 x, 2.9 x, and 1.9 x respectively with ideal I/O scalability up to 768 GPUs on the Summit supercomputer. Through Stimulus, we can portably extend existing popular AI frameworks to cohesively support any complex scientific data format and efficiently scale the applications on large-scale supercomputers.
more » « less
Full Text Available
DLIO: A Data-Centric Benchmark for Scientific Deep Learning Applications

https://doi.org/10.1109/CCGrid51090.2021.00018

Devarajan, Hariharan; Zheng, Huihuo; Kougkas, Anthony; Sun, Xian-He; Vishwanath, Venkatram (May 2021, IEEE/ACM 21st International Symposium on Cluster, Cloud and Internet Computing (CCGrid))
null (Ed.)
Deep learning has been shown as a successful method for various tasks, and its popularity results in numerous open-source deep learning software tools. Deep learning has been applied to a broad spectrum of scientific domains such as cosmology, particle physics, computer vision, fusion, and astrophysics. Scientists have performed a great deal of work to optimize the computational performance of deep learning frameworks. However, the same cannot be said for I/O performance. As deep learning algorithms rely on big-data volume and variety to effectively train neural networks accurately, I/O is a significant bottleneck on large-scale distributed deep learning training. This study aims to provide a detailed investigation of the I/O behavior of various scientific deep learning workloads running on the Theta supercomputer at Argonne Leadership Computing Facility. In this paper, we present DLIO, a novel representative benchmark suite built based on the I/O profiling of the selected workloads. DLIO can be utilized to accurately emulate the I/O behavior of modern scientific deep learning applications. Using DLIO, application developers and system software solution architects can identify potential I/O bottlenecks in their applications and guide optimizations to boost the I/O performance leading to lower training times by up to 6.7x.
more » « less
Full Text Available
SeeSAw: Optimizing Performance of In-Situ Analytics Applications under Power Constraints

https://doi.org/10.1109/IPDPS47924.2020.00086

Marincic, Ivana; Vishwanath, Venkatram; Hoffmann, Henry (May 2020, 2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS))

Full Text Available
MLPerf™ HPC: A Holistic Benchmark Suite for Scientific Machine Learning on HPC Systems

https://doi.org/10.1109/MLHPC54614.2021.00009

Farrell, Steven; Emani, Murali; Balma, Jacob; Drescher, Lukas; Drozd, Aleksandr; Fink, Andreas; Fox, Geoffrey; Kanter, David; Kurth, Thorsten; Mattson, Peter; et al (November 2021, 2021 IEEE/ACM Workshop on Machine Learning in High Performance Computing Environments (MLHPC))

Scientific communities are increasingly adopting machine learning and deep learning models in their applications to accelerate scientific insights. High performance computing systems are pushing the frontiers of performance with a rich diversity of hardware resources and massive scale-out capabilities. There is a critical need to understand fair and effective benchmarking of machine learning applications that are representative of real-world scientific use cases. MLPerf ™ is a community-driven standard to benchmark machine learning workloads, focusing on end-to-end performance metrics. In this paper, we introduce MLPerf HPC, a benchmark suite of large-scale scientific machine learning training applications, driven by the MLCommons ™ Association. We present the results from the first submission round including a diverse set of some of the world’s largest HPC systems. We develop a systematic framework for their joint analysis and compare them in terms of data staging, algorithmic convergence and compute performance. As a result, we gain a quantitative understanding of optimizations on different subsystems such as staging and on-node loading of data, compute-unit utilization and communication scheduling enabling overall >10× (end-to-end) performance improvements through system scaling. Notably, our analysis shows a scale-dependent interplay between the dataset size, a system’s memory hierarchy and training convergence that underlines the importance of near-compute storage. To overcome the data-parallel scalability challenge at large batch-sizes, we discuss specific learning techniques and hybrid data-and-model parallelism that are effective on large systems. We conclude by characterizing each benchmark with respect to low-level memory, I/O and network behaviour to parameterize extended roofline performance models in future rounds.
more » « less
Full Text Available
A terminology for in situ visualization and analysis systems

https://doi.org/10.1177/1094342020935991

Childs, Hank; Ahern, Sean D.; Ahrens, James; Bauer, Andrew C.; Bennett, Janine; Bethel, E. Wes; Bremer, Peer-Timo; Brugger, Eric; Cottam, Joseph; Dorier, Matthieu; et al (November 2020, The International Journal of High Performance Computing Applications)
null (Ed.)
The term “in situ processing” has evolved over the last decade to mean both a specific strategy for visualizing and analyzing data and an umbrella term for a processing paradigm. The resulting confusion makes it difficult for visualization and analysis scientists to communicate with each other and with their stakeholders. To address this problem, a group of over 50 experts convened with the goal of standardizing terminology. This paper summarizes their findings and proposes a new terminology for describing in situ systems. An important finding from this group was that in situ systems are best described via multiple, distinct axes: integration type, proximity, access, division of execution, operation controls, and output type. This paper discusses these axes, evaluates existing systems within the axes, and explores how currently used terms relate to the axes.
more » « less
Full Text Available

Search for: All records