NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Calo4pQVAE: Quantum-Assisted 4-Partite VAE Surrogate for High Energy Particle-Calorimeter Interactions

https://doi.org/10.1109/QCE60285.2024.10262

Gonzalez, Sebastian; Jia, Hao; Toledo-Marin, J Quetzalcoatl; Hoque, Sehmimul; Abhishek, Abhishek; Lu, Ian; Sogutlu, Deniz; Anderson, Soren; Gay, Colin; Paquet, Eric; et al (September 2024, IEEE)

As we approach the High Luminosity Large Hadron Collider (HL-LHC) set to begin collisions by the end of this decade, it is clear that the computational demands of traditional collision simulations have become untenably high. Current methods, relying heavily on first-principles Monte Carlo simulations for event showers in calorimeters, are estimated to require millions of CPU-years annually, a demand that far exceeds current capabilities. This bottleneck presents a unique opportunity for breakthroughs in computational physics through the integration of generative AI with quantum computing technologies. We propose a Quantum-Assisted deep generative model. In particular, we combine a variational autoencoder (VAE) with a Restricted Boltzmann Machine (RBM) embedded in its latent space as a prior. The RBM in latent space provides further expressiveness compared to legacy VAE where the prior is a fixed Gaussian distribution. By crafting the RBM couplings, we leverage D-Wave’s Quantum Annealer to significantly speed up the shower sampling time. By combining classical and quantum computing, this framework sets a path towards utilizing large-scale quantum simulations as priors in deep generative models and demonstrate their ability to generate high-quality synthetic data for the HL-LHC experiments.
more » « less
Full Text Available
Supercharging distributed computing environments for high-performance data engineering

https://doi.org/10.3389/fhpcp.2024.1384619

Perera, Niranda; Sarker, Arup Kumar; Shan, Kaiying; Fetea, Alex; Kamburugamuve, Supun; Kanewala, Thejaka Amila; Widanage, Chathura; Staylor, Mills; Zhong, Tianle; Abeykoon, Vibhatha; et al (July 2024, Frontiers in High Performance Computing)

The data engineering and data science community has embraced the idea of using Python and R dataframes for regular applications. Driven by the big data revolution and artificial intelligence, these frameworks are now ever more important in order to process terabytes of data. They can easily exceed the capabilities of a single machine but also demand significant developer time and effort due to their convenience and ability to manipulate data with high-level abstractions that can be optimized. Therefore it is essential to design scalable dataframe solutions. There have been multiple efforts to be integrated into the most efficient fashion to tackle this problem, the most notable being the dataframe systems developed using distributed computing environments such as Dask and Ray. Even though Dask and Ray's distributed computing features look very promising, we perceive that the Dask Dataframes and Ray Datasets still have room for optimization In this paper, we present CylonFlow, an alternative distributed dataframe execution methodology that enables state-of-the-art performance and scalability on the same Dask and Ray infrastructure (superchargingthem!). To achieve this, we integrate ahigh-performance dataframesystem Cylon, which was originally based on an entirely different execution paradigm, into Dask and Ray. Our experiments show that on a pipeline of dataframe operators, CylonFlow achieves 30 × more distributed performance than Dask Dataframes. Interestingly, it also enables superior sequential performance due to leveraging the native C++ execution of Cylon. We believe the performance of Cylon in conjunction with CylonFlow extends beyond the data engineering domain and can be used to consolidate high-performance computing and distributed computing ecosystems.
more » « less
Full Text Available
FAIR for AI: An interdisciplinary and international community building perspective

https://doi.org/10.1038/s41597-023-02298-6

Huerta, E. A.; Blaiszik, Ben; Brinson, L. Catherine; Bouchard, Kristofer E.; Diaz, Daniel; Doglioni, Caterina; Duarte, Javier M.; Emani, Murali; Foster, Ian; Fox, Geoffrey; et al (December 2023, Scientific Data)

Full Text Available
In-depth analysis on parallel processing patterns for high-performance Dataframes

https://doi.org/10.1016/j.future.2023.07.007

Perera, Niranda; Sarker, Arup Kumar; Staylor, Mills; von Laszewski, Gregor; Shan, Kaiying; Kamburugamuve, Supun; Widanage, Chathura; Abeykoon, Vibhatha; Kanewela, Thejaka Amila; Fox, Geoffrey (December 2023, Future Generation Computer Systems)

Full Text Available
Opportunities for enhancing MLCommons efforts while leveraging insights from educational MLCommons earthquake benchmarks efforts

https://doi.org/10.3389/fhpcp.2023.1233877

von Laszewski, Gregor; Fleischer, J. P.; Knuuti, Robert; Fox, Geoffrey C.; Kolessar, Jake; Butler, Thomas S.; Fox, Judy (October 2023, Frontiers in High Performance Computing)

MLCommons is an effort to develop and improve the artificial intelligence (AI) ecosystem through benchmarks, public data sets, and research. It consists of members from start-ups, leading companies, academics, and non-profits from around the world. The goal is to make machine learning better for everyone. In order to increase participation by others, educational institutions provide valuable opportunities for engagement. In this article, we identify numerous insights obtained from different viewpoints as part of efforts to utilize high-performance computing (HPC) big data systems in existing education while developing and conducting science benchmarks for earthquake prediction. As this activity was conducted across multiple educational efforts, we project if and how it is possible to make such efforts available on a wider scale. This includes the integration of sophisticated benchmarks into courses and research activities at universities, exposing the students and researchers to topics that are otherwise typically not sufficiently covered in current course curricula as we witnessed from our practical experience across multiple organizations. As such, we have outlined the many lessons we learned throughout these efforts, culminating in the need forbenchmark carpentryfor scientists using advanced computational resources. The article also presents the analysis of an earthquake prediction code benchmark while focusing on the accuracy of the results and not only on the runtime; notedly, this benchmark was created as a result of our lessons learned. Energy traces were produced throughout these benchmarks, which are vital to analyzing the power expenditure within HPC environments. Additionally, one of the insights is that in the short time of the project with limited student availability, the activity was only possible by utilizing a benchmark runtime pipeline while developing and using software to generate jobs from the permutation of hyperparameters automatically. It integrates a templated job management framework for executing tasks and experiments based on hyperparameters while leveraging hybrid compute resources available at different institutions. The software is part of a collection calledcloudmeshwith its newly developed components, cloudmesh-ee (experiment executor) and cloudmesh-cc (compute coordinator).
more » « less
Full Text Available
Deep Learning Patterns Enabling AI for Science

https://doi.org/10.1109/JVA60410.2023.00014

Fox, Geoffrey (July 2023, IEEE)
High Performance Dataframes from Parallel Processing Patterns

Niranda Perera, Supun Kamburugamuve (April 2023, Springer)

Full Text Available
Hybrid Cloud and HPC Approach to High-Performance Dataframes

https://doi.org/10.1109/BigData55660.2022.10020958

Shan, Kaiying; Perera, Niranda; Lenadora, Damitha; Zhong, Tianle; Kumar Sarker, Arup; Kamburugamuve, Supun; Amila Kanewela, Thejaka; Widanage, Chathura; Fox, Geoffrey (December 2022, 2022 IEEE International Conference on Big Data (Big Data))
Optimizing Earthquake Nowcasting With Machine Learning: The Role of Strain Hardening in the Earthquake Cycle

https://doi.org/10.1029/2022EA002343

Rundle, John B.; Yazbeck, Joe; Donnellan, Andrea; Fox, Geoffrey; Ludwig, Lisa Grant; Heflin, Michael; Crutchfield, James (November 2022, Earth and Space Science)

Full Text Available
Does the Catalog of California Earthquakes, with Aftershocks Included, Contain Information about Future Large Earthquakes?

John B. Rundle; Andrea Donnellan; Geoffrey Fox; Lisa Grant Ludwig; James P Crutchfield (August 2022, Earth and Space Science Open Archive)

Yes
more » « less
Full Text Available

« Prev Next »

Search for: All records