skip to main content


This content will become publicly available on September 1, 2024

Title: Accelerated dynamic data reduction using spatial and temporal properties

Due to improvements in high-performance computing (HPC) capabilities, many of today’s applications produce petabytes worth of data, causing bottlenecks within the system. Importance-based sampling methods, including our spatio-temporal hybrid data sampling method, are capable of resolving these bottlenecks. While our hybrid method has been shown to outperform existing methods, its effectiveness relies heavily on user parameters, such as histogram bins, error threshold, or number of regions. Moreover, the throughput it demonstrates must be higher to avoid becoming a bottleneck itself. In this article, we resolve both of these issues. First, we assess the effects of several user input parameters and detail techniques to help determine optimal parameters. Next, we detail and implement accelerated versions of our method using OpenMP and CUDA. Upon analyzing our implementations, we find 9.8× to 31.5× throughput improvements. Next, we demonstrate how our method can accept different base sampling algorithms and the effects these different algorithms have. Finally, we compare our sampling methods to the lossy compressor cuSZ in terms of data preservation and data movement.

 
more » « less
Award ID(s):
2018069
NSF-PAR ID:
10487816
Author(s) / Creator(s):
; ; ; ; ; ;
Publisher / Repository:
SAGE
Date Published:
Journal Name:
The International Journal of High Performance Computing Applications
Volume:
37
Issue:
5
ISSN:
1094-3420
Page Range / eLocation ID:
539 to 559
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract

    Computationally modeling how mutations affect protein–protein binding not only helps uncover the biophysics of protein interfaces, but also enables the redesign and optimization of protein interactions. Traditional high‐throughput methods for estimating binding free energy changes are currently limited to mutations directly at the interface due to difficulties in accurately modeling how long‐distance mutations propagate their effects through the protein structure. However, the modeling and design of such mutations is of substantial interest as it allows for greater control and flexibility in protein design applications. We have developed a method that combines high‐throughput Rosetta‐based side‐chain optimization with conformational sampling using classical molecular dynamics simulations, finding significant improvements in our ability to accurately predict long‐distance mutational perturbations to protein binding. Our approach uses an analytical framework grounded in alchemical free energy calculations while enabling exploration of a vastly larger sequence space. When comparing to experimental data, we find that our method can predict internal long‐distance mutational perturbations with a level of accuracy similar to that of traditional methods in predicting the effects of mutations at the protein–protein interface. This work represents a new and generalizable approach to optimize protein free energy landscapes for desired biological functions.

     
    more » « less
  2. While there are several bottlenecks in hybrid organic–inorganic perovskite (HOIP) solar cell production steps, including composition screening, fabrication, material stability, and device performance, machine learning approaches have begun to tackle each of these issues in recent years. Different algorithms have successfully been adopted to solve the unique problems at each step of HOIP development. Specifically, high-throughput experimentation produces vast amount of training data required to effectively implement machine learning methods. Here, we present an overview of machine learning models, including linear regression, neural networks, deep learning, and statistical forecasting. Experimental examples from the literature, where machine learning is applied to HOIP composition screening, thin film fabrication, thin film characterization, and full device testing, are discussed. These paradigms give insights into the future of HOIP solar cell research. As databases expand and computational power improves, increasingly accurate predictions of the HOIP behavior are becoming possible.

     
    more » « less
  3. Weathering and transport of potentially acid generating material (PAGM) at abandoned mines can degrade downstream environments and contaminate water resources. Monitoring the thousands of abandoned mine lands (AMLs) for exposed PAGM using field surveys is time intensive. Here, we explore the use of Remotely Piloted Aerial Systems (RPASs) as a complementary remote sensing platform to map the spatial and temporal changes of PAGM across a mine waste rock pile on an AML. We focus on testing the ability of established supervised and unsupervised classification algorithms to map PAGM on imagery with very high spatial resolution, but low spectral sampling. At the Perry Canyon, NV, USA AML, we carried out six flights over a 29-month period, using a RPAS equipped with a 5-band multispectral sensor measuring in the visible to near infrared (400–1000 nm). We built six different 3 cm resolution orthorectified reflectance maps, and our tests using supervised and unsupervised classifications revealed benefits to each approach. Supervised classification schemes allowed accurate mapping of classes that lacked published spectral libraries, such as acid mine drainage (AMD) and efflorescent mineral salts (EMS). The unsupervised method produced similar maps of PAGM, as compared to supervised schemes, but with little user input. Our classified multi-temporal maps, validated with multiple field and lab-based methods, revealed persistent and slowly growing ‘hotspots’ of jarosite on the mine waste rock pile, whereas EMS exhibit more rapid fluctuations in extent. The mapping methods we detail for a RPAS carrying a broadband multispectral sensor can be applied extensively to AMLs. Our methods show promise to increase the spatial and temporal coverage of accurate maps critical for environmental monitoring and reclamation efforts over AMLs. 
    more » « less
  4. Massive multiuser (MU) multiple-input multiple-output (MIMO) promises significant improvements in spectral efficiency compared to small-scale MIMO. Typical massive MU-MIMO base-station (BS) designs rely on centralized linear data detectors and precoders which entail excessively high complexity, interconnect data rates, and chip input/output (I/O) bandwidth when executed on a single computing fabric. To resolve these complexity and bandwidth bottlenecks, we propose new decentralized algorithms for data detection and precoding that use coordinate descent. Our methods parallelize computations across multiple computing fabrics, while minimizing interconnect and I/O bandwidth. The proposed decentralized algorithms achieve near-optimal error-rate performance and multi-Gbps throughput at sub-1 ms latency when implemented on a multi-GPU cluster with half-precision floating-point arithmetic. 
    more » « less
  5. Abstract. Smoke from wildfires is a significant source of air pollution, which can adversely impact air quality and ecosystems downwind. With the recently increasing intensity and severity of wildfires, the threat to air quality is expected to increase. Satellite-derived biomass burning emissions can fill in gaps in the absence of aircraft or ground-based measurement campaigns and can help improve the online calculation of biomass burning emissions as well as the biomass burning emissions inventories that feed air quality models. This study focuses on satellite-derived NOx emissions using the high-spatial-resolution TROPOspheric Monitoring Instrument (TROPOMI) NO2 dataset. Advancements and improvements to the satellite-based determination of forest fire NOx emissions are discussed, including information on plume height and effects of aerosol scattering and absorption on the satellite-retrieved vertical column densities. Two common top-down emission estimation methods, (1) an exponentially modified Gaussian (EMG) and (2) a flux method, are applied to synthetic data to determine the accuracy and the sensitivity to different parameters, including wind fields, satellite sampling, noise, lifetime, and plume spread. These tests show that emissions can be accurately estimated from single TROPOMI overpasses.The effect of smoke aerosols on TROPOMI NO2 columns (via air mass factors, AMFs) is estimated, and these satellite columns and emission estimates are compared to aircraft observations from four different aircraft campaigns measuring biomass burning plumes in 2018 and 2019 in North America. Our results indicate that applying an explicit aerosol correction to the TROPOMI NO2 columns improves the agreement with the aircraft observations (by about 10 %–25 %). The aircraft- and satellite-derived emissions are in good agreement within the uncertainties. Both top-down emissions methods work well; however, the EMG method seems to output more consistent results and has better agreement with the aircraft-derived emissions. Assuming a Gaussian plume shape for various biomass burning plumes, we estimate an average NOx e-folding time of 2 ±1 h from TROPOMI observations. Based on chemistry transport model simulations and aircraft observations, the net emissions of NOx are 1.3 to 1.5 times greater than the satellite-derived NO2 emissions. A correction factor of 1.3 to 1.5 should thus be used to infer net NOx emissions from the satellite retrievals of NO2. 
    more » « less