skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Accelerated dynamic data reduction using spatial and temporal properties
Due to improvements in high-performance computing (HPC) capabilities, many of today’s applications produce petabytes worth of data, causing bottlenecks within the system. Importance-based sampling methods, including our spatio-temporal hybrid data sampling method, are capable of resolving these bottlenecks. While our hybrid method has been shown to outperform existing methods, its effectiveness relies heavily on user parameters, such as histogram bins, error threshold, or number of regions. Moreover, the throughput it demonstrates must be higher to avoid becoming a bottleneck itself. In this article, we resolve both of these issues. First, we assess the effects of several user input parameters and detail techniques to help determine optimal parameters. Next, we detail and implement accelerated versions of our method using OpenMP and CUDA. Upon analyzing our implementations, we find 9.8× to 31.5× throughput improvements. Next, we demonstrate how our method can accept different base sampling algorithms and the effects these different algorithms have. Finally, we compare our sampling methods to the lossy compressor cuSZ in terms of data preservation and data movement.  more » « less
Award ID(s):
2018069 1943114 1910197
PAR ID:
10487816
Author(s) / Creator(s):
; ; ; ; ; ;
Publisher / Repository:
SAGE
Date Published:
Journal Name:
The International Journal of High Performance Computing Applications
Volume:
37
Issue:
5
ISSN:
1094-3420
Page Range / eLocation ID:
539 to 559
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Massive multiuser (MU) multiple-input multiple-output (MIMO) promises significant improvements in spectral efficiency compared to small-scale MIMO. Typical massive MU-MIMO base-station (BS) designs rely on centralized linear data detectors and precoders which entail excessively high complexity, interconnect data rates, and chip input/output (I/O) bandwidth when executed on a single computing fabric. To resolve these complexity and bandwidth bottlenecks, we propose new decentralized algorithms for data detection and precoding that use coordinate descent. Our methods parallelize computations across multiple computing fabrics, while minimizing interconnect and I/O bandwidth. The proposed decentralized algorithms achieve near-optimal error-rate performance and multi-Gbps throughput at sub-1 ms latency when implemented on a multi-GPU cluster with half-precision floating-point arithmetic. 
    more » « less
  2. While there are several bottlenecks in hybrid organic–inorganic perovskite (HOIP) solar cell production steps, including composition screening, fabrication, material stability, and device performance, machine learning approaches have begun to tackle each of these issues in recent years. Different algorithms have successfully been adopted to solve the unique problems at each step of HOIP development. Specifically, high-throughput experimentation produces vast amount of training data required to effectively implement machine learning methods. Here, we present an overview of machine learning models, including linear regression, neural networks, deep learning, and statistical forecasting. Experimental examples from the literature, where machine learning is applied to HOIP composition screening, thin film fabrication, thin film characterization, and full device testing, are discussed. These paradigms give insights into the future of HOIP solar cell research. As databases expand and computational power improves, increasingly accurate predictions of the HOIP behavior are becoming possible. 
    more » « less
  3. Abstract Computationally modeling how mutations affect protein–protein binding not only helps uncover the biophysics of protein interfaces, but also enables the redesign and optimization of protein interactions. Traditional high‐throughput methods for estimating binding free energy changes are currently limited to mutations directly at the interface due to difficulties in accurately modeling how long‐distance mutations propagate their effects through the protein structure. However, the modeling and design of such mutations is of substantial interest as it allows for greater control and flexibility in protein design applications. We have developed a method that combines high‐throughput Rosetta‐based side‐chain optimization with conformational sampling using classical molecular dynamics simulations, finding significant improvements in our ability to accurately predict long‐distance mutational perturbations to protein binding. Our approach uses an analytical framework grounded in alchemical free energy calculations while enabling exploration of a vastly larger sequence space. When comparing to experimental data, we find that our method can predict internal long‐distance mutational perturbations with a level of accuracy similar to that of traditional methods in predicting the effects of mutations at the protein–protein interface. This work represents a new and generalizable approach to optimize protein free energy landscapes for desired biological functions. 
    more » « less
  4. null (Ed.)
    Sampling-based methods promise scalability improvements when paired with stochastic gradient descent in training Graph Convolutional Networks (GCNs). While effective in alleviating the neighborhood explosion, due to bandwidth and memory bottlenecks, these methods lead to computational overheads in preprocessing and loading new samples in heterogeneous systems, which significantly deteriorate the sampling performance. By decoupling the frequency of sampling from the sampling strategy, we propose LazyGCN, a general yet effective framework that can be integrated with any sampling strategy to substantially improve the training time. The basic idea behind LazyGCN is to perform sampling periodically and effectively recycle the sampled nodes to mitigate data preparation overhead. We theoretically analyze the proposed algorithm and show that under a mild condition on the recycling size, by reducing the variance of inner layers, we are able to obtain the same convergence rate as the underlying sampling method. We also give corroborating empirical evidence on large real-world graphs, demonstrating that the proposed schema can significantly reduce the number of sampling steps and yield superior speedup without compromising the accuracy. 
    more » « less
  5. null (Ed.)
    TUNERCAR is a toolchain that jointly optimizes racing strategy, planning methods, control algorithms, and vehicle parameters for an autonomous racecar. In this paper, we detail the target hardware, software, simulators, and systems infrastructure for this toolchain. Our methodology employs a parallel implementation of CMA-ES which enables simulations to proceed 6 times faster than real-world rollouts. We show our approach can reduce the lap times in autonomous racing, given a fixed computational budget. For all tested tracks, our method provides the lowest lap time, and relative improvements in lap time between 7-21%. We demonstrate improvements over a naive random search method with equivalent computational budget of over 15 seconds/lap, and improvements over expert solutions of over 2 seconds/lap. We further compare the performance of our method against hand-tuned solutions submitted by over 30 international teams, comprised of graduate students working in the field of autonomous vehicles. Finally, we discuss the effectiveness of utilizing an online planning mechanism to reduce the reality gap between our simulation and actual tests. 
    more » « less