skip to main content

Search for: All records

Creators/Authors contains: "Nguyen, Tri"

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Free, publicly-accessible full text available May 1, 2023
  2. Concurrent kernel execution on GPU has proven an effective technique to improve system throughput by maximizing the resource utilization. In order to increase programmability and meet the increasing memory requirements of data-intensive applications, current GPUs support Unified Virtual Memory (UVM), which provides a virtual memory abstraction with demand paging. By allowing applications to oversubscribe GPU memory, UVM provides increased opportunities to share GPU resources across applications. However, in the presence of applications with competing memory requirements, GPU sharing can lead to performance degradation due to thrashing. NVIDIA's Multiple Process Service (MPS) offers the capability to space share bare metal GPUs,more »thereby enabling cluster workload managers, such as Slurm, to share a single GPU across MPI ranks with limited control over resource partitioning. However, it is not possible to preempt, schedule, or throttle a running GPU process through MPS. These features would enable new OS-managed scheduling policies to be implemented for GPU kernels to dynamically handle resource contention and offer consistent performance. The contribution of this paper is two-fold. We first show how memory oversubscription can impact the performance of concurrent GPU applications. Then, we propose three methods to transparently mitigate memory interference through kernel preemption and scheduling policies. To implement our policies, we develop our own runtime system (PILOT) to serve as an alternative to NVIDIA's MPS. In the presence of memory over-subscription, we noticed a dramatic improvement in the overall throughput when using our scheduling policies and runtime hints.« less
    Free, publicly-accessible full text available January 1, 2023
  3. Free, publicly-accessible full text available December 1, 2022
  4. The field of transient astronomy has seen a revolution with the first gravitational-wave detections and the arrival of multi-messenger observations they enabled. Transformed by the first detection of binary black hole and binary neutron star mergers, computational demands in gravitational-wave astronomy are expected to grow by at least a factor of two over the next five years as the global network of kilometer-scale interferometers are brought to design sensitivity. With the increase in detector sensitivity, real-time delivery of gravitational-wave alerts will become increasingly important as an enabler of multi-messenger followup. In this work, we report a novel implementation and deploymentmore »of deep learning inference for real-time gravitational-wave data denoising and astrophysical source identification. This is accomplished using a generic Inference-as-a-Service model that is capable of adapting to the future needs of gravitational-wave data analysis. Our implementation allows seamless incorporation of hardware accelerators and also enables the use of commercial or private (dedicated) as-a-service computing. Based on our results, we propose a paradigm shift in low-latency and offline computing in gravitational-wave astronomy. Such a shift can address key challenges in peak-usage, scalability and reliability, and provide a data analysis platform particularly optimized for deep learning applications. The achieved sub-millisecond scale latency will also be relevant for any machine learning-based real-time control systems that may be invoked in the operation of near-future and next generation ground-based laser interferometers, as well as the front-end collection, distribution and processing of data from such instruments.« less
  5. Abstract

    In pursuit of scientific discovery, vast collections of unstructured structural and functional images are acquired; however, only an infinitesimally small fraction of this data is rigorously analyzed, with an even smaller fraction ever being published. One method to accelerate scientific discovery is to extract more insight from costly scientific experiments already conducted. Unfortunately, data from scientific experiments tend only to be accessible by the originator who knows the experiments and directives. Moreover, there are no robust methods to search unstructured databases of images to deduce correlations and insight. Here, we develop a machine learning approach to create image similaritymore »projections to search unstructured image databases. To improve these projections, we develop and train a model to include symmetry-aware features. As an exemplar, we use a set of 25,133 piezoresponse force microscopy images collected on diverse materials systems over five years. We demonstrate how this tool can be used for interactive recursive image searching and exploration, highlighting structural similarities at various length scales. This tool justifies continued investment in federated scientific databases with standardized metadata schemas where the combination of filtering and recursive interactive searching can uncover synthesis-structure-property relations. We provide a customizable open-source package ( of this interactive tool for researchers to use with their data.

    « less
  6. Free, publicly-accessible full text available September 1, 2022
  7. In this community review report, we discuss applications and techniques for fast machine learning (ML) in science—the concept of integrating powerful ML methods into the real-time experimental data processing loop to accelerate scientific discovery. The material for the report builds on two workshops held by the Fast ML for Science community and covers three main areas: applications for fast ML across a number of scientific domains; techniques for training and implementing performant and resource-efficient ML algorithms; and computing architectures, platforms, and technologies for deploying these algorithms. We also present overlapping challenges across the multiple scientific domains where common solutions canmore »be found. This community report is intended to give plenty of examples and inspiration for scientific discovery through integrated and accelerated ML solutions. This is followed by a high-level overview and organization of technical advances, including an abundance of pointers to source material, which can enable these breakthroughs.« less
    Free, publicly-accessible full text available April 12, 2023
  8. We present a simple, yet effective, auxiliary learning task for the problem of neuron segmentation in electron microscopy volumes. The auxiliary task consists of the prediction of Local Shape Descriptors (LSDs), which we combine with conventional voxel-wise direct neighbor affinities for neuron boundary detection. The shape descriptors are designed to capture local statistics about the neuron to be segmented, such as diameter, elongation, and direction.On a large study comparing several existing methods across various specimen, imaging techniques, and resolutions, we find that auxiliary learning of LSDs consistently increases segmentation accuracy of affinity-based methods over a range of metrics. Furthermore, themore »addition of LSDs promotes affinity-based segmentation methods to be on par with the current state of the art for neuron segmentation (Flood-Filling Networks,FFN), while being two orders of magnitudes more efficient—a critical requirement for the processing of future petabyte-sized datasets. Implementations of the new auxiliary learning task,network architectures, training, prediction, and evaluation code, as well as the datasets used in this study are publicly available as a benchmark for future method contributions.« less