skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: FastZ: accelerating gapped whole genome alignment on GPUs
Recognizing the importance of whole genome alignment (WGA), the National Institutes for Health maintains LASTZ, a sequential WGA application. As genomic data grows, there is a compelling need for scalable, high-performance WGA. Unfortunately, high-sensitivity, `gapped' alignment which uses dynamic programming (DP) is slow, whereas faster alignment with ungapped filtering is often less sensitive. We develop FastZ, a GPU-accelerated, gapped WGA software which matches gapped LASTZ in sensitivity. FastZ employs a novel inspector-executor scheme in which (a) the lightweight inspector elides DP traceback except in common, extremely short alignments, where the inspector performs limited, eager traceback to eliminate the executor, and (b) executor trimming avoids unnecessary work. Further, FastZ employs register-based cyclic-buffering to drastically reduce memory traffic, and groups DP problems by size for load balance. FastZ running on an RTX 3080 GPU and our multicore implementation of LASTZ achieve 111x and 20x speedups over the sequential LASTZ, respectively.  more » « less
Award ID(s):
1633412
PAR ID:
10337858
Author(s) / Creator(s):
; ;
Date Published:
Journal Name:
SC '21: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis
Page Range / eLocation ID:
1 to 13
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. The inspector/executor paradigm permits using runtime information in concert with compiler optimization. An inspector collects information that is only available at runtime; this information is used by an optimized executor that was created at compile time. Inspectors are widely used in optimizing irregular computations, where information about data dependences, loop bounds, data structures, and memory access pa erns are collected at runtime and used to guide code transformation, parallelization, and data layout. Most research that uses inspectors relies on instantiating inspector templates, invoking inspector library code, or manually writing inspectors. is paper describes abstractions for generating inspectors for loop and data transformations for sparse matrix computations using the Sparse Polyhedral Framework (SPF). SPF is an extension of the polyhedral framework for transformation and code generation. SPF extends the polyhedral framework to represent runtime information with uninterpreted functions and inspector computations that explicitly realize such functions at runtime. It has previously been used to derive inspectors for data and iteration space reordering. is paper introduces data transformations into SPF, such as conversions between sparse matrix formats, and show how prior work can be supported by SPF. We also discuss possible extensions to support inspector composition and incorporate other optimizations. is work represents a step towards creating composable inspectors in keeping with the composability of a ne transformations on the executors. 
    more » « less
  2. Cathie Olschanowsky (Ed.)
    Sparse computations are important in scientific computing. Many scientific applications compute on sparse data. Data is said to be sparse if it has a relatively small number of non-zeros. Sparse formats use auxiliary arrays to store non-zeros, as a result, the contents of auxiliary arrays are not known until run-time. The Inspector/Executor (I/E) paradigm uses run-time information for compiler optimizations. An inspector computes information at run-time to drive transformations. The executor—a compile-time transformation of the original code— uses information computed by the inspector. The sparse polyhedral framework (SPF) encompasses a series of tools to support I/E run-time transformations. This work introduces a unified framework that wraps SPF tools while providing a holistic view of computation as an intermediate representation (IR). This work also introduces a method to automatically synthesize inspectors to transform between sparse formats and improvements to SPF to explore the performance of irregular applications. 
    more » « less
  3. The GraphBLAS C specification provisional release 1.0 is complete. To manage the scope of the project, we had to defer important functionality to a future version of the specification. For example, we are well aware that many algorithms benefit from an inspector-executor execution strategy. We also know that users would benefit from a number of standard predefined semirings as well as more general user-defined types. These and other features are described in this paper in the context of a future release of the GraphBLAS C API. 
    more » « less
  4. Machine Learning (ML) opens exciting scientific opportunities in K-12 STEM classrooms. However, students struggle with interpreting ML patterns due to limited data literacy. Face glyphs offer unique benefit by leveraging our brain’s facial feature processing. Yet, they have limitations like lacking contextual information and data biases. To address this, we created three enhanced face glyph visualizations: feature-independent and feature-aligned range views, and the sequential feature inspector. In a study with 25 high school students, feature-aligned range visualization helped contextual analysis, and the sequential feature inspector reduced missing data risks. Face glyphs also benefit the global interpretation of data. 
    more » « less
  5. There has been a growing interest in using GPU to accelerate data analytics due to its massive parallelism and high memory bandwidth. The main constraint of using GPU for data analytics is the limited capacity of GPU memory. Heterogeneous CPU-GPU query execution is a compelling approach to mitigate the limited GPU memory capacity and PCIe bandwidth. However, the design space of heterogeneous CPU-GPU query execution has not been fully explored. We aim to improve state-of-the-art CPU-GPU data analytics engine by optimizing data placement and heterogeneous query execution. First, we introduce a semantic-aware fine-grained caching policy which takes into account various aspects of the workload such as query semantics, data correlation, and query frequency when determining data placement between CPU and GPU. Second, we introduce a heterogeneous query executor which can fully exploit data in both CPU and GPU and coordinate query execution at a fine granularity. We integrate both solutions in Mordred, our novel hybrid CPU-GPU data analytics engine. Evaluation on the Star Schema Benchmark shows that the semantic-aware caching policy can outperform the best traditional caching policy by up to 3x. Compared to existing GPU DBMSs, Mordred can outperform by an order of magnitude. 
    more » « less