skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: GPU-DAEMON: GPU algorithm design, data management & optimization template for array based big omics data
Award ID(s):
1925960
PAR ID:
10090849
Author(s) / Creator(s):
; ;
Date Published:
Journal Name:
Computers in Biology and Medicine
Volume:
101
Issue:
C
ISSN:
0010-4825
Page Range / eLocation ID:
163 to 173
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. We develop a dynamic dictionary data structure for the GPU, supporting fast insertions and deletions, based on the Log Structured Merge tree (LSM). Our implementation on an NVIDIA K40c GPU has an average update (insertion or deletion) rate of 225 M elements/s, 13.5x faster than merging items into a sorted array. The GPU LSM supports the retrieval operations of lookup, count, and range query operations with an average rate of 75 M, 32 M and 23 M queries/s respectively. The trade-off for the dynamic updates is that the sorted array is almost twice as fast on retrievals. We believe that our GPU LSM is the first dynamic general-purpose dictionary data structure for the GPU. 
    more » « less
  2. We develop a dynamic dictionary data structure for the GPU, supporting fast insertions and deletions, based on the Log Structured Merge tree (LSM). Our implementation on an NVIDIA K40c GPU has an average update (insertion or deletion) rate of 225 M elements/s, 13.5x faster than merging items into a sorted array. The GPU LSM supports the retrieval operations of lookup, count, and range query operations with an average rate of 75 M, 32 M and 23 M queries/s respectively. The trade-off for the dynamic updates is that the sorted array is almost twice as fast on retrievals. We believe that our GPU LSM is the first dynamic general-purpose dictionary data structure for the GPU. 
    more » « less
  3. We propose a new static high-performance mesh data structure for triangle surface meshes on the GPU. Our data structure is carefully designed for parallel execution while capturing mesh locality and confining data access, as much as possible, within the GPU's fast shared memory. We achieve this by subdividing the mesh intopatchesand representing these patches compactly using a matrix-based representation. Our patching technique is decorated withribbons, thin mesh strips around patches that eliminate the need to communicate between different computation thread blocks, resulting in consistent high throughput. We call our data structureRXMesh: Ribbon-matriX Mesh. We hide the complexity of our data structure behind a flexible but powerful programming model that helps deliver high performance by inducing load balance even in highly irregular input meshes. We show the efficacy of our programming model on common geometry processing applications---mesh smoothing and filtering, geodesic distance, and vertex normal computation. For evaluation, we benchmark our data structure against well-optimized GPU and (single and multi-core) CPU data structures and show significant speedups. 
    more » « less
  4. Utilizing memory and register bandwidth in modern architectures may require swizzles — non-trivial mappings of data and computations onto hardware resources — such as shuffles. We develop Swizzle Inventor to help programmers implement swizzle programs, by writing program sketches that omit swizzles and delegating their creation to an automatic synthesizer. Our synthesis algorithm scales to real-world programs, allowing us to invent new GPU kernels for stencil computations, matrix transposition, and a finite field multiplication algorithm (used in cryptographic applications). The synthesized 2D convolution and finite field multiplication kernels are on average 1.5–3.2x and 1.1–1.7x faster, respectively, than expert-optimized CUDA kernels. 
    more » « less
  5. Studying dynamic-functional connectivity (DFC) using fMRI data of the brain gives much richer information to neuroscientists than studying the brain as a static entity. Mining of dynamic connectivity graphs from these brain studies can be used to classify diseased versus healthy brains. However, constructing and mining dynamic-functional connectivity graphs of the brain can be time consuming due to size of fMRI data. In this paper, we propose a highly scalable GPU-based parallel algorithm called GPU-DFC for computing dynamic-functional connectivity of fMRI data both at region and voxel level. Our algorithm exploits sparsification of correlation matrix and stores them in CSR format. Further reduction in the correlation matrix is achieved by parallel decomposition techniques. Our GPU-DFC algorithm achieves 2 times speed-up for computing dynamic correlations compared to state-of-the-art GPU-based techniques and more than 40 times compared to a sequential CPU version. In terms of storage, our proposed matrix decomposition technique reduces the size of correlation matrices more than 100 times. Reconstructed values from decomposed matrices show comparable results as compared to the correlations with original data. The implemented code is available as GPL license on GitHub portal of our lab (https://github.com/pcdslab/GPU-DFC). 
    more » « less