Moving Just Enough Deep Sequencing Data to Get the Job Done

Mills, Nicholas; Bensman, Ethan M; Poehlman, William L; Ligon, Walter B; Feltus, F Alex

doi:10.1177/1177932219856359

Citation Details

Moving Just Enough Deep Sequencing Data to Get the Job Done

Motivation: As the size of high-throughput DNA sequence datasets continues to grow, the cost of transferring and storing the datasets may prevent their processing in all but the largest data centers or commercial cloud providers. To lower this cost, it should be possible to process only a subset of the original data while still preserving the biological information of interest. Results: Using 4 high-throughput DNA sequence datasets of differing sequencing depth from 2 species as use cases, we demonstrate the effect of processing partial datasets on the number of detected RNA transcripts using an RNA-Seq workflow. We used transcript detection to decide on a cutoff point. We then physically transferred the minimal partial dataset and compared with the transfer of the full dataset, which showed a reduction of approximately 25% in the total transfer time. These results suggest that as sequencing datasets get larger, one way to speed up analysis is to simply transfer the minimal amount of data that still sufficiently detects biological signal. Availability: All results were generated using public datasets from NCBI and publicly available open source software. more »

Award ID(s):: 1659300

NSF-PAR ID:: 10132575

Author(s) / Creator(s):: Mills, Nicholas; Bensman, Ethan M; Poehlman, William L; Ligon, Walter B; Feltus, F Alex

Date Published:: 2019-01-01

Journal Name:: Bioinformatics and Biology Insights

Volume:: 13

ISSN:: 1177-9322

Page Range / eLocation ID:: 117793221985635

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript1.0
Journal Article:
https://doi.org/10.1177/1177932219856359

More Like this