Exploring Lossy Compression of Gene Expression Matrices

McKnight, Coleman B.; Poulos, Alexandra L.; Bender, M. Reed; Calhoun, Jon C.; Feltus, F. Alex

doi:10.1109/DRBSD-549595.2019.00010

Citation Details

Exploring Lossy Compression of Gene Expression Matrices

Gene Expression Matrices (GEMs) are a fundamental data type in the genomics domain. As the size and scope of genomics experiments increase, researchers are struggling to process large GEMs through downstream workflows with currently accepted practices. In this paper, we propose a methodology to reduce the size of GEMs using multiple approaches. Our method partitions data into discrete fields based on data type and employs state-of-the-art lossless and lossy compression algorithms to reduce the input data size. This work explores a variety of lossless and lossy compression methods to determine which methods work the best for each component of a GEM. We evaluate the accuracy of the compressed GEMs by running them through the Knowledge Independent Network Construction (KINC) workflow and comparing the quality of the resulting gene co-expression network with a lossless control to verify result fidelity. Results show that utilizing a combination of lossy and lossless compression results in compression ratios up to 9.77× on a Yeast GEM, while still preserving the biological integrity of the data. Usage of the compression methodology on the Cancer Cell Line Encyclopedia(CCLE) GEM resulted in compression ratios up to 9.26×. By using this methodology, researchers in the Genomics domain may be able to process previously inaccessible GEMs while realizing significant reduction in computational costs. more »

Award ID(s):: 1659300 1910197

PAR ID:: 10132572

Author(s) / Creator(s):: McKnight, Coleman B.; Poulos, Alexandra L.; Bender, M. Reed; Calhoun, Jon C.; Feltus, F. Alex

Date Published:: 2019-11-01

Journal Name:: 2019 IEEE/ACM 5th International Workshop on Data Analysis and Reduction for Big Scientific Data (DRBSD-5)

Page Range / eLocation ID:: 28 to 34

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript1.0
Conference Paper:
https://doi.org/10.1109/DRBSD-549595.2019.00010

More Like this