Sketching and Sublinear Data Structures in Genomics

Marçais, Guillaume; Solomon, Brad; Patro, Rob; Kingsford, Carl

doi:10.1146/annurev-biodatasci-072018-021156

Citation Details

Sketching and Sublinear Data Structures in Genomics

Large-scale genomics demands computational methods that scale sublinearly with the growth of data. We review several data structures and sketching techniques that have been used in genomic analysis methods. Specifically, we focus on four key ideas that take different approaches to achieve sublinear space usage and processing time: compressed full-text indices, approximate membership query data structures, locality-sensitive hashing, and minimizers schemes. We describe these techniques at a high level and give several representative applications of each. more »

Award ID(s):: 1763680 1750472

PAR ID:: 10132819

Author(s) / Creator(s):: Marçais, Guillaume; Solomon, Brad; Patro, Rob; Kingsford, Carl

Date Published:: 2019-07-20

Journal Name:: Annual Review of Biomedical Data Science

Volume:: 2

Issue:: 1

ISSN:: 2574-3414

Page Range / eLocation ID:: 93 to 118

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript1.0
Journal Article:
https://doi.org/10.1146/annurev-biodatasci-072018-021156

More Like this