Large-scale sequence comparisons with sourmash

Pierce, N. Tessa; Irber, Luiz; Reiter, Taylor; Brooks, Phillip; Brown, C. Titus

doi:10.12688/f1000research.19675.1

Citation Details

Large-scale sequence comparisons with sourmash

The sourmash software package uses MinHash-based sketching to create “signatures”, compressed representations of DNA, RNA, and protein sequences, that can be stored, searched, explored, and taxonomically annotated. sourmash signatures can be used to estimate sequence similarity between very large data sets quickly and in low memory, and can be used to search large databases of genomes for matches to query genomes and metagenomes. sourmash is implemented in C++, Rust, and Python, and is freely available under the BSD license at http://github.com/dib-lab/sourmash. more »

Award ID(s):: 1711984

PAR ID:: 10192442

Author(s) / Creator(s):: Pierce, N. Tessa; Irber, Luiz; Reiter, Taylor; Brooks, Phillip; Brown, C. Titus

Date Published:: 2019-01-01

Journal Name:: F1000Research

Volume:: 8

ISSN:: 2046-1402

Page Range / eLocation ID:: 1006

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript1.0
Journal Article:
https://doi.org/10.12688/f1000research.19675.1

More Like this