- Home
- Search Results
- Page 1 of 1
Search for: All records
-
Total Resources4
- Resource Type
-
01000030000
- More
- Availability
-
40
- Author / Contributor
- Filter by Author / Creator
-
-
Liu, Shaopeng (4)
-
Bian, Xiao (1)
-
Chu, Peng (1)
-
Gao, Robert X. (1)
-
Guzzo, Judith A. (1)
-
Jawahir, I. S. (1)
-
Koslicki, David (1)
-
Kurfess, Thomas (1)
-
Ling, Haibin (1)
-
Russell, Matthew (1)
-
Terpenny, Janis (1)
-
Wang, Peng (1)
-
Wu, Dazhong (1)
-
Zhang, Li (1)
-
#Tyler Phillips, Kenneth E. (0)
-
#Willis, Ciara (0)
-
& Abreu-Ramos, E. D. (0)
-
& Abramson, C. I. (0)
-
& Abreu-Ramos, E. D. (0)
-
& Adams, S.G. (0)
-
- Filter by Editor
-
-
& Spizer, S. M. (0)
-
& . Spizer, S. (0)
-
& Ahn, J. (0)
-
& Bateiha, S. (0)
-
& Bosch, N. (0)
-
& Brennan K. (0)
-
& Brennan, K. (0)
-
& Chen, B. (0)
-
& Chen, Bodong (0)
-
& Drown, S. (0)
-
& Ferretti, F. (0)
-
& Higgins, A. (0)
-
& J. Peters (0)
-
& Kali, Y. (0)
-
& Ruiz-Arias, P.M. (0)
-
& S. Spitzer (0)
-
& Sahin. I. (0)
-
& Spitzer, S. (0)
-
& Spitzer, S.M. (0)
-
(submitted - in Review for IEEE ICASSP-2024) (0)
-
-
Have feedback or suggestions for a way to improve these results?
!
Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher.
Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?
Some links on this page may take you to non-federal websites. Their policies may differ from this site.
-
Liu, Shaopeng ; Koslicki, David ( , Bioinformatics)
Abstract Motivation K-mer-based methods are used ubiquitously in the field of computational biology. However, determining the optimal value of k for a specific application often remains heuristic. Simply reconstructing a new k-mer set with another k-mer size is computationally expensive, especially in metagenomic analysis where datasets are large. Here, we introduce a hashing-based technique that leverages a kind of bottom-m sketch as well as a k-mer ternary search tree (KTST) to obtain k-mer-based similarity estimates for a range of k values. By truncating k-mers stored in a pre-built KTST with a large k=kmax value, we can simultaneously obtain k-mer-based estimates for all k values up to kmax. This truncation approach circumvents the reconstruction of new k-mer sets when changing k values, making analysis more time and space-efficient.
Results We derived the theoretical expression of the bias factor due to truncation. And we showed that the biases are negligible in practice: when using a KTST to estimate the containment index between a RefSeq-based microbial reference database and simulated metagenome data for 10 values of k, the running time was close to 10× faster compared to a classic MinHash approach while using less than one-fifth the space to store the data structure.
Availability and implementation A python implementation of this method, CMash, is available at https://github.com/dkoslicki/CMash. The reproduction of all experiments presented herein can be accessed via https://github.com/KoslickiLab/CMASH-reproducibles.
Supplementary information Supplementary data are available at Bioinformatics online.
-
Chu, Peng ; Bian, Xiao ; Liu, Shaopeng ; Ling, Haibin ( , European Conf. on Computer Vision (ECCV))
-
Wu, Dazhong ; Liu, Shaopeng ; Zhang, Li ; Terpenny, Janis ; Gao, Robert X. ; Kurfess, Thomas ; Guzzo, Judith A. ( , Journal of Manufacturing Systems)