The Naïve Bayes classifier++ for metagenomic taxonomic classification—query evaluation

Duan, Haozhe_Neil; Hearne, Gavin; Polikar, Robi (ORCID:0000000227394228); Rosen, Gail_L (ORCID:0000000317635750); Kendziorski, ed., Christina

doi:10.1093/bioinformatics/btae743

Citation Details

The Naïve Bayes classifier++ for metagenomic taxonomic classification—query evaluation

Abstract MotivationThis study examines the query performance of the NBC++ (Incremental Naive Bayes Classifier) program for variations in canonicality, k-mer size, databases, and input sample data size. We demonstrate that both NBC++ and Kraken2 are influenced by database depth, with macro measures improving as depth increases. However, fully capturing the diversity of life, especially viruses, remains a challenge. ResultsNBC++ can competitively profile the superkingdom content of metagenomic samples using a small training database. NBC++ spends less time training and can use a fraction of the memory than Kraken2 but at the cost of long querying time. Major NBC++ enhancements include accommodating canonical k-mer storage (leading to significant storage savings) and adaptable and optimized memory allocation that accelerates query analysis and enables the software to be run on nearly any system. Additionally, the output now includes log-likelihood values for each training genome, providing users with valuable confidence information. Availability and implementationSource code and Dockerfile are available at http://github.com/EESI/Naive_Bayes. more »

Award ID(s):: 1936791 2107108 1936782

PAR ID:: 10566179

Author(s) / Creator(s):: Duan, Haozhe_Neil; Hearne, Gavin; Polikar, Robi; Rosen, Gail_L; Kendziorski, ed., Christina

Publisher / Repository:: Oxford University Press

Date Published:: 2024-12-19

Journal Name:: Bioinformatics

Volume:: 41

Issue:: 1

ISSN:: 1367-4811

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Journal Article:
https://doi.org/10.1093/bioinformatics/btae743

More Like this