MArVD2: a machine learning enhanced tool to discriminate between archaeal and bacterial viruses in viral datasets

Vik, Dean (ORCID:000000027546899X); Bolduc, Benjamin (ORCID:0000000324200755); Roux, Simon; Sun, Christine L.; Pratama, Akbar Adjie (ORCID:000000031079744X); Krupovic, Mart (ORCID:0000000154860098); Sullivan, Matthew B.

doi:10.1038/s43705-023-00295-9

Citation Details

MArVD2: a machine learning enhanced tool to discriminate between archaeal and bacterial viruses in viral datasets

Abstract Our knowledge of viral sequence space has exploded with advancing sequencing technologies and large-scale sampling and analytical efforts. Though archaea are important and abundant prokaryotes in many systems, our knowledge of archaeal viruses outside of extreme environments is limited. This largely stems from the lack of a robust, high-throughput, and systematic way to distinguish between bacterial and archaeal viruses in datasets of curated viruses. Here we upgrade our prior text-based tool (MArVD) via training and testing a random forest machine learning algorithm against a newly curated dataset of archaeal viruses. After optimization, MArVD2 presented a significant improvement over its predecessor in terms of scalability, usability, and flexibility, and will allow user-defined custom training datasets as archaeal virus discovery progresses. Benchmarking showed that a model trained with viral sequences from the hypersaline, marine, and hot spring environments correctly classified 85% of the archaeal viruses with a false detection rate below 2% using a random forest prediction threshold of 80% in a separate benchmarking dataset from the same habitats. more »

Award ID(s):: 2149505

PAR ID:: 10446500

Author(s) / Creator(s):: Vik, Dean; Bolduc, Benjamin; Roux, Simon; Sun, Christine L.; Pratama, Akbar Adjie; Krupovic, Mart; Sullivan, Matthew B.

Publisher / Repository:: Oxford University Press

Date Published:: 2023-08-24

Journal Name:: ISME Communications

Volume:: 3

Issue:: 1

ISSN:: 2730-6151

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Journal Article:
https://doi.org/10.1038/s43705-023-00295-9

More Like this