SeqScreen: accurate and sensitive functional screening of pathogenic sequences via ensemble learning

Balaji, Advait (ORCID:0000000198589578); Kille, Bryce (ORCID:0000000329466915); Kappell, Anthony D. (ORCID:0000000335119207); Godbold, Gene D. (ORCID:0000000257024690); Diep, Madeline (ORCID:0000000299080367); Elworth, R. A. Leo (ORCID:0000000239450661); Qian, Zhiqin; Albin, Dreycey; Nasko, Daniel J. (ORCID:0000000283596975); Shah, Nidhi; Pop, Mihai (ORCID:0000000196175304); Segarra, Santiago (ORCID:0000000284089633); Ternus, Krista L. (ORCID:0000000311385308); Treangen, Todd J. (ORCID:000000023760564X)

doi:10.1186/s13059-022-02695-x

Citation Details

SeqScreen: accurate and sensitive functional screening of pathogenic sequences via ensemble learning

Abstract The COVID-19 pandemic has emphasized the importance of accurate detection of known and emerging pathogens. However, robust characterization of pathogenic sequences remains an open challenge. To address this need we developed SeqScreen, which accurately characterizes short nucleotide sequences using taxonomic and functional labels and a customized set of curated Functions of Sequences of Concern (FunSoCs) specific to microbial pathogenesis. We show our ensemble machine learning model can label protein-coding sequences with FunSoCs with high recall and precision. SeqScreen is a step towards a novel paradigm of functionally informed synthetic DNA screening and pathogen characterization, available for download atwww.gitlab.com/treangenlab/seqscreen. more »

Award ID(s):: 2126387

PAR ID:: 10367978

Author(s) / Creator(s):: Balaji, Advait; Kille, Bryce; Kappell, Anthony D.; Godbold, Gene D.; Diep, Madeline; Elworth, R. A. Leo; Qian, Zhiqin; Albin, Dreycey; Nasko, Daniel J.; Shah, Nidhi; Pop, Mihai; Segarra, Santiago; Ternus, Krista L.; Treangen, Todd J.

Publisher / Repository:: Springer Science + Business Media

Date Published:: 2022-06-20

Journal Name:: Genome Biology

Volume:: 23

Issue:: 1

ISSN:: 1474-760X

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Journal Article:
https://doi.org/10.1186/s13059-022-02695-x

More Like this