Mapping Data to Deep Understanding: Making the Most of the Deluge of SARS-CoV-2 Genome Sequences

Sokhansanj, Bahrad A.; Rosen, Gail L.

doi:10.1128/msystems.00035-22

Citation Details

Mapping Data to Deep Understanding: Making the Most of the Deluge of SARS-CoV-2 Genome Sequences

ABSTRACT Next-generation sequencing has been essential to the global response to the COVID-19 pandemic. As of January 2022, nearly 7 million severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) sequences are available to researchers in public databases. Sequence databases are an abundant resource from which to extract biologically relevant and clinically actionable information. As the pandemic has gone on, SARS-CoV-2 has rapidly evolved, involving complex genomic changes that challenge current approaches to classifying SARS-CoV-2 variants. Deep sequence learning could be a potentially powerful way to build complex sequence-to-phenotype models. Unfortunately, while they can be predictive, deep learning typically produces “black box” models that cannot directly provide biological and clinical insight. Researchers should therefore consider implementing emerging methods for visualizing and interpreting deep sequence models. Finally, researchers should address important data limitations, including (i) global sequencing disparities, (ii) insufficient sequence metadata, and (iii) screening artifacts due to poor sequence quality control. more »

Award ID(s):: 2107108

PAR ID:: 10356866

Author(s) / Creator(s):: Sokhansanj, Bahrad A.; Rosen, Gail L.

Editor(s):: Gaglia, Marta M.

Date Published:: 2022-04-26

Journal Name:: mSystems

Volume:: 7

Issue:: 2

ISSN:: 2379-5077

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript
Journal Article:
https://doi.org/10.1128/msystems.00035-22

More Like this