skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Interpretable and Predictive Deep Neural Network Modeling of the SARS-CoV-2 Spike Protein Sequence to Predict COVID-19 Disease Severity
Through the COVID-19 pandemic, SARS-CoV-2 has gained and lost multiple mutations in novel or unexpected combinations. Predicting how complex mutations affect COVID-19 disease severity is critical in planning public health responses as the virus continues to evolve. This paper presents a novel computational framework to complement conventional lineage classification and applies it to predict the severe disease potential of viral genetic variation. The transformer-based neural network model architecture has additional layers that provide sample embeddings and sequence-wide attention for interpretation and visualization. First, training a model to predict SARS-CoV-2 taxonomy validates the architecture’s interpretability. Second, an interpretable predictive model of disease severity is trained on spike protein sequence and patient metadata from GISAID. Confounding effects of changing patient demographics, increasing vaccination rates, and improving treatment over time are addressed by including demographics and case date as independent input to the neural network model. The resulting model can be interpreted to identify potentially significant virus mutations and proves to be a robust predctive tool. Although trained on sequence data obtained entirely before the availability of empirical data for Omicron, the model can predict the Omicron’s reduced risk of severe disease, in accord with epidemiological and experimental data.  more » « less
Award ID(s):
2107108
PAR ID:
10463981
Author(s) / Creator(s):
; ;
Date Published:
Journal Name:
Biology
Volume:
11
Issue:
12
ISSN:
2079-7737
Page Range / eLocation ID:
1786
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. SARS-CoV-2 infection can result in a range of outcomes from asymptomatic/mild disease to severe COVID-19/fatality. In this study, we investigated the differential expression of small noncoding RNAs (sncRNAs) between patient cohorts defined by disease severity. We collected plasma samples, stratified these based on clinical outcomes, and sequenced their circulating sncRNAs. Excitingly, we found YRNA HY4 displays significant differential expression (p=0.025) between patients experiencing mild and severe disease. In agreement with recent reports identifying plasma YRNAs as indicators of influenza infection severity, our results strongly suggest that circulating HY4 levels represent a powerful prognostic indicator of likely SARS-CoV-2 patient infection outcome. 
    more » « less
  2. null (Ed.)
    Coronavirus Disease 2019 (COVID-19) is caused by severe acute respiratory syndrome coronavirus 2 virus (SARS-CoV-2). The virus transmits rapidly; it has a basic reproductive number (R0) of 2.2-2.7. In March 2020, the World Health Organization declared the COVID-19 outbreak a pandemic. COVID-19 is currently affecting more than 200 countries with 6M active cases. An effective testing strategy for COVID-19 is crucial to controlling the outbreak but the demand for testing surpasses the availability of test kits that use Reverse Transcription Polymerase Chain Reaction (RT-PCR). In this paper, we present a technique to screen for COVID-19 using artificial intelligence. Our technique takes only seconds to screen for the presence of the virus in a patient. We collected a dataset of chest X-ray images and trained several popular deep convolution neural network-based models (VGG, MobileNet, Xception, DenseNet, InceptionResNet) to classify the chest X-rays. Unsatisfied with these models, we then designed and built a Residual Attention Network that was able to screen COVID-19 with a testing accuracy of 98% and a validation accuracy of 100%. A feature maps visual of our model show areas in a chest X-ray which are important for classification. Our work can help to increase the adaptation of AI-assisted applications in clinical practice. The code and dataset used in this project are available at https://github.com/vishalshar/covid-19-screening-using-RAN-on-X-ray-images. 
    more » « less
  3. The coronavirus disease 2019 (COVID-19) pandemic challenged the workings of human society, but in doing so, it advanced our understanding of the ecology and evolution of infectious diseases. Fluctuating transmission of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) demonstrated the highly dynamic nature of human social behavior, often without government intervention. Evolution of SARS-CoV-2 in the first two years following spillover resulted primarily in increased transmissibility, while in the third year, the globally dominant virus variants had all evolved substantial immune evasion. The combination of viral evolution and the buildup of host immunity through vaccination and infection greatly decreased the realized virulence of SARS-CoV-2 due to the age dependence of disease severity. The COVID-19 pandemic was exacerbated by presymptomatic, asymptomatic, and highly heterogeneous transmission, as well as highly variable disease severity and the broad host range of SARS-CoV-2. Insights and tools developed during the COVID-19 pandemic could provide a stronger scientific basis for preventing, mitigating, and controlling future pandemics. 
    more » « less
  4. null (Ed.)
    Abstract In less than nine months, the Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2) killed over a million people, including >25,000 in New York City (NYC) alone. The COVID-19 pandemic caused by SARS-CoV-2 highlights clinical needs to detect infection, track strain evolution, and identify biomarkers of disease course. To address these challenges, we designed a fast (30-minute) colorimetric test (LAMP) for SARS-CoV-2 infection from naso/oropharyngeal swabs and a large-scale shotgun metatranscriptomics platform (total-RNA-seq) for host, viral, and microbial profiling. We applied these methods to clinical specimens gathered from 669 patients in New York City during the first two months of the outbreak, yielding a broad molecular portrait of the emerging COVID-19 disease. We find significant enrichment of a NYC-distinctive clade of the virus (20C), as well as host responses in interferon, ACE, hematological, and olfaction pathways. In addition, we use 50,821 patient records to find that renin–angiotensin–aldosterone system inhibitors have a protective effect for severe COVID-19 outcomes, unlike similar drugs. Finally, spatial transcriptomic data from COVID-19 patient autopsy tissues reveal distinct ACE2 expression loci, with macrophage and neutrophil infiltration in the lungs. These findings can inform public health and may help develop and drive SARS-CoV-2 diagnostic, prevention, and treatment strategies. 
    more » « less
  5. Abstract Given the global impact and severity of COVID-19, there is a pressing need for a better understanding of the SARS-CoV-2 genome and mutations. Multi-strain sequence alignments of coronaviruses (CoV) provide important information for interpreting the genome and its variation. We apply a comparative genomics method, ConsHMM, to the multi-strain alignments of CoV to annotate every base of the SARS-CoV-2 genome with conservation states based on sequence alignment patterns among CoV. The learned conservation states show distinct enrichment patterns for genes, protein domains, and other regions of interest. Certain states are strongly enriched or depleted of SARS-CoV-2 mutations, which can be used to predict potentially consequential mutations. We expect the conservation states to be a resource for interpreting the SARS-CoV-2 genome and mutations. 
    more » « less