skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Using machine learning to detect coronaviruses potentially infectious to humans
Abstract Establishing the host range for novel viruses remains a challenge. Here, we address the challenge of identifying non-human animal coronaviruses that may infect humans by creating an artificial neural network model that learns from spike protein sequences of alpha and beta coronaviruses and their binding annotation to their host receptor. The proposed method produces a human-Binding Potential (h-BiP) score that distinguishes, with high accuracy, the binding potential among coronaviruses. Three viruses, previously unknown to bind human receptors, were identified: Bat coronavirus BtCoV/133/2005 and Pipistrellus abramus bat coronavirus HKU5-related (both MERS related viruses), andRhinolophus affiniscoronavirus isolate LYRa3 (a SARS related virus). We further analyze the binding properties of BtCoV/133/2005 and LYRa3 using molecular dynamics. To test whether this model can be used for surveillance of novel coronaviruses, we re-trained the model on a set that excludes SARS-CoV-2 and all viral sequences released after the SARS-CoV-2 was published. The results predict the binding of SARS-CoV-2 with a human receptor, indicating that machine learning methods are an excellent tool for the prediction of host expansion events.  more » « less
Award ID(s):
2030491 1934568
PAR ID:
10421032
Author(s) / Creator(s):
; ; ; ; ;
Publisher / Repository:
Nature Publishing Group
Date Published:
Journal Name:
Scientific Reports
Volume:
13
Issue:
1
ISSN:
2045-2322
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Pandemics originating from non-human animals highlight the need to understand how natural hosts have evolved in response to emerging human pathogens and which groups may be susceptible to infection and/or potential reservoirs to mitigate public health and conservation concerns. Multiple zoonotic coronaviruses, such as severe acute respiratory syndrome-associated coronavirus (SARS-CoV), SARS-CoV-2 and Middle Eastern respiratory syndrome-associated coronavirus (MERS-CoV), are hypothesized to have evolved in bats. We investigate angiotensin-converting enzyme 2 (ACE2), the host protein bound by SARS-CoV and SARS-CoV-2, and dipeptidyl-peptidase 4 (DPP4 or CD26), the host protein bound by MERS-CoV, in the largest bat datasets to date. Both the ACE2 and DPP4 genes are under strong selection pressure in bats, more so than in other mammals, and in residues that contact viruses. Additionally, mammalian groups vary in their similarity to humans in residues that contact SARS-CoV, SARS-CoV-2 and MERS-CoV, and increased similarity to humans in binding residues is broadly predictive of susceptibility to SARS-CoV-2. This work augments our understanding of the relationship between coronaviruses and mammals, particularly bats, provides taxonomically diverse data for studies of how host proteins are bound by coronaviruses and can inform surveillance, conservation and public health efforts. 
    more » « less
  2. null (Ed.)
    Abstract Severe acute respiratory syndrome coronavirus 1 (SARS-CoV-1) and SARS-CoV-2 are not phylogenetically closely related; however, both use the angiotensin-converting enzyme 2 (ACE2) receptor in humans for cell entry. This is not a universal sarbecovirus trait; for example, many known sarbecoviruses related to SARS-CoV-1 have two deletions in the receptor binding domain of the spike protein that render them incapable of using human ACE2. Here, we report three sequences of a novel sarbecovirus from Rwanda and Uganda that are phylogenetically intermediate to SARS-CoV-1 and SARS-CoV-2 and demonstrate via in vitro studies that they are also unable to utilize human ACE2. Furthermore, we show that the observed pattern of ACE2 usage among sarbecoviruses is best explained by recombination not of SARS-CoV-2, but of SARS-CoV-1 and its relatives. We show that the lineage that includes SARS-CoV-2 is most likely the ancestral ACE2-using lineage, and that recombination with at least one virus from this group conferred ACE2 usage to the lineage including SARS-CoV-1 at some time in the past. We argue that alternative scenarios such as convergent evolution are much less parsimonious; we show that biogeography and patterns of host tropism support the plausibility of a recombination scenario, and we propose a competitive release hypothesis to explain how this recombination event could have occurred and why it is evolutionarily advantageous. The findings provide important insights into the natural history of ACE2 usage for both SARS-CoV-1 and SARS-CoV-2 and a greater understanding of the evolutionary mechanisms that shape zoonotic potential of coronaviruses. This study also underscores the need for increased surveillance for sarbecoviruses in southwestern China, where most ACE2-using viruses have been found to date, as well as other regions such as Africa, where these viruses have only recently been discovered. 
    more » « less
  3. null (Ed.)
    The novel coronavirus severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) is the cause of COVID-19. The main receptor of SARS-CoV-2, angiotensin I converting enzyme 2 (ACE2), is now undergoing extensive scrutiny to understand the routes of transmission and sensitivity in different species. Here, we utilized a unique dataset of ACE2 sequences from 410 vertebrate species, including 252 mammals, to study the conservation of ACE2 and its potential to be used as a receptor by SARS-CoV-2. We designed a five-category binding score based on the conservation properties of 25 amino acids important for the binding between ACE2 and the SARS-CoV-2 spike protein. Only mammals fell into the medium to very high categories and only catarrhine primates into the very high category, suggesting that they are at high risk for SARS-CoV-2 infection. We employed a protein structural analysis to qualitatively assess whether amino acid changes at variable residues would be likely to disrupt ACE2/SARS-CoV-2 spike protein binding and found the number of predicted unfavorable changes significantly correlated with the binding score. Extending this analysis to human population data, we found only rare (frequency <0.001) variants in 10/25 binding sites. In addition, we found significant signals of selection and accelerated evolution in the ACE2 coding sequence across all mammals, and specific to the bat lineage. Our results, if confirmed by additional experimental data, may lead to the identification of intermediate host species for SARS-CoV-2, guide the selection of animal models of COVID-19, and assist the conservation of animals both in native habitats and in human care. 
    more » « less
  4. Coronaviruses are positive sense, single-stranded, enveloped, and non-segmented RNA viruses that belong to the Coronaviridae family within the order Nidovirales and suborder Coronavirinae. Two Alphacoronavirus strains: HCoV-229E and HCoV-NL63 and five Betacoronaviruses: HCoV-HKU1, HCoV-OC43, SARS-CoV, MERS-CoV, and SARS-CoV-2 have so far been recognized as Human Coronaviruses (HCoVs). Coronavirus disease 2019 (COVID-19) caused by SARS-CoV-2 is currently the greatest concern for humanity. Despite the overflow of research on SARS-CoV-2 and other HCoVs published every week, existing knowledge in this area is insufficient for the complete understanding of the viruses and the diseases caused by them. This review is based on the analysis of 210 published works, and it attempts to cover the basic biology of coronaviruses, including the genetic characteristics, life cycle, and host-pathogen interaction, pathogenesis, the antiviral drugs, and vaccines against HCoVs, especially focusing on SARS-CoV-2. Furthermore, we will briefly discuss the potential link between extracellular vesicles (EVs) and SARS-CoV-2/COVID-19 pathophysiology. 
    more » « less
  5. null (Ed.)
    Viruses such as the novel coronavirus, SARS-CoV-2, that is wreaking havoc on the world, depend on interactions of its own proteins with those of the human host cells. Relatively small changes in sequence such as between SARS-CoV and SARS-CoV-2 can dramatically change clinical phenotypes of the virus, including transmission rates and severity of the disease. On the other hand, highly dissimilar virus families such as Coronaviridae, Ebola, and HIV have overlap in functions. In this work we aim to analyze the role of protein sequence in the binding of SARS-CoV-2 virus proteins towards human proteins and compare it to that of the above other viruses. We build supervised machine learning models, using Generalized Additive Models to predict interactions based on sequence features and find that our models perform well with an AUC-PR of 0.65 in a class-skew of 1:10. Analysis of the novel predictions using an independent dataset showed statistically significant enrichment. We further map the importance of specific amino-acid sequence features in predicting binding and summarize what combinations of sequences from the virus and the host is correlated with an interaction. By analyzing the sequence-based embeddings of the interactomes from different viruses and clustering them together we find some functionally similar proteins from different viruses. For example, vif protein from HIV-1, vp24 from Ebola and orf3b from SARS-CoV all function as interferon antagonists. Furthermore, we can differentiate the functions of similar viruses, for example orf3a’s interactions are more diverged than orf7b interactions when comparing SARS-CoV and SARS-CoV-2. 
    more » « less