skip to main content


Title: Using machine learning to detect coronaviruses potentially infectious to humans
Abstract

Establishing the host range for novel viruses remains a challenge. Here, we address the challenge of identifying non-human animal coronaviruses that may infect humans by creating an artificial neural network model that learns from spike protein sequences of alpha and beta coronaviruses and their binding annotation to their host receptor. The proposed method produces a human-Binding Potential (h-BiP) score that distinguishes, with high accuracy, the binding potential among coronaviruses. Three viruses, previously unknown to bind human receptors, were identified: Bat coronavirus BtCoV/133/2005 and Pipistrellus abramus bat coronavirus HKU5-related (both MERS related viruses), andRhinolophus affiniscoronavirus isolate LYRa3 (a SARS related virus). We further analyze the binding properties of BtCoV/133/2005 and LYRa3 using molecular dynamics. To test whether this model can be used for surveillance of novel coronaviruses, we re-trained the model on a set that excludes SARS-CoV-2 and all viral sequences released after the SARS-CoV-2 was published. The results predict the binding of SARS-CoV-2 with a human receptor, indicating that machine learning methods are an excellent tool for the prediction of host expansion events.

 
more » « less
Award ID(s):
2030491 1934568
PAR ID:
10421032
Author(s) / Creator(s):
; ; ; ; ;
Publisher / Repository:
Nature Publishing Group
Date Published:
Journal Name:
Scientific Reports
Volume:
13
Issue:
1
ISSN:
2045-2322
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. null (Ed.)
    Abstract Severe acute respiratory syndrome coronavirus 1 (SARS-CoV-1) and SARS-CoV-2 are not phylogenetically closely related; however, both use the angiotensin-converting enzyme 2 (ACE2) receptor in humans for cell entry. This is not a universal sarbecovirus trait; for example, many known sarbecoviruses related to SARS-CoV-1 have two deletions in the receptor binding domain of the spike protein that render them incapable of using human ACE2. Here, we report three sequences of a novel sarbecovirus from Rwanda and Uganda that are phylogenetically intermediate to SARS-CoV-1 and SARS-CoV-2 and demonstrate via in vitro studies that they are also unable to utilize human ACE2. Furthermore, we show that the observed pattern of ACE2 usage among sarbecoviruses is best explained by recombination not of SARS-CoV-2, but of SARS-CoV-1 and its relatives. We show that the lineage that includes SARS-CoV-2 is most likely the ancestral ACE2-using lineage, and that recombination with at least one virus from this group conferred ACE2 usage to the lineage including SARS-CoV-1 at some time in the past. We argue that alternative scenarios such as convergent evolution are much less parsimonious; we show that biogeography and patterns of host tropism support the plausibility of a recombination scenario, and we propose a competitive release hypothesis to explain how this recombination event could have occurred and why it is evolutionarily advantageous. The findings provide important insights into the natural history of ACE2 usage for both SARS-CoV-1 and SARS-CoV-2 and a greater understanding of the evolutionary mechanisms that shape zoonotic potential of coronaviruses. This study also underscores the need for increased surveillance for sarbecoviruses in southwestern China, where most ACE2-using viruses have been found to date, as well as other regions such as Africa, where these viruses have only recently been discovered. 
    more » « less
  2. null (Ed.)
    The novel coronavirus severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) is the cause of COVID-19. The main receptor of SARS-CoV-2, angiotensin I converting enzyme 2 (ACE2), is now undergoing extensive scrutiny to understand the routes of transmission and sensitivity in different species. Here, we utilized a unique dataset of ACE2 sequences from 410 vertebrate species, including 252 mammals, to study the conservation of ACE2 and its potential to be used as a receptor by SARS-CoV-2. We designed a five-category binding score based on the conservation properties of 25 amino acids important for the binding between ACE2 and the SARS-CoV-2 spike protein. Only mammals fell into the medium to very high categories and only catarrhine primates into the very high category, suggesting that they are at high risk for SARS-CoV-2 infection. We employed a protein structural analysis to qualitatively assess whether amino acid changes at variable residues would be likely to disrupt ACE2/SARS-CoV-2 spike protein binding and found the number of predicted unfavorable changes significantly correlated with the binding score. Extending this analysis to human population data, we found only rare (frequency <0.001) variants in 10/25 binding sites. In addition, we found significant signals of selection and accelerated evolution in the ACE2 coding sequence across all mammals, and specific to the bat lineage. Our results, if confirmed by additional experimental data, may lead to the identification of intermediate host species for SARS-CoV-2, guide the selection of animal models of COVID-19, and assist the conservation of animals both in native habitats and in human care. 
    more » « less
  3. Coronaviruses are positive sense, single-stranded, enveloped, and non-segmented RNA viruses that belong to the Coronaviridae family within the order Nidovirales and suborder Coronavirinae. Two Alphacoronavirus strains: HCoV-229E and HCoV-NL63 and five Betacoronaviruses: HCoV-HKU1, HCoV-OC43, SARS-CoV, MERS-CoV, and SARS-CoV-2 have so far been recognized as Human Coronaviruses (HCoVs). Coronavirus disease 2019 (COVID-19) caused by SARS-CoV-2 is currently the greatest concern for humanity. Despite the overflow of research on SARS-CoV-2 and other HCoVs published every week, existing knowledge in this area is insufficient for the complete understanding of the viruses and the diseases caused by them. This review is based on the analysis of 210 published works, and it attempts to cover the basic biology of coronaviruses, including the genetic characteristics, life cycle, and host-pathogen interaction, pathogenesis, the antiviral drugs, and vaccines against HCoVs, especially focusing on SARS-CoV-2. Furthermore, we will briefly discuss the potential link between extracellular vesicles (EVs) and SARS-CoV-2/COVID-19 pathophysiology. 
    more » « less
  4. null (Ed.)
    Viruses such as the novel coronavirus, SARS-CoV-2, that is wreaking havoc on the world, depend on interactions of its own proteins with those of the human host cells. Relatively small changes in sequence such as between SARS-CoV and SARS-CoV-2 can dramatically change clinical phenotypes of the virus, including transmission rates and severity of the disease. On the other hand, highly dissimilar virus families such as Coronaviridae, Ebola, and HIV have overlap in functions. In this work we aim to analyze the role of protein sequence in the binding of SARS-CoV-2 virus proteins towards human proteins and compare it to that of the above other viruses. We build supervised machine learning models, using Generalized Additive Models to predict interactions based on sequence features and find that our models perform well with an AUC-PR of 0.65 in a class-skew of 1:10. Analysis of the novel predictions using an independent dataset showed statistically significant enrichment. We further map the importance of specific amino-acid sequence features in predicting binding and summarize what combinations of sequences from the virus and the host is correlated with an interaction. By analyzing the sequence-based embeddings of the interactomes from different viruses and clustering them together we find some functionally similar proteins from different viruses. For example, vif protein from HIV-1, vp24 from Ebola and orf3b from SARS-CoV all function as interferon antagonists. Furthermore, we can differentiate the functions of similar viruses, for example orf3a’s interactions are more diverged than orf7b interactions when comparing SARS-CoV and SARS-CoV-2. 
    more » « less
  5. null (Ed.)
    Australia’s 81 bat species play vital ecological and economic roles via suppression of insect pests and maintenance of native forests through pollination and seed dispersal. Bats also host a wide diversity of coronaviruses globally, including several viral species that are closely related to SARS-CoV-2 and other emergent human respiratory coronaviruses. Although there are hundreds of studies of bat coronaviruses globally, there are only three studies of bat coronaviruses in Australian bat species, and no systematic studies of drivers of shedding. These limited studies have identified two betacoronaviruses and seven alphacoronaviruses, but less than half of Australian species are included in these studies and further research is therefore needed. There is no current evidence of spillover of coronaviruses from bats to humans in Australia, either directly or indirectly via intermediate hosts. The limited available data are inadequate to determine whether this lack of evidence indicates that spillover does not occur or occurs but is undetected. Conversely, multiple international agencies have flagged the potential transmission of human coronaviruses (including SARS CoV-2) from humans to bats, and the consequent threat to bat conservation and human health. Australia has a long history of bat research across a broad range of ecological and associated disciplines, as well as expertise in viral spillover from bats. This strong foundation is an ideal platform for developing integrative approaches to understanding bat health and sustainable protection of human health. 
    more » « less