Back and forth transmission of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) between humans and animals will establish wild reservoirs of virus that endanger long-term efforts to control COVID-19 in people and to protect vulnerable animal populations. Better targeting surveillance and laboratory experiments to validate zoonotic potential requires predicting high-risk host species. A major bottleneck to this effort is the few species with available sequences for angiotensin-converting enzyme 2 receptor, a key receptor required for viral cell entry. We overcome this bottleneck by combining species' ecological and biological traits with three-dimensional modelling of host-virus protein–protein interactions using machine learning. This approach enables predictions about the zoonotic capacity of SARS-CoV-2 for greater than 5000 mammals—an order of magnitude more species than previously possible. Our predictions are strongly corroborated by in vivo studies. The predicted zoonotic capacity and proximity to humans suggest enhanced transmission risk from several common mammals, and priority areas of geographic overlap between these species and global COVID-19 hotspots. With molecular data available for only a small fraction of potential animal hosts, linking data across biological scales offers a conceptual advance that may expand our predictive modelling capacity for zoonotic viruses with similarly unknown host ranges.
more »
« less
Mammal virus diversity estimates are unstable due to accelerating discovery effort
Host-virus association data underpin research into the distribution and eco-evolutionary correlates of viral diversity and zoonotic risk across host species. However, current knowledge of the wildlife virome is inherently constrained by historical discovery effort, and there are concerns that the reliability of ecological inference from host-virus data may be undermined by taxonomic and geographical sampling biases. Here, we evaluate whether current estimates of host-level viral diversity in wild mammals are stable enough to be considered biologically meaningful, by analysing a comprehensive dataset of discovery dates of 6571 unique mammal host-virus associations between 1930 and 2018. We show that virus discovery rates in mammal hosts are either constant or accelerating, with little evidence of declines towards viral richness asymptotes, even in highly sampled hosts. Consequently, inference of relative viral richness across host species has been unstable over time, particularly in bats, where intensified surveillance since the early 2000s caused a rapid rearrangement of species' ranked viral richness. Our results illustrate that comparative inference of host-level virus diversity across mammals is highly sensitive to even short-term changes in sampling effort. We advise caution to avoid overinterpreting patterns in current data, since it is feasible that an analysis conducted today could draw quite different conclusions than one conducted only a decade ago.
more »
« less
- Award ID(s):
- 2021909
- PAR ID:
- 10312508
- Date Published:
- Journal Name:
- Biology Letters
- Volume:
- 18
- Issue:
- 1
- ISSN:
- 1744-957X
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Bats host a number of pathogens that cause severe disease and onward transmission in humans and domestic animals. Some of these pathogens, including henipaviruses and filoviruses, are considered a concern for future pandemics. There has been substantial effort to identify these viruses in bats. However, the reservoir hosts for Ebola virus are still unknown and henipaviruses are largely uncharacterized across their distribution. Identifying reservoir species is critical in understanding the viral ecology within these hosts and the conditions that lead to spillover. We collated surveillance data to identify taxonomic patterns in prevalence and seroprevalence and to assess sampling efforts across species. We systematically collected data on filovirus and henipavirus detections and used a machine-learning algorithm, phylofactorization, in order to search the bat phylogeny for cladistic patterns in filovirus and henipavirus infection, accounting for sampling efforts. Across sampled bat species, evidence for filovirus infection was widely dispersed across the sampled phylogeny. We found major gaps in filovirus sampling in bats, especially in Western Hemisphere species. Evidence for henipavirus infection was clustered within the Pteropodidae; however, no other clades have been as intensely sampled. The major predictor of filovirus and henipavirus exposure or infection was sampling effort. Based on these results, we recommend expanding surveillance for these pathogens across the bat phylogenetic tree.more » « less
-
Mammals host a wide diversity of parasites. Lice, comprising more than 5,000 species, are one group of ectoparasites whose major lineages have a somewhat patchwork distribution across the major groups of mammals. Here we explored patterns in the diversification of mammalian lice by reconstructing a higher-level phylogeny of these lice, leveraging whole genome sequence reads to assemble single-copy orthologue genes across the genome. The evolutionary tree of lice indicated that three of the major lineages of placental mammal lice had a single common ancestor. Comparisons of this parasite phylogeny with that for their mammalian hosts indicated that the common ancestor of elephants, elephant shrews and hyraxes (that is, Afrotheria) was the ancestral host of this group of lice. Other groups of placental mammals obtained their lice via host-switching out of these Afrotherian ancestors. In addition, reconstructions of the ancestral host group (bird versus mammal) for all parasitic lice supported an avian ancestral host, indicating that the ancestor of Afrotheria acquired these parasites via host-switching from an ancient avian host. These results shed new light on the long-standing question of why the major groups of parasitic lice are not uniformly distributed across mammals and reveal the origins of mammalian lice.more » « less
-
Abstract The fields of viral ecology and evolution are rapidly expanding, motivated in part by concerns around emerging zoonoses. One consequence is the proliferation of host–virus association data, which underpin viral macroecology and zoonotic risk prediction but remain fragmented across numerous data portals. In the present article, we propose that synthesis of host–virus data is a central challenge to characterize the global virome and develop foundational theory in viral ecology. To illustrate this, we build an open database of mammal host–virus associations that reconciles four published data sets. We show that this offers a substantially richer view of the known virome than any individual source data set but also that databases such as these risk becoming out of date as viral discovery accelerates. We argue for a shift in practice toward the development, incremental updating, and use of synthetic data sets in viral ecology, to improve replicability and facilitate work to predict the structure and dynamics of the global virome.more » « less
-
Scarpino, Samuel V (Ed.)Viruses of microbes are ubiquitous biological entities that reprogram their hosts’ metabolisms during infection in order to produce viral progeny, impacting the ecology and evolution of microbiomes with broad implications for human and environmental health. Advances in genome sequencing have led to the discovery of millions of novel viruses and an appreciation for the great diversity of viruses on Earth. Yet, with knowledge of only“who is there?”we fall short in our ability to infer the impacts of viruses on microbes at population, community, and ecosystem-scales. To do this, we need a more explicit understanding“who do they infect?”Here, we developed a novel machine learning model (ML), Virus-Host Interaction Predictor (VHIP), to predict virus-host interactions (infection/non-infection) from input virus and host genomes. This ML model was trained and tested on a high-value manually curated set of 8849 virus-host pairs and their corresponding sequence data. The resulting dataset, ‘Virus Host Range network’ (VHRnet), is core to VHIP functionality. Each data point that underlies the VHIP training and testing represents a lab-tested virus-host pair in VHRnet, from which meaningful signals of viral adaptation to host were computed from genomic sequences. VHIP departs from existing virus-host prediction models in its ability to predict multiple interactions rather than predicting a single most likely host or host clade. As a result, VHIP is able to infer the complexity of virus-host networks in natural systems. VHIP has an 87.8% accuracy rate at predicting interactions between virus-host pairs at the species level and can be applied to novel viral and host population genomes reconstructed from metagenomic datasets.more » « less
An official website of the United States government

