skip to main content


Title: Sampling and summarizing transmission trees with multi-strain infections
Abstract Motivation The combination of genomic and epidemiological data holds the potential to enable accurate pathogen transmission history inference. However, the inference of outbreak transmission histories remains challenging due to various factors such as within-host pathogen diversity and multi-strain infections. Current computational methods ignore within-host diversity and/or multi-strain infections, often failing to accurately infer the transmission history. Thus, there is a need for efficient computational methods for transmission tree inference that accommodate the complexities of real data. Results We formulate the direct transmission inference (DTI) problem for inferring transmission trees that support multi-strain infections given a timed phylogeny and additional epidemiological data. We establish hardness for the decision and counting version of the DTI problem. We introduce Transmission Tree Uniform Sampler (TiTUS), a method that uses SATISFIABILITY to almost uniformly sample from the space of transmission trees. We introduce criteria that prioritize parsimonious transmission trees that we subsequently summarize using a novel consensus tree approach. We demonstrate TiTUS’s ability to accurately reconstruct transmission trees on simulated data as well as a documented HIV transmission chain. Availability and implementation https://github.com/elkebir-group/TiTUS. Supplementary information Supplementary data are available at Bioinformatics online.  more » « less
Award ID(s):
1850502 2027669
NSF-PAR ID:
10289261
Author(s) / Creator(s):
;
Date Published:
Journal Name:
Bioinformatics
Volume:
36
Issue:
Supplement_1
ISSN:
1367-4803
Page Range / eLocation ID:
i362 to i370
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract

    Disease dynamics are governed by variation of individuals, species, and environmental conditions across space and time. In some cases, an alternate reservoir host amplifies pathogen loads and drives disease transmission to less competent hosts in a process called pathogen spillover. Spillover is frequently associated with multi‐host disease systems where a single species is more tolerant of infection and more competent in pathogen transmission compared to other hosts. Pathogen spillover must be driven by biotic factors, including host and community characteristics, yet biotic factors interact with the abiotic environment (e.g., temperature) to create disease. Despite its fundamental role in disease dynamics, the influence of the abiotic environment on pathogen spillover has seldom been examined. Improving our understanding of disease processes such as pathogen spillover hinges on disentangling the effects of interrelated biotic and abiotic factors over space and time. We applied 10 yr of fine‐scale microclimate, disease, and tree community data in a path analysis to investigate the relative influence of biotic and abiotic factors on pathogen spillover for the emerging infectious forest disease sudden oak death (SOD). Disease transmission inSODis primarily driven by the reservoir host California bay laurel, which supports high foliar pathogen loads that spillover onto neighboring oak trees and create lethal canker infections. The foliar pathogen load and susceptibility of oaks is expected to be sensitive to forest microclimate conditions. We found that biotic factors of pathogen load and tree diversity had relatively stronger effects on pathogen spillover compared to abiotic microclimate factors, with pathogen load increasing oak infection and tree diversity reducing oak infection. Abiotic factors still had significant effects, with greater heat exposure during summer months reducing pathogen loads and optimal pathogen conditions during the wet season increasing oak infection. Our results offer clues to possible disease dynamics under future climate change where hotter and drier or warmer and wetter conditions could have opposing effects on pathogen spillover in theSODsystem. Disentangling direct and indirect effects of biotic and abiotic factors affecting disease processes can provide key insights into disease dynamics including potential avenues for reducing disease spread and predicting future epidemics.

     
    more » « less
  2. Abstract Motivation

    Cancer phylogenies are key to studying tumorigenesis and have clinical implications. Due to the heterogeneous nature of cancer and limitations in current sequencing technology, current cancer phylogeny inference methods identify a large solution space of plausible phylogenies. To facilitate further downstream analyses, methods that accurately summarize such a set T of cancer phylogenies are imperative. However, current summary methods are limited to a single consensus tree or graph and may miss important topological features that are present in different subsets of candidate trees.

    Results

    We introduce the Multiple Consensus Tree (MCT) problem to simultaneously cluster T and infer a consensus tree for each cluster. We show that MCT is NP-hard, and present an exact algorithm based on mixed integer linear programming (MILP). In addition, we introduce a heuristic algorithm that efficiently identifies high-quality consensus trees, recovering all optimal solutions identified by the MILP in simulated data at a fraction of the time. We demonstrate the applicability of our methods on both simulated and real data, showing that our approach selects the number of clusters depending on the complexity of the solution space T.

    Availability and implementation

    https://github.com/elkebir-group/MCT.

    Supplementary information

    Supplementary data are available at Bioinformatics online.

     
    more » « less
  3. Abstract Motivation Cells in an organism share a common evolutionary history, called cell lineage tree. Cell lineage tree can be inferred from single cell genotypes at genomic variation sites. Cell lineage tree inference from noisy single cell data is a challenging computational problem. Most existing methods for cell lineage tree inference assume uniform uncertainty in genotypes. A key missing aspect is that real single cell data usually has non-uniform uncertainty in individual genotypes. Moreover, existing methods are often sampling based and can be very slow for large data. Results In this article, we propose a new method called ScisTree, which infers cell lineage tree and calls genotypes from noisy single cell genotype data. Different from most existing approaches, ScisTree works with genotype probabilities of individual genotypes (which can be computed by existing single cell genotype callers). ScisTree assumes the infinite sites model. Given uncertain genotypes with individualized probabilities, ScisTree implements a fast heuristic for inferring cell lineage tree and calling the genotypes that allow the so-called perfect phylogeny and maximize the likelihood of the genotypes. Through simulation, we show that ScisTree performs well on the accuracy of inferred trees, and is much more efficient than existing methods. The efficiency of ScisTree enables new applications including imputation of the so-called doublets. Availability and implementation The program ScisTree is available for download at: https://github.com/yufengwudcs/ScisTree. Supplementary information Supplementary data are available at Bioinformatics online. 
    more » « less
  4. Abstract Motivation

    A phylogenetic network is a powerful model to represent entangled evolutionary histories with both divergent (speciation) and convergent (e.g. hybridization, reassortment, recombination) evolution. The standard approach to inference of hybridization networks is to (i) reconstruct rooted gene trees and (ii) leverage gene tree discordance for network inference. Recently, we introduced a method called RF-Net for accurate inference of virus reassortment and hybridization networks from input gene trees in the presence of errors commonly found in phylogenetic trees. While RF-Net demonstrated the ability to accurately infer networks with up to four reticulations from erroneous input gene trees, its application was limited by the number of reticulations it could handle in a reasonable amount of time. This limitation is particularly restrictive in the inference of the evolutionary history of segmented RNA viruses such as influenza A virus (IAV), where reassortment is one of the major mechanisms shaping the evolution of these pathogens.

    Results

    Here, we expand the functionality of RF-Net that makes it significantly more applicable in practice. Crucially, we introduce a fast extension to RF-Net, called Fast-RF-Net, that can handle large numbers of reticulations without sacrificing accuracy. In addition, we develop automatic stopping criteria to select the appropriate number of reticulations heuristically and implement a feature for RF-Net to output error-corrected input gene trees. We then conduct a comprehensive study of the original method and its novel extensions and confirm their efficacy in practice using extensive simulation and empirical IAV evolutionary analyses.

    Availability and implementation

    RF-Net 2 is available at https://github.com/flu-crew/rf-net-2.

    Supplementary information

    Supplementary data are available at Bioinformatics online.

     
    more » « less
  5. Co-infections of hosts by multiple pathogen species are ubiquitous, but predicting their impact on disease remains challenging. Interactions between co-infecting pathogens within hosts can alter pathogen transmission, with the impact on transmission typically dependent on the relative arrival order of pathogens within hosts (within-host priority effects). However, it is unclear how these within-host priority effects influence multi-pathogen epidemics, particularly when the arrival order of pathogens at the host-population scale varies. Here, we combined models and experiments with zooplankton and their naturally co-occurring fungal and bacterial pathogens to examine how within-host priority effects influence multi-pathogen epidemics. Epidemiological models parametrized with within-host priority effects measured at the single-host scale predicted that advancing the start date of bacterial epidemics relative to fungal epidemics would decrease the mean bacterial prevalence in a multi-pathogen setting, while models without within-host priority effects predicted the opposite effect. We tested these predictions with experimental multi-pathogen epidemics. Empirical dynamics matched predictions from the model including within-host priority effects, providing evidence that within-host priority effects influenced epidemic dynamics. Overall, within-host priority effects may be a key element of predicting multi-pathogen epidemic dynamics in the future, particularly as shifting disease phenology alters the order of infection within hosts. 
    more » « less