skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Sapling: Inferring and Summarizing Tumor Phylogenies from Bulk Data Using Backbone Trees
Cancer phylogenies are key to understanding tumor evolution. There exist many important downstream analyses that take as input a single or a small number of trees. However, due to uncertainty, one typically infers many, equally-plausible phylogenies from bulk DNA sequencing data of tumors. We introduce Sapling, a heuristic method to solve the Backbone Tree Inference from Reads problem, which seeks a small set of backbone trees on a smaller subset of mutations that collectively summarize the entire solution space. Sapling also includes a greedy algorithm to solve the Backbone Tree Expansion from Reads problem, which aims to expand an inferred backbone tree into a full tree. We prove that both problems are NP-hard. On simulated and real data, we demonstrate that Sapling is capable of inferring high-quality backbone trees that adequately summarize the solution space and that can be expanded into full trees.  more » « less
Award ID(s):
2046488
PAR ID:
10582744
Author(s) / Creator(s):
;
Editor(s):
Pissis, Solon P; Sung, Wing-Kin
Publisher / Repository:
Schloss Dagstuhl – Leibniz-Zentrum für Informatik
Date Published:
Volume:
312
ISSN:
1868-8969
ISBN:
978-3-95977-340-9
Page Range / eLocation ID:
7:1-7:19
Subject(s) / Keyword(s):
Cancer intra-tumor heterogeneity consensus maximum agreement Applied computing → Computational biology
Format(s):
Medium: X Size: 19 pages; 12118910 bytes Other: application/pdf
Size(s):
19 pages 12118910 bytes
Right(s):
Creative Commons Attribution 4.0 International license; info:eu-repo/semantics/openAccess
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract MotivationCancer phylogenies are key to studying tumorigenesis and have clinical implications. Due to the heterogeneous nature of cancer and limitations in current sequencing technology, current cancer phylogeny inference methods identify a large solution space of plausible phylogenies. To facilitate further downstream analyses, methods that accurately summarize such a set T of cancer phylogenies are imperative. However, current summary methods are limited to a single consensus tree or graph and may miss important topological features that are present in different subsets of candidate trees. ResultsWe introduce the Multiple Consensus Tree (MCT) problem to simultaneously cluster T and infer a consensus tree for each cluster. We show that MCT is NP-hard, and present an exact algorithm based on mixed integer linear programming (MILP). In addition, we introduce a heuristic algorithm that efficiently identifies high-quality consensus trees, recovering all optimal solutions identified by the MILP in simulated data at a fraction of the time. We demonstrate the applicability of our methods on both simulated and real data, showing that our approach selects the number of clusters depending on the complexity of the solution space T. Availability and implementationhttps://github.com/elkebir-group/MCT. Supplementary informationSupplementary data are available at Bioinformatics online. 
    more » « less
  2. Abstract Management of tree cover, either to curb bush encroachment or to mitigate losses of woody cover to over‐browsing, is a major concern in savanna ecosystems. Once established, trees are often “trapped” as saplings, since interactions among disturbance, plant competition, and precipitation delay sapling recruitment into adult size classes. Saplings can be directly suppressed by wildlife browsing and competition from adjacent plants, and indirectly facilitated by grazers, such as cattle, which feed on neighboring grasses. Yet few experimental studies have simultaneously quantified the effects of cattle and wildlife on sapling growth, particularly over long time scales. We used a series of replicated 4‐ha herbivore‐manipulation plots to investigate the net effects of wildlife and moderate cattle grazing onAcacia drepanolobiumsapling growth over 10 years that encompassed extended wet and dry periods. We also simulated more intense cattle grazing using grass removal treatments (0.5‐m radius around saplings), and we quantified the role of intraspecific tree competition using neighborhood tree surveys (trees within a 3‐m radius). Wildlife, which included elephants, had a positive effect on sapling growth. Wildlife also reduced neighbor tree density during the 10‐yr study, which likely caused the positive effect of wildlife on saplings. Although moderate cattle grazing did not affect sapling growth, grass removal treatments simulating heavy grazing increased sapling growth. Both grass removal and neighbor tree effects on saplings were strongest during above‐average rainfall years following drought. This highlights that livestock‐driven reductions in grass cover and catastrophic wildlife damage to trees during droughts present a need, or an opportunity, for targeted management of sapling growth and woody plant cover during ensuing wet periods. 
    more » « less
  3. Several algorithms build on the perfect phylogeny model to infer evolutionary trees. This problem is particularly hard when evolutionary trees are inferred from the fraction of genomes that have mutations in different positions, across different samples. Existing algorithms might do extensive searches over the space of possible trees. At the center of these algorithms is a projection problem that assigns a fitness cost to phylogenetic trees. In order to perform a wide search over the space of the trees, it is critical to solve this projection problem fast. In this paper, we use Moreau's decomposition for proximal operators, and a tree reduction scheme, to develop a new algorithm to compute this projection. Our algorithm terminates with an exact solution in a finite number of steps, and is extremely fast. In particular, it can search over all evolutionary trees with fewer than 11 nodes, a size relevant for several biological problems (more than 2 billion trees) in about 2 hours. 
    more » « less
  4. Xia, Xuhua (Ed.)
    Abstract Phylogenetic trees inferred from sequence data often have branch lengths measured in the expected number of substitutions and therefore, do not have divergence times estimated. These trees give an incomplete view of evolutionary histories since many applications of phylogenies require time trees. Many methods have been developed to convert the inferred branch lengths from substitution unit to time unit using calibration points, but none is universally accepted as they are challenged in both scalability and accuracy under complex models. Here, we introduce a new method that formulates dating as a non-convex optimization problem where the variance of log-transformed rate multipliers are minimized across the tree. On simulated and real data, we show that our method, wLogDate, is often more accurate than alternatives and is more robust to various model assumptions. 
    more » « less
  5. null (Ed.)
    Bark beetles naturally inhabit forests and can cause large-scale tree mortality when they reach epidemic population numbers. A recent epidemic (1990s–2010s), primarily driven by mountain pine beetles ( Dendroctonus ponderosae ), was a leading mortality agent in western United States forests. Predictive models of beetle populations and their impact on forests largely depend on host related parameters, such as stand age, basal area, and density. We hypothesized that bark beetle attack patterns are also dependent on inferred beetle population densities: large epidemic populations of beetles will preferentially attack large-diameter trees, and successfully kill them with overwhelming numbers. Conversely, small endemic beetle populations will opportunistically attack stressed and small trees. We tested this hypothesis using 12 years of repeated field observations of three dominant forest species (lodgepole pine Pinus contorta , Engelmann spruce Picea engelmannii , and subalpine fir Abies lasiocarpa ) in subalpine forests of southeastern Wyoming paired with a Bayesian modeling approach. The models provide probabilistic predictions of beetle attack patterns that are free of assumptions required by frequentist models that are often violated in these data sets. Furthermore, we assessed seedling/sapling regeneration in response to overstory mortality and hypothesized that higher seedling/sapling establishment occurs in areas with highest overstory mortality because resources are freed from competing trees. Our results indicate that large-diameter trees were more likely to be attacked and killed by bark beetles than small-diameter trees during epidemic years for all species, but there was no shift toward preferentially attacking small-diameter trees in post-epidemic years. However, probabilities of bark beetle attack and mortality increased for small diameter lodgepole pine and Engelmann spruce trees in post-epidemic years compared to epidemic years. We also show an increase in overall understory growth (graminoids, forbs, and shrubs) and seedling/sapling establishment in response to beetle-caused overstory mortality, especially in lodgepole pine dominated stands. Our observations provide evidence of the trajectories of attack and mortality as well as early forest regrowth of three common tree species during the transition from epidemic to post-epidemic stages of bark beetle populations in the field. 
    more » « less