skip to main content


Title: DaTeR: error-correcting phylogenetic chronograms using relative time constraints
Abstract Motivation

A chronogram is a dated phylogenetic tree whose branch lengths have been scaled to represent time. Such chronograms are computed based on available date estimates (e.g. from dated fossils), which provide absolute time constraints for one or more nodes of an input undated phylogeny, coupled with an appropriate underlying model for evolutionary rates variation along the branches of the phylogeny. However, traditional methods for phylogenetic dating cannot take into account relative time constraints, such as those provided by inferred horizontal transfer events. In many cases, chronograms computed using only absolute time constraints are inconsistent with known relative time constraints.

Results

In this work, we introduce a new approach, Dating Trees using Relative constraints (DaTeR), for phylogenetic dating that can take into account both absolute and relative time constraints. The key idea is to use existing Bayesian approaches for phylogenetic dating to sample posterior chronograms satisfying desired absolute time constraints, minimally adjust or ‘error-correct’ these sampled chronograms to satisfy all given relative time constraints, and aggregate across all error-corrected chronograms. DaTeR uses a constrained optimization framework for the error-correction step, finding minimal deviations from previously assigned dates or branch lengths. We applied DaTeR to a biological dataset of 170 Cyanobacterial taxa and a reliable set of 24 transfer-based relative constraints, under six different molecular dating models. Our extensive analysis of this dataset demonstrates that DaTeR is both highly effective and scalable and that its application can significantly improve estimated chronograms.

Availability and implementation

Freely available from https://compbio.engr.uconn.edu/software/dater/

Supplementary information

Supplementary data are available at Bioinformatics online.

 
more » « less
NSF-PAR ID:
10398359
Author(s) / Creator(s):
; ; ; ; ;
Publisher / Repository:
Oxford University Press
Date Published:
Journal Name:
Bioinformatics
Volume:
39
Issue:
2
ISSN:
1367-4811
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract

    Chronograms—phylogenies with branch lengths proportional to time—represent key data on timing of evolutionary events, allowing us to study natural processes in many areas of biological research. Chronograms also provide valuable information that can be used for education, science communication, and conservation policy decisions. Yet, achieving a high-quality reconstruction of a chronogram is a difficult and resource-consuming task. Here we present DateLife, a phylogenetic software implemented as an R package and an R Shiny web application available at www.datelife.org, that provides services for efficient and easy discovery, summary, reuse, and reanalysis of node age data mined from a curated database of expert, peer-reviewed, and openly available chronograms. The main DateLife workflow starts with one or more scientific taxon names provided by a user. Names are processed and standardized to a unified taxonomy, allowing DateLife to run a name match across its local chronogram database that is curated from Open Tree of Life’s phylogenetic repository, and extract all chronograms that contain at least two queried taxon names, along with their metadata. Finally, node ages from matching chronograms are mapped using the congruification algorithm to corresponding nodes on a tree topology, either extracted from Open Tree of Life’s synthetic phylogeny or one provided by the user. Congruified node ages are used as secondary calibrations to date the chosen topology, with or without initial branch lengths, using different phylogenetic dating methods such as BLADJ, treePL, PATHd8, and MrBayes. We performed a cross-validation test to compare node ages resulting from a DateLife analysis (i.e, phylogenetic dating using secondary calibrations) to those from the original chronograms (i.e, obtained with primary calibrations), and found that DateLife’s node age estimates are consistent with the age estimates from the original chronograms, with the largest variation in ages occurring around topologically deeper nodes. Because the results from any software for scientific analysis can only be as good as the data used as input, we highlight the importance of considering the results of a DateLife analysis in the context of the input chronograms. DateLife can help to increase awareness of the existing disparities among alternative hypotheses of dates for the same diversification events, and to support exploration of the effect of alternative chronogram hypotheses on downstream analyses, providing a framework for a more informed interpretation of evolutionary results.

     
    more » « less
  2. Abstract

    The amount and patterns of phylodiversity in a community are often used to draw inferences about the local and historical factors affecting community assembly and can be used to prioritize communities and locations for conservation. Because measures of phylodiversity are based on the topology and branch lengths of phylogenetic trees, which are affected by the number and diversity of taxa in the tree, these analyses may be sensitive to changes in taxon sampling and tree reconstruction methods.

    To investigate the effects of taxon sampling and tree reconstruction methods on measures of phylodiversity, we investigated the community phylogenetics of the Ordway‐Swisher Biological Station (Florida), which is home to over 600 species of vascular plants. We studied the effects of (a) the number of taxa included in the regional phylogeny; (b) random versus targeted sampling of species to assemble the regional species pool; (c) including only species from specific clades rather than broad sampling; (d) using trees reconstructed directly for the taxa under study compared to trees pruned from a larger reconstructed tree; and (e) using phylograms compared to chronograms.

    We found that including more taxa in a study increases the likelihood of observing significantly nonrandom phylogenetic patterns. However, there were no consistent trends in the phylodiversity patterns based on random taxon sampling compared to targeted sampling, or within individual clades compared to the complete dataset. Using pruned and reconstructed phylogenies resulted in similar patterns of phylodiversity, while chronograms in some cases led to significantly different results from phylograms.

    The methods commonly used in community phylogenetic studies can significantly impact the results, potentially influencing both inferences of community assembly and conservation decisions. We highlight the need for both careful selection of methods in community phylogenetic studies and appropriate interpretation of results, depending on the specific questions to be addressed.

     
    more » « less
  3. Premise

    Phylogenetic relationships within major angiosperm clades are increasingly well resolved, but largely informed by plastid data. Areas of poor resolution persist within the Dipsacales, including placement ofHeptacodiumandZabelia, and relationships within the Caprifolieae and Linnaeeae, hindering our interpretation of morphological evolution. Here, we sampled a significant number of nuclear loci using a Hyb‐Seq approach and used these data to infer the Dipsacales phylogeny and estimate divergence times.

    Methods

    Sampling all major clades within the Dipsacales, we applied the Angiosperms353 probe set to 96 species. Data were filtered based on locus completeness and taxon recovery per locus, and trees were inferred using RAxML and ASTRAL. Plastid loci were assembled from off‐target reads, and 10 fossils were used to calibrate dated trees.

    Results

    Varying numbers of targeted loci and off‐target plastomes were recovered from most taxa. Nuclear and plastid data confidently placeHeptacodiumwith Caprifolieae, implying homoplasy in calyx morphology, ovary development, and fruit type. Placement ofZabelia, and relationships within the Caprifolieae and Linnaeeae, remain uncertain. Dipsacales diversification began earlier than suggested by previous angiosperm‐wide dating analyses, but many major splitting events date to the Eocene.

    Conclusions

    The Angiosperms353 probe set facilitated the assembly of a large, single‐copy nuclear dataset for the Dipsacales. Nevertheless, many relationships remain unresolved, and resolution was poor for woody clades with low rates of molecular evolution. We favor expanding the Angiosperms353 probe set to include more variable loci and loci of special interest, such as developmental genes, within particular clades.

     
    more » « less
  4. Abstract Motivation

    Phylogenomics faces a dilemma: on the one hand, most accurate species and gene tree estimation methods are those that co-estimate them; on the other hand, these co-estimation methods do not scale to moderately large numbers of species. The summary-based methods, which first infer gene trees independently and then combine them, are much more scalable but are prone to gene tree estimation error, which is inevitable when inferring trees from limited-length data. Gene tree estimation error is not just random noise and can create biases such as long-branch attraction.

    Results

    We introduce a scalable likelihood-based approach to co-estimation under the multi-species coalescent model. The method, called quartet co-estimation (QuCo), takes as input independently inferred distributions over gene trees and computes the most likely species tree topology and internal branch length for each quartet, marginalizing over gene tree topologies and ignoring branch lengths by making several simplifying assumptions. It then updates the gene tree posterior probabilities based on the species tree. The focus on gene tree topologies and the heuristic division to quartets enables fast likelihood calculations. We benchmark our method with extensive simulations for quartet trees in zones known to produce biased species trees and further with larger trees. We also run QuCo on a biological dataset of bees. Our results show better accuracy than the summary-based approach ASTRAL run on estimated gene trees.

    Availability and implementation

    QuCo is available on https://github.com/maryamrabiee/quco.

    Supplementary information

    Supplementary data are available at Bioinformatics online.

     
    more » « less
  5. Abstract Aim

    The origin of the amphitropic Mediterranean Basin and southern African disjunction (European–African amphitropical disjunction; EAAD) pattern is generally attributed to recent dispersal events. However, our knowledge is limited because the origin of the EAAD pattern has been almost exclusively studied in plants. Here, we investigate the origin of this wide‐ranging disjunction pattern in a group of wingless insects, consisting of two major clades, both of which have EAAD distributions.

    Location

    Sub‐Saharan Africa and Mediterranean region.

    Taxon

    Tribe Dendarini (Coleoptera: Tenebrionidae).

    Methods

    We reconstructed a dated molecular phylogeny of major lineages within Dendarini using maximum likelihood and Bayesian inference. The employed dataset included sequences of six genes (two mitochondrial and four nuclear fragments) generated for 72 species. To investigate the sequence and timing leading to present‐day wide‐ranging disjunction patterns, we conducted parametric historical biogeography analyses.

    Results

    The dated phylogenetic framework supports the monophyly of all major Dendarini lineages and highlights the origin of the tribe in sub‐Saharan Africa during the Middle Eocene. From there, representatives of the two major lineages colonized the Mediterranean region at the Oligocene‐Miocene boundary, with one lineage first reaching North Africa, whilst the other reached southern Europe.

    Main conclusions

    The origin of the EAAD in Dendarini beetles is ancient and better explained by the progressive fragmentation of the pan‐African rainforest that started in the Early Eocene than by other scenarios. This and the increased aridification associated with the global long‐term cooling trend that took place at that time had a strong influence on the diversification and distribution of xerophilic organisms such as dendarine beetles. This challenges the understanding of the origin of EAAD patterns, highlighting that they do not only result from recent dispersal events between the Pliocene and Pleistocene.

     
    more » « less