NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

DateLife: Leveraging Databases and Analytical Tools to Reveal the Dated Tree of Life

https://doi.org/10.1093/sysbio/syae015

Sánchez_Reyes, Luna_L; McTavish, Emily_Jane; O’Meara, Brian; Silvestro, ed., Daniele (March 2024, Systematic Biology)

Abstract Chronograms—phylogenies with branch lengths proportional to time—represent key data on timing of evolutionary events, allowing us to study natural processes in many areas of biological research. Chronograms also provide valuable information that can be used for education, science communication, and conservation policy decisions. Yet, achieving a high-quality reconstruction of a chronogram is a difficult and resource-consuming task. Here we present DateLife, a phylogenetic software implemented as an R package and an R Shiny web application available at www.datelife.org, that provides services for efficient and easy discovery, summary, reuse, and reanalysis of node age data mined from a curated database of expert, peer-reviewed, and openly available chronograms. The main DateLife workflow starts with one or more scientific taxon names provided by a user. Names are processed and standardized to a unified taxonomy, allowing DateLife to run a name match across its local chronogram database that is curated from Open Tree of Life’s phylogenetic repository, and extract all chronograms that contain at least two queried taxon names, along with their metadata. Finally, node ages from matching chronograms are mapped using the congruification algorithm to corresponding nodes on a tree topology, either extracted from Open Tree of Life’s synthetic phylogeny or one provided by the user. Congruified node ages are used as secondary calibrations to date the chosen topology, with or without initial branch lengths, using different phylogenetic dating methods such as BLADJ, treePL, PATHd8, and MrBayes. We performed a cross-validation test to compare node ages resulting from a DateLife analysis (i.e, phylogenetic dating using secondary calibrations) to those from the original chronograms (i.e, obtained with primary calibrations), and found that DateLife’s node age estimates are consistent with the age estimates from the original chronograms, with the largest variation in ages occurring around topologically deeper nodes. Because the results from any software for scientific analysis can only be as good as the data used as input, we highlight the importance of considering the results of a DateLife analysis in the context of the input chronograms. DateLife can help to increase awareness of the existing disparities among alternative hypotheses of dates for the same diversification events, and to support exploration of the effect of alternative chronogram hypotheses on downstream analyses, providing a framework for a more informed interpretation of evolutionary results.
more » « less
Leveraging shared ancestral variation to detect local introgression

https://doi.org/10.1371/journal.pgen.1010155

Lopez Fang, Lesly; Peede, David; Ortega-Del Vecchyo, Diego; McTavish, Emily Jane; Huerta-Sánchez, Emilia (January 2024, PLOS Genetics)
Zhu, Xiaofeng (Ed.)
Introgression is a common evolutionary phenomenon that results in shared genetic material across non-sister taxa. Existing statistical methods such as Patterson’sDstatistic can detect introgression by measuring an excess of shared derived alleles between populations. TheDstatistic is effective to detect genome-wide patterns of introgression but can give spurious inferences of introgression when applied to local regions. We propose a new statistic,D⁺, that leverages both shared ancestral and derived alleles to infer local introgressed regions. Incorporating both shared derived and ancestral alleles increases the number of informative sites per region, improving our ability to identify local introgression. We use a coalescent framework to derive the expected value of this statistic as a function of different demographic parameters under an instantaneous admixture model and use coalescent simulations to compute the power and precision ofD⁺. While the power ofDandD⁺is comparable,D⁺has better precision thanD. We applyD⁺to empirical data from the 1000 Genome Project andHeliconiusbutterflies to infer local targets of introgression in humans and in butterflies.
more » « less
Full Text Available
Approachable Case Studies Support Learning and Reproducibility in Data Science: An Example from Evolutionary Biology

https://doi.org/10.1080/26939169.2022.2099487

Sanchez Reyes, Luna L.; McTavish, Emily Jane (September 2022, Journal of Statistics and Data Science Education)

Full Text Available
Rapid alignment updating with Extensiphy

https://doi.org/10.1111/2041-210X.13790

Field, Jasper Toscani; Abrams, A. Jeanine; Cartee, John C.; McTavish, Emily Jane (March 2022, Methods in Ecology and Evolution)

Full Text Available
Color Polymorphism is a Driver of Diversification in the Lizard Family Lacertidae

https://doi.org/10.1093/sysbio/syab046

Brock, Kinsey M; McTavish, Emily Jane; Edwards, Danielle L (June 2021, Systematic Biology)
Eaton, Deren (Ed.)
Abstract Color polymorphism—two or more heritable color phenotypes maintained within a single breeding population—is an extreme type of intraspecific diversity widespread across the tree of life. Color polymorphism is hypothesized to be an engine for speciation, where morph loss or divergence between distinct color morphs within a species results in the rapid evolution of new lineages, and thus, color polymorphic lineages are expected to display elevated diversification rates. Multiple species in the lizard family Lacertidae are color polymorphic, making them an ideal group to investigate the evolutionary history of this trait and its influence on macroevolution. Here, we produce a comprehensive species-level phylogeny of the lizard family Lacertidae to reconstruct the evolutionary history of color polymorphism and test if color polymorphism has been a driver of diversification. Accounting for phylogenetic uncertainty with multiple phylogenies and simulation studies, we estimate an ancient origin of color polymorphism (111 Ma) within the Lacertini tribe (subfamily Lacertinae). Color polymorphism most likely evolved few times in the Lacertidae and has been lost at a much faster rate than gained. Evolutionary transitions to color polymorphism are associated with shifts in increased net diversification rate in this family of lizards. Taken together, our empirical results support long-standing theoretical expectations that color polymorphism is a driver of diversification.[Color polymorphism; Lacertidae; state-dependent speciation extinction models; trait-dependent diversification.]
more » « less
Full Text Available
Linking Biodiversity Data Using Evolutionary History

https://doi.org/10.3897/biss.3.36207

McTavish, Emily Jane (June 2019, Biodiversity Information Science and Standards)

All life on earth is linked by a shared evolutionary history. Even before Darwin developed the theory of evolution, Linnaeus categorized types of organisms based on their shared traits. We now know these traits derived from these species’ shared ancestry. This evolutionary history provides a natural framework to harness the enormous quantities of biological data being generated today. The Open Tree of Life project is a collaboration developing tools to curate and share evolutionary estimates (phylogenies) covering the entire tree of life (Hinchliff et al. 2015, McTavish et al. 2017). The tree is viewable at https://tree.opentreeoflife.org, and the data is all freely available online. The taxon identifiers used in the Open Tree unified taxonomy (Rees and Cranston 2017) are mapped to identifiers across biological informatics databases, including the Global Biodiversity Information Facility (GBIF), NCBI, and others. Linking these identifiers allows researchers to easily unify data from across these different resources (Fig. 1). Leveraging a unified evolutionary framework across the diversity of life provides new avenues for integrative wide scale research. Downstream tools, such as R packages developed by the R OpenSci foundation (rotl, rgbif) (Michonneau et al. 2016, Chamberlain 2017) and others tools (Revell 2012), make accessing and combining this information straightforward for students as well as researchers (e.g. https://mctavishlab.github.io/BIO144/labs/rotl-rgbif.html). Figure 1. Example linking phylogenetic relationships accessed from the Open Tree of Life with specimen location data from Global Biodiversity Information Facility. For example, a recent publication by Santorelli et al. 2018 linked evolutionary information from Open Tree with species locality data gathered from a local field study as well as GBIF species location records to test a river-barrier hypothesis in the Amazon. By combining these data, the authors were able test a widely held biogeographic hypothesis across 1952 species in 14 taxonomic groups, and found that a river that had been postulated to drive endemism, was in fact not a barrier to gene flow. However, data provenance and taxonomic name reconciliation remain key hurdles to applying data from these large digital biodiversity and evolution community resources to answering biological questions. In the Amazonian river analysis, while they leveraged use of GBIF records as a secondary check on their species records, they relied on their an intensive local field study for their major conclusions, and preferred taxon specific phylogenetic resources over Open Tree where they were available (Santorelli et al. 2018). When Li et al. 2018 assessed large scale phylogenetic approaches, including Open Tree, for measuring community diversity, they found that synthesis phylogenies were less resolved than purpose-built phylogenies, but also found that these synthetic phylogenies were sufficient for community level phylogenetic diversity analyses. Nonetheless, data quality concerns have limited adoption of analyses data from centralized resources (McTavish et al. 2017). Taxonomic name recognition and reconciliation across databases also remains a hurdle for large scale analyses, despite several ongoing efforts to improve taxonomic interoperability and unify taxonomies, such at Catalogue of Life + (Bánki et al. 2018). In order to support innovative science, large scale digital data resources need to facilitate data linkage between resources, and address researchers' data quality and provenance concerns. I will present the model that the Open Tree of Life is using to provide evolutionary data at the scale of the entire tree of life, while maintaining traceable provenance to the publications and taxonomies these evolutionary relationships are inferred from. I will discuss the hurdles to adoption of these large scale resources by researchers, as well as the opportunities for new research avenues provided by the connections between evolutionary inferences and biodiversity digital databases.
more » « less
Full Text Available

Search for: All records