skip to main content


Title: An r package and online resource for macroevolutionary studies using the ray‐finned fish tree of life
Abstract

Comprehensive, time‐scaled phylogenies provide a critical resource for many questions in ecology, evolution and biodiversity. Methodological advances have increased the breadth of taxonomic coverage in phylogenetic data; however, accessing and reusing these data remain challenging.

We introduce the Fish Tree of Life website and associatedrpackagefishtreeto provide convenient access to sequences, phylogenies, fossil calibrations and diversification rate estimates for the most diverse group of vertebrate organisms, the ray‐finned fishes. The Fish Tree of Life website presents subsets and visual summaries of phylogenetic and comparative data, and is complemented by therpackage, which provides flexible programmatic access to the same underlying data source for advanced users wishing to extend or reanalyse the data.

We demonstrate functionality with an overview of the website, and show three examples of advanced usage through therpackage. First, we test for the presence of long branch attraction artefacts across the fish tree of life. The second example examines the effects of habitat on diversification rate in the pufferfishes. The final example demonstrates how a community phylogenetic analysis could be conducted with the package.

This resource makes a large comparative vertebrate dataset easily accessible via the website, while therpackage enables the rapid reuse and reproducibility of research results via its ability to easily integrate with otherrpackages and software for molecular biology and comparative methods.

 
more » « less
NSF-PAR ID:
10457229
Author(s) / Creator(s):
 ;  ;  ;  ;
Publisher / Repository:
Wiley-Blackwell
Date Published:
Journal Name:
Methods in Ecology and Evolution
Volume:
10
Issue:
7
ISSN:
2041-210X
Page Range / eLocation ID:
p. 1118-1124
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract

    Phenotypic data are crucial for understanding genotype–phenotype relationships, assessing the tree of life and revealing trends in trait diversity over time. Large‐scale description of whole organisms for quantitative analyses (phenomics) presents several challenges, and technological advances in the collection of genomic data outpace those for phenomic data. Reasons for this disparity include the time‐consuming and expensive nature of collecting discrete phenotypic data and mining previously published data on a given species (both often requiring anatomical expertise across taxa), and computational challenges involved with analysing high‐dimensional datasets.

    One approach to building approximations of organismal phenomes is to combine published datasets of discrete characters assembled for phylogenetic analyses into a phenomic dataset. Despite a wealth of legacy datasets in the literature for many groups, relatively few methods exist for automating the assembly, analysis, and visualization of phenomic datasets in phylogenetic contexts. Here, we introduce a newrpackagephenotoolsfor integrating (fusing original or legacy datasets), curating (finding and removing duplicates) and visualizing phenomic datasets.

    We demonstrate the utility of the proposed toolkit with a morphological dataset for flightless birds and two morphological datasets for theropod dinosaurs and provide recommendations for character construction to maximize accessibility in future workflows. Visualization tools allow rapid identification of anatomical subregions with difficult or problematic histories of homology.

    We anticipate these tools aiding automation of the assembly and visualization of phenomic datasets to inform evolutionary relationships and rates of phenotypic evolution.

     
    more » « less
  2. Summary

    The tree of life is highly reticulate, with the history of population divergence emerging from populations of gene phylogenies that reflect histories of introgression, lineage sorting and divergence. In this study, we investigate global patterns of oak diversity and test the hypothesis that there are regions of the oak genome that are broadly informative about phylogeny.

    We utilize fossil data and restriction‐site associatedDNAsequencing (RAD‐seq) for 632 individuals representing nearly 250Quercusspecies to infer a time‐calibrated phylogeny of the world's oaks. We use a reversible‐jump Markov chain Monte Carlo method to reconstruct shifts in lineage diversification rates, accounting for among‐clade sampling biases. We then map the > 20 000RAD‐seq loci back to an annotated oak genome and investigate genomic distribution of introgression and phylogenetic support across the phylogeny.

    Oak lineages have diversified among geographic regions, followed by ecological divergence within regions, in the Americas and Eurasia. Roughly 60% of oak diversity traces back to four clades that experienced increases in net diversification, probably in response to climatic transitions or ecological opportunity.

    The strong support for the phylogeny contrasts with high genomic heterogeneity in phylogenetic signal and introgression. Oaks are phylogenomic mosaics, and their diversity may in fact depend on the gene flow that shapes the oak genome.

     
    more » « less
  3. Abstract

    Population dynamics play a central role in the historical and current development of fundamental and applied ecological science. The nascent culture of open data promises to increase the value of population dynamics studies to the field of ecology. However, synthesis of population data is constrained by the difficulty in identifying relevant datasets, by the heterogeneity of available data and by access to raw (as opposed to aggregated or derived) observations.

    To obviate these issues, we built a relational database,popler, and itsRclient, the library popler.popleraccommodates the vast majority of population data under a common structure, and without the need for aggregating raw observations. The popler R library is designed for users unfamiliar with the structure of the database and with the SQL language. ThisRlibrary allows users to identify, download, explore and cite datasets salient to their needs.

    We implemented popler as a PostgreSQL instance, where we stored population data originated by the United States Long Term Ecological Research (LTER) Network. Our focus on the US LTER data aims to leverage the potential of this vast open data resource. The database currently contains 305 datasets from 25 LTER sites.popleris designed to accommodate automatic updates of existing datasets, and to accommodate additional datasets from LTER as well as non‐LTER studies.

    The combination of the online database and theRlibrary popler is a resource for data synthesis efforts in population ecology. The common structure ofpoplersimplifies comparative analyses, and the availability of raw data confers flexibility in data analysis. The popler R library maximizes these opportunities by providing a user‐friendly interface to the online database.

     
    more » « less
  4. Abstract

    Gene flow is increasingly recognized as an important macroevolutionary process. The many mechanisms that contribute to gene flow (e.g. introgression, hybridization, lateral gene transfer) uniquely affect the diversification of dynamics of species, making it important to be able to account for these idiosyncrasies when constructing phylogenetic models. Existing phylogenetic‐network simulators for macroevolution are limited in the ways they model gene flow.

    We presentSiPhyNetwork, an R package for simulating phylogenetic networks under a birth–death‐hybridization process.

    Our package unifies the existing birth–death‐hybridization models while also extending the toolkit for modelling gene flow. This tool can create patterns of reticulation such as hybridization, lateral gene transfer, and introgression.

    Specifically, we model different reticulate events by allowing events to either add, remove or keep constant the number of lineages. Additionally, we allow reticulation events to be trait dependent, creating the ability to model the expanse of isolating mechanisms that prevent gene flow. This tool makes it possible for researchers to model many of the complex biological factors associated with gene flow in a phylogenetic context.

     
    more » « less
  5. Abstract

    Many important demographic processes are seasonal, including survival. For many species, mortality risk is significantly higher at certain times of the year than at others, whether because resources are scarce, susceptibility to predators or disease is high, or both. Despite the importance of survival modelling in wildlife sciences, no tools are available to estimate the peak, duration and relative importance of these ‘seasons of mortality’.

    We presentcyclomort, anrpackage that estimates the timing, duration and intensity of any number of mortality seasons with reliable confidence intervals. The package includes a model selection approach to determine the number of mortality seasons and to test whether seasons of mortality vary across discrete grouping factors.

    We illustrate the periodic hazard function model and workflow of cyclomort with simulated data. We then estimate mortality seasons of two caribouRangifer taranduspopulations that have strikingly different mortality patterns, including different numbers and timing of mortality peaks, and a marked change in one population over time.

    Thecyclomortpackage was developed to estimate mortality seasons for wildlife, but the package can model any time‐to‐event processes with a periodic component.

     
    more » « less