The Copepoda is a clade of pancrustaceans containing 14,485 species that are extremely varied in their morphology and lifestyle. Not only do copepods dominate marine plankton and sediment communities and make up a sizeable component of the freshwater plankton, but over 6,000 species are symbiotically associated with every major phylum of marine metazoans, mostly as parasites. Unfortunately, our understanding of copepod evolutionary relationships is relatively limited in part because of their extremely divergent morphology, sparse taxon sampling in molecular phylogenetic analyses, a reliance on only a handful of molecular markers, and little taxonomic overlap between phylogenetic studies. Here, a synthesis tree method is used to integrate published phylogenies into a more comprehensive tree of copepods by leveraging phylogenetic and taxonomic data. A literature review in this study finds fewer than 500 species of copepods have been sampled in molecular phylogenetic studies. Using the Open Tree of Life platform, those taxa that have been sampled in previous phylogenetic studies are grafted together and combined with the underlying copepod taxonomic hierarchy from the Open Tree of Life Taxonomy to make a synthesis phylogeny of all copepod species. Taxon sampling with respect to molecular phylogenetic analyses is reviewed for all orders of copepods and shows only 3% of copepod species have been sampled in phylogenetic studies. The resulting synthesis phylogeny reveals copepods have transitioned to a parasitic lifestyle on at least 14 occasions. We examine the underlying phylogenetic, taxonomic, and natural history data supporting these transitions to parasitism; review the species diversity of each parasitic clade; and identify key areas for further phylogenetic investigation. 
                        more » 
                        « less   
                    
                            
                            Phylotastic: Improving Access to Tree-of-Life Knowledge With Flexible, on-the-Fly Delivery of Trees
                        
                    
    
            A comprehensive phylogeny of species, i.e., a tree of life, has potential uses in a variety of contexts, including research, education, and public policy. Yet, accessing the tree of life typically requires special knowledge, complex software, or long periods of training. The Phylotastic project aims make it as easy to get a phylogeny of species as it is to get driving directions from mapping software. In prior work, we presented a design for an open system to validate and manage taxon names, find phylogeny resources, extract subtrees matching a user’s taxon list, scale trees to time, and integrate related resources such as species images. Here, we report the implementation of a set of tools that together represent a robust, accessible system for on-the-fly delivery of phylogenetic knowledge. This set of tools includes a web portal to execute several customizable workflows to obtain species phylogenies (scaled by geologic time and decorated with thumbnail images); more than 30 underlying web services (accessible via a common registry); and code toolkits in R and Python (allowing others to develop custom applications using Phylotastic services). The Phylotastic system, accessible via http://www.phylotastic.org , provides a unique resource to access the current state of phylogenetic knowledge, useful for a variety of cases in which a tree extracted quickly from online resources (as distinct from a tree custom-made from character data) is sufficient, as it is for many casual uses of trees identified here. 
        more » 
        « less   
        
    
                            - Award ID(s):
- 1458595
- PAR ID:
- 10196306
- Date Published:
- Journal Name:
- Evolutionary Bioinformatics
- Volume:
- 16
- ISSN:
- 1176-9343
- Page Range / eLocation ID:
- 117693431989938
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
- 
            
- 
            The reconstruction of phylogenetic networks is an important but challenging problem in phylogenetics and genome evolution, as the space of phylogenetic networks is vast and cannot be sampled well. One approach to the problem is to solve the minimum phylogenetic network problem, in which phylogenetic trees are first inferred, and then the smallest phylogenetic network that displays all the trees is computed. The approach takes advantage of the fact that the theory of phylogenetic trees is mature, and there are excellent tools available for inferring phylogenetic trees from a large number of biomolecular sequences. A tree–child network is a phylogenetic network satisfying the condition that every nonleaf node has at least one child that is of indegree one. Here, we develop a new method that infers the minimum tree–child network by aligning lineage taxon strings in the phylogenetic trees. This algorithmic innovation enables us to get around the limitations of the existing programs for phylogenetic network inference. Our new program, named ALTS, is fast enough to infer a tree–child network with a large number of reticulations for a set of up to 50 phylogenetic trees with 50 taxa that have only trivial common clusters in about a quarter of an hour on average.more » « less
- 
            All life on earth is linked by a shared evolutionary history. Even before Darwin developed the theory of evolution, Linnaeus categorized types of organisms based on their shared traits. We now know these traits derived from these species’ shared ancestry. This evolutionary history provides a natural framework to harness the enormous quantities of biological data being generated today. The Open Tree of Life project is a collaboration developing tools to curate and share evolutionary estimates (phylogenies) covering the entire tree of life (Hinchliff et al. 2015, McTavish et al. 2017). The tree is viewable at https://tree.opentreeoflife.org, and the data is all freely available online. The taxon identifiers used in the Open Tree unified taxonomy (Rees and Cranston 2017) are mapped to identifiers across biological informatics databases, including the Global Biodiversity Information Facility (GBIF), NCBI, and others. Linking these identifiers allows researchers to easily unify data from across these different resources (Fig. 1). Leveraging a unified evolutionary framework across the diversity of life provides new avenues for integrative wide scale research. Downstream tools, such as R packages developed by the R OpenSci foundation (rotl, rgbif) (Michonneau et al. 2016, Chamberlain 2017) and others tools (Revell 2012), make accessing and combining this information straightforward for students as well as researchers (e.g. https://mctavishlab.github.io/BIO144/labs/rotl-rgbif.html). Figure 1. Example linking phylogenetic relationships accessed from the Open Tree of Life with specimen location data from Global Biodiversity Information Facility. For example, a recent publication by Santorelli et al. 2018 linked evolutionary information from Open Tree with species locality data gathered from a local field study as well as GBIF species location records to test a river-barrier hypothesis in the Amazon. By combining these data, the authors were able test a widely held biogeographic hypothesis across 1952 species in 14 taxonomic groups, and found that a river that had been postulated to drive endemism, was in fact not a barrier to gene flow. However, data provenance and taxonomic name reconciliation remain key hurdles to applying data from these large digital biodiversity and evolution community resources to answering biological questions. In the Amazonian river analysis, while they leveraged use of GBIF records as a secondary check on their species records, they relied on their an intensive local field study for their major conclusions, and preferred taxon specific phylogenetic resources over Open Tree where they were available (Santorelli et al. 2018). When Li et al. 2018 assessed large scale phylogenetic approaches, including Open Tree, for measuring community diversity, they found that synthesis phylogenies were less resolved than purpose-built phylogenies, but also found that these synthetic phylogenies were sufficient for community level phylogenetic diversity analyses. Nonetheless, data quality concerns have limited adoption of analyses data from centralized resources (McTavish et al. 2017). Taxonomic name recognition and reconciliation across databases also remains a hurdle for large scale analyses, despite several ongoing efforts to improve taxonomic interoperability and unify taxonomies, such at Catalogue of Life + (Bánki et al. 2018). In order to support innovative science, large scale digital data resources need to facilitate data linkage between resources, and address researchers' data quality and provenance concerns. I will present the model that the Open Tree of Life is using to provide evolutionary data at the scale of the entire tree of life, while maintaining traceable provenance to the publications and taxonomies these evolutionary relationships are inferred from. I will discuss the hurdles to adoption of these large scale resources by researchers, as well as the opportunities for new research avenues provided by the connections between evolutionary inferences and biodiversity digital databases.more » « less
- 
            Large systematic revisionary projects incorporating data for hundreds or thousands of taxa require an integrative approach, with a strong biodiversity-informatics core for efficient data management to facilitate research on the group. Our original biodiversity informatics platform, 3i (Internet-accessible Interactive Identification) combined a customized MS Access database backend with ASP-based web interfaces to support revisionary syntheses of several large genera of leafhopers (Hemiptera: Auchenorrhyncha: Cicadellidae). More recently, for our National Science Foundation sponsored project, “GoLife: Collaborative Research: Integrative genealogy, ecology and phenomics of deltocephaline leafhoppers (Hemiptera: Cicadellidae), and their microbial associates”, we selected the new open-source platform TaxonWorks as the cyberinfrastructure. In the scope of the project, the original “3i World Auchenorrhyncha Database” was imported into TaxonWorks. At the present time, TaxonWorks has many tools to automatically import nomenclature, citations, and specimen based collection data. At the time of the initial migration of the 3i database, many of those tools were still under development, and complexity of the data in the database required a custom migration script, which is still probably the most efficient solution for importing datasets with long development history. At the moment, the World Auchenorrhyncha Database comprehensively covers nomenclature of the group and includes data on 70 valid families, 6,816 valid genera, 47,064 valid species as well as synonymy and subsequent combinations (Fig. 1). In addition, many taxon records include the original citation, bibliography, type information, etymology, etc. The bibliography of the group includes 37,579 sources, about 1/3 of which are associated with PDF files. Species have distribution records, either derived from individual specimens or as country and state level asserted distribution, as well as biological associations indicating host plants, predators, and parasitoids. Observation matrices in TaxonWorks are designed to handle morphological data associated with taxa or specimens. The matrices may be used to automatically generate interactive identification keys and taxon descriptions. They can also be downloaded to be imported, for example, into Lucid builder, or to perform phylogenetic analysis using an external application. At the moment there are 36 matrices associated with the project. The observation matrix from GoLife project covers 798 taxa by 210 descriptors (most of which are qualitative multi-state morphological descriptors) (Fig. 2). Illustrations are provided for 9,886 taxa and organized in the specialized image matrix and could be used as a pictorial key for determination of species and taxa of a higher rank. For the phylogenetic analysis, a dataset was constructed for 730 terminal taxa and >160,000 nucleotide positions obtained using anchored hybrid enrichment of genomic DNA for a sample of leafhoppers from the subfamily Deltocephalinae and outgroups. The probe kit targets leafhopper genes, as well as some bacterial genes (endosymbionts and plant pathogens transmitted by leafhoppers). The maximum likelihood analyses of concatenated nucleotide and amino acid sequences as well as coalescent gene tree analysis yielded well-resolved phylogenetic trees (Cao et al. 2022). Raw sequence data have been uploaded to the Sequence Read Archive on GenBank. Occurrence and morphological data, as well as diagnostic images, for voucher specimens have been incorporated into TaxonWorks. Data in TaxonWorks could be exported in raw format, get accessed via Application Programming Interface (API), or be shared with external data aggregators like Catalogue of Life, GBIF, iDigBio.more » « less
- 
            Blair, Jaime E. (Ed.)Phytophthora species cause severe diseases on food, forest, and ornamental crops. Since the genus was described in 1876, it has expanded to comprise over 190 formally described species. There is a need for an open access phylogenetic tool that centralizes diverse streams of sequence data and metadata to facilitate research and identification of Phytophthora species. We used the Tree-Based Alignment Selector Toolkit (T-BAS) to develop a phylogeny of 192 formally described species and 33 informal taxa in the genus Phytophthora using sequences of eight nuclear genes. The phylogenetic tree was inferred using the RAxML maximum likelihood program. A search engine was also developed to identify microsatellite genotypes of P . infestans based on genetic distance to known lineages. The T-BAS tool provides a visualization framework allowing users to place unknown isolates on a curated phylogeny of all Phytophthora species. Critically, the tree can be updated in real-time as new species are described. The tool contains metadata including clade, host species, substrate, sexual characteristics, distribution, and reference literature, which can be visualized on the tree and downloaded for other uses. This phylogenetic resource will allow data sharing among research groups and the database will enable the global Phytophthora community to upload sequences and determine the phylogenetic placement of an isolate within the larger phylogeny and to download sequence data and metadata. The database will be curated by a community of Phytophthora researchers and housed on the T-BAS web portal in the Center for Integrated Fungal Research at NC State. The T-BAS web tool can be leveraged to create similar metadata enhanced phylogenies for other Oomycete, bacterial or fungal pathogens.more » « less
 An official website of the United States government
An official website of the United States government 
				
			 
					 
					
 
                                    