Abstract Marker selection has emerged as an important component of phylogenomic study design due to rising concerns of the effects of gene tree estimation error, model misspecification, and data-type differences. Researchers must balance various trade-offs associated with locus length and evolutionary rate among other factors. The most commonly used reduced representation data sets for phylogenomics are ultraconserved elements (UCEs) and Anchored Hybrid Enrichment (AHE). Here, we introduce Rapidly Evolving Long Exon Capture (RELEC), a new set of loci that targets single exons that are both rapidly evolving (evolutionary rate faster than RAG1) and relatively long in length (>1,500 bp), while at the same time avoiding paralogy issues across amniotes. We compare the RELEC data set to UCEs and AHE in squamate reptiles by aligning and analyzing orthologous sequences from 17 squamate genomes, composed of 10 snakes and 7 lizards. The RELEC data set (179 loci) outperforms AHE and UCEs by maximizing per-locus genetic variation while maintaining presence and orthology across a range of evolutionary scales. RELEC markers show higher phylogenetic informativeness than UCE and AHE loci, and RELEC gene trees show greater similarity to the species tree than AHE or UCE gene trees. Furthermore, with fewer loci, RELEC remains computationally tractable for full Bayesian coalescent species tree analyses. We contrast RELEC to and discuss important aspects of comparable methods, and demonstrate how RELEC may be the most effective set of loci for resolving difficult nodes and rapid radiations. We provide several resources for capturing or extracting RELEC loci from other amniote groups.
more »
« less
Estimating phylogenies from genomes: A beginners review of commonly used genomic data in vertebrate phylogenomics
Abstract Despite the increasing feasibility of sequencing whole genomes from diverse taxa, a persistent problem in phylogenomics is the selection of appropriate genetic markers or loci for a given taxonomic group or research question. In this review, we aim to streamline the decision-making process when selecting specific markers to use in phylogenomic studies by introducing commonly used types of genomic markers, their evolutionary characteristics, and their associated uses in phylogenomics. Specifically, we review the utilities of ultraconserved elements (including flanking regions), anchored hybrid enrichment loci, conserved nonexonic elements, untranslated regions, introns, exons, mitochondrial DNA, single nucleotide polymorphisms, and anonymous regions (nonspecific regions that are evenly or randomly distributed across the genome). These various genomic elements and regions differ in their substitution rates, likelihood of neutrality or of being strongly linked to loci under selection, and mode of inheritance, each of which are important considerations in phylogenomic reconstruction. These features may give each type of marker important advantages and disadvantages depending on the biological question, number of taxa sampled, evolutionary timescale, cost effectiveness, and analytical methods used. We provide a concise outline as a resource to efficiently consider key aspects of each type of genetic marker. There are many factors to consider when designing phylogenomic studies, and this review may serve as a primer when weighing options between multiple potential phylogenomic markers.
more »
« less
- PAR ID:
- 10414943
- Editor(s):
- Springer, Mark
- Date Published:
- Journal Name:
- Journal of Heredity
- Volume:
- 114
- Issue:
- 1
- ISSN:
- 0022-1503
- Page Range / eLocation ID:
- 1 to 13
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Sork, Victoria (Ed.)Abstract When species are continuously distributed across environmental gradients, the relative strength of selection and gene flow shape spatial patterns of genetic variation, potentially leading to variable levels of differentiation across loci. Determining whether adaptive genetic variation tends to be structured differently than neutral variation along environmental gradients is an open and important question in evolutionary genetics. We performed exome-wide population genomic analysis on deer mice sampled along an elevational gradient of nearly 4000 m of vertical relief. Using a combination of selection scans, genotype-environment associations, and geographic cline analyses, we found that a large proportion of the exome has experienced a history of altitude-related selection. Elevational clines for nearly 30% of these putatively adaptive loci were shifted significantly up- or down-slope of clines for loci that did not bear similar signatures of selection. Many of these selection targets can be plausibly linked to known phenotypic differences between highland and lowland deer mice, although the vast majority of these candidates have not been reported in other studies of highland taxa. Together, these results suggest new hypotheses about the genetic basis of physiological adaptation to high-altitude, and the spatial distribution of adaptive genetic variation along environmental gradients.more » « less
-
Alanio, Alexandre (Ed.)ABSTRACT Modern taxonomic classification is often based on phylogenetic analyses of a few molecular markers, although single-gene studies are still common. Here, we leverage genome-scale molecular phylogenetics (phylogenomics) of species and populations to reconstruct evolutionary relationships in a dense data set of 710 fungal genomes from the biomedically and technologically important genusAspergillus. To do so, we generated a novel set of 1,362 high-quality molecular markers specific forAspergillusand provided profile Hidden Markov Models for each, facilitating their use by others. Examining the resulting phylogeny helped resolve ongoing taxonomic controversies, identified new ones, and revealed extensive strain misidentification (7.59% of strains were previously misidentified), underscoring the importance of population-level sampling in species classification. These findings were corroborated using the current standard, taxonomically informative loci. These findings suggest that phylogenomics of species and populations can facilitate accurate taxonomic classifications and reconstructions of the Tree of Life.IMPORTANCEIdentification of fungal species relies on the use of molecular markers. Advances in genomic technologies have made it possible to sequence the genome of any fungal strain, making it possible to use genomic data for the accurate assignment of strains to fungal species (and for the discovery of new ones). We examined the usefulness and current limitations of genomic data using a large data set of 710 publicly available genomes from multiple strains and species of the biomedically, agriculturally, and industrially important genusAspergillus. Our evolutionary genomic analyses revealed that nearly 8% of publicly availableAspergillusgenomes are misidentified. Our work highlights the usefulness of genomic data for fungal systematic biology and suggests that systematic genome sequencing of multiple strains, including reference strains (e.g., type strains), of fungal species will be required to reduce misidentification errors in public databases.more » « less
-
Wiegmann, Brian (Ed.)Abstract Ultraconserved genomic elements (UCEs) are generally treated as independent loci in phylogenetic analyses. The identification pipeline for UCE probes does not require prior knowledge of genetic identity, only selecting loci that are highly conserved, single copy, without repeats, and of a particular length. Here, we characterized UCEs from 11 phylogenomic studies across the animal tree of life, from birds to marine invertebrates. We found that within vertebrate lineages, UCEs are mostly intronic and intergenic, while in invertebrates, the majority are in exons. We then curated four different sets of UCE markers by genomic category from five different studies including: birds, mammals, fish, Hymenoptera (ants, wasps, and bees), and Coleoptera (beetles). Of genes captured by UCEs, we find that many are represented by two or more UCEs, corresponding to nonoverlapping segments of a single gene. We considered these UCEs to be nonindependent, merged all UCEs that belonged to a particular gene, constructed gene and species trees, and then evaluated the subsequent effect of merging cogenic UCEs on gene and species tree reconstruction. Average bootstrap support for merged UCE gene trees was significantly improved across all data sets apparently driven by the increase in loci length. Additionally, we conducted simulations and found that gene trees generated from merged UCEs were more accurate than those generated by unmerged UCEs. As loci length improves gene tree accuracy, this modest degree of UCE characterization and curation impacts downstream analyses and demonstrates the advantages of incorporating basic genomic characterizations into phylogenomic analyses. [Anchored hybrid enrichment; ants; ASTRAL; bait capture; carangimorph; Coleoptera; conserved nonexonic elements; exon capture; gene tree; Hymenoptera; mammal; phylogenomic markers; songbird; species tree; ultraconserved elements; weevils.]more » « less
-
Abstract Evolutionary processes may have substantial impacts on community assembly, but evidence for phylogenetic relatedness as a determinant of interspecific interaction strength remains mixed. In this perspective, we consider a possible role for discordance between gene trees and species trees in the interpretation of phylogenetic signal in studies of community ecology. Modern genomic data show that the evolutionary histories of many taxa are better described by a patchwork of histories that vary along the genome rather than a single species tree. If a subset of genomic loci harbour trait‐related genetic variation, then the phylogeny at these loci may be more informative of interspecific trait differences than the genome background. We develop a simple method to detect loci harbouring phylogenetic signal and demonstrate its application through a proof‐of‐principle analysis ofPenicilliumgenomes and pairwise interaction strength. Our results show that phylogenetic signal that may be masked genome‐wide could be detectable using phylogenomic techniques and may provide a window into the genetic basis for interspecific interactions.more » « less