skip to main content

Title: Estimating phylogenies from genomes: A beginners review of commonly used genomic data in vertebrate phylogenomics
Abstract Despite the increasing feasibility of sequencing whole genomes from diverse taxa, a persistent problem in phylogenomics is the selection of appropriate genetic markers or loci for a given taxonomic group or research question. In this review, we aim to streamline the decision-making process when selecting specific markers to use in phylogenomic studies by introducing commonly used types of genomic markers, their evolutionary characteristics, and their associated uses in phylogenomics. Specifically, we review the utilities of ultraconserved elements (including flanking regions), anchored hybrid enrichment loci, conserved nonexonic elements, untranslated regions, introns, exons, mitochondrial DNA, single nucleotide polymorphisms, and anonymous regions (nonspecific regions that are evenly or randomly distributed across the genome). These various genomic elements and regions differ in their substitution rates, likelihood of neutrality or of being strongly linked to loci under selection, and mode of inheritance, each of which are important considerations in phylogenomic reconstruction. These features may give each type of marker important advantages and disadvantages depending on the biological question, number of taxa sampled, evolutionary timescale, cost effectiveness, and analytical methods used. We provide a concise outline as a resource to efficiently consider key aspects of each type of genetic marker. There are many factors to consider when designing phylogenomic studies, and this review may serve as a primer when weighing options between multiple potential phylogenomic markers.  more » « less
Award ID(s):
1906188 1856266
Author(s) / Creator(s):
; ; ; ; ; ;
Springer, Mark
Date Published:
Journal Name:
Journal of Heredity
Page Range / eLocation ID:
1 to 13
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract Marker selection has emerged as an important component of phylogenomic study design due to rising concerns of the effects of gene tree estimation error, model misspecification, and data-type differences. Researchers must balance various trade-offs associated with locus length and evolutionary rate among other factors. The most commonly used reduced representation data sets for phylogenomics are ultraconserved elements (UCEs) and Anchored Hybrid Enrichment (AHE). Here, we introduce Rapidly Evolving Long Exon Capture (RELEC), a new set of loci that targets single exons that are both rapidly evolving (evolutionary rate faster than RAG1) and relatively long in length (>1,500 bp), while at the same time avoiding paralogy issues across amniotes. We compare the RELEC data set to UCEs and AHE in squamate reptiles by aligning and analyzing orthologous sequences from 17 squamate genomes, composed of 10 snakes and 7 lizards. The RELEC data set (179 loci) outperforms AHE and UCEs by maximizing per-locus genetic variation while maintaining presence and orthology across a range of evolutionary scales. RELEC markers show higher phylogenetic informativeness than UCE and AHE loci, and RELEC gene trees show greater similarity to the species tree than AHE or UCE gene trees. Furthermore, with fewer loci, RELEC remains computationally tractable for full Bayesian coalescent species tree analyses. We contrast RELEC to and discuss important aspects of comparable methods, and demonstrate how RELEC may be the most effective set of loci for resolving difficult nodes and rapid radiations. We provide several resources for capturing or extracting RELEC loci from other amniote groups. 
    more » « less
  2. Abstract

    Selection that acts in a sex-specific manner causes the evolution of sexual dimorphism. Sex-specific phenotypic selection has been demonstrated in many taxa and can be in the same direction in the two sexes (differing only in magnitude), limited to one sex, or in opposing directions (antagonistic). Attempts to detect the signal of sex-specific selection from genomic data have confronted numerous difficulties. These challenges highlight the utility of “direct approaches,” in which fitness is predicted from individual genotype within each sex. Here, we directly measured selection on Single Nucleotide Polymorphisms (SNPs) in a natural population of the sexually dimorphic, dioecious plant, Silene latifolia. We measured flowering phenotypes, estimated fitness over one reproductive season, as well as survival to the next year, and genotyped all adults and a subset of their offspring for SNPs across the genome. We found that while phenotypic selection was congruent (fitness covaried similarly with flowering traits in both sexes), SNPs showed clear evidence for sex-specific selection. SNP-level selection was particularly strong in males and may involve an important gametic component (e.g., pollen competition). While the most significant SNPs under selection in males differed from those under selection in females, paternity selection showed a highly polygenic tradeoff with female survival. Alleles that increased male mating success tended to reduce female survival, indicating sexual antagonism at the genomic level. Perhaps most importantly, this experiment demonstrates that selection within natural populations can be strong enough to measure sex-specific fitness effects of individual loci.

    Males and females typically differ phenotypically, a phenomenon known as sexual dimorphism. These differences arise when selection on males differs from selection on females, either in magnitude or direction. Estimated relationships between traits and fitness indicate that sex-specific selection is widespread, occurring in both plants and animals, and explains why so many species exhibit sexual dimorphism. Finding the specific loci experiencing sex-specific selection is a challenging prospect but one worth undertaking given the extensive evolutionary consequences. Flowering plants with separate sexes are ideal organisms for such studies, given that the fitness of females can be estimated by counting the number of seeds they produce. Determination of fitness for males has been made easier as thousands of genetic markers can now be used to assign paternity to seeds. We undertook just such a study in S. latifolia, a short-lived, herbaceous plant. We identified loci under sex-specific selection in this species and found more loci affecting fitness in males than females. Importantly, loci with major effects on male fitness were distinct from the loci with major effects on females. We detected sexual antagonism only when considering the aggregate effect of many loci. Hence, even though males and females share the same genome, this does not necessarily impose a constraint on their independent evolution.

    more » « less
  3. Building the Tree of Life (ToL) is a major challenge of modern biology, requiring advances in cyberinfrastructure, data collection, theory, and more. Here, we argue that phylogenomics stands to benefit by embracing the many heterogeneous genomic signals emerging from the first decade of large-scale phylogenetic analysis spawned by high-throughput sequencing (HTS). Such signals include those most commonly encountered in phylogenomic datasets, such as incomplete lineage sorting, but also those reticulate processes emerging with greater frequency, such as recombination and introgression. Here we focus specifically on how phylogenetic methods can accommodate the heterogeneity incurred by such population genetic processes; we do not discuss phylogenetic methods that ignore such processes, such as concatenation or supermatrix approaches or supertrees. We suggest that methods of data acquisition and the types of markers used in phylogenomics will remain restricted until a posteriori methods of marker choice are made possible with routine whole-genome sequencing of taxa of interest. We discuss limitations and potential extensions of a model supporting innovation in phylogenomics today, the multispecies coalescent model (MSC). Macroevolutionary models that use phylogenies, such as character mapping, often ignore the heterogeneity on which building phylogenies increasingly rely and suggest that assimilating such heterogeneity is an important goal moving forward. Finally, we argue that an integrative cyberinfrastructure linking all steps of the process of building the ToL, from specimen acquisition in the field to publication and tracking of phylogenomic data, as well as a culture that values contributors at each step, are essential for progress.

    more » « less
  4. Buerkle, Alex (Ed.)
    Inferences about past processes of adaptation and speciation require a gene-scale and genome-wide understanding of the evolutionary history of diverging taxa. In this study, we use genome-wide capture of nuclear gene sequences, plus skimming of organellar sequences, to investigate the phylogenomics of monkeyflowers in Mimulus section Erythranthe (27 accessions from seven species ) . Taxa within Erythranthe , particularly the parapatric and putatively sister species M . lewisii (bee-pollinated) and M . cardinalis (hummingbird-pollinated), have been a model system for investigating the ecological genetics of speciation and adaptation for over five decades. Across >8000 nuclear loci, multiple methods resolve a predominant species tree in which M . cardinalis groups with other hummingbird-pollinated taxa (37% of gene trees), rather than being sister to M . lewisii (32% of gene trees). We independently corroborate a single evolution of hummingbird pollination syndrome in Erythranthe by demonstrating functional redundancy in genetic complementation tests of floral traits in hybrids; together, these analyses overturn a textbook case of pollination-syndrome convergence. Strong asymmetries in allele sharing (Patterson’s D-statistic and related tests) indicate that gene tree discordance reflects ancient and recent introgression rather than incomplete lineage sorting. Consistent with abundant introgression blurring the history of divergence, low-recombination and adaptation-associated regions support the new species tree, while high-recombination regions generate phylogenetic evidence for sister status for M . lewisii and M . cardinalis . Population-level sampling of core taxa also revealed two instances of chloroplast capture, with Sierran M . lewisii and Southern Californian M . parishii each carrying organelle genomes nested within respective sympatric M . cardinalis clades. A recent organellar transfer from M . cardinalis , an outcrosser where selfish cytonuclear dynamics are more likely, may account for the unexpected cytoplasmic male sterility effects of selfer M . parishii organelles in hybrids with M . lewisii . Overall, our phylogenomic results reveal extensive reticulation throughout the evolutionary history of a classic monkeyflower radiation, suggesting that natural selection (re-)assembles and maintains species-diagnostic traits and barriers in the face of gene flow. Our findings further underline the challenges, even in reproductively isolated species, in distinguishing re-use of adaptive alleles from true convergence and emphasize the value of a phylogenomic framework for reconstructing the evolutionary genetics of adaptation and speciation. 
    more » « less
  5. Abstract Context

    Processes that shape genomic and ecological divergence can reveal important evolutionary dynamics to inform the conservation of threatened species.Fontaineais a genus of rainforest shrubs and small trees including critically endangered and threatened species restricted to narrow, but complex geographic and ecological regions. Several species ofFontaineaare subject to spatially explicit conditions and experience limited intra-specific gene flow, likely generating genetic differentiation and local adaptation.


    Here, we explored the genetic and ecological mechanisms underlying patterns of diversification in two, closely related threatenedFontaineaspecies. Our aim was to compare spatial patterns of genetic variation between the vulnerableFontainea australis(Southern Fontainea) and critically endangeredF. oraria(Coastal Fontainea), endemic to the heterogeneous subtropical region of central, eastern Australia, where large-scale clearing has severely reduced rainforest habitat to a fraction (< 1%) of its pre-European settlement extent.


    We used a set of 10,000 reduced-representation markers to infer genetic relationships and the drivers of spatial genetic variation across the two species. In addition, we employed a combination of univariate and multivariate genome-environment association analysis using a set of topo-climatic variables to explore potential patterns of local adaptation as a factor impacting genomic divergence.


    Our study revealed that Coastal Fontainea have a close genetic relationship with Southern Fontainea. We showed that isolation by distance has played a key role in their genetic variation, indicating that vicariance can explain the spatial genetic distribution of the two species. Genotype-environment analyses showed a strong association with temperature and topographic features, suggesting adaptation to localised thermal environments. We used a multivariate redundancy analysis to identify a range of putatively adapted loci associated with local environmental conditions.


    Divergent selection at the local-habitat scale as a result of dispersal limitations and environmental heterogeneity (including physical barriers) are likely contributors to adaptive divergence between the twoFontaineaspecies. Our findings have presented evidence to indicate that Southern and Coastal Fontainea were comprised of distinct genetic groups and ecotypes, that together may form a single species continuum, with further phenotype research suggested to confirm the current species boundaries. Proactive conservation actions, including assisted migration to enhance the resilience of populations lacking stress-tolerant single nucleotide polymorphisms (SNPs) may be required to secure the long-term future of both taxa. This is especially vital for the critically endangered Coastal Fontainea given projections of habitat decline for the species under future climate scenarios.

    more » « less