skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: homologizer: Phylogenetic phasing of gene copies into polyploid subgenomes
Abstract Organisms such as allopolyploids and F1 hybrids contain multiple distinct subgenomes, each potentially with its own evolutionary history. These organisms present a challenge for multilocus phylogenetic inference and other analyses since it is not apparent which gene copies from different loci are from the same subgenome and thus share an evolutionary history.Here we introduce homologizer, a flexible Bayesian approach that uses a phylogenetic framework to infer the phasing of gene copies across loci into their respective subgenomes.Through the use of simulation tests, we demonstrate that homologizer is robust to a wide range of factors, such as incomplete lineage sorting and the phylogenetic informativeness of loci. Furthermore, we establish the utility of homologizer on real data, by analysing a multilocus dataset consisting of nine diploids and 19 tetraploids from the fern family Cystopteridaceae.Finally, we describe how homologizer may potentially be used beyond its core phasing functionality to identify non‐homologous sequences, such as hidden paralogs or contaminants.  more » « less
Award ID(s):
1753800 1753673
PAR ID:
10401462
Author(s) / Creator(s):
 ;  ;  
Publisher / Repository:
Wiley-Blackwell
Date Published:
Journal Name:
Methods in Ecology and Evolution
Volume:
14
Issue:
5
ISSN:
2041-210X
Page Range / eLocation ID:
p. 1230-1244
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Summary Species in the genusSphagnumcreate, maintain, and dominate boreal peatlands through ‘extended phenotypes’ that allow these organisms to engineer peatland ecosystems and thereby impact global biogeochemical cycles. One such phenotype is the production of peat, or incompletely decomposed biomass, that accumulates when rates of growth exceed decomposition. Interspecific variation in peat production is thought to be responsible for the establishment and maintenance of ecological gradients such as the microtopographic hummock‐hollow gradient, along which sympatric species sort within communities.This study investigated the mode and tempo of functional trait evolution across 15 species ofSphagnumusing data from the most extensive studies ofSphagnumfunctional traits to date and phylogenetic comparative methods.We found evidence for phylogenetic conservatism of the niche descriptor height‐above‐water‐table and of traits related to growth, decay and litter quality. However, we failed to detect the influence of phylogeny on interspecific variation in other traits such as shoot density and suggest that environmental context can obscure phylogenetic signal. Trait correlations indicate possible adaptive syndromes that may relate to niche and its construction.This study is the first to formally test the extent to which functional trait variation amongSphagnumspecies is a result of shared evolutionary history. 
    more » « less
  2. Summary Processes affecting rates of sequence polymorphism are fundamental to the evolution of gene duplicates. The relationship between gene activity and sequence polymorphism can influence the likelihood that functionally redundant gene copies are co‐maintained in stable evolutionary equilibria vs other outcomes such as neofunctionalization.Here, we investigate genic variation in epigenome‐associated polymorphism rates inArabidopsis thalianaand consider whether these affect the evolution of gene duplicates. We compared the frequency of sequence polymorphism and patterns of genetic differentiation between genes classified by exon methylation patterns: unmethylated (unM), gene‐body methylated (gbM), and transposon‐like methylated (teM) states, which reflect divergence in gene expression.We found that the frequency of polymorphism was higher in teM (transcriptionally repressed, tissue‐specific) genes and lower in gbM (active, constitutively expressed) genes. Comparisons of gene duplicates were largely consistent with genome‐wide patterns – gene copies that exhibit teM accumulate more variation, evolve faster, and are in chromatin states associated with reduced DNA repair.This relationship between expression, the epigenome, and polymorphism may lead to the breakdown of equilibrium states that would otherwise maintain genetic redundancies. Epigenome‐mediated polymorphism rate variation may facilitate the evolution of novel gene functions in duplicate paralogs maintained over evolutionary time. 
    more » « less
  3. Abstract— Like many fern lineages comprising reticulate species complexes, Polypodium s.s. (Polypodiacaeae) has a history shaped by rapid diversification, hybridization, and polyploidy that poses substantial challenges for phylogenetic inference with plastid and single-locus nuclear markers. Using target capture probes for 408 nuclear loci developed by the GoFlag project and a custom bioinformatic pipeline, SORTER, we constructed multi-locus nuclear datasets for diploid temperate and Mesoamerican species of Polypodium and five allotetraploid species belonging to the well-studied Polypodium vulgare complex. SORTER employs a clustering approach to separate putatively paralogous copies of targeted loci into orthologous matrices and haplotype phasing to infer allopolyploid haplotypes across loci, resulting in datasets amenable to both concatenated maximum likelihood and multi-species coalescent phylogenetic analyses. By comparing phylogenies derived from maximum likelihood and multi-species coalescent analyses of unphased and phased datasets, as well as evaluating discordance among gene trees and species trees, we recover support for incomplete lineage sorting within Polypodium s.s., novel relationships among diploid taxa of the Polypodium vulgare complex and its Mesoamerican sister clade, and the placement of several Polypodium species within other genera. Additionally, we were able to infer well-supported phylogenies that identified the hypothesized progenitors of the allotetraploid species, indicating that SORTER is an effective and accurate tool for reconstructing homeolog haplotypes of allopolyploids in fern taxa and other non-model organisms from target capture data. 
    more » « less
  4. Abstract Evolutionary biologists characterize macroevolutionary trends of phenotypic change across the tree of life using phylogenetic comparative methods. However, within‐species variation can complicate such investigations. For this reason, procedures for incorporating nonstructured (random) intraspecific variation have been developed.Likewise, evolutionary biologists seek to understand microevolutionary patterns of phenotypic variation within species, such as sex‐specific differences or allometric trends. Additionally, there is a desire to compare such within‐species patterns across taxa, but current analytical approaches cannot be used to interrogate within‐species patterns while simultaneously accounting for phylogenetic non‐independence. Consequently, deciphering how intraspecific trends evolve remains a challenge.Here we introduce an extended phylogenetic generalized least squares (E‐PGLS) procedure which facilitates comparisons of within‐species patterns across species while simultaneously accounting for phylogenetic non‐independence.Our method uses an expanded phylogenetic covariance matrix, a hierarchical linear model, and permutation methods to obtain empirical sampling distributions and effect sizes for model effects that can evaluate differences in intraspecific trends across species for both univariate and multivariate data, while conditioning them on the phylogeny.The method has appropriate statistical properties for both balanced and imbalanced data. Additionally, the procedure obtains evolutionary covariance estimates that reflect those from existing approaches for nonstructured intraspecific variation. Importantly, E‐PGLS can detect differences in structured (i.e. microevolutionary) intraspecific patterns across species when such trends are present. Thus, E‐PGLS extends the reach of phylogenetic comparative methods into the intraspecific comparative realm, by providing the ability to compare within‐species trends across species while simultaneously accounting for shared evolutionary history. 
    more » « less
  5. Abstract Traits underlie organismal responses to their environment and are essential to predict community responses to environmental conditions under global change. Species differ in life‐history traits, morphometrics, diet type, reproductive characteristics and habitat utilization.Trait associations are widely analysed using phylogenetic comparative methods (PCM) to account for correlations among related species. Similarly, traits are measured for some but not all species, and missing continuous traits (e.g. growth rate) can be imputed using ‘phylogenetic trait imputation’ (PTI), based on evolutionary relatedness and trait covariance. However, PTI has not been available for categorical traits, and estimating covariance among traits without ecological constraints risks inferring implausible evolutionary mechanisms.Here, we extend previous PCM and PTI methods by (1) specifying covariance among traits as a structural equation model (SEM), and (2) incorporating associations among both continuous and categorical traits. Fitting a SEM replaces the covariance among traits with a set of linear path coefficients specifying potential evolutionary mechanisms. Estimated parameters then represent regression slopes (i.e. the average change in trait Y given an exogenous change in trait X) that can be used to calculate both direct effects (X impacts Y) and indirect effects (X impacts Z and Z impacts Y).We demonstrate phylogenetic structural‐equation mixed‐trait imputation using 33 variables representing life history, reproductive, morphological, and behavioural traits for all >32,000 described fishes worldwide. SEM coefficients suggest that one degree Celsius increase in habitat is associated with an average 3.5% increase in natural mortality (including a 1.4% indirect impact that acts via temperature effects on the growth coefficient), and an average 3.0% decrease in fecundity (via indirect impacts on maximum age and length). Cross‐validation indicates that the model explains 54%–89% of variance for withheld measurements of continuous traits and has an area under the receiver‐operator‐characteristics curve of 0.86–0.99 for categorical traits.We use imputed traits to classify all fishes into life‐history types, and confirm a phylogenetic signal in three dominant life‐history strategies in fishes. PTI using phylogenetic SEMs ensures that estimated parameters are interpretable as regression slopes, such that the inferred evolutionary relationships can be compared with long‐term evolutionary and rearing experiments. 
    more » « less