skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: pSONIC: Ploidy-aware Syntenic Orthologous Networks Identified via Collinearity
Abstract With the rapid rise in availability of high-quality genomes for closely related species, methods for orthology inference that incorporate synteny are increasingly useful. Polyploidy perturbs the 1:1 expected frequencies of orthologs between two species, complicating the identification of orthologs. Here we present a method of ortholog inference, Ploidy-aware Syntenic Orthologous Networks Identified via Collinearity (pSONIC). We demonstrate the utility of pSONIC using four species in the cotton tribe (Gossypieae), including one allopolyploid, and place between 75% and 90% of genes from each species into nearly 32,000 orthologous groups, 97% of which consist of at most singletons or tandemly duplicated genes—58.8% more than comparable methods that do not incorporate synteny. We show that 99% of singleton gene groups follow the expected tree topology and that our ploidy-aware algorithm recovers 97.5% identical groups when compared to splitting the allopolyploid into its two respective subgenomes, treating each as separate “species.”  more » « less
Award ID(s):
1829176
PAR ID:
10308592
Author(s) / Creator(s):
 ;  ;  
Editor(s):
Morrell, P L
Date Published:
Journal Name:
G3 Genes|Genomes|Genetics
Volume:
11
Issue:
8
ISSN:
2160-1836
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Polyploidy, or whole-genome duplication, is expected to confound the inference of species trees with phyloge- netic methods for two reasons. First, the presence of retained duplicated genes requires the reconciliation of the inferred gene trees to a proposed species tree. Second, even if the analyses are restricted to shared single copy genes, the occurrence of reciprocal gene loss, where the surviving genes in different species are paralogs from the polyploidy rather than orthologs, will mean that such genes will not have evolved under the corresponding species tree and may not produce gene trees that allow inference of that species tree. Here we analyze three different ancient polyploidy events, using synteny-based inferences of orthology and paralogy to infer gene trees from nearly 17,000 sets of homologous genes. We find that the simple use of single copy genes from polyploid organisms provides reasonably robust phylogenetic signals, despite the presence of reciprocal gene losses. Such gene trees are also most often in accord with the inferred species relationships inferred from maximum likelihood models of gene loss after polyploidy: a completely distinct phylogenetic signal present in these genomes. As seen in other studies, however, we find that methods for inferring phylogenetic confidence yield high support values even in cases where the underlying data suggest meaningful conflict in the phylogenetic signals. 
    more » « less
  2. Abstract We describe POInTbrowse, a web portal that gives access to the orthology inferences made for polyploid genomes with POInT, the Polyploidy Orthology Inference Tool. Ancient, or paleo-, polyploidy events are widely distributed across the eukaryotic phylogeny, and the combination of duplicated and lost duplicated genes that these polyploidies produce can confound the identification of orthologous genes between genomes. POInT uses conserved synteny and phylogenetic models to infer orthologous genes between genomes with a shared polyploidy. It also gives confidence estimates for those orthology inferences. POInTbrowsegives both graphical and query-based access to these inferences from 12 different polyploidy events, allowing users to visualize genomic regions produced by polyploidies and perform batch queries for each polyploidy event, downloading genes trees and coding sequences for orthologous genes meeting user-specified criteria. POInTbrowseand the associated data are online athttps://wgd.statgen.ncsu.edu. 
    more » « less
  3. Hejnol, Andreas (Ed.)
    Molecular evolution studies, such as phylogenomic studies and genome-wide surveys of selection, often rely on gene families of single-copy orthologs (SC-OGs). Large gene families with multiple homologs in 1 or more species—a phenomenon observed among several important families of genes such as transporters and transcription factors—are often ignored because identifying and retrieving SC-OGs nested within them is challenging. To address this issue and increase the number of markers used in molecular evolution studies, we developed OrthoSNAP, a software that uses a phylogenetic framework to simultaneously split gene families into SC-OGs and prune species-specific inparalogs. We term SC-OGs identified by OrthoSNAP as SNAP-OGs because they are identified using a s plitti n g a nd p runing procedure analogous to snapping branches on a tree. From 415,129 orthologous groups of genes inferred across 7 eukaryotic phylogenomic datasets, we identified 9,821 SC-OGs; using OrthoSNAP on the remaining 405,308 orthologous groups of genes, we identified an additional 10,704 SNAP-OGs. Comparison of SNAP-OGs and SC-OGs revealed that their phylogenetic information content was similar, even in complex datasets that contain a whole-genome duplication, complex patterns of duplication and loss, transcriptome data where each gene typically has multiple transcripts, and contentious branches in the tree of life. OrthoSNAP is useful for increasing the number of markers used in molecular evolution data matrices, a critical step for robustly inferring and exploring the tree of life. 
    more » « less
  4. Wittkopp, Patricia (Ed.)
    Abstract In Drosophila melanogaster and D. simulans head tissue, 60% of orthologous genes show evidence of sex-biased expression in at least one species. Of these, ∼39% (2,192) are conserved in direction. We hypothesize enrichment of open chromatin in the sex where we see expression bias and closed chromatin in the opposite sex. Male-biased orthologs are significantly enriched for H3K4me3 marks in males of both species (∼89% of male-biased orthologs vs. ∼76% of unbiased orthologs). Similarly, female-biased orthologs are significantly enriched for H3K4me3 marks in females of both species (∼90% of female-biased orthologs vs. ∼73% of unbiased orthologs). The sex-bias ratio in female-biased orthologs was similar in magnitude between the two species, regardless of the closed chromatin (H3K27me2me3) marks in males. However, in male-biased orthologs, the presence of H3K27me2me3 in both species significantly reduced the correlation between D. melanogaster sex-bias ratio and the D. simulans sex-bias ratio. Male-biased orthologs are enriched for evidence of positive selection in the D. melanogaster group. There are more male-biased genes than female-biased genes in both species. For orthologs with gains/losses of sex-bias between the two species, there is an excess of male-bias compared to female-bias, but there is no consistent pattern in the relationship between H3K4me3 or H3K27me2me3 chromatin marks and expression. These data suggest chromatin state is a component of the maintenance of sex-biased expression and divergence of sex-bias between species is reflected in the complexity of the chromatin status. 
    more » « less
  5. Kubatko, Laura (Ed.)
    Abstract Many recent phylogenetic methods have focused on accurately inferring species trees when there is gene tree discordance due to incomplete lineage sorting (ILS). For almost all of these methods, and for phylogenetic methods in general, the data for each locus are assumed to consist of orthologous, single-copy sequences. Loci that are present in more than a single copy in any of the studied genomes are excluded from the data. These steps greatly reduce the number of loci available for analysis. The question we seek to answer in this study is: what happens if one runs such species tree inference methods on data where paralogy is present, in addition to or without ILS being present? Through simulation studies and analyses of two large biological data sets, we show that running such methods on data with paralogs can still provide accurate results. We use multiple different methods, some of which are based directly on the multispecies coalescent model, and some of which have been proven to be statistically consistent under it. We also treat the paralogous loci in multiple ways: from explicitly denoting them as paralogs, to randomly selecting one copy per species. In all cases, the inferred species trees are as accurate as equivalent analyses using single-copy orthologs. Our results have significant implications for the use of ILS-aware phylogenomic analyses, demonstrating that they do not have to be restricted to single-copy loci. This will greatly increase the amount of data that can be used for phylogenetic inference.[Gene duplication and loss; incomplete lineage sorting; multispecies coalescent; orthology; paralogy.] 
    more » « less