skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Signal, bias, and the role of transcriptome assembly quality in phylogenomic inference
Abstract Background Phylogenomic approaches have great power to reconstruct evolutionary histories, however they rely on multi-step processes in which each stage has the potential to affect the accuracy of the final result. Many studies have empirically tested and established methodology for resolving robust phylogenies, including selecting appropriate evolutionary models, identifying orthologs, or isolating partitions with strong phylogenetic signal. However, few have investigated errors that may be initiated at earlier stages of the analysis. Biases introduced during the generation of the phylogenomic dataset itself could produce downstream effects on analyses of evolutionary history. Transcriptomes are widely used in phylogenomics studies, though there is little understanding of how a poor-quality assembly of these datasets could impact the accuracy of phylogenomic hypotheses. Here we examined how transcriptome assembly quality affects phylogenomic inferences by creating independent datasets from the same input data representing high-quality and low-quality transcriptome assembly outcomes. Results By studying the performance of phylogenomic datasets derived from alternative high- and low-quality assembly inputs in a controlled experiment, we show that high-quality transcriptomes produce richer phylogenomic datasets with a greater number of unique partitions than low-quality assemblies. High-quality assemblies also give rise to partitions that have lower alignment ambiguity and less compositional bias. In addition, high-quality partitions hold stronger phylogenetic signal than their low-quality transcriptome assembly counterparts in both concatenation- and coalescent-based analyses. Conclusions Our findings demonstrate the importance of transcriptome assembly quality in phylogenomic analyses and suggest that a portion of the uncertainty observed in such studies could be alleviated at the assembly stage.  more » « less
Award ID(s):
1638296
PAR ID:
10351224
Author(s) / Creator(s):
; ; ;
Date Published:
Journal Name:
BMC Ecology and Evolution
Volume:
21
Issue:
1
ISSN:
2730-7182
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Morphological characters and nuclear ribosomal DNA (rDNA) phylogenies have so far been the basis of the current classifications of arbuscular mycorrhizal (AM) fungi. Improved understanding of the evolutionary history of AM fungi requires extensive ortholog sampling and analyses of genome and transcriptome data from a wide range of taxa. To circumvent the need for axenic culturing of AM fungi we gathered and combined genomic data from single nuclei to generate de novo genome assemblies covering seven families of AM fungi. We successfully sequenced the genomes of 15 AM fungal species for which genome data was not previously available. Comparative analysis of the previously published Rhizophagus irregularis DAOM197198 assembly confirm that our novel workflow generates genome assemblies suitable for phylogenomic analysis. Predicted genes of our assemblies, together with published protein sequences of AM fungi and their sister clades, were used for phylogenomic analyses. We evaluated the phylogenetic placement of Glomeromycota in relation to its sister phyla (Mucoromycota and Mortierellomycota), and found no support to reject a polytomy. Finally, we explored the phylogenetic relationships within Glomeromycota. Our results support family level classification from previous phylogenetic studies, and the polyphyly of the order Glomerales with Claroideoglomeraceae as the sister group to Glomeraceae and Diversisporales. 
    more » « less
  2. Abstract BackgroundHigh-quality genomic resources facilitate investigations into behavioral ecology, morphological and physiological adaptations, and the evolution of genomic architecture. Lizards in the genus Sceloporus have a long history as important ecological, evolutionary, and physiological models, making them a valuable target for the development of genomic resources. FindingsWe present a high-quality chromosome-level reference genome assembly, SceUnd1.0 (using 10X Genomics Chromium, HiC, and Pacific Biosciences data), and tissue/developmental stage transcriptomes for the eastern fence lizard, Sceloporus undulatus. We performed synteny analysis with other snake and lizard assemblies to identify broad patterns of chromosome evolution including the fusion of micro- and macrochromosomes. We also used this new assembly to provide improved reference-based genome assemblies for 34 additional Sceloporus species. Finally, we used RNAseq and whole-genome resequencing data to compare 3 assemblies, each representing an increased level of cost and effort: Supernova Assembly with data from 10X Genomics Chromium, HiRise Assembly that added data from HiC, and PBJelly Assembly that added data from Pacific Biosciences sequencing. We found that the Supernova Assembly contained the full genome and was a suitable reference for RNAseq and single-nucleotide polymorphism calling, but the chromosome-level scaffolds provided by the addition of HiC data allowed synteny and whole-genome association mapping analyses. The subsequent addition of PacBio data doubled the contig N50 but provided negligible gains in scaffold length. ConclusionsThese new genomic resources provide valuable tools for advanced molecular analysis of an organism that has become a model in physiology and evolutionary ecology. 
    more » « less
  3. ABSTRACT For most species, transcriptome data are much more readily available than genome data. Without a reference genome, gene calling is cumbersome and inaccurate because of the high degree of redundancy in de novo transcriptome assemblies. To simplify and increase the accuracy of de novo transcriptome assembly in the absence of a reference genome, we developed UnigeneFinder. Combining several clustering methods, UnigeneFinder substantially reduces the redundancy typical of raw transcriptome assemblies. This pipeline offers an effective solution to the problem of inflated transcript numbers, achieving a closer representation of the actual underlying genome. UnigeneFinder performs comparably or better, compared with existing tools, on plant species with varying genome complexities. UnigeneFinder is the only available transcriptome redundancy solution that fully automates the generation of primary transcript, coding region, and protein sequences, analogous to those available for high‐quality reference genomes. These features, coupled with the pipeline’s cross‐platform implementation, focus on automation, and an accessible, user‐friendly interface, make UnigeneFinder a useful tool for many downstream sequence‐based analyses in nonmodel organisms lacking a reference genome, including differential gene expression analysis, accurate ortholog identification, functional enrichments, and evolutionary analyses. UnigeneFinder also runs efficiently both on high‐performance computing (HPC) systems and personal computers, further reducing barriers to use. 
    more » « less
  4. Abstract Scorpions are ancient and historically renowned for their potent venom. Traditionally, the systematics of this group of arthropods was supported by morphological characters, until recent phylogenomic analyses (using RNAseq data) revealed most of the higher‐level taxa to be non‐monophyletic. While these phylogenomic hypotheses are stable for almost all lineages, some nodes have been hard to resolve due to minimal taxonomic sampling (e.g. family Chactidae). In the same line, it has been shown that some nodes in the Arachnid Tree of Life show disagreement between hypotheses generated using transcritptomes and other genomic sources such as the ultraconserved elements (UCEs). Here, we compared the phylogenetic signal of transcriptomes vs. UCEs by retrieving UCEs from new and previously published scorpion transcriptomes and genomes, and reconstructed phylogenies using both datasets independently. We reexamined the monophyly and phylogenetic placement of Chactidae, sampling an additional chactid species using both datasets. Our results showed that both sets of genome‐scale datasets recovered highly similar topologies, with Chactidae rendered paraphyletic owing to the placement ofNullibrotheas allenii. As a first step toward redressing the systematics of Chactidae, we establish the family Anuroctonidae (new family) to accommodate the genusAnuroctonus. 
    more » « less
  5. Abstract Evolutionary processes may have substantial impacts on community assembly, but evidence for phylogenetic relatedness as a determinant of interspecific interaction strength remains mixed. In this perspective, we consider a possible role for discordance between gene trees and species trees in the interpretation of phylogenetic signal in studies of community ecology. Modern genomic data show that the evolutionary histories of many taxa are better described by a patchwork of histories that vary along the genome rather than a single species tree. If a subset of genomic loci harbour trait‐related genetic variation, then the phylogeny at these loci may be more informative of interspecific trait differences than the genome background. We develop a simple method to detect loci harbouring phylogenetic signal and demonstrate its application through a proof‐of‐principle analysis ofPenicilliumgenomes and pairwise interaction strength. Our results show that phylogenetic signal that may be masked genome‐wide could be detectable using phylogenomic techniques and may provide a window into the genetic basis for interspecific interactions. 
    more » « less