skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


This content will become publicly available on January 21, 2026

Title: A comparison of phylogenomic inference pipelines for low‐coverage whole‐genome sequencing in Formica ants
Abstract A rapid proliferation in the availability of whole genome sequences (WGS), often with relatively low read depth, offers an unprecedented opportunity for phylogenomic advances using publicly available data, but there are several key challenges in applying these data. Using low‐coverage WGS data for the ant species ofFormica, we conducted detailed comparisons on two different analytical pipelines (reference‐based vs. de novo genome assembly), four types of datasets (5‐kbp‐window, ultra‐conserved element [UCE], single‐copy ortholog [BUSCO] and mitogenome), and a series of analytical procedures (e.g. concatenation vs. coalescent analyses) to identify which are robust to typical WGS data. The results show that at a shallow scale of phylogenetic relationships of closely related species 5‐kbp‐windows from the reference‐based pipeline and UCEs from the de novo assemblies are more successful than the BUSCOs in recovering informative markers for phylogenetic inference. Compared with concatenation analyses, coalescent analyses often resulted in disparate deeper relationships in the phylogeny. This study also uncovers evident mito‐nuclear discordance and demonstrates genome‐wide gene conflicts in phylogenetic signals, both pointing to possible incomplete lineage sorting and/or hybridization during the early, rapid radiation ofFormicaants. Divergence dating analyses show that different types of data and analytical methods could result in inconsistent time estimates, highlighting the potential need for multiple approaches to better understand species divergence. The strengths and weaknesses of different analytical pipelines and strategies are discussed. Findings from this study provide valuable insights for large‐scale phylogenomic projects using WGS data.  more » « less
Award ID(s):
1942252 1754834
PAR ID:
10587756
Author(s) / Creator(s):
; ; ; ;
Editor(s):
Blaimer, Bonnie
Publisher / Repository:
Systematic Entomology
Date Published:
Journal Name:
Systematic Entomology
ISSN:
0307-6970
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract A phylogenomic analysis of the so far phylogenetically unresolved subfamily Bromelioideae (Bromeliaceae) was performed to infer species relationships as the basis for future taxonomic treatment, stabilization of generic concept, and further analyses of evolution and biogeography of the subfamily. A target‐enrichment approach was chosen, using the Angiosperms353 v.4 kit RNA‐baits and including 86 Bromelioideae species representing previously identified major evolutionary lineages. Phylogenetic analyses were based on 125 target nuclear loci, assembled off‐target plastome as well as mitogenome reads. A Bromelioideae phylogeny with a mostly well‐resolved backbone is provided based on nuclear (194 kbp), plastome (109 kbp), and mitogenome data (34 kbp). For the nuclear markers, a coalescent‐based analysis of single‐locus gene trees was performed as well as a supermatrix analysis of concatenated gene alignments. Nuclear and plastome datasets provide well‐resolved trees, which showed only minor topological incongruences. The mitogenome tree is not sufficiently resolved. A total of 26 well‐supported clades were identified. The generaAechmea,Canistrum,Hohenbergia,Neoregelia, andQuesneliawere revealed polyphyletic. In core Bromelioideae,Acanthostachysis sister to the remainder. Among the 26 recognized clades, 12 correspond with currently employed taxonomic concepts. Hence, the presented phylogenetic framework will serve as an important basis for future taxonomic revisions as well as to better understand the evolutionary drivers and processes in this exciting subfamily. 
    more » « less
  2. INTRODUCTION Resolving the role that different environmental forces may have played in the apparent explosive diversification of modern placental mammals is crucial to understanding the evolutionary context of their living and extinct morphological and genomic diversity. RATIONALE Limited access to whole-genome sequence alignments that sample living mammalian biodiversity has hampered phylogenomic inference, which until now has been limited to relatively small, highly constrained sequence matrices often representing <2% of a typical mammalian genome. To eliminate this sampling bias, we used an alignment of 241 whole genomes to comprehensively identify and rigorously analyze noncoding, neutrally evolving sequence variation in coalescent and concatenation-based phylogenetic frameworks. These analyses were followed by validation with multiple classes of phylogenetically informative structural variation. This approach enabled the generation of a robust time tree for placental mammals that evaluated age variation across hundreds of genomic loci that are not restricted by protein coding annotations. RESULTS Coalescent and concatenation phylogenies inferred from multiple treatments of the data were highly congruent, including support for higher-level taxonomic groupings that unite primates+colugos with treeshrews (Euarchonta), bats+cetartiodactyls+perissodactyls+carnivorans+pangolins (Scrotifera), all scrotiferans excluding bats (Fereuungulata), and carnivorans+pangolins with perissodactyls (Zooamata). However, because these approaches infer a single best tree, they mask signatures of phylogenetic conflict that result from incomplete lineage sorting and historical hybridization. Accordingly, we also inferred phylogenies from thousands of noncoding loci distributed across chromosomes with historically contrasting recombination rates. Throughout the radiation of modern orders (such as rodents, primates, bats, and carnivores), we observed notable differences between locus trees inferred from the autosomes and the X chromosome, a pattern typical of speciation with gene flow. We show that in many cases, previously controversial phylogenetic relationships can be reconciled by examining the distribution of conflicting phylogenetic signals along chromosomes with variable historical recombination rates. Lineage divergence time estimates were notably uniform across genomic loci and robust to extensive sensitivity analyses in which the underlying data, fossil constraints, and clock models were varied. The earliest branching events in the placental phylogeny coincide with the breakup of continental landmasses and rising sea levels in the Late Cretaceous. This signature of allopatric speciation is congruent with the low genomic conflict inferred for most superordinal relationships. By contrast, we observed a second pulse of diversification immediately after the Cretaceous-Paleogene (K-Pg) extinction event superimposed on an episode of rapid land emergence. Greater geographic continuity coupled with tumultuous climatic changes and increased ecological landscape at this time provided enhanced opportunities for mammalian diversification, as depicted in the fossil record. These observations dovetail with increased phylogenetic conflict observed within clades that diversified in the Cenozoic. CONCLUSION Our genome-wide analysis of multiple classes of sequence variation provides the most comprehensive assessment of placental mammal phylogeny, resolves controversial relationships, and clarifies the timing of mammalian diversification. We propose that the combination of Cretaceous continental fragmentation and lineage isolation, followed by the direct and indirect effects of the K-Pg extinction at a time of rapid land emergence, synergistically contributed to the accelerated diversification rate of placental mammals during the early Cenozoic. The timing of placental mammal evolution. Superordinal mammalian diversification took place in the Cretaceous during periods of continental fragmentation and sea level rise with little phylogenomic discordance (pie charts: left, autosomes; right, X chromosome), which is consistent with allopatric speciation. By contrast, the Paleogene hosted intraordinal diversification in the aftermath of the K-Pg mass extinction event, when clades exhibited higher phylogenomic discordance consistent with speciation with gene flow and incomplete lineage sorting. 
    more » « less
  3. Morphological characters and nuclear ribosomal DNA (rDNA) phylogenies have so far been the basis of the current classifications of arbuscular mycorrhizal (AM) fungi. Improved understanding of the evolutionary history of AM fungi requires extensive ortholog sampling and analyses of genome and transcriptome data from a wide range of taxa. To circumvent the need for axenic culturing of AM fungi we gathered and combined genomic data from single nuclei to generate de novo genome assemblies covering seven families of AM fungi. We successfully sequenced the genomes of 15 AM fungal species for which genome data was not previously available. Comparative analysis of the previously published Rhizophagus irregularis DAOM197198 assembly confirm that our novel workflow generates genome assemblies suitable for phylogenomic analysis. Predicted genes of our assemblies, together with published protein sequences of AM fungi and their sister clades, were used for phylogenomic analyses. We evaluated the phylogenetic placement of Glomeromycota in relation to its sister phyla (Mucoromycota and Mortierellomycota), and found no support to reject a polytomy. Finally, we explored the phylogenetic relationships within Glomeromycota. Our results support family level classification from previous phylogenetic studies, and the polyphyly of the order Glomerales with Claroideoglomeraceae as the sister group to Glomeraceae and Diversisporales. 
    more » « less
  4. PremiseLarge genomic data sets offer the promise of resolving historically recalcitrant species relationships. However, different methodologies can yield conflicting results, especially when clades have experienced ancient, rapid diversification. Here, we analyzed the ancient radiation of Ericales and explored sources of uncertainty related to species tree inference, conflicting gene tree signal, and the inferred placement of gene and genome duplications. MethodsWe used a hierarchical clustering approach, with tree‐based homology and orthology detection, to generate six filtered phylogenomic matrices consisting of data from 97 transcriptomes and genomes. Support for species relationships was inferred from multiple lines of evidence including shared gene duplications, gene tree conflict, gene‐wise edge‐based analyses, concatenation, and coalescent‐based methods, and is summarized in a consensus framework. ResultsOur consensus approach supported a topology largely concordant with previous studies, but suggests that the data are not capable of resolving several ancient relationships because of lack of informative characters, sensitivity to methodology, and extensive gene tree conflict correlated with paleopolyploidy. We found evidence of a whole‐genome duplication before the radiation of all or most ericalean families, and demonstrate that tree topology and heterogeneous evolutionary rates affect the inferred placement of genome duplications. ConclusionsWe provide several hypotheses regarding the history of Ericales, and confidently resolve most nodes, but demonstrate that a series of ancient divergences are unresolvable with these data. Whether paleopolyploidy is a major source of the observed phylogenetic conflict warrants further investigation. 
    more » « less
  5. Ruane, Sara (Ed.)
    Abstract Some phylogenetic problems remain unresolved even when large amounts of sequence data are analyzed and methods that accommodate processes such as incomplete lineage sorting are employed. In addition to investigating biological sources of phylogenetic incongruence, it is also important to reduce noise in the phylogenomic dataset by using appropriate filtering approach that addresses gene tree estimation errors. We present the results of a case study in manakins, focusing on the very difficult clade comprising the genera Antilophia and Chiroxiphia. Previous studies suggest that Antilophia is nested within Chiroxiphia, though relationships among Antilophia+Chiroxiphia species have been highly unstable. We extracted more than 11,000 loci (ultra-conserved elements and introns) from whole genomes and conducted analyses using concatenation and multispecies coalescent methods. Topologies resulting from analyses using all loci differed depending on the data type and analytical method, with 2 clades (Antilophia+Chiroxiphia and Manacus+Pipra+Machaeopterus) in the manakin tree showing incongruent results. We hypothesized that gene trees that conflicted with a long coalescent branch (e.g., the branch uniting Antilophia+Chiroxiphia) might be enriched for cases of gene tree estimation error, so we conducted analyses that either constrained those gene trees to include monophyly of Antilophia+Chiroxiphia or excluded these loci. While constraining trees reduced some incongruence, excluding the trees led to completely congruent species trees, regardless of the data type or model of sequence evolution used. We found that a suite of gene metrics (most importantly the number of informative sites and likelihood of intralocus recombination) collectively explained the loci that resulted in non-monophyly of Antilophia+Chiroxiphia. We also found evidence for introgression that may have contributed to the discordant topologies we observe in Antilophia+Chiroxiphia and led to deviations from expectations given the multispecies coalescent model. Our study highlights the importance of identifying factors that can obscure phylogenetic signal when dealing with recalcitrant phylogenetic problems, such as gene tree estimation error, incomplete lineage sorting, and reticulation events. [Birds; c-gene; data type; gene estimation error; model fit; multispecies coalescent; phylogenomics; reticulation] 
    more » « less