Abstract Phlebotomine sand flies are the vectors of leishmaniasis, a neglected tropical disease. High-quality reference genomes are an important tool for understanding the biology and eco-evolutionary dynamics underpinning disease epidemiology. Previous leishmaniasis vector reference sequences were limited by sequencing technologies available at the time and inadequate for high-resolution genomic inquiry. Here, we present updated reference assemblies of two sand flies,Phlebotomus papatasiandLutzomyia longipalpis. These chromosome-level assemblies were generated using an ultra-low input library protocol, PacBio HiFi long reads, and Hi-C technology. The newP. papatasireference has a final assembly span of 351.6 Mb and contig and scaffold N50s of 926 kb and 111.8 Mb, respectively. The newLu. longipalpisreference has a final assembly span of 147.8 Mb and contig and scaffold N50s of 1.09 Mb and 40.6 Mb, respectively. Benchmarking Universal Single-Copy Orthologue (BUSCO) assessments indicated 94.5% and 95.6% complete single copy insecta orthologs forP. papatasiandLu. longipalpis. These improved assemblies will serve as an invaluable resource for future genomic work on phlebotomine sandflies.
more »
« less
This content will become publicly available on December 1, 2026
Universal orthologs infer deep phylogenies and improve genome quality assessments
Abstract BackgroundUniversal single-copy orthologs are the most conserved components of genomes. Although they are routinely used for studying evolutionary histories and assessing new assemblies, current methods do not incorporate information from available genomic data. ResultsHere, we first determine the influence of evolutionary history on universal gene content and find that across 11,098 genomes of plants, fungi, and animals comprising 2606 taxonomic groups, 215 groups significantly vary from their respective lineages in terms of BUSCO (Benchmarking Universal Single Copy Orthologs) completeness. Additionally, 169 groups display an elevated complement of duplicated orthologs, likely from ancestral whole genome duplication events. Secondly, we investigate the extent of taxonomic congruence in broad BUSCO-derived phylogenies. For 275 suitable families out of 543 tested, sites evolving at higher rates produce at most 23.84% more taxonomically concordant, and at least 46.15% less terminally variable phylogenies compared to lower-rate sites. We find that BUSCO concatenated and coalescent trees have comparable accuracy and conclude that higher rate sites from concatenated alignments produce the most congruent and least variable phylogenies. Finally, we show that undetected, yet pervasive BUSCO gene loss events lead to misrepresentations of assembly quality. To overcome this, we filter a Curated set of BUSCOs (CUSCOs) that provide up to 6.99% fewer false positives compared to the standard search and introduce novel methods for comparing assemblies using gene synteny. ConclusionsOverall, we highlight the importance of considering evolutionary histories during assembly evaluations and release the phyca software toolkit that reconstructs consistent phylogenies and offers more precise assembly assessments.
more »
« less
- Award ID(s):
- 2022055
- PAR ID:
- 10632967
- Publisher / Repository:
- BMC Biology
- Date Published:
- Journal Name:
- BMC Biology
- Volume:
- 23
- Issue:
- 1
- ISSN:
- 1741-7007
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Suh, Alexander (Ed.)Abstract Although spiders are one of the most diverse groups of arthropods, the genetic architecture of their evolutionary adaptations is largely unknown. Specifically, ancient genome-wide duplication occurring during arachnid evolution ~450 mya resulted in a vast assembly of gene families, yet the extent to which selection has shaped this variation is understudied. To aid in comparative genome sequence analyses, we provide a chromosome-level genome of the Western black widow spider (Latrodectus hesperus)—a focus due to its silk properties, venom applications, and as a model for urban adaptation. We used long-read and Hi-C sequencing data, combined with transcriptomes, to assemble 14 chromosomes in a 1.46 Gb genome, with 38,393 genes annotated, and a BUSCO score of 95.3%. Our analyses identified high repetitive gene content and heterozygosity, consistent with other spider genomes, which has led to challenges in genome characterization. Our comparative evolutionary analyses of eight genomes available for species within the Araneoidea group (orb weavers and their descendants) identified 1,827 single-copy orthologs. Of these, 155 exhibit significant positive selection primarily associated with developmental genes, and with traits linked to sensory perception. These results support the hypothesis that several traits unique to spiders emerged from the adaptive evolution of ohnologs—or retained ancestrally duplicated genes—from ancient genome-wide duplication. These comparative spider genome analyses can serve as a model to understand how positive selection continually shapes ancestral duplications in generating novel traits today within and between diverse taxonomic groups.more » « less
-
Abstract The brown bear (Ursus arctos) is the second largest and most widespread extant terrestrial carnivore on Earth and has recently emerged as a medical model for human metabolic diseases. Here, we report a fully phased chromosome-level assembly of a male North American brown bear built by combining Pacific Biosciences (PacBio) HiFi data and publicly available Hi-C data. The final genome size is 2.47 Gigabases (Gb) with a scaffold and contig N50 length of 70.08 and 43.94 Megabases (Mb), respectively. Benchmarking Universal Single-Copy Ortholog (BUSCO) analysis revealed that 94.5% of single copy orthologs from Mammalia were present in the genome (the highest of any ursid genome to date). Repetitive elements accounted for 44.48% of the genome and a total of 20,480 protein coding genes were identified. Based on whole genome alignment to the polar bear, the brown bear is highly syntenic with the polar bear, and our phylogenetic analysis of 7,246 single-copy orthologs supports the currently proposed species tree for Ursidae. This highly contiguous genome assembly will support future research on both the evolutionary history of the bear family and the physiological mechanisms behind hibernation, the latter of which has broad medical implications.more » « less
-
Wheat, Christopher (Ed.)Abstract The blackstripe livebearer Poeciliopsis prolifica is a live-bearing fish belonging to the family Poeciliidae with high level of postfertilization maternal investment (matrotrophy). This viviparous matrotrophic species has evolved a structure similarly to the mammalian placenta. Placentas have independently evolved multiple times in Poeciliidae from nonplacental ancestors, which provide an opportunity to study the placental evolution. However, there is a lack of high-quality reference genomes for the placental species in Poeciliidae. In this study, we present a 674 Mb assembly of P. prolifica in 504 contigs with excellent continuity (contig N50 7.7 Mb) and completeness (97.2% Benchmarking Universal Single-Copy Orthologs [BUSCO] completeness score, including 92.6% single-copy and 4.6% duplicated BUSCO score). A total of 27,227 protein-coding genes were annotated from the merged datasets based on bioinformatic prediction, RNA sequencing and homology evidence. Phylogenomic analyses revealed that P. prolifica diverged from the guppy (Poecilia reticulata) ∼19 Ma. Our research provides the necessary resources and the genomic toolkit for investigating the genetic underpinning of placentation.more » « less
-
Abstract The Andes mountains of western South America are a globally important biodiversity hotspot, yet there is a paucity of resolved phylogenies for plant clades from this region. Filling an important gap in our understanding of the World’s richest flora, we present the first phylogeny of Freziera (Pentaphylacaceae), an Andean-centered, cloud forest radiation. Our dataset was obtained via hybrid-enriched target sequence capture of Angiosperms353 universal loci for 50 of the ca. 75 spp., obtained almost entirely from herbarium specimens. We identify high phylogenomic complexity in Freziera, including the presence of data artifacts. Via by-eye observation of gene trees, detailed examination of warnings from recently improved assembly pipelines, and gene tree filtering, we identified that artifactual orthologs (i.e., the presence of only one copy of a multicopy gene due to differential assembly) were an important source of gene tree heterogeneity that had a negative impact on phylogenetic inference and support. These artifactual orthologs may be common in plant phylogenomic datasets, where multiple instances of genome duplication are common. After accounting for artifactual orthologs as source of gene tree error, we identified a significant, but nonspecific signal of introgression using Patterson’s D and f4 statistics. Despite phylogenomic complexity, we were able to resolve Freziera into 9 well-supported subclades whose evolution has been shaped by multiple evolutionary processes, including incomplete lineage sorting, historical gene flow, and gene duplication. Our results highlight the complexities of plant phylogenomics, which are heightened in Andean radiations, and show the impact of filtering data processing artifacts and standard filtering approaches on phylogenetic inference.more » « less
An official website of the United States government
