skip to main content


Title: Long-Read Assembly and Annotation of the Parasitoid Wasp Muscidifurax raptorellus, a Biological Control Agent for Filth Flies
The parasitoid wasp Muscidifurax raptorellus (Hymenoptera: Pteromalidae) is a gregarious species that has received extensive attention for its potential in biological pest control against house fly, stable fly, and other filth flies. It has a high reproductive capacity and can be reared easily. However, genome assembly is not available for M. raptorellus or any other species in this genus. Previously, we assembled a complete circular mitochondrial genome with a length of 24,717 bp. Here, we assembled and annotated a high-quality nuclear genome of M. raptorellus , using a combination of long-read (104× genome coverage) and short-read (326× genome coverage) sequencing technologies. The assembled genome size is 314 Mbp in 226 contigs, with a 97.9% BUSCO completeness score and a contig N50 of 4.67 Mb, suggesting excellent continuity of this assembly. Our assembly builds the foundation for comparative and evolutionary genomic analysis in the genus of Muscidifurax and possible future biocontrol applications.  more » « less
Award ID(s):
1928770 1950078
NSF-PAR ID:
10308520
Author(s) / Creator(s):
; ; ; ; ; ; ; ; ; ;
Date Published:
Journal Name:
Frontiers in Genetics
Volume:
12
ISSN:
1664-8021
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. null (Ed.)
    Abstract Background Availability of plant genome sequences has led to significant advances. However, with few exceptions, the great majority of existing genome assemblies are derived from short read sequencing technologies with highly uneven read coverages indicative of sequencing and assembly issues that could significantly impact any downstream analysis of plant genomes. In tomato for example, 0.6% (5.1 Mb) and 9.7% (79.6 Mb) of short-read based assembly had significantly higher and lower coverage compared to background, respectively. Results To understand what the causes may be for such uneven coverage, we first established machine learning models capable of predicting genomic regions with variable coverages and found that high coverage regions tend to have higher simple sequence repeat and tandem gene densities compared to background regions. To determine if the high coverage regions were misassembled, we examined a recently available tomato long-read based assembly and found that 27.8% (1.41 Mb) of high coverage regions were potentially misassembled of duplicate sequences, compared to 1.4% in background regions. In addition, using a predictive model that can distinguish correctly and incorrectly assembled high coverage regions, we found that misassembled, high coverage regions tend to be flanked by simple sequence repeats, pseudogenes, and transposon elements. Conclusions Our study provides insights on the causes of variable coverage regions and a quantitative assessment of factors contributing to plant genome misassembly when using short reads and the generality of these causes and factors should be tested further in other species. 
    more » « less
  2. Background

    Metagenomics has transformed our understanding of microbial diversity across ecosystems, with recent advances enablingde novoassembly of genomes from metagenomes. These metagenome-assembled genomes are critical to provide ecological, evolutionary, and metabolic context for all the microbes and viruses yet to be cultivated. Metagenomes can now be generated from nanogram to subnanogram amounts of DNA. However, these libraries require several rounds of PCR amplification before sequencing, and recent data suggest these typically yield smaller and more fragmented assemblies than regular metagenomes.

    Methods

    Here we evaluatede novoassembly methods of 169 PCR-amplified metagenomes, including 25 for which an unamplified counterpart is available, to optimize specific assembly approaches for PCR-amplified libraries. We first evaluated coverage bias by mapping reads from PCR-amplified metagenomes onto reference contigs obtained from unamplified metagenomes of the same samples. Then, we compared different assembly pipelines in terms of assembly size (number of bp in contigs ≥ 10 kb) and error rates to evaluate which are the best suited for PCR-amplified metagenomes.

    Results

    Read mapping analyses revealed that the depth of coverage within individual genomes is significantly more uneven in PCR-amplified datasets versus unamplified metagenomes, with regions of high depth of coverage enriched in short inserts. This enrichment scales with the number of PCR cycles performed, and is presumably due to preferential amplification of short inserts. Standard assembly pipelines are confounded by this type of coverage unevenness, so we evaluated other assembly options to mitigate these issues. We found that a pipeline combining read deduplication and an assembly algorithm originally designed to recover genomes from libraries generated after whole genome amplification (single-cell SPAdes) frequently improved assembly of contigs ≥10 kb by 10 to 100-fold for low input metagenomes.

    Conclusions

    PCR-amplified metagenomes have enabled scientists to explore communities traditionally challenging to describe, including some with extremely low biomass or from which DNA is particularly difficult to extract. Here we show that a modified assembly pipeline can lead to an improvedde novogenome assembly from PCR-amplified datasets, and enables a better genome recovery from low input metagenomes.

     
    more » « less
  3. Jewel wasps in the genus of Nasonia are parasitoids with haplodiploidy sex determination, rapid development and are easy to culture in the laboratory. They are excellent models for insect genetics, genomics, epigenetics, development, and evolution. Nasonia vitripennis ( Nv ) and N. giraulti ( Ng ) are closely-related species that can be intercrossed, particularly after removal of the intracellular bacterium Wolbachia , which serve as a powerful tool to map and positionally clone morphological, behavioral, expression and methylation phenotypes. The Nv reference genome was assembled using Sanger, PacBio and Nanopore approaches and annotated with extensive RNA-seq data. In contrast, Ng genome is only available through low coverage resequencing. Therefore, de novo Ng assembly is in urgent need to advance this system. In this study, we report a high-quality Ng assembly using 10X Genomics linked-reads with 670X sequencing depth. The current assembly has a genome size of 259,040,977 bp in 3,160 scaffolds with 38.05% G-C and a 98.6% BUSCO completeness score. 97% of the RNA reads are perfectly aligned to the genome, indicating high quality in contiguity and completeness. A total of 14,777 genes are annotated in the Ng genome, and 72% of the annotated genes have a one-to-one ortholog in the Nv genome. We reported 5 million Ng-Nv SNPs which will facility mapping and population genomic studies in Nasonia . In addition, 42 Ng -specific genes were identified by comparing with Nv genome and annotation. This is the first de novo assembly for this important species in the Nasonia model system, providing a useful new genomic toolkit. 
    more » « less
  4. Abstract The angiosperm genus Silene is a model system for several traits of ecological and evolutionary significance in plants, including breeding system and sex chromosome evolution, host-pathogen interactions, invasive species biology, heavy metal tolerance, and cytonuclear interactions. Despite its importance, genomic resources for this large genus of approximately 850 species are scarce, with only one published whole-genome sequence (from the dioecious species Silene latifolia). Here, we provide genomic and transcriptomic resources for a hermaphroditic representative of this genus (S. noctiflora), including a PacBio Iso-Seq transcriptome, which uses long-read, single-molecule sequencing technology to analyze full-length mRNA transcripts. Using these data, we have assembled and annotated high-quality full-length cDNA sequences for approximately 14,126 S. noctiflora genes and 25,317 isoforms. We demonstrated the utility of these data to distinguish between recent and highly similar gene duplicates by identifying novel paralogous genes in an essential protease complex. Furthermore, we provide a draft assembly for the approximately 2.7-Gb genome of this species, which is near the upper range of genome-size values reported for diploids in this genus and threefold larger than the 0.9-Gb genome of Silene conica, another species in the same subgenus. Karyotyping confirmed that S. noctiflora is a diploid, indicating that its large genome size is not due to polyploidization. These resources should facilitate further study and development of this genus as a model in plant ecology and evolution. 
    more » « less
  5. Abstract

    The diversity among Drosophila species presents an opportunity to study the molecular mechanisms underlying the evolution of biological phenomena. A challenge to investigating these species is that, unlike the plethora of molecular and genetics tools available for D. melanogaster research, many other species do not have sequenced genomes; a requirement for employing these tools. Selecting transgenic flies through white (w) complementation has been commonly practiced in numerous Drosophila species. While tolerated, the disruption of w is associated with impaired vision, among other effects in D. melanogaster. The D. nebulosa fly has a unique mating behavior which requires vision, and is thus unable to successfully mate in dark conditions. Here, we hypothesized that the disruption of w will impede mating success. As a first step, using PacBio long-read sequencing, we assembled a high-quality annotated genome of D. nebulosa. Using these data, we employed CRISPR/Cas9 to successfully disrupt the w gene. As expected, D. nebulosa males null for w did not court females, unlike several other mutant strains of Drosophila species whose w gene has been disrupted. In the absence of mating, no females became homozygous null for w. We conclude that gene disruption via CRISPR/Cas9 genome engineering is a successful tool in D. nebulosa, and that the w gene is necessary for mating. Thus, an alternative selectable marker unrelated to vision is desirable.

     
    more » « less