skip to main content

Title: Draft genome assemblies of the avian louse Brueelia nebulosa and its associates using long-read sequencing from an individual specimen

Sequencing high molecular weight (HMW) DNA with long-read and linked-read technologies has promoted a major increase in more complete genome sequences for nonmodel organisms. Sequencing approaches that rely on HMW DNA have been limited to larger organisms or pools of multiple individuals, but recent advances have allowed for sequencing from individuals of small-bodied organisms. Here, we use HMW DNA sequencing with PacBio long reads and TELL-Seq linked reads to assemble and annotate the genome from a single individual feather louse (Brueelia nebulosa) from a European Starling (Sturnus vulgaris). We assembled a genome with a relatively high scaffold N50 (637 kb) and with BUSCO scores (96.1%) comparable to louse genomes assembled from pooled individuals. We annotated a number of genes (10,938) similar to the human louse (Pediculus humanus) genome. Additionally, calling phased variants revealed that the Brueelia genome is more heterozygous (∼1%) then expected for a highly obligate and dispersal-limited parasite. We also assembled and annotated the mitochondrial genome and primary endosymbiont (Sodalis) genome from the individual louse, which showed evidence for heteroplasmy in the mitogenome and a reduced genome size in the endosymbiont compared to its free-living relative. Our study is a valuable demonstration of the capability to obtain high-quality genomes from individual small, nonmodel organisms. Applying this approach to other organisms could greatly increase our understanding of the diversity and evolution of individual genomes.

more » « less
Author(s) / Creator(s):
; ; ; ; ;
Publisher / Repository:
Oxford University Press
Date Published:
Journal Name:
G3: Genes, Genomes, Genetics
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract Background

    De novo phased (haplo)genome assembly using long-read DNA sequencing data has improved the detection and characterization of structural variants (SVs) in plant and animal genomes. Able to span across haplotypes, long reads allow phased, haplogenome assembly in highly outbred organisms such as forest trees. Eucalyptus tree species and interspecific hybrids are the most widely planted hardwood trees with F1 hybrids of Eucalyptus grandis and E. urophylla forming the bulk of fast-growing pulpwood plantations in subtropical regions. The extent of structural variation and its effect on interspecific hybridization is unknown in these trees. As a first step towards elucidating the extent of structural variation between the genomes of E. grandis and E. urophylla, we sequenced and assembled the haplogenomes contained in an F1 hybrid of the two species.


    Using Nanopore sequencing and a trio-binning approach, we assembled the separate haplogenomes (566.7 Mb and 544.5 Mb) to 98.0% BUSCO completion. High-density SNP genetic linkage maps of both parents allowed scaffolding of 88.0% of the haplogenome contigs into 11 pseudo-chromosomes (scaffold N50 of 43.8 Mb and 42.5 Mb for the E. grandis and E. urophylla haplogenomes, respectively). We identify 48,729 SVs between the two haplogenomes providing the first detailed insight into genome structural rearrangement in these species. The two haplogenomes have similar gene content, 35,572 and 33,915 functionally annotated genes, of which 34.7% are contained in genome rearrangements.


    Knowledge of SV and haplotype diversity in the two species will form the basis for understanding the genetic basis of hybrid superiority in these trees.

    more » « less
  2. Abstract

    Sex determination systems and genetic sex differentiation across fishes are highly diverse but are unknown for most Cypriniformes, including Rio Grande silvery minnow (Hybognathus amarus). In this study, we aimed to detect and validate sex-linked markers to infer sex determination system and to demonstrate the utility of combining several methods for sex-linked marker detection in nonmodel organisms. To identify potential sex-linked markers, Nextera-tagmented reductively amplified DNA (nextRAD) libraries were generated from 66 females, 64 males, and 60 larvae of unknown sex. These data were combined with female and male de novo genomes from Nanopore long-read sequences. We identified five potential unique male nextRAD-tags and one potential unique male contig, suggesting an XY sex determination system. We also identified two single-nucleotide polymorphisms (SNPs) in the same contig with values of FST, allele frequencies, and heterozygosity conforming with expectations of an XY system. Through PCR we validated the marker containing the sex-linked SNPs and a single nextRAD-tag sex-associated marker but it was not male specific. Instead, more copies of this locus in the male genome were suggested by enhanced amplification in males. Results are consistent with an XY system with low differentiation between sex-determining regions. Further research is needed to confirm the level of differentiation between the sex chromosomes. Nonetheless, this study highlighted the power of combining reduced representation and whole-genome sequencing for identifying sex-linked markers, especially when reduced representation sequencing does not include extensive variation between sexes, either because such variation is not present or not captured.

    more » « less
  3. Zufall, Rebecca (Ed.)
    Abstract Ciliates are microbial eukaryotes with distinct somatic and germline genomes. Postzygotic development involves extensive remodeling of the germline genome to form somatic chromosomes. Ciliates therefore offer a valuable model for studying the architecture and evolution of programed genome rearrangements. Current studies usually focus on a few model species, where rearrangement features are annotated by aligning reference germline and somatic genomes. Although many high-quality somatic genomes have been assembled, a high-quality germline genome assembly is difficult to obtain due to its smaller DNA content and abundance of repetitive sequences. To overcome these hurdles, we propose a new pipeline, SIGAR (Split-read Inference of Genome Architecture and Rearrangements) to infer germline genome architecture and rearrangement features without a germline genome assembly, requiring only short DNA sequencing reads. As a proof of principle, 93% of rearrangement junctions identified by SIGAR in the ciliate Oxytricha trifallax were validated by the existing germline assembly. We then applied SIGAR to six diverse ciliate species without germline genome assemblies, including Ichthyophthirius multifilii, a fish pathogen. Despite the high level of somatic DNA contamination in each sample, SIGAR successfully inferred rearrangement junctions, short eliminated sequences, and potential scrambled genes in each species. This pipeline enables pilot surveys or exploration of DNA rearrangements in species with limited DNA material access, thereby providing new insights into the evolution of chromosome rearrangements. 
    more » « less
  4. Background

    Metagenomics has transformed our understanding of microbial diversity across ecosystems, with recent advances enablingde novoassembly of genomes from metagenomes. These metagenome-assembled genomes are critical to provide ecological, evolutionary, and metabolic context for all the microbes and viruses yet to be cultivated. Metagenomes can now be generated from nanogram to subnanogram amounts of DNA. However, these libraries require several rounds of PCR amplification before sequencing, and recent data suggest these typically yield smaller and more fragmented assemblies than regular metagenomes.


    Here we evaluatede novoassembly methods of 169 PCR-amplified metagenomes, including 25 for which an unamplified counterpart is available, to optimize specific assembly approaches for PCR-amplified libraries. We first evaluated coverage bias by mapping reads from PCR-amplified metagenomes onto reference contigs obtained from unamplified metagenomes of the same samples. Then, we compared different assembly pipelines in terms of assembly size (number of bp in contigs ≥ 10 kb) and error rates to evaluate which are the best suited for PCR-amplified metagenomes.


    Read mapping analyses revealed that the depth of coverage within individual genomes is significantly more uneven in PCR-amplified datasets versus unamplified metagenomes, with regions of high depth of coverage enriched in short inserts. This enrichment scales with the number of PCR cycles performed, and is presumably due to preferential amplification of short inserts. Standard assembly pipelines are confounded by this type of coverage unevenness, so we evaluated other assembly options to mitigate these issues. We found that a pipeline combining read deduplication and an assembly algorithm originally designed to recover genomes from libraries generated after whole genome amplification (single-cell SPAdes) frequently improved assembly of contigs ≥10 kb by 10 to 100-fold for low input metagenomes.


    PCR-amplified metagenomes have enabled scientists to explore communities traditionally challenging to describe, including some with extremely low biomass or from which DNA is particularly difficult to extract. Here we show that a modified assembly pipeline can lead to an improvedde novogenome assembly from PCR-amplified datasets, and enables a better genome recovery from low input metagenomes.

    more » « less
  5. Abstract

    The spiral gingers (Costus L.) are a pantropical genus of herbaceous perennial monocots; the Neotropical clade of Costus radiated rapidly in the past few million years into over 60 species. The Neotropical spiral gingers have a rich history of evolutionary and ecological research that can motivate and inform modern genetic investigations. Here, we present the first 2 chromosome-level genome assemblies in the genus, for C. pulverulentus and C. lasius, and briefly compare their synteny. We assembled the C. pulverulentus genome from a combination of short-read data, Chicago and Dovetail Hi-C chromatin-proximity sequencing, and alignment with a linkage map. We annotated the genome by mapping a C. pulverulentus transcriptome and querying mapped transcripts against a protein database. We assembled the C. lasius genome with Pacific Biosciences HiFi long reads and alignment to the C. pulverulentus genome. These 2 assemblies are the first published genomes for non-cultivated tropical plants. These genomes solidify the spiral gingers as a model system and will facilitate research on the poorly understood genetic basis of tropical plant diversification.

    more » « less