skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Long-read HiFi sequencing correctly assembles repetitive heavy fibroin silk genes in new moth and caddisfly genomes
Insect silk is a versatile biomaterial. Lepidoptera and Trichoptera display some of the most diverse uses of silk, with varying strength, adhesive qualities, and elastic properties. Silk fibroin genes are long (>20 Kbp), with many repetitive motifs that make them challenging to sequence. Most research thus far has focused on conserved N- and C-terminal regions of fibroin genes because a full comparison of repetitive regions across taxa has not been possible. Using the PacBio Sequel II system and SMRT sequencing, we generated high fidelity (HiFi) long-read genomic and transcriptomic sequences for the Indianmeal moth (Plodia interpunctella) and genomic sequences for the caddisfly Eubasilissa regina. Both genomes were highly contiguous (N50  = 9.7 Mbp/32.4 Mbp, L50  = 13/11) and complete (BUSCO complete  = 99.3%/95.2%), with complete and contiguous recovery of silk heavy fibroin gene sequences. We show that HiFi long-read sequencing is helpful for understanding genes with long, repetitive regions.  more » « less
Award ID(s):
2217159
PAR ID:
10438946
Author(s) / Creator(s):
; ; ; ; ; ; ; ; ; ; ; ; ; ; ;
Date Published:
Journal Name:
Gigabyte
Volume:
2022
ISSN:
2709-4715
Page Range / eLocation ID:
1 to 14
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Arthropod silk is vital to the evolutionary success of hundreds of thousands of species. The primary proteins in silks are often encoded by long, repetitive gene sequences. Until recently, sequencing and assembling these complex gene sequences has proven intractable given their repetitive structure. Here, using high-quality long-read sequencing, we show that there is extensive variation—both in terms of length and repeat motif order—between alleles of silk genes within individual arthropods. Further, this variation exists across two deep, independent origins of silk which diverged more than 500 Mya: the insect clade containing caddisflies and butterflies and spiders. This remarkable convergence in previously overlooked patterns of allelic variation across multiple origins of silk suggests common mechanisms for the generation and maintenance of structural protein-coding genes. Future genomic efforts to connect genotypes to phenotypes should account for such allelic variation. 
    more » « less
  2. Abstract We present the first long-read de novo assembly and annotation of the luna moth (Actias luna) and provide the full characterization of heavy chain fibroin (h-fibroin), a long and highly repetitive gene (>20 kb) essential in silk fiber production. There are >160,000 described species of moths and butterflies (Lepidoptera), but only within the last 5 years have we begun to recover high-quality annotated whole genomes across the order that capture h-fibroin. Using PacBio HiFi reads, we produce the first high-quality long-read reference genome for this species. The assembled genome has a length of 532 Mb, a contig N50 of 16.8 Mb, an L50 of 14 contigs, and 99.4% completeness (BUSCO). Our annotation using Bombyx mori protein and A. luna RNAseq evidence captured a total of 20,866 genes at 98.9% completeness with 10,267 functionally annotated proteins and a full-length h-fibroin annotation of 2,679 amino acid residues. 
    more » « less
  3. Macqueen, D (Ed.)
    Abstract Spider silks are renowned for their high-performance mechanical properties. Contributing to these properties are proteins encoded by the spidroin (spider fibroin) gene family. Spidroins have been discovered mostly through cDNA studies of females based on the presence of conserved terminal regions and a repetitive central region. Recently, genome sequencing of the golden orb-web weaver, Trichonephila clavipes, provided a complete picture of spidroin diversity. Here, we refine the annotation of T. clavipes spidroin genes including the reclassification of some as non-spidroins. We rename these non-spidroins as spidroin-like (SpL) genes because they have repetitive sequences and amino acid compositions like spidroins, but entirely lack the archetypal terminal domains of spidroins. Insight into the function of these spidroin and SpL genes was then examined through tissue- and sex-specific gene expression studies. Using qPCR, we show that some silk genes are upregulated in male silk glands compared to females, despite males producing less silk in general. We also find that an enigmatic spidroin that lacks a spidroin C-terminal domain is highly expressed in silk glands, suggesting that spidroins could assemble into fibers without a canonical terminal region. Further, we show that two SpL genes are expressed in silk glands, with one gene highly evolutionarily conserved across species, providing evidence that particular SpL genes are important to silk production. Together, these findings challenge long-standing paradigms regarding the evolutionary and functional significance of the proteins and conserved motifs essential for producing spider silks. 
    more » « less
  4. Abstract The fish order Syngnathiformes has been referred to as a collection of misfit fishes, comprising commercially important fish such as red mullets as well as the highly diverse seahorses, pipefishes, and seadragons—the well-known family Syngnathidae, with their unique adaptations including male pregnancy. Another ornate member of this order is the species mandarinfish. No less than two types of chromatophores have been discovered in the spectacularly colored mandarinfish: the cyanophore (producing blue color) and the dichromatic cyano-erythrophore (producing blue and red). The phylogenetic position of mandarinfish in Syngnathiformes, and their promise of additional genetic discoveries beyond the chromatophores, made mandarinfish an appealing target for whole-genome sequencing. We used linked sequences to create synthetic long reads, producing a highly contiguous genome assembly for the mandarinfish. The genome assembly comprises 483 Mbp (longest scaffold 29 Mbp), has an N50 of 12 Mbp, and an L50 of 14 scaffolds. The assembly completeness is also high, with 92.6% complete, 4.4% fragmented, and 2.9% missing out of 4584 BUSCO genes found in ray-finned fishes. Outside the family Syngnathidae, the mandarinfish represents one of the most contiguous syngnathiform genome assemblies to date. The mandarinfish genomic resource will likely serve as a high-quality outgroup to syngnathid fish, and furthermore for research on the genomic underpinnings of the evolution of novel pigmentation. 
    more » « less
  5. Abstract Mobile element insertions (MEIs) are repetitive genomic sequences that contribute to genetic variation and can lead to genetic disorders. Targeted and whole-genome approaches using short-read sequencing have been developed to identify reference and non-reference MEIs; however, the read length hampers detection of these elements in complex genomic regions. Here, we pair Cas9-targeted nanopore sequencing with computational methodologies to capture active MEIs in human genomes. We demonstrate parallel enrichment for distinct classes of MEIs, averaging 44% of reads on-targeted signals and exhibiting a 13.4-54x enrichment over whole-genome approaches. We show an individual flow cell can recover most MEIs (97% L1Hs, 93% Alu Yb, 51% Alu Ya, 99% SVA_F, and 65% SVA_E). We identify seventeen non-reference MEIs in GM12878 overlooked by modern, long-read analysis pipelines, primarily in repetitive genomic regions. This work introduces the utility of nanopore sequencing for MEI enrichment and lays the foundation for rapid discovery of elusive, repetitive genetic elements. 
    more » « less