skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Effect of sequence depth and length in long-read assembly of the maize inbred NC358
Abstract Improvements in long-read data and scaffolding technologies have enabled rapid generation of reference-quality assemblies for complex genomes. Still, an assessment of critical sequence depth and read length is important for allocating limited resources. To this end, we have generated eight assemblies for the complex genome of the maize inbred line NC358 using PacBio datasets ranging from 20 to 75 × genomic depth and with N50 subread lengths of 11–21 kb. Assemblies with ≤30 × depth and N50 subread length of 11 kb are highly fragmented, with even low-copy genic regions showing degradation at 20 × depth. Distinct sequence-quality thresholds are observed for complete assembly of genes, transposable elements, and highly repetitive genomic features such as telomeres, heterochromatic knobs, and centromeres. In addition, we show high-quality optical maps can dramatically improve contiguity in even our most fragmented base assembly. This study provides a useful resource allocation reference to the community as long-read technologies continue to mature.  more » « less
Award ID(s):
1744001
PAR ID:
10226860
Author(s) / Creator(s):
; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; more » ; ; ; ; « less
Publisher / Repository:
Nature Publishing Group
Date Published:
Journal Name:
Nature Communications
Volume:
11
Issue:
1
ISSN:
2041-1723
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract We present the first long-read de novo assembly and annotation of the luna moth (Actias luna) and provide the full characterization of heavy chain fibroin (h-fibroin), a long and highly repetitive gene (>20 kb) essential in silk fiber production. There are >160,000 described species of moths and butterflies (Lepidoptera), but only within the last 5 years have we begun to recover high-quality annotated whole genomes across the order that capture h-fibroin. Using PacBio HiFi reads, we produce the first high-quality long-read reference genome for this species. The assembled genome has a length of 532 Mb, a contig N50 of 16.8 Mb, an L50 of 14 contigs, and 99.4% completeness (BUSCO). Our annotation using Bombyx mori protein and A. luna RNAseq evidence captured a total of 20,866 genes at 98.9% completeness with 10,267 functionally annotated proteins and a full-length h-fibroin annotation of 2,679 amino acid residues. 
    more » « less
  2. Hoffmann, Federico (Ed.)
    Abstract The first insect genome assembly (Drosophila melanogaster) was published two decades ago. Today, nuclear genome assemblies are available for a staggering 601 insect species representing 20 orders. In this study, we analyzed the most-contiguous assembly for each species and provide a “state-of-the-field” perspective, emphasizing taxonomic representation, assembly quality, gene completeness, and sequencing technologies. Relative to species richness, genomic efforts have been biased toward four orders (Diptera, Hymenoptera, Collembola, and Phasmatodea), Coleoptera are underrepresented, and 11 orders still lack a publicly available genome assembly. The average insect genome assembly is 439.2 Mb in length with 87.5% of single-copy benchmarking genes intact. Most notable has been the impact of long-read sequencing; assemblies that incorporate long reads are ∼48× more contiguous than those that do not. We offer four recommendations as we collectively continue building insect genome resources: 1) seek better integration between independent research groups and consortia, 2) balance future sampling between filling taxonomic gaps and generating data for targeted questions, 3) take advantage of long-read sequencing technologies, and 4) expand and improve gene annotations. 
    more » « less
  3. Abstract The symbiosis between clownfish and giant tropical sea anemones (Order Actiniaria) is one of the most iconic on the planet. Distributed on tropical reefs, 28 species of clownfishes form obligate mutualistic relationships with 10 nominal species of venomous sea anemones. Our understanding of the symbiosis is limited by the fact that most research has been focused on the clownfishes. Chromosome scale reference genomes are available for all clownfish species, yet there are no published reference genomes for the host sea anemones. Recent studies have shown that the clownfish-hosting sea anemones belong to three distinct clades of sea anemones that have evolved symbiosis with clownfishes independently. Here we present the first high quality long read assemblies for three species of clownfish hosting sea anemones belonging to each of these clades:Entacmaea quadricolor, Stichodactyla haddoni, Radianthus doreensis. PacBio HiFi sequencing yielded 1,597,562, 3,101,773, and 1,918,148 million reads forE. quadricolor, S. haddoni, andR. doreensis, respectively. All three assemblies were highly contiguous and complete with N50 values above 4Mb and BUSCO completeness above 95% on the Metazoa dataset. Genome structural annotation with BRAKER3 predicted 20,454, 18,948 and 17,056 protein coding genes inE. quadricolor, S. haddoniandR. doreeensisgenome, respectively. These new resources will form the basis of comparative genomic analyses that will allow us to deepen our understanding of this mutualism from the host perspective. SignificanceChromosome-scale genomes are available for all 28 clownfish species yet there are no high-quality reference genomes published for the clownfish-hosting sea anemones. The lack of genomic resources impedes our ability to understand evolution of this iconic symbiosis from the host perspective. The clownfish-hosting sea anemones belong to three clades of sea anemones that have evolved mutualism with clownfish independently. Here we assembled the first high-quality long-read genomes for three species of host sea anemones each belonging to a different host clade:Entacmaea quadricolor, Stichodactyla haddoni, Radianthus doreensis. These resources will enable in depth comparative genomics of clownfish-hosting sea anemones providing a critical perspective for understanding how the symbiosis has evolved. Finally, these reference genomes present a significant increase in the number of high-quality long-read genome assemblies for sea anemones (11 currently published) and double the number of high-quality reference genomes for the sea anemone superfamily Actinoidea. 
    more » « less
  4. null (Ed.)
    The blue crab, Callinectes sapidus (Rathbun, 1896) is an economically, culturally, and ecologically important species found across the temperate and tropical North and South American Atlantic coast. A reference genome will enable research for this high-value species. Initial assembly combined 200× coverage Illumina paired-end reads, a 60× 8 kb mate-paired library, and 50× PacBio data using the MaSuRCA assembler resulting in a 985 Mb assembly with a scaffold N50 of 153 kb. Dovetail Chicago and HiC sequencing with the 3d DNA assembler and Juicebox assembly tools were then used for chromosome scaffolding. The 50 largest scaffolds span 810 Mb are 1.5–37 Mb long and have a repeat content of 36%. The 190 Mb unplaced sequence is in 3921 sequences over 10 kb with a repeat content of 68%. The final assembly N50 is 18.9 Mb for scaffolds and 9317 bases for contigs. Of arthropod BUSCO, ∼88% (888/1013) were complete and single copies. Using 309 million RNAseq read pairs from 12 different tissues and developmental stages, 25,249 protein-coding genes were predicted. Between C. sapidus and Portunus trituberculatus genomes, 41 of 50 large scaffolds had high nucleotide identity and protein-coding synteny, but 9 scaffolds in both assemblies were not clear matches. The protein-coding genes included 9423 one-to-one putative orthologs, of which 7165 were syntenic between the two crab species. Overall, the two crab genome assemblies show strong similarities at the nucleotide, protein, and chromosome level and verify the blue crab genome as an excellent reference for this important seafood species. 
    more » « less
  5. Abstract Rapid technological improvements are democratizing access to high quality, chromosome-scale genome assemblies. No longer the domain of only the most highly studied model organisms, now non-traditional and emerging model species can be genome-enabled using a combination of sequencing technologies and assembly software. Consequently, old ideas built on sparse sampling across the tree of life have recently been amended in the face of genomic data drawn from a growing number of high-quality reference genomes. Arguably the most valuable are those long-studied species for which much is already known about their biology; what many term emerging model species. Here, we report a highly complete chromosome-scale genome assembly for the brown anole,Anolis sagrei– a lizard species widely studied across a variety of disciplines and for which a high-quality reference genome was long overdue. This assembly exceeds the vast majority of existing reptile and snake genomes in contiguity (N50 = 253.6 Mb) and annotation completeness. Through the analysis of this genome and population resequence data, we examine the history of repetitive element accumulation, identify the X chromosome, and propose a hypothesis for the evolutionary history of fusions between autosomes and the X that led to the sex chromosomes ofA. sagrei. 
    more » « less