skip to main content


Title: Improved Reference Genome for Cyclotella cryptica CCMP332, a Model for Cell Wall Morphogenesis, Salinity Adaptation, and Lipid Production in Diatoms (Bacillariophyta)
Abstract The diatom, Cyclotella cryptica, is a well-established model species for physiological studies and biotechnology applications of diatoms. To further facilitate its use as a model diatom, we report an improved reference genome assembly and annotation for C. cryptica strain CCMP332. We used a combination of long- and short-read sequencing to assemble a high-quality and contaminant-free genome. The genome is 171 Mb in size and consists of 662 scaffolds with a scaffold N50 of 494 kb. This represents a 176-fold decrease in scaffold number and 41-fold increase in scaffold N50 compared to the previous assembly. The genome contains 21,250 predicted genes, 75% of which were assigned putative functions. Repetitive DNA comprises 59% of the genome, and an improved classification of repetitive elements indicated that a historically steady accumulation of transposable elements has contributed to the relatively large size of the C. cryptica genome. The high-quality C. cryptica genome will serve as a valuable reference for ecological, genetic, and biotechnology studies of diatoms.  more » « less
Award ID(s):
1651087
NSF-PAR ID:
10231024
Author(s) / Creator(s):
; ; ; ;
Date Published:
Journal Name:
G3 Genes|Genomes|Genetics
Volume:
10
Issue:
9
ISSN:
2160-1836
Page Range / eLocation ID:
2965 to 2974
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract Background Calcareous outcrops, rocky areas composed of calcium carbonate (CaCO 3 ), often host a diverse, specialized, and threatened biomineralizing fauna. Despite the repeated evolution of physiological and morphological adaptations to colonize these mineral rich substrates, there is a lack of genomic resources for calcareous rock endemic species. This has hampered our ability to understand the genomic mechanisms underlying calcareous rock specialization and manage these threatened species. Results Here, we present a new draft genome assembly of the threatened limestone endemic land snail Oreohelix idahoensis and genome skim data for two other Oreohelix species. The O. idahoensis genome assembly (scaffold N50: 404.19 kb; 86.6% BUSCO genes) is the largest (~ 5.4 Gb) and most repetitive mollusc genome assembled to date (85.74% assembly size). The repetitive landscape was unusually dominated by an expansion of long terminal repeat (LTR) transposable elements (57.73% assembly size) which have shaped the evolution genome size, gene composition through retrotransposition of host genes, and ectopic recombination. Genome skims revealed repeat content is more than 2–3 fold higher in limestone endemic O. idahoensis compared to non-calcareous Oreohelix species. Gene family size analysis revealed stress and biomineralization genes have expanded significantly in the O. idahoensis genome . Conclusions Hundreds of threatened land snail species are endemic to calcareous rock regions but there are very few genomic resources available to guide their conservation or determine the genomic architecture underlying CaCO 3 resource specialization. Our study provides one of the first high quality draft genomes of a calcareous rock endemic land snail which will serve as a foundation for the conservation genomics of this threatened species and for other groups. The high proportion and activity of LTRs in the O. idahoensis genome is unprecedented in molluscan genomics and sheds new light how transposable element content can vary across molluscs. The genomic resources reported here will enable further studies of the genomic mechanisms underlying calcareous rock specialization and the evolution of transposable element content across molluscs. 
    more » « less
  2. Abstract

    The parasitoid wasp Venturia canescens is an important biological control agent of stored products moth pests and serves as a model to study the function and evolution of domesticated endogenous viruses (DEVs). The DEVs discovered in V. canescens are known as virus-like particles (VcVLPs), which are produced using nudivirus-derived components and incorporate wasp-derived virulence proteins instead of packaged nucleic acids. Previous studies of virus-derived components in the V. canescens genome identified 53 nudivirus-like genes organized in six gene clusters and several viral pseudogenes, but how VcVLP genes are organized among wasp chromosomes following their integration in the ancestral wasp genome is largely unknown. Here, we present a chromosomal scale genome of V. canescens consisting of 11 chromosomes and 56 unplaced small scaffolds. The genome size is 290.8 Mbp with a N50 scaffold size of 24.99 Mbp. A high-quality gene set including 11,831 protein-coding genes were produced using RNA-Seq data as well as publicly available peptide sequences from related Hymenoptera. A manual annotation of genes of viral origin produced 61 intact and 19 pseudogenized nudivirus-derived genes. The genome assembly revealed that two previously identified clusters were joined into a single cluster and a total of 5 gene clusters comprising of 60 intact nudivirus-derived genes were located in three chromosomes. In contrast, pseudogenes are dispersed among 8 chromosomes with only 4 pseudogenes associated with nudivirus gene clusters. The architecture of genes encoding VcVLP components suggests it originates from a recent virus acquisition and there is a link between the processes of dispersal and pseudogenization. This high-quality genome assembly and annotation represents the first chromosome-scale assembly for parasitoid wasps associated with VLPs, and is publicly available in the National Center for Biotechnology Information Genome and RefSeq databases, providing a valuable resource for future studies of DEVs in parasitoid wasps.

     
    more » « less
  3. Wheat, Christopher (Ed.)
    Abstract

    Echinometra lucunter, the rock-boring sea urchin, is a widely distributed echinoid and a model for ecological studies of reproduction, responses to climate change, and speciation. We present a near chromosome-level genome assembly of E. lucunter, including 21 scaffolds larger than 10 Mb predicted to represent each of the chromosomes of the species. The 760.4 Mb assembly includes a scaffold N50 of 30.0 Mb and BUSCO (benchmarking universal single-copy orthologue) single copy and a duplicated score of 95.8% and 1.4%, respectively. Ab-initio gene model prediction and annotation with transcriptomic data constructed 33,989 gene models composing 50.4% of the assembly, including 37,036 transcripts. Repetitive elements make up approximately 39.6% of the assembly, and unresolved gap sequences are estimated to be 0.65%. Whole genome alignment with Echinometra sp. EZ revealed high synteny and conservation between the two species, further bolstering Echinometra as an emerging genus for comparative genomics studies. This genome assembly represents a high-quality genomic resource for future evolutionary and developmental studies of this species and more broadly of echinoderms.

     
    more » « less
  4. Abstract Comparisons of high-quality, reference butterfly, and moth genomes have been instrumental to advancing our understanding of how hybridization, and natural selection drive genomic change during the origin of new species and novel traits. Here, we present a genome assembly of the Southern Dogface butterfly, Zerene cesonia (Pieridae) whose brilliant wing colorations have been implicated in developmental plasticity, hybridization, sexual selection, and speciation. We assembled 266,407,278 bp of the Z. cesonia genome, which accounts for 98.3% of the estimated 271 Mb genome size. Using a hybrid approach involving Chicago libraries with Hi-Rise assembly and a diploid Meraculous assembly, the final haploid genome was assembled. In the final assembly, nearly all autosomes and the Z chromosome were assembled into single scaffolds. The largest 29 scaffolds accounted for 91.4% of the genome assembly, with the remaining ∼8% distributed among another 247 scaffolds and overall N50 of 9.2 Mb. Tissue-specific RNA-seq informed annotations identified 16,442 protein-coding genes, which included 93.2% of the arthropod Benchmarking Universal Single-Copy Orthologs (BUSCO). The Z. cesonia genome assembly had ∼9% identified as repetitive elements, with a transposable element landscape rich in helitrons. Similar to other Lepidoptera genomes, Z. cesonia showed a high conservation of chromosomal synteny. The Z. cesonia assembly provides a high-quality reference for studies of chromosomal arrangements in the Pierid family, as well as for population, phylo, and functional genomic studies of adaptation and speciation. 
    more » « less
  5. Abstract

    The brown bear (Ursus arctos) is the second largest and most widespread extant terrestrial carnivore on Earth and has recently emerged as a medical model for human metabolic diseases. Here, we report a fully phased chromosome-level assembly of a male North American brown bear built by combining Pacific Biosciences (PacBio) HiFi data and publicly available Hi-C data. The final genome size is 2.47 Gigabases (Gb) with a scaffold and contig N50 length of 70.08 and 43.94 Megabases (Mb), respectively. Benchmarking Universal Single-Copy Ortholog (BUSCO) analysis revealed that 94.5% of single copy orthologs from Mammalia were present in the genome (the highest of any ursid genome to date). Repetitive elements accounted for 44.48% of the genome and a total of 20,480 protein coding genes were identified. Based on whole genome alignment to the polar bear, the brown bear is highly syntenic with the polar bear, and our phylogenetic analysis of 7,246 single-copy orthologs supports the currently proposed species tree for Ursidae. This highly contiguous genome assembly will support future research on both the evolutionary history of the bear family and the physiological mechanisms behind hibernation, the latter of which has broad medical implications.

     
    more » « less