This content will become publicly available on December 1, 2024
Mitochondrial genomes are known for their compact size and conserved gene order, however, recent studies employing long-read sequencing technologies have revealed the presence of atypical mitogenomes in some species. In this study, we assembled and annotated the mitogenomes of five Antarctic notothenioids, including four icefishes (
- Award ID(s):
- NSF-PAR ID:
- Publisher / Repository:
- Date Published:
- Journal Name:
- Scientific Reports
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
Abstract Background Modern sequencing technologies should make the assembly of the relatively small mitochondrial genomes an easy undertaking. However, few tools exist that address mitochondrial assembly directly. Results As part of the Vertebrate Genomes Project (VGP) we develop mitoVGP, a fully automated pipeline for similarity-based identification of mitochondrial reads and de novo assembly of mitochondrial genomes that incorporates both long (> 10 kbp, PacBio or Nanopore) and short (100–300 bp, Illumina) reads. Our pipeline leads to successful complete mitogenome assemblies of 100 vertebrate species of the VGP. We observe that tissue type and library size selection have considerable impact on mitogenome sequencing and assembly. Comparing our assemblies to purportedly complete reference mitogenomes based on short-read sequencing, we identify errors, missing sequences, and incomplete genes in those references, particularly in repetitive regions. Our assemblies also identify novel gene region duplications. The presence of repeats and duplications in over half of the species herein assembled indicates that their occurrence is a principle of mitochondrial structure rather than an exception, shedding new light on mitochondrial genome evolution and organization. Conclusions Our results indicate that even in the “simple” case of vertebrate mitogenomes the completeness of many currently available reference sequences can be further improved, and caution should be exercised before claiming the complete assembly of a mitogenome, particularly from short reads alone.more » « less
Long‐read sequencing is driving a new reality for genome science in which highly contiguous assemblies can be produced efficiently with modest resources. Genome assemblies from long‐read sequences are particularly exciting for understanding the evolution of complex genomic regions that are often difficult to assemble. In this study, we utilized long‐read sequencing data to generate a high‐quality genome assembly for an Antarctic eelpout,
Ophthalmolycus amberensis, the first for the globally distributed family Zoarcidae. We used this assembly to understand how O. amberensishas adapted to the harsh Southern Ocean and compared it to another group of Antarctic fishes: the notothenioids. We showed that selection has largely acted on different targets in eelpouts relative to notothenioids. However, we did find some overlap; in both groups, genes involved in membrane structure, thermal tolerance and vision have evidence of positive selection. We found evidence for historical shifts of transposable element activity in O. amberensisand other polar fishes, perhaps reflecting a response to environmental change. We were specifically interested in the evolution of two complex genomic loci known to underlie key adaptations to polar seas: haemoglobin and antifreeze proteins (AFPs). We observed unique evolution of the haemoglobin MN cluster in eelpouts and related fishes in the suborder Zoarcoidei relative to other Perciformes. For AFPs, we identified the first species in the suborder with no evidence of afpIIIsequences ( Cebidichthys violaceus) in the genomic region where they are found in all other Zoarcoidei, potentially reflecting a lineage‐specific loss of this cluster. Beyond polar fishes, our results highlight the power of long‐read sequencing to understand genome evolution.
Sequencing high molecular weight (HMW) DNA with long-read and linked-read technologies has promoted a major increase in more complete genome sequences for nonmodel organisms. Sequencing approaches that rely on HMW DNA have been limited to larger organisms or pools of multiple individuals, but recent advances have allowed for sequencing from individuals of small-bodied organisms. Here, we use HMW DNA sequencing with PacBio long reads and TELL-Seq linked reads to assemble and annotate the genome from a single individual feather louse (Brueelia nebulosa) from a European Starling (Sturnus vulgaris). We assembled a genome with a relatively high scaffold N50 (637 kb) and with BUSCO scores (96.1%) comparable to louse genomes assembled from pooled individuals. We annotated a number of genes (10,938) similar to the human louse (Pediculus humanus) genome. Additionally, calling phased variants revealed that the Brueelia genome is more heterozygous (∼1%) then expected for a highly obligate and dispersal-limited parasite. We also assembled and annotated the mitochondrial genome and primary endosymbiont (Sodalis) genome from the individual louse, which showed evidence for heteroplasmy in the mitogenome and a reduced genome size in the endosymbiont compared to its free-living relative. Our study is a valuable demonstration of the capability to obtain high-quality genomes from individual small, nonmodel organisms. Applying this approach to other organisms could greatly increase our understanding of the diversity and evolution of individual genomes.
INTRODUCTION Transposable elements (TEs), repeat expansions, and repeat-mediated structural rearrangements play key roles in chromosome structure and species evolution, contribute to human genetic variation, and substantially influence human health through copy number variants, structural variants, insertions, deletions, and alterations to gene transcription and splicing. Despite their formative role in genome stability, repetitive regions have been relegated to gaps and collapsed regions in human genome reference GRCh38 owing to the technological limitations during its development. The lack of linear sequence in these regions, particularly in centromeres, resulted in the inability to fully explore the repeat content of the human genome in the context of both local and regional chromosomal environments. RATIONALE Long-read sequencing supported the complete, telomere-to-telomere (T2T) assembly of the pseudo-haploid human cell line CHM13. This resource affords a genome-scale assessment of all human repetitive sequences, including TEs and previously unknown repeats and satellites, both within and outside of gaps and collapsed regions. Additionally, a complete genome enables the opportunity to explore the epigenetic and transcriptional profiles of these elements that are fundamental to our understanding of chromosome structure, function, and evolution. Comparative analyses reveal modes of repeat divergence, evolution, and expansion or contraction with locus-level resolution. RESULTS We implemented a comprehensive repeat annotation workflow using previously known human repeats and de novo repeat modeling followed by manual curation, including assessing overlaps with gene annotations, segmental duplications, tandem repeats, and annotated repeats. Using this method, we developed an updated catalog of human repetitive sequences and refined previous repeat annotations. We discovered 43 previously unknown repeats and repeat variants and characterized 19 complex, composite repetitive structures, which often carry genes, across T2T-CHM13. Using precision nuclear run-on sequencing (PRO-seq) and CpG methylated sites generated from Oxford Nanopore Technologies long-read sequencing data, we assessed RNA polymerase engagement across retroelements genome-wide, revealing correlations between nascent transcription, sequence divergence, CpG density, and methylation. These analyses were extended to evaluate RNA polymerase occupancy for all repeats, including high-density satellite repeats that reside in previously inaccessible centromeric regions of all human chromosomes. Moreover, using both mapping-dependent and mapping-independent approaches across early developmental stages and a complete cell cycle time series, we found that engaged RNA polymerase across satellites is low; in contrast, TE transcription is abundant and serves as a boundary for changes in CpG methylation and centromere substructure. Together, these data reveal the dynamic relationship between transcriptionally active retroelement subclasses and DNA methylation, as well as potential mechanisms for the derivation and evolution of new repeat families and composite elements. Focusing on the emerging T2T-level assembly of the HG002 X chromosome, we reveal that a high level of repeat variation likely exists across the human population, including composite element copy numbers that affect gene copy number. Additionally, we highlight the impact of repeats on the structural diversity of the genome, revealing repeat expansions with extreme copy number differences between humans and primates while also providing high-confidence annotations of retroelement transduction events. CONCLUSION The comprehensive repeat annotations and updated repeat models described herein serve as a resource for expanding the compendium of human genome sequences and reveal the impact of specific repeats on the human genome. In developing this resource, we provide a methodological framework for assessing repeat variation within and between human genomes. The exhaustive assessment of the transcriptional landscape of repeats, at both the genome scale and locally, such as within centromeres, sets the stage for functional studies to disentangle the role transcription plays in the mechanisms essential for genome stability and chromosome segregation. Finally, our work demonstrates the need to increase efforts toward achieving T2T-level assemblies for nonhuman primates and other species to fully understand the complexity and impact of repeat-derived genomic innovations that define primate lineages, including humans. Telomere-to-telomere assembly of CHM13 supports repeat annotations and discoveries. The human reference T2T-CHM13 filled gaps and corrected collapsed regions (triangles) in GRCh38. Combining long read–based methylation calls, PRO-seq, and multilevel computational methods, we provide a compendium of human repeats, define retroelement expression and methylation profiles, and delineate locus-specific sites of nascent transcription genome-wide, including previously inaccessible centromeres. SINE, short interspersed element; SVA, SINE–variable number tandem repeat– Alu ; LINE, long interspersed element; LTR, long terminal repeat; TSS, transcription start site; pA, xxxxxxxxxxxxxxxx.more » « less
The basal South American notothenioid Eleginops maclovinus (Patagonia blennie or róbalo) occupies a uniquely important phylogenetic position in Notothenioidei as the singular closest sister species to the Antarctic cryonotothenioid fishes. Its genome and the traits encoded therein would be the nearest representatives of the temperate ancestor from which the Antarctic clade arose, providing an ancestral reference for deducing polar derived changes. In this study, we generated a gene- and chromosome-complete assembly of the E. maclovinus genome using long read sequencing and HiC scaffolding. We compared its genome architecture with the more basally divergent Cottoperca gobio and the derived genomes of nine cryonotothenioids representing all five Antarctic families. We also reconstructed a notothenioid phylogeny using 2918 proteins of single-copy orthologous genes from these genomes that reaffirmed E. maclovinus’ phylogenetic position. We additionally curated E. maclovinus’ repertoire of circadian rhythm genes, ascertained their functionality by transcriptome sequencing, and compared its pattern of gene retention with C. gobio and the derived cryonotothenioids. Through reconstructing circadian gene trees, we also assessed the potential role of the retained genes in cryonotothenioids by referencing to the functions of the human orthologs. Our results found E. maclovinus to share greater conservation with the Antarctic clade, solidifying its evolutionary status as the direct sister and best suited ancestral proxy of cryonotothenioids. The high-quality genome of E. maclovinus will facilitate inquiries into cold derived traits in temperate to polar evolution, and conversely on the paths of readaptation to non-freezing habitats in various secondarily temperate cryonotothenioids through comparative genomic analyses.