skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.

Attention:

The NSF Public Access Repository (PAR) system and access will be unavailable from 10:00 PM ET on Thursday, February 12 until 1:00 AM ET on Friday, February 13 due to maintenance. We apologize for the inconvenience.


Title: Analyses of 600+ insect genomes reveal repetitive element dynamics and highlight biodiversity-scale repeat annotation challenges
Repetitive elements (REs) are integral to the composition, structure, and function of eukaryotic genomes, yet remain understudied in most taxonomic groups. We investigated REs across 601 insect species and report wide variation in RE dynamics across groups. Analysis of associations between REs and protein-coding genes revealed dynamic evolution at the interface between REs and coding regions across insects, including notably elevated RE–gene associations in lineages with abundant long interspersed nuclear elements (LINEs). We leveraged this large, empirical data set to quantify impacts of long-read technology on RE detection and investigate fundamental challenges to RE annotation in diverse groups. In long-read assemblies, we detected ∼36% more REs than short-read assemblies, with long terminal repeats (LTRs) showing 162% increased detection, whereas DNA transposons and LINEs showed less respective technology-related bias. In most insect lineages, 25%–85% of repetitive sequences were “unclassified” following automated annotation, compared with only ∼13% inDrosophilaspecies. Although the diversity of available insect genomes has rapidly expanded, we show the rate of community contributions to RE databases has not kept pace, preventing efficient annotation and high-resolution study of REs in most groups. We highlight the tremendous opportunity and need for the biodiversity genomics field to embrace REs and suggest collective steps for making progress toward this goal.  more » « less
Award ID(s):
2312253
PAR ID:
10489112
Author(s) / Creator(s):
; ; ; ; ; ; ; ;
Publisher / Repository:
Genome Research
Date Published:
Journal Name:
Genome Research
Volume:
33
Issue:
10
ISSN:
1088-9051
Page Range / eLocation ID:
1708 to 1717
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract The symbiosis between clownfish and giant tropical sea anemones (Order Actiniaria) is one of the most iconic on the planet. Distributed on tropical reefs, 28 species of clownfishes form obligate mutualistic relationships with 10 nominal species of venomous sea anemones. Our understanding of the symbiosis is limited by the fact that most research has been focused on the clownfishes. Chromosome scale reference genomes are available for all clownfish species, yet there are no published reference genomes for the host sea anemones. Recent studies have shown that the clownfish-hosting sea anemones belong to three distinct clades of sea anemones that have evolved symbiosis with clownfishes independently. Here we present the first high quality long read assemblies for three species of clownfish hosting sea anemones belonging to each of these clades:Entacmaea quadricolor, Stichodactyla haddoni, Radianthus doreensis. PacBio HiFi sequencing yielded 1,597,562, 3,101,773, and 1,918,148 million reads forE. quadricolor, S. haddoni, andR. doreensis, respectively. All three assemblies were highly contiguous and complete with N50 values above 4Mb and BUSCO completeness above 95% on the Metazoa dataset. Genome structural annotation with BRAKER3 predicted 20,454, 18,948 and 17,056 protein coding genes inE. quadricolor, S. haddoniandR. doreeensisgenome, respectively. These new resources will form the basis of comparative genomic analyses that will allow us to deepen our understanding of this mutualism from the host perspective. SignificanceChromosome-scale genomes are available for all 28 clownfish species yet there are no high-quality reference genomes published for the clownfish-hosting sea anemones. The lack of genomic resources impedes our ability to understand evolution of this iconic symbiosis from the host perspective. The clownfish-hosting sea anemones belong to three clades of sea anemones that have evolved mutualism with clownfish independently. Here we assembled the first high-quality long-read genomes for three species of host sea anemones each belonging to a different host clade:Entacmaea quadricolor, Stichodactyla haddoni, Radianthus doreensis. These resources will enable in depth comparative genomics of clownfish-hosting sea anemones providing a critical perspective for understanding how the symbiosis has evolved. Finally, these reference genomes present a significant increase in the number of high-quality long-read genome assemblies for sea anemones (11 currently published) and double the number of high-quality reference genomes for the sea anemone superfamily Actinoidea. 
    more » « less
  2. Hodgins, Kathryn (Ed.)
    Abstract Antifreeze proteins (AFPs) have enabled teleost fishes to repeatedly colonize polar seas. Four AFP types have convergently evolved in several fish lineages. AFPs inhibit ice crystal growth and lower tissue freezing point. In lineages with AFPs, species inhabiting colder environments may possess more AFP copies. Elucidating how differences in AFP copy number evolve is challenging due to the genes’ tandem array structure and consequently poor resolution of these repetitive regions. Here, we explore the evolution of type III AFPs (AFP III) in the globally distributed suborder Zoarcoidei, leveraging six new long-read genome assemblies. Zoarcoidei has fewer genomic resources relative to other polar fish clades while it is one of the few groups of fishes adapted to both the Arctic and Southern Oceans. Combining these new assemblies with additional long-read genomes available for Zoarcoidei, we conducted a comprehensive phylogenetic test of AFP III evolution and modeled the effects of thermal habitat and depth on AFP III gene family evolution. We confirm a single origin of AFP III via neofunctionalization of the enzyme sialic acid synthase B. We also show that AFP copy number increased under low temperature but decreased with depth, potentially because pressure lowers freezing point. Associations between the environment and AFP III copy number were driven by duplications of paralogs that were translocated out of the ancestral locus at which AFP III arose. Our results reveal novel environmental effects on AFP evolution and demonstrate the value of high-quality genomic resources for studying how structural genomic variation shapes convergent adaptation. 
    more » « less
  3. null (Ed.)
    Abstract High-quality and complete reference genome assemblies are fundamental for the application of genomics to biology, disease, and biodiversity conservation. However, such assemblies are available for only a few non-microbial species 1–4 . To address this issue, the international Genome 10K (G10K) consortium 5,6 has worked over a five-year period to evaluate and develop cost-effective methods for assembling highly accurate and nearly complete reference genomes. Here we present lessons learned from generating assemblies for 16 species that represent six major vertebrate lineages. We confirm that long-read sequencing technologies are essential for maximizing genome quality, and that unresolved complex repeats and haplotype heterozygosity are major sources of assembly error when not handled correctly. Our assemblies correct substantial errors, add missing sequence in some of the best historical reference genomes, and reveal biological discoveries. These include the identification of many false gene duplications, increases in gene sizes, chromosome rearrangements that are specific to lineages, a repeated independent chromosome breakpoint in bat genomes, and a canonical GC-rich pattern in protein-coding genes and their regulatory regions. Adopting these lessons, we have embarked on the Vertebrate Genomes Project (VGP), an international effort to generate high-quality, complete reference genomes for all of the roughly 70,000 extant vertebrate species and to help to enable a new era of discovery across the life sciences. 
    more » « less
  4. Improvements in DNA sequencing technology and computational methods have led to a substantial increase in the creation of high-quality genome assemblies of many species. To understand the biology of these genomes, annotation of gene features and other functional elements is essential; however for most species, only the reference genome is well-annotated. One strategy to annotate new or improved genome assemblies is to map or ‘lift over’ the genes from a previously-annotated reference genome. Here we describe Liftoff, a new genome annotation lift-over tool capable of mapping genes between two assemblies of the same or closely-related species. Liftoff aligns genes from a reference genome to a target genome and finds the mapping that maximizes sequence identity while preserving the structure of each exon, transcript, and gene. We show that Liftoff can accurately map 99.9% of genes between two versions of the human reference genome with an average sequence identity >99.9%. We also show that Liftoff can map genes across species by successfully lifting over 98.4% of human protein-coding genes to a chimpanzee genome assembly with 98.7% sequence identity. Availability The source code for Liftoff is available at https://github.com/agshumate/Liftoff 
    more » « less
  5. IntroductionEukaryotic life depends on the functional elements encoded by both the nuclear genome and organellar genomes, such as those contained within the mitochondria. The content, size, and structure of the mitochondrial genome varies across organisms with potentially large implications for phenotypic variance and resulting evolutionary trajectories. Among yeasts in the subphylum Saccharomycotina, extensive differences have been observed in various species relative to the model yeastSaccharomyces cerevisiae, but mitochondrial genome sampling across many groups has been scarce, even as hundreds of nuclear genomes have become available. MethodsBy extracting mitochondrial assemblies from existing short-read genome sequence datasets, we have greatly expanded both the number of available genomes and the coverage across sparsely sampled clades. ResultsComparison of 353 yeast mitochondrial genomes revealed that, while size and GC content were fairly consistent across species, those in the generaMetschnikowiaandSaccharomycestrended larger, while several species in the order Saccharomycetales, which includesS. cerevisiae, exhibited lower GC content. Extreme examples for both size and GC content were scattered throughout the subphylum. All mitochondrial genomes shared a core set of protein-coding genes for Complexes III, IV, and V, but they varied in the presence or absence of mitochondrially-encoded canonical Complex I genes. We traced the loss of Complex I genes to a major event in the ancestor of the orders Saccharomycetales and Saccharomycodales, but we also observed several independent losses in the orders Phaffomycetales, Pichiales, and Dipodascales. In contrast to prior hypotheses based on smaller-scale datasets, comparison of evolutionary rates in protein-coding genes showed no bias towards elevated rates among aerobically fermenting (Crabtree/Warburg-positive) yeasts. Mitochondrial introns were widely distributed, but they were highly enriched in some groups. The majority of mitochondrial introns were poorly conserved within groups, but several were shared within groups, between groups, and even across taxonomic orders, which is consistent with horizontal gene transfer, likely involving homing endonucleases acting as selfish elements. DiscussionAs the number of available fungal nuclear genomes continues to expand, the methods described here to retrieve mitochondrial genome sequences from these datasets will prove invaluable to ensuring that studies of fungal mitochondrial genomes keep pace with their nuclear counterparts. 
    more » « less