skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.

Attention:

The NSF Public Access Repository (PAR) system and access will be unavailable from 11:00 PM ET on Friday, May 16 until 2:00 AM ET on Saturday, May 17 due to maintenance. We apologize for the inconvenience.


Title: Chromosome-scale genome assembly of the brown anole (Anolis sagrei), an emerging model species
Abstract Rapid technological improvements are democratizing access to high quality, chromosome-scale genome assemblies. No longer the domain of only the most highly studied model organisms, now non-traditional and emerging model species can be genome-enabled using a combination of sequencing technologies and assembly software. Consequently, old ideas built on sparse sampling across the tree of life have recently been amended in the face of genomic data drawn from a growing number of high-quality reference genomes. Arguably the most valuable are those long-studied species for which much is already known about their biology; what many term emerging model species. Here, we report a highly complete chromosome-scale genome assembly for the brown anole,Anolis sagrei– a lizard species widely studied across a variety of disciplines and for which a high-quality reference genome was long overdue. This assembly exceeds the vast majority of existing reptile and snake genomes in contiguity (N50 = 253.6 Mb) and annotation completeness. Through the analysis of this genome and population resequence data, we examine the history of repetitive element accumulation, identify the X chromosome, and propose a hypothesis for the evolutionary history of fusions between autosomes and the X that led to the sex chromosomes ofA. sagrei.  more » « less
Award ID(s):
1927156 1927194 1827647
PAR ID:
10376902
Author(s) / Creator(s):
; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ;
Publisher / Repository:
Nature Publishing Group
Date Published:
Journal Name:
Communications Biology
Volume:
5
Issue:
1
ISSN:
2399-3642
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Enard, David (Ed.)
    Abstract High-quality reference genomes are fundamental tools for understanding population history, and can provide estimates of genetic and demographic parameters relevant to the conservation of biodiversity. The federally endangered Pacific pocket mouse (PPM), which persists in three small, isolated populations in southern California, is a promising model for studying how demographic history shapes genetic diversity, and how diversity in turn may influence extinction risk. To facilitate these studies in PPM, we combined PacBio HiFi long reads with Omni-C and Hi-C data to generate a de novo genome assembly, and annotated the genome using RNAseq. The assembly comprised 28 chromosome-length scaffolds (N50 = 72.6 MB) and the complete mitochondrial genome, and included a long heterochromatic region on chromosome 18 not represented in the previously available short-read assembly. Heterozygosity was highly variable across the genome of the reference individual, with 18% of windows falling in runs of homozygosity (ROH) >1 MB, and nearly 9% in tracts spanning >5 MB. Yet outside of ROH, heterozygosity was relatively high (0.0027), and historical Ne estimates were large. These patterns of genetic variation suggest recent inbreeding in a formerly large population. Currently the most contiguous assembly for a heteromyid rodent, this reference genome provides insight into the past and recent demographic history of the population, and will be a critical tool for management and future studies of outbreeding depression, inbreeding depression, and genetic load. 
    more » « less
  2. Abstract The symbiosis between clownfish and giant tropical sea anemones (Order Actiniaria) is one of the most iconic on the planet. Distributed on tropical reefs, 28 species of clownfishes form obligate mutualistic relationships with 10 nominal species of venomous sea anemones. Our understanding of the symbiosis is limited by the fact that most research has been focused on the clownfishes. Chromosome scale reference genomes are available for all clownfish species, yet there are no published reference genomes for the host sea anemones. Recent studies have shown that the clownfish-hosting sea anemones belong to three distinct clades of sea anemones that have evolved symbiosis with clownfishes independently. Here we present the first high quality long read assemblies for three species of clownfish hosting sea anemones belonging to each of these clades:Entacmaea quadricolor, Stichodactyla haddoni, Radianthus doreensis. PacBio HiFi sequencing yielded 1,597,562, 3,101,773, and 1,918,148 million reads forE. quadricolor, S. haddoni, andR. doreensis, respectively. All three assemblies were highly contiguous and complete with N50 values above 4Mb and BUSCO completeness above 95% on the Metazoa dataset. Genome structural annotation with BRAKER3 predicted 20,454, 18,948 and 17,056 protein coding genes inE. quadricolor, S. haddoniandR. doreeensisgenome, respectively. These new resources will form the basis of comparative genomic analyses that will allow us to deepen our understanding of this mutualism from the host perspective. SignificanceChromosome-scale genomes are available for all 28 clownfish species yet there are no high-quality reference genomes published for the clownfish-hosting sea anemones. The lack of genomic resources impedes our ability to understand evolution of this iconic symbiosis from the host perspective. The clownfish-hosting sea anemones belong to three clades of sea anemones that have evolved mutualism with clownfish independently. Here we assembled the first high-quality long-read genomes for three species of host sea anemones each belonging to a different host clade:Entacmaea quadricolor, Stichodactyla haddoni, Radianthus doreensis. These resources will enable in depth comparative genomics of clownfish-hosting sea anemones providing a critical perspective for understanding how the symbiosis has evolved. Finally, these reference genomes present a significant increase in the number of high-quality long-read genome assemblies for sea anemones (11 currently published) and double the number of high-quality reference genomes for the sea anemone superfamily Actinoidea. 
    more » « less
  3. Abstract Genomic resources across squamate reptiles (lizards and snakes) have lagged behind other vertebrate systems and high-quality reference genomes remain scarce. Of the 23 chromosome-scale reference genomes across the order, only 12 of the ~60 squamate families are represented. Within geckos (infraorder Gekkota), a species-rich clade of lizards, chromosome-level genomes are exceptionally sparse representing only two of the seven extant families. Using the latest advances in genome sequencing and assembly methods, we generated one of the highest-quality squamate genomes to date for the leopard gecko, Eublepharis macularius (Eublepharidae). We compared this assembly to the previous, short-read only, E. macularius reference genome published in 2016 and examined potential factors within the assembly influencing contiguity of genome assemblies using PacBio HiFi data. Briefly, the read N50 of the PacBio HiFi reads generated for this study was equal to the contig N50 of the previous E. macularius reference genome at 20.4 kilobases. The HiFi reads were assembled into a total of 132 contigs, which was further scaffolded using HiC data into 75 total sequences representing all 19 chromosomes. We identified 9 of the 19 chromosomal scaffolds were assembled as a near-single contig, whereas the other 10 chromosomes were each scaffolded together from multiple contigs. We qualitatively identified that the percent repeat content within a chromosome broadly affects its assembly contiguity prior to scaffolding. This genome assembly signifies a new age for squamate genomics where high-quality reference genomes rivaling some of the best vertebrate genome assemblies can be generated for a fraction of previous cost estimates. This new E. macularius reference assembly is available on NCBI at JAOPLA010000000. 
    more » « less
  4. null (Ed.)
    Abstract High-quality and complete reference genome assemblies are fundamental for the application of genomics to biology, disease, and biodiversity conservation. However, such assemblies are available for only a few non-microbial species 1–4 . To address this issue, the international Genome 10K (G10K) consortium 5,6 has worked over a five-year period to evaluate and develop cost-effective methods for assembling highly accurate and nearly complete reference genomes. Here we present lessons learned from generating assemblies for 16 species that represent six major vertebrate lineages. We confirm that long-read sequencing technologies are essential for maximizing genome quality, and that unresolved complex repeats and haplotype heterozygosity are major sources of assembly error when not handled correctly. Our assemblies correct substantial errors, add missing sequence in some of the best historical reference genomes, and reveal biological discoveries. These include the identification of many false gene duplications, increases in gene sizes, chromosome rearrangements that are specific to lineages, a repeated independent chromosome breakpoint in bat genomes, and a canonical GC-rich pattern in protein-coding genes and their regulatory regions. Adopting these lessons, we have embarked on the Vertebrate Genomes Project (VGP), an international effort to generate high-quality, complete reference genomes for all of the roughly 70,000 extant vertebrate species and to help to enable a new era of discovery across the life sciences. 
    more » « less
  5. Abstract The current human reference genome, GRCh38, represents over 20 years of effort to generate a high-quality assembly, which has benefitted society 1,2 . However, it still has many gaps and errors, and does not represent a biological genome as it is a blend of multiple individuals 3,4 . Recently, a high-quality telomere-to-telomere reference, CHM13, was generated with the latest long-read technologies, but it was derived from a hydatidiform mole cell line with a nearly homozygous genome 5 . To address these limitations, the Human Pangenome Reference Consortium formed with the goal of creating high-quality, cost-effective, diploid genome assemblies for a pangenome reference that represents human genetic diversity 6 . Here, in our first scientific report, we determined which combination of current genome sequencing and assembly approaches yield the most complete and accurate diploid genome assembly with minimal manual curation. Approaches that used highly accurate long reads and parent–child data with graph-based haplotype phasing during assembly outperformed those that did not. Developing a combination of the top-performing methods, we generated our first high-quality diploid reference assembly, containing only approximately four gaps per chromosome on average, with most chromosomes within ±1% of the length of CHM13. Nearly 48% of protein-coding genes have non-synonymous amino acid changes between haplotypes, and centromeric regions showed the highest diversity. Our findings serve as a foundation for assembling near-complete diploid human genomes at scale for a pangenome reference to capture global genetic variation from single nucleotides to structural rearrangements. 
    more » « less