skip to main content


Title: Long-read genome assemblies for the study of chromosome expansion: Drosophila kikkawai , Drosophila takahashii , Drosophila bipectinata , and Drosophila ananassae
Abstract

Flow cytometry estimates of genome sizes among species of Drosophila show a 3-fold variation, ranging from ∼127 Mb in Drosophila mercatorum to ∼400 Mb in Drosophila cyrtoloma. However, the assembled portion of the Muller F element (orthologous to the fourth chromosome in Drosophila melanogaster) shows a nearly 14-fold variation in size, ranging from ∼1.3 Mb to >18 Mb. Here, we present chromosome-level long-read genome assemblies for 4 Drosophila species with expanded F elements ranging in size from 2.3 to 20.5 Mb. Each Muller element is present as a single scaffold in each assembly. These assemblies will enable new insights into the evolutionary causes and consequences of chromosome size expansion.

 
more » « less
NSF-PAR ID:
10459855
Author(s) / Creator(s):
; ; ; ; ; ; ;
Publisher / Repository:
Oxford University Press
Date Published:
Journal Name:
G3: Genes, Genomes, Genetics
ISSN:
2160-1836
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Annotating the genomes of multiple species allows us to analyze the evolution of their genes. While many eukaryotic genome assemblies already include computational gene predictions, these predictions can benefit from review and refinement through manual gene annotation. The Genomics Education Partnership (GEP; https://thegep.org/ ) developed a structural annotation protocol for protein-coding genes that enables undergraduate student and faculty researchers to create high-quality gene annotations that can be utilized in subsequent scientific investigations. For example, this protocol has been utilized by the GEP faculty to engage undergraduate students in the comparative annotation of genes involved in the insulin signaling pathway in 27 Drosophila species, using D. melanogaster as the reference genome. Students construct gene models using multiple lines of computational and empirical evidence including expression data (e.g., RNA-Seq), sequence similarity (e.g., BLAST and multiple sequence alignment), and computational gene predictions. Quality control measures require each gene be annotated by at least two students working independently, followed by reconciliation of the submitted gene models by a more experienced student. This article provides an overview of the annotation protocol and describes how discrepancies in student submitted gene models are resolved to produce a final, high-quality gene set suitable for subsequent analyses. The protocol can be adapted to other scientific questions (e.g., expansion of the Drosophila Muller F element) and species (e.g., parasitoid wasps) to provide additional opportunities for undergraduate students to participate in genomics research. These student annotation efforts can substantially improve the quality of gene annotations in publicly available genomic databases. 
    more » « less
  2. Abstract The F element of the Drosophila karyotype (the fourth chromosome in Drosophila melanogaster) is often referred to as the “dot chromosome” because of its appearance in a metaphase chromosome spread. This chromosome is distinct from other Drosophila autosomes in possessing both a high level of repetitious sequences (in particular, remnants of transposable elements) and a gene density similar to that found in the other chromosome arms, ∼80 genes distributed throughout its 1.3-Mb “long arm.” The dot chromosome is notorious for its lack of recombination and is often neglected as a consequence. This and other features suggest that the F element is packaged as heterochromatin throughout. F element genes have distinct characteristics (e.g., low codon bias, and larger size due both to larger introns and an increased number of exons), but exhibit expression levels comparable to genes found in euchromatin. Mapping experiments show the presence of appropriate chromatin modifications for the formation of DNaseI hypersensitive sites and transcript initiation at the 5′ ends of active genes, but, in most cases, high levels of heterochromatin proteins are observed over the body of these genes. These various features raise many interesting questions about the relationships of chromatin structures with gene and chromosome function. The apparent evolution of the F element as an autosome from an ancestral sex chromosome also raises intriguing questions. The findings argue that the F element is a unique chromosome that occupies its own space in the nucleus. Further study of the F element should provide new insights into chromosome structure and function. 
    more » « less
  3. Sethuraman, A (Ed.)
    Abstract Spiny lizards in the genus Sceloporus are a model system among squamate reptiles for studies of chromosomal evolution. While most pleurodont iguanians retain an ancestral karyotype formula of 2n = 36 chromosomes, Sceloporus exhibits substantial karyotype variation ranging from 2n =  22 to 46 chromosomes. We present two annotated chromosome-scale genome assemblies for the Plateau Fence Lizard (Sceloporus tristichus) to facilitate research on the role of pericentric inversion polymorphisms on adaptation and speciation. Based on previous karyotype work using conventional staining, the S. tristichus genome is characterized as 2n =  22 with six pairs of macrochromosomes and five pairs of microchromosomes and a pericentric inversion polymorphism on chromosome 7 that is geographically variable. We provide annotated, chromosome-scale genomes for two lizards located at opposite ends of a dynamic hybrid zone that are each fixed for different inversion polymorphisms. The assembled genomes are 1.84–1.87 Gb (1.72 Gb for scaffolds mapping to chromosomes) with a scaffold N50 of 267.5 Mb. Functional annotation of the genomes resulted in ∼15K predicted gene models. Our assemblies confirmed the presence of a 4.62-Mb pericentric inversion on chromosome 7, which contains 62 annotated coding genes with known functions. In addition, we collected population genomics data using double digest RAD-sequencing for 44 S. tristichus to estimate population structure and phylogeny across the Colorado Plateau. These new genomic resources provide opportunities to perform genomic scans and investigate the formation and spread of pericentric inversions in a naturally occurring hybrid zone. 
    more » « less
  4. Green plants play a fundamental role in ecosystems, human health, and agriculture. As de novo genomes are being generated for all known eukaryotic species as advocated by the Earth BioGenome Project, increasing genomic information on green land plants is essential. However, setting standards for the generation and storage of the complex set of genomes that characterize the green lineage of life is a major challenge for plant scientists. Such standards will need to accommodate the immense variation in green plant genome size, transposable element content, and structural complexity while enabling research into the molecular and evolutionary processes that have resulted in this enormous genomic variation. Here we provide an overview and assessment of the current state of knowledge of green plant genomes. To date fewer than 300 complete chromosome-scale genome assemblies representing fewer than 900 species have been generated across the estimated 450,000 to 500,000 species in the green plant clade. These genomes range in size from 12 Mb to 27.6 Gb and are biased toward agricultural crops with large branches of the green tree of life untouched by genomic-scale sequencing. Locating suitable tissue samples of most species of plants, especially those taxa from extreme environments, remains one of the biggest hurdles to increasing our genomic inventory. Furthermore, the annotation of plant genomes is at present undergoing intensive improvement. It is our hope that this fresh overview will help in the development of genomic quality standards for a cohesive and meaningful synthesis of green plant genomes as we scale up for the future. 
    more » « less
  5. Abstract

    Despite being quite specious (~10,000 extant species), birds have a fairly uniform genome size and karyotype (including the common occurrence of microchromosomes) relative to other vertebrate lineages. Storks (Family Ciconiidae) are a charismatic and distinct group of large wading birds with nearly worldwide distribution but few genomic resources. Here we present an annotated chromosome-level reference genome and chromosome orthology analysis for the wood stork (Mycteria americana), a species that has been federally protected under the Endangered Species Act since 1984. The annotated chromosome-level reference assembly was produced using the blood of a wild female wood stork chick, has a length of 1.35 Gb, a contig N50 of 37 Mb, a scaffold N50 of 80 Mb, and a BUSCO score of 98.8%. We identified 31 autosomal pairs and two sex chromosomes in the wood stork genome, but failed to identify four additional autosomal microchromosomes previously found via karyotyping. Orthology analyses confirmed reported synapomorphies unique to storks and identified the chromosomes participating in these fusions. This study highlights the difficulty and potential problems associated with delineating microchromosomes in reference genome assemblies. It also provides a foundation for studying karyotype evolution in the core water bird clade that includes penguins, albatrosses, storks, cormorants, herons, and ibises. Finally, our reference genome will allow for numerous genomic studies, such as genome-wide association studies of local adaptation, that will aid in wood stork conservation.

     
    more » « less