skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: The Compact Macronuclear Genome of the Ciliate Halteria grandinella: A Transcriptome-Like Genome with 23,000 Nanochromosomes
ABSTRACT How to achieve protein diversity by genome and transcriptome processing is essential for organismal complexity and adaptation. The present work identifies that the macronuclear genome of Halteria grandinella , a cosmopolitan unicellular eukaryote, is composed almost entirely of gene-sized nanochromosomes with extremely short nongenic regions. This challenges our usual understanding of chromosomal structure and suggests the possibility of novel mechvanisms in transcriptional regulation. Comprehensive analysis of multiple data sets reveals that Halteria transcription dynamics are influenced by: (i) nonuniform nanochromosome copy numbers correlated with gene-expression level; (ii) dynamic alterations at both the DNA and RNA levels, including alternative internal eliminated sequence (IES) deletions during macronucleus formation and large-scale alternative splicing in transcript maturation; and (iii) extremely short 5′ and 3′ untranslated regions (UTRs) and universal TATA box-like motifs in the compact 5′ subtelomeric regions of most chromosomes. This study broadens the view of ciliate biology and the evolution of unicellular eukaryotes, and identifies Halteria as one of the most compact known eukaryotic genomes, indicating that complex cell structure does not require complex gene architecture.  more » « less
Award ID(s):
1927159
PAR ID:
10351089
Author(s) / Creator(s):
; ; ;
Editor(s):
Katz, Laura A.; Capone, Douglas G.
Date Published:
Journal Name:
mBio
Volume:
12
Issue:
1
ISSN:
2161-2129
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Since its initial release in 2000, the human reference genome has covered only the euchromatic fraction of the genome, leaving important heterochromatic regions unfinished. Addressing the remaining 8% of the genome, the Telomere-to-Telomere (T2T) Consortium presents a complete 3.055 billion–base pair sequence of a human genome, T2T-CHM13, that includes gapless assemblies for all chromosomes except Y, corrects errors in the prior references, and introduces nearly 200 million base pairs of sequence containing 1956 gene predictions, 99 of which are predicted to be protein coding. The completed regions include all centromeric satellite arrays, recent segmental duplications, and the short arms of all five acrocentric chromosomes, unlocking these complex regions of the genome to variational and functional studies. 
    more » « less
  2. The rapid evolution of repetitive DNA sequences, including satellite DNA, tandem duplications, and transposable elements, underlies phenotypic evolution and contributes to hybrid incompatibilities between species. However, repetitive genomic regions are fragmented and misassembled in most contemporary genome assemblies. We generated highly contiguous de novo reference genomes for the Drosophila simulans species complex ( D. simulans , D. mauritiana , and D. sechellia ), which speciated ∼250,000 yr ago. Our assemblies are comparable in contiguity and accuracy to the current D. melanogaster genome, allowing us to directly compare repetitive sequences between these four species. We find that at least 15% of the D. simulans complex species genomes fail to align uniquely to D. melanogaster owing to structural divergence—twice the number of single-nucleotide substitutions. We also find rapid turnover of satellite DNA and extensive structural divergence in heterochromatic regions, whereas the euchromatic gene content is mostly conserved. Despite the overall preservation of gene synteny, euchromatin in each species has been shaped by clade- and species-specific inversions, transposable elements, expansions and contractions of satellite and tRNA tandem arrays, and gene duplications. We also find rapid divergence among Y-linked genes, including copy number variation and recent gene duplications from autosomes. Our assemblies provide a valuable resource for studying genome evolution and its consequences for phenotypic evolution in these genetic model species. 
    more » « less
  3. null (Ed.)
    Abstract Background Availability of plant genome sequences has led to significant advances. However, with few exceptions, the great majority of existing genome assemblies are derived from short read sequencing technologies with highly uneven read coverages indicative of sequencing and assembly issues that could significantly impact any downstream analysis of plant genomes. In tomato for example, 0.6% (5.1 Mb) and 9.7% (79.6 Mb) of short-read based assembly had significantly higher and lower coverage compared to background, respectively. Results To understand what the causes may be for such uneven coverage, we first established machine learning models capable of predicting genomic regions with variable coverages and found that high coverage regions tend to have higher simple sequence repeat and tandem gene densities compared to background regions. To determine if the high coverage regions were misassembled, we examined a recently available tomato long-read based assembly and found that 27.8% (1.41 Mb) of high coverage regions were potentially misassembled of duplicate sequences, compared to 1.4% in background regions. In addition, using a predictive model that can distinguish correctly and incorrectly assembled high coverage regions, we found that misassembled, high coverage regions tend to be flanked by simple sequence repeats, pseudogenes, and transposon elements. Conclusions Our study provides insights on the causes of variable coverage regions and a quantitative assessment of factors contributing to plant genome misassembly when using short reads and the generality of these causes and factors should be tested further in other species. 
    more » « less
  4. Variation in gene regulation is ubiquitous, yet identifying the mechanisms producing such variation, especially for complex traits, is challenging. Snake venoms provide a model system for studying the phenotypic impacts of regulatory variation in complex traits because of their genetic tractability. Here, we sequence the genome of the Tiger Rattlesnake, which possesses the simplest and most toxic venom of any rattlesnake species, to determine whether the simple venom phenotype is the result of a simple genotype through gene loss or a complex genotype mediated through regulatory mechanisms. We generate the most contiguous snake-genome assembly to date and use this genome to show that gene loss, chromatin accessibility, and methylation levels all contribute to the production of the simplest, most toxic rattlesnake venom. We provide the most complete characterization of the venom gene-regulatory network to date and identify key mechanisms mediating phenotypic variation across a polygenic regulatory network. 
    more » « less
  5. Fodor, Anthony (Ed.)
    ABSTRACT Are two adjacent genes in the same operon? What are the order and spacing between several transcription factor binding sites? Genome browsers are software data visualization and exploration tools that enable biologists to answer questions such as these. In this paper, we report on a major update to our browser, Genome Explorer, that provides nearly instantaneous scaling and traversing of a genome, enabling users to quickly and easily zoom into an area of interest. The user can rapidly move between scales that depict the entire genome, individual genes, and the sequence; Genome Explorer presents the most relevant detail and context for each scale. By downloading the data for the entire genome to the user’s web browser and dynamically generating visualizations locally, we enable fine control of zoom and pan functions and real-time redrawing of the visualization, resulting in smoother and more intuitive exploration of a genome than is possible with other browsers. Further, genome features are presented together, in-line, using familiar graphical depictions. In contrast, many other browsers depict genome features using data tracks, which have low information density and can visually obscure the relative positions of features. Genome Explorer diagrams have a high information density that provides larger amounts of genome context and sequence information to be presented in a given-sized monitor than for tracks-based browsers. Genome Explorer provides optional data tracks for the analysis of large-scale data sets and a unique comparative mode that aligns genomes at orthologous genes with synchronized zooming. IMPORTANCEGenome browsers provide graphical depictions of genome information to speed the uptake of complex genome data by scientists. They provide search operations to help scientists find information and zoom operations to enable scientists to view genome features at different resolutions. We introduce the Genome Explorer browser, which provides extremely fast zooming and panning of genome visualizations and displays with high information density. 
    more » « less