skip to main content

This content will become publicly available on October 20, 2023

Title: An Improved 1.5-Gigabase Draft Assembly of Massospora cicadina (Zoopagomycota), an Obligate Fungal Parasite of 13- and 17-Year Cicadas
ABSTRACT A 1.488-Gb draft genome sequence was assembled for the fungus Massospora cicadina , an obligate parasite of periodical cicadas. The M. cicadina genome has experienced massive expansion via transposable elements (TEs), which account for 92% of the genome.  more » « less
Award ID(s):
1441715 1429826 2215705
Author(s) / Creator(s):
; ; ; ; ;
Rokas, Antonis
Date Published:
Journal Name:
Microbiology Resource Announcements
Medium: X
Sponsoring Org:
National Science Foundation
More Like this

    Plant genome size ranges widely, providing many opportunities to examine how genome size variation affects plant form and function. We analyzed trends in chromosome number, genome size, and leaf traits for the woody angiosperm cladeViburnumto examine the evolutionary associations, functional implications, and possible drivers of genome size.


    Chromosome counts and genome size estimates were mapped onto aViburnumphylogeny to infer the location and frequency of polyploidization events and trends in genome size evolution. Genome size was analyzed with leaf anatomical and physiological data to evaluate the influence of genome size on plant function.


    We discovered nine independent polyploidization events, two reductions in base chromosome number, and substantial variation in genome size with a slight trend toward genome size reduction in polyploids. We did not find strong relationships between genome size and the functional and morphological traits that have been highlighted at broader phylogenetic scales.


    Polyploidization events were sometimes associated with rapid radiations, demonstrating that polyploid lineages can be highly successful. Relationships between genome size and plant physiological function observed at broad phylogenetic scales may be largely irrelevant to the evolutionary dynamics of genome size at smaller scales. The view that plants readily tolerate changes in ploidy and genome size, and often do so, appears to apply toViburnum.

    more » « less
  2. INTRODUCTION One of the central applications of the human reference genome has been to serve as a baseline for comparison in nearly all human genomic studies. Unfortunately, many difficult regions of the reference genome have remained unresolved for decades and are affected by collapsed duplications, missing sequences, and other issues. Relative to the current human reference genome, GRCh38, the Telomere-to-Telomere CHM13 (T2T-CHM13) genome closes all remaining gaps, adds nearly 200 million base pairs (Mbp) of sequence, corrects thousands of structural errors, and unlocks the most complex regions of the human genome for scientific inquiry. RATIONALE We demonstrate how the T2T-CHM13 reference genome universally improves read mapping and variant identification in a globally diverse cohort. This cohort includes all 3202 samples from the expanded 1000 Genomes Project (1KGP), sequenced with short reads, as well as 17 globally diverse samples sequenced with long reads. By applying state-of-the-art methods for calling single-nucleotide variants (SNVs) and structural variants (SVs), we document the strengths and limitations of T2T-CHM13 relative to its predecessors and highlight its promise for revealing new biological insights within technically challenging regions of the genome. RESULTS Across the 1KGP samples, we found more than 1 million additional high-quality variants genome-wide using T2T-CHM13 than with GRCh38. Within previously unresolved regions of the genome, we identified hundreds of thousands of variants per sample—a promising opportunity for evolutionary and biomedical discovery. T2T-CHM13 improves the Mendelian concordance rate among trios and eliminates tens of thousands of spurious SNVs per sample, including a reduction of false positives in 269 challenging, medically relevant genes by up to a factor of 12. These corrections are in large part due to improvements to 70 protein-coding genes in >9 Mbp of inaccurate sequence caused by falsely collapsed or duplicated regions in GRCh38. Using the T2T-CHM13 genome also yields a more comprehensive view of SVs genome-wide, with a greatly improved balance of insertions and deletions. Finally, by providing numerous resources for T2T-CHM13 (including 1KGP genotypes, accessibility masks, and prominent annotation databases), our work will facilitate the transition to T2T-CHM13 from the current reference genome. CONCLUSION The vast improvements in variant discovery across samples of diverse ancestries position T2T-CHM13 to succeed as the next prevailing reference for human genetics. T2T-CHM13 thus offers a model for the construction and study of high-quality reference genomes from globally diverse individuals, such as is now being pursued through collaboration with the Human Pangenome Reference Consortium. As a foundation, our work underscores the benefits of an accurate and complete reference genome for revealing diversity across human populations. Genomic features and resources available for T2T-CHM13. Comparisons to GRCh38 reveal broad improvements in SNVs, indels, and SVs discovered across diverse human populations by means of short-read (1KGP) and long-read sequencing (LRS). These improvements are due to resolution of complex genomic loci (nonsyntenic and previously unresolved), duplication errors, and discordant haplotypes, including those in medically relevant genes. 
    more » « less
  3. Abstract Background Dalbergia odorifera is an economically and culturally important species in the Fabaceae because of the high-quality lumber and traditional Chinese medicines made from this plant, however, overexploitation has increased the scarcity of D. odorifera . Given the rarity and the multiple uses of this species, it is important to expand the genomic resources for utilizing in applications such as tracking illegal logging, determining effective population size of wild stands, delineating pedigrees in marker assisted breeding programs, and resolving gene networks in functional genomics studies. Even the nuclear and chloroplast genomes have been published for D. odorifera , the complete mitochondrial genome has not been assembled or assessed for sequence transfer to other genomic compartments until now. Such work is essential in understanding structural and functional genome evolution in a lineage (Fabaceae) with frequent intergenomic sequence transfers. Results We integrated Illumina short-reads and PacBio CLR long-reads to assemble and annotate the complete mitochondrial genome of D. odorifera . The mitochondrial genome was organized as a single circular structure of 435 Kb in length containing 33 protein coding genes, 4 rRNA and 17 tRNA genes. Nearly 4.0% (17,386 bp) of the genome was annotated as repetitive DNA. From the sequence transfer analysis, it was found that 114 Kb of DNA originating from the mitochondrial genome has been transferred to the nuclear genome, with most of the transfer events having taken place relatively recently. The high frequency of sequence transfers from the mitochondria to the nuclear genome was similar to that of sequence transfer from the chloroplast to the nuclear genome. Conclusion For the first-time, the complete mitochondrial genome of D. odorifera was assembled in this study, which will provide a baseline resource in understanding genomic evolution in the highly specious Fabaceae. In particular, the assessment of intergenomic sequence transfer suggests that transfers have been common and recent indicating a possible role in environmental adaptation as has been found in other lineages. The high turnover rate of genomic colinearly and large differences in mitochondrial genome size found in the comparative analyses herein providing evidence for the rapid evolution of mitochondrial genome structure compared to chloroplasts in Faboideae. While phylogenetic analyses using functional genes indicate that mitochondrial genes are very slowly evolving compared to chloroplast genes. 
    more » « less
  4. Mitchell, Aaron P. (Ed.)
    ABSTRACT Candida albicans is an opportunistic fungal pathogen of humans that is typically diploid yet has a highly labile genome tolerant of large-scale perturbations including chromosomal aneuploidy and loss-of-heterozygosity events. The ability to rapidly generate genetic variation is crucial for C. albicans to adapt to changing or stressful environments, like those encountered in the host. Genetic variation occurs via stress-induced mutagenesis or can be generated through its parasexual cycle, in which tetraploids arise via diploid mating or stress-induced mitotic defects and undergo nonmeiotic ploidy reduction. However, it remains largely unknown how genetic background contributes to C. albicans genome instability in vitro or in the host environment. Here, we tested how genetic background, ploidy, and the host environment impacts C. albicans genome stability. We found that host association induced both loss-of-heterozygosity events and genome size changes, regardless of genetic background or ploidy. However, the magnitude and types of genome changes varied across C. albicans strain background and ploidy state. We then assessed if host-induced genomic changes resulted in fitness consequences on growth rate and nonlethal virulence phenotypes and found that many host-derived isolates significantly changed relative to their parental strain. Interestingly, diploid host-associated C. albicans predominantly decreased host reproductive fitness, whereas tetraploid host-associated C. albicans increased host reproductive fitness. Together, these results are important for understanding how host-induced genomic changes in C. albicans alter its relationship with the host. IMPORTANCE Candida albicans is an opportunistic fungal pathogen of humans. The ability to generate genetic variation is essential for adaptation and is a strategy that C. albicans and other fungal pathogens use to change their genome size. Stressful environments, including the host, induce C. albicans genome instability. Here, we investigated how C. albicans genetic background and ploidy state impact genome instability, both in vitro and in a host environment. We show that the host environment induces genome instability, but the magnitude depends on C. albicans genetic background. Furthermore, we show that tetraploid C. albicans is highly unstable in host environments and rapidly reduces in genome size. These reductions in genome size often resulted in reduced virulence. In contrast, diploid C. albicans displayed modest host-induced genome size changes, yet these frequently resulted in increased virulence. Such studies are essential for understanding how opportunistic pathogens respond and potentially adapt to the host environment. 
    more » « less
  5. INTRODUCTION Transposable elements (TEs), repeat expansions, and repeat-mediated structural rearrangements play key roles in chromosome structure and species evolution, contribute to human genetic variation, and substantially influence human health through copy number variants, structural variants, insertions, deletions, and alterations to gene transcription and splicing. Despite their formative role in genome stability, repetitive regions have been relegated to gaps and collapsed regions in human genome reference GRCh38 owing to the technological limitations during its development. The lack of linear sequence in these regions, particularly in centromeres, resulted in the inability to fully explore the repeat content of the human genome in the context of both local and regional chromosomal environments. RATIONALE Long-read sequencing supported the complete, telomere-to-telomere (T2T) assembly of the pseudo-haploid human cell line CHM13. This resource affords a genome-scale assessment of all human repetitive sequences, including TEs and previously unknown repeats and satellites, both within and outside of gaps and collapsed regions. Additionally, a complete genome enables the opportunity to explore the epigenetic and transcriptional profiles of these elements that are fundamental to our understanding of chromosome structure, function, and evolution. Comparative analyses reveal modes of repeat divergence, evolution, and expansion or contraction with locus-level resolution. RESULTS We implemented a comprehensive repeat annotation workflow using previously known human repeats and de novo repeat modeling followed by manual curation, including assessing overlaps with gene annotations, segmental duplications, tandem repeats, and annotated repeats. Using this method, we developed an updated catalog of human repetitive sequences and refined previous repeat annotations. We discovered 43 previously unknown repeats and repeat variants and characterized 19 complex, composite repetitive structures, which often carry genes, across T2T-CHM13. Using precision nuclear run-on sequencing (PRO-seq) and CpG methylated sites generated from Oxford Nanopore Technologies long-read sequencing data, we assessed RNA polymerase engagement across retroelements genome-wide, revealing correlations between nascent transcription, sequence divergence, CpG density, and methylation. These analyses were extended to evaluate RNA polymerase occupancy for all repeats, including high-density satellite repeats that reside in previously inaccessible centromeric regions of all human chromosomes. Moreover, using both mapping-dependent and mapping-independent approaches across early developmental stages and a complete cell cycle time series, we found that engaged RNA polymerase across satellites is low; in contrast, TE transcription is abundant and serves as a boundary for changes in CpG methylation and centromere substructure. Together, these data reveal the dynamic relationship between transcriptionally active retroelement subclasses and DNA methylation, as well as potential mechanisms for the derivation and evolution of new repeat families and composite elements. Focusing on the emerging T2T-level assembly of the HG002 X chromosome, we reveal that a high level of repeat variation likely exists across the human population, including composite element copy numbers that affect gene copy number. Additionally, we highlight the impact of repeats on the structural diversity of the genome, revealing repeat expansions with extreme copy number differences between humans and primates while also providing high-confidence annotations of retroelement transduction events. CONCLUSION The comprehensive repeat annotations and updated repeat models described herein serve as a resource for expanding the compendium of human genome sequences and reveal the impact of specific repeats on the human genome. In developing this resource, we provide a methodological framework for assessing repeat variation within and between human genomes. The exhaustive assessment of the transcriptional landscape of repeats, at both the genome scale and locally, such as within centromeres, sets the stage for functional studies to disentangle the role transcription plays in the mechanisms essential for genome stability and chromosome segregation. Finally, our work demonstrates the need to increase efforts toward achieving T2T-level assemblies for nonhuman primates and other species to fully understand the complexity and impact of repeat-derived genomic innovations that define primate lineages, including humans. Telomere-to-telomere assembly of CHM13 supports repeat annotations and discoveries. The human reference T2T-CHM13 filled gaps and corrected collapsed regions (triangles) in GRCh38. Combining long read–based methylation calls, PRO-seq, and multilevel computational methods, we provide a compendium of human repeats, define retroelement expression and methylation profiles, and delineate locus-specific sites of nascent transcription genome-wide, including previously inaccessible centromeres. SINE, short interspersed element; SVA, SINE–variable number tandem repeat– Alu ; LINE, long interspersed element; LTR, long terminal repeat; TSS, transcription start site; pA, xxxxxxxxxxxxxxxx. 
    more » « less