skip to main content

Title: CPGView : A package for visualizing detailed chloroplast genome structures

Chloroplast genomes have been widely used in studying plant phylogeny and evolution. Several chloroplast genome visualization tools have been developed to display the distribution of genes on the genome. However, these tools do not draw features, such as exons, introns, repetitive elements, and variable sites, disallowing in‐depth examination of the genome structures. Here, we developed and validated a software package called Chloroplast Genome Viewers (CPGView). CPGView can draw three maps showing (i) the distributions of genes, variable sites, and repetitive sequences, including microsatellites, tandem and dispersed repeats; (ii) the structure of the cis‐splicing genes after adjusting the exon‐intron boundary positions using a coordinate scaling algorithm, and (iii) the structure of the trans‐splicing generps12. To test the accuracy of CPGView, we sequenced, assembled, and annotated 31 chloroplast genomes from 31 genera of 22 families. CPGView drew maps correctly for all the 31 chloroplast genomes. Lastly, we used CPGView to examine 5998 publicly released chloroplast genomes from 2513 genera of 553 families. CPGView succeeded in plotting maps for 5882 but failed to plot maps for 116 chloroplast genomes. Further examination showed that the annotations of these 116 genomes had various errors needing manual correction. The test on newly generated data and publicly available data demonstrated the ability of CPGView to identify errors in the annotations of chloroplast genomes. CPGView will become a widely used tool to study the detailed structure of chloroplast genomes. The web version of CPGView can be accessed from

more » « less
Author(s) / Creator(s):
 ;  ;  ;  ;  ;  ;  
Publisher / Repository:
Date Published:
Journal Name:
Molecular Ecology Resources
Page Range / eLocation ID:
p. 694-704
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. INTRODUCTION Transposable elements (TEs), repeat expansions, and repeat-mediated structural rearrangements play key roles in chromosome structure and species evolution, contribute to human genetic variation, and substantially influence human health through copy number variants, structural variants, insertions, deletions, and alterations to gene transcription and splicing. Despite their formative role in genome stability, repetitive regions have been relegated to gaps and collapsed regions in human genome reference GRCh38 owing to the technological limitations during its development. The lack of linear sequence in these regions, particularly in centromeres, resulted in the inability to fully explore the repeat content of the human genome in the context of both local and regional chromosomal environments. RATIONALE Long-read sequencing supported the complete, telomere-to-telomere (T2T) assembly of the pseudo-haploid human cell line CHM13. This resource affords a genome-scale assessment of all human repetitive sequences, including TEs and previously unknown repeats and satellites, both within and outside of gaps and collapsed regions. Additionally, a complete genome enables the opportunity to explore the epigenetic and transcriptional profiles of these elements that are fundamental to our understanding of chromosome structure, function, and evolution. Comparative analyses reveal modes of repeat divergence, evolution, and expansion or contraction with locus-level resolution. RESULTS We implemented a comprehensive repeat annotation workflow using previously known human repeats and de novo repeat modeling followed by manual curation, including assessing overlaps with gene annotations, segmental duplications, tandem repeats, and annotated repeats. Using this method, we developed an updated catalog of human repetitive sequences and refined previous repeat annotations. We discovered 43 previously unknown repeats and repeat variants and characterized 19 complex, composite repetitive structures, which often carry genes, across T2T-CHM13. Using precision nuclear run-on sequencing (PRO-seq) and CpG methylated sites generated from Oxford Nanopore Technologies long-read sequencing data, we assessed RNA polymerase engagement across retroelements genome-wide, revealing correlations between nascent transcription, sequence divergence, CpG density, and methylation. These analyses were extended to evaluate RNA polymerase occupancy for all repeats, including high-density satellite repeats that reside in previously inaccessible centromeric regions of all human chromosomes. Moreover, using both mapping-dependent and mapping-independent approaches across early developmental stages and a complete cell cycle time series, we found that engaged RNA polymerase across satellites is low; in contrast, TE transcription is abundant and serves as a boundary for changes in CpG methylation and centromere substructure. Together, these data reveal the dynamic relationship between transcriptionally active retroelement subclasses and DNA methylation, as well as potential mechanisms for the derivation and evolution of new repeat families and composite elements. Focusing on the emerging T2T-level assembly of the HG002 X chromosome, we reveal that a high level of repeat variation likely exists across the human population, including composite element copy numbers that affect gene copy number. Additionally, we highlight the impact of repeats on the structural diversity of the genome, revealing repeat expansions with extreme copy number differences between humans and primates while also providing high-confidence annotations of retroelement transduction events. CONCLUSION The comprehensive repeat annotations and updated repeat models described herein serve as a resource for expanding the compendium of human genome sequences and reveal the impact of specific repeats on the human genome. In developing this resource, we provide a methodological framework for assessing repeat variation within and between human genomes. The exhaustive assessment of the transcriptional landscape of repeats, at both the genome scale and locally, such as within centromeres, sets the stage for functional studies to disentangle the role transcription plays in the mechanisms essential for genome stability and chromosome segregation. Finally, our work demonstrates the need to increase efforts toward achieving T2T-level assemblies for nonhuman primates and other species to fully understand the complexity and impact of repeat-derived genomic innovations that define primate lineages, including humans. Telomere-to-telomere assembly of CHM13 supports repeat annotations and discoveries. The human reference T2T-CHM13 filled gaps and corrected collapsed regions (triangles) in GRCh38. Combining long read–based methylation calls, PRO-seq, and multilevel computational methods, we provide a compendium of human repeats, define retroelement expression and methylation profiles, and delineate locus-specific sites of nascent transcription genome-wide, including previously inaccessible centromeres. SINE, short interspersed element; SVA, SINE–variable number tandem repeat– Alu ; LINE, long interspersed element; LTR, long terminal repeat; TSS, transcription start site; pA, xxxxxxxxxxxxxxxx. 
    more » « less

    Plastids contain their own genomes, which are transcribed by two types of RNA polymerases. One of those enzymes is a bacterial‐type, multi‐subunit polymerase encoded by the plastid genome. The plastid‐encoded RNA polymerase (PEP) is required for efficient expression of genes encoding proteins involved in photosynthesis. Despite the importance of PEP, its DNA binding locations have not been studied on the genome‐wide scale at high resolution. We established a highly specific approach to detect the genome‐wide pattern of PEP binding to chloroplast DNA using plastid chromatin immunoprecipitation–sequencing (ptChIP‐seq). We found that in matureArabidopsis thalianachloroplasts, PEP has a complex DNA binding pattern with preferential association at genes encoding rRNA, tRNA, and a subset of photosynthetic proteins. Sigma factors SIG2 and SIG6 strongly impact PEP binding to a subset of tRNA genes and have more moderate effects on PEP binding throughout the rest of the genome. PEP binding is commonly enriched on gene promoters, around transcription start sites. Finally, the levels of PEP binding to DNA are correlated with levels of RNA accumulation, which demonstrates the impact of PEP on chloroplast gene expression. Presented data are available through a publicly available Plastid Genome Visualization Tool (Plavisto) at

    more » « less
  3. Summary Open Research Badges

    This article has earned an Open Data Badge for making publicly available the digitally‐shareable data necessary to reproduce the reported results. The data is available at;

    more » « less
  4. Abstract Background

    Systems-level analyses, such as differential gene expression analysis, co-expression analysis, and metabolic pathway reconstruction, depend on the accuracy of the transcriptome. Multiple tools exist to perform transcriptome assembly from RNAseq data. However, assembling high quality transcriptomes is still not a trivial problem. This is especially the case for non-model organisms where adequate reference genomes are often not available. Different methods produce different transcriptome models and there is no easy way to determine which are more accurate. Furthermore, having alternative-splicing events exacerbates such difficult assembly problems. While benchmarking transcriptome assemblies is critical, this is also not trivial due to the general lack of true reference transcriptomes.


    In this study, we first provide a pipeline to generate a set of the simulated benchmark transcriptome and corresponding RNAseq data. Using the simulated benchmarking datasets, we compared the performance of various transcriptome assembly approaches including both de novo and genome-guided methods. The results showed that the assembly performance deteriorates significantly when alternative transcripts (isoforms) exist or for genome-guided methods when the reference is not available from the same genome. To improve the transcriptome assembly performance, leveraging the overlapping predictions between different assemblies, we present a new consensus-based ensemble transcriptome assembly approach, ConSemble.


    Without using a reference genome, ConSemble using four de novo assemblers achieved an accuracy up to twice as high as any de novo assemblers we compared. When a reference genome is available, ConSemble using four genome-guided assemblies removed many incorrectly assembled contigs with minimal impact on correctly assembled contigs, achieving higher precision and accuracy than individual genome-guided methods. Furthermore, ConSemble using de novo assemblers matched or exceeded the best performing genome-guided assemblers even when the transcriptomes included isoforms. We thus demonstrated that the ConSemble consensus strategy both for de novo and genome-guided assemblers can improve transcriptome assembly. The RNAseq simulation pipeline, the benchmark transcriptome datasets, and the script to perform the ConSemble assembly are all freely available from:

    more » « less
  5. Heřmanský–Pudlák syndrome (HPS), a rare autosomal recessive disorder, manifests with oculocutaneous albinism and a bleeding diathesis. However, severity of disease can be variable and is typically related to the genetic subtype of HPS; HPS type 6 (HPS‐6) is an uncommon subtype generally associated with mild disease. A Caucasian adult female presented with a history of severe bleeding; ophthalmologic examination indicated occult oculocutaneous albinism. The patient was diagnosed with a platelet storage pool disorder, and platelet whole mount electron microscopy demonstrated absent delta granules. Genome‐wide SNP analysis showed regions of homozygosity that included theHPS1andHPS6genes. Full lengthHPS1transcript was amplified by PCR of genomic DNA. Targeted next‐generation sequencing identified a novel homozygous missense variant inHPS6(c.383 T > C; p.V128A); this was associated with significantly reducedHPS6mRNA and protein expression in the patient's fibroblasts compared to control cells. These findings highlight the variable severity of disease manifestations in patients with HPS, and illustrate that HPS can be diagnosed in patients with excessive bleeding and occult oculocutaneous albinism. Genetic analysis and platelet electron microscopy are useful diagnostic tests in evaluating patients with suspected HPS.

    Clinical Trial registration:

    Registration Numbers: NCT00001456 and NCT00084305.

    more » « less