skip to main content


Title: Chromosome-scale assembly of the Verbenaceae species Queen’s Wreath (Petrea volubilis L.)
Abstract Objectives

Petrea volubilis, a member of the Order Lamiales and the Verbenaceae family, is an important horticultural species that has been used in traditional folk medicine. To provide a genome sequence for comparative studies within the Order Lamiales that includes important families such as Lamiaceae (mints), we generated a long-read, chromosome-scale genome assembly of this species.

Data description

Using a total of 45.5 Gb of Pacific Biosciences long read sequence, we generated a 480.2 Mb assembly ofP. volubilis,of which, 93% is chromosome anchored. Representation of genic regions was robust with 96.6% of the Benchmarking of Universal Single Copy Orthologs present in the genome assembly. A total of 57.8% of the genome was annotated as a repetitive sequence. Using a gene annotation pipeline that included refinement of gene models using transcript evidence, 30,982 high confidence genes were annotated. Access to theP. volubilisgenome will facilitate evolutionary studies in the Lamiales, a key order of Asterids that includes significant crop and medicinal plant species.

 
more » « less
NSF-PAR ID:
10400077
Author(s) / Creator(s):
; ; ;
Publisher / Repository:
Springer Science + Business Media
Date Published:
Journal Name:
BMC Genomic Data
Volume:
24
Issue:
1
ISSN:
2730-6844
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. INTRODUCTION Transposable elements (TEs), repeat expansions, and repeat-mediated structural rearrangements play key roles in chromosome structure and species evolution, contribute to human genetic variation, and substantially influence human health through copy number variants, structural variants, insertions, deletions, and alterations to gene transcription and splicing. Despite their formative role in genome stability, repetitive regions have been relegated to gaps and collapsed regions in human genome reference GRCh38 owing to the technological limitations during its development. The lack of linear sequence in these regions, particularly in centromeres, resulted in the inability to fully explore the repeat content of the human genome in the context of both local and regional chromosomal environments. RATIONALE Long-read sequencing supported the complete, telomere-to-telomere (T2T) assembly of the pseudo-haploid human cell line CHM13. This resource affords a genome-scale assessment of all human repetitive sequences, including TEs and previously unknown repeats and satellites, both within and outside of gaps and collapsed regions. Additionally, a complete genome enables the opportunity to explore the epigenetic and transcriptional profiles of these elements that are fundamental to our understanding of chromosome structure, function, and evolution. Comparative analyses reveal modes of repeat divergence, evolution, and expansion or contraction with locus-level resolution. RESULTS We implemented a comprehensive repeat annotation workflow using previously known human repeats and de novo repeat modeling followed by manual curation, including assessing overlaps with gene annotations, segmental duplications, tandem repeats, and annotated repeats. Using this method, we developed an updated catalog of human repetitive sequences and refined previous repeat annotations. We discovered 43 previously unknown repeats and repeat variants and characterized 19 complex, composite repetitive structures, which often carry genes, across T2T-CHM13. Using precision nuclear run-on sequencing (PRO-seq) and CpG methylated sites generated from Oxford Nanopore Technologies long-read sequencing data, we assessed RNA polymerase engagement across retroelements genome-wide, revealing correlations between nascent transcription, sequence divergence, CpG density, and methylation. These analyses were extended to evaluate RNA polymerase occupancy for all repeats, including high-density satellite repeats that reside in previously inaccessible centromeric regions of all human chromosomes. Moreover, using both mapping-dependent and mapping-independent approaches across early developmental stages and a complete cell cycle time series, we found that engaged RNA polymerase across satellites is low; in contrast, TE transcription is abundant and serves as a boundary for changes in CpG methylation and centromere substructure. Together, these data reveal the dynamic relationship between transcriptionally active retroelement subclasses and DNA methylation, as well as potential mechanisms for the derivation and evolution of new repeat families and composite elements. Focusing on the emerging T2T-level assembly of the HG002 X chromosome, we reveal that a high level of repeat variation likely exists across the human population, including composite element copy numbers that affect gene copy number. Additionally, we highlight the impact of repeats on the structural diversity of the genome, revealing repeat expansions with extreme copy number differences between humans and primates while also providing high-confidence annotations of retroelement transduction events. CONCLUSION The comprehensive repeat annotations and updated repeat models described herein serve as a resource for expanding the compendium of human genome sequences and reveal the impact of specific repeats on the human genome. In developing this resource, we provide a methodological framework for assessing repeat variation within and between human genomes. The exhaustive assessment of the transcriptional landscape of repeats, at both the genome scale and locally, such as within centromeres, sets the stage for functional studies to disentangle the role transcription plays in the mechanisms essential for genome stability and chromosome segregation. Finally, our work demonstrates the need to increase efforts toward achieving T2T-level assemblies for nonhuman primates and other species to fully understand the complexity and impact of repeat-derived genomic innovations that define primate lineages, including humans. Telomere-to-telomere assembly of CHM13 supports repeat annotations and discoveries. The human reference T2T-CHM13 filled gaps and corrected collapsed regions (triangles) in GRCh38. Combining long read–based methylation calls, PRO-seq, and multilevel computational methods, we provide a compendium of human repeats, define retroelement expression and methylation profiles, and delineate locus-specific sites of nascent transcription genome-wide, including previously inaccessible centromeres. SINE, short interspersed element; SVA, SINE–variable number tandem repeat– Alu ; LINE, long interspersed element; LTR, long terminal repeat; TSS, transcription start site; pA, xxxxxxxxxxxxxxxx. 
    more » « less
  2. Abstract Background

    The increasing number of chromosome-level genome assemblies has advanced our knowledge and understanding of macroevolutionary processes. Here, we introduce the genome of the desert horned lizard, Phrynosoma platyrhinos, an iguanid lizard occupying extreme desert conditions of the American southwest. We conduct analysis of the chromosomal structure and composition of this species and compare these features across genomes of 12 other reptiles (5 species of lizards, 3 snakes, 3 turtles, and 1 bird).

    Findings

    The desert horned lizard genome was sequenced using Illumina paired-end reads and assembled and scaffolded using Dovetail Genomics Hi-C and Chicago long-range contact data. The resulting genome assembly has a total length of 1,901.85 Mb, scaffold N50 length of 273.213 Mb, and includes 5,294 scaffolds. The chromosome-level assembly is composed of 6 macrochromosomes and 11 microchromosomes. A total of 20,764 genes were annotated in the assembly. GC content and gene density are higher for microchromosomes than macrochromosomes, while repeat element distributions show the opposite trend. Pathway analyses provide preliminary evidence that microchromosome and macrochromosome gene content are functionally distinct. Synteny analysis indicates that large microchromosome blocks are conserved among closely related species, whereas macrochromosomes show evidence of frequent fusion and fission events among reptiles, even between closely related species.

    Conclusions

    Our results demonstrate dynamic karyotypic evolution across Reptilia, with frequent inferred splits, fusions, and rearrangements that have resulted in shuffling of chromosomal blocks between macrochromosomes and microchromosomes. Our analyses also provide new evidence for distinct gene content and chromosomal structure between microchromosomes and macrochromosomes within reptiles.

     
    more » « less
  3. Abstract Background

    The maize inbred line A188 is an attractive model for elucidation of gene function and improvement due to its high embryogenic capacity and many contrasting traits to the first maize reference genome, B73, and other elite lines. The lack of a genome assembly of A188 limits its use as a model for functional studies.

    Results

    Here, we present a chromosome-level genome assembly of A188 using long reads and optical maps. Comparison of A188 with B73 using both whole-genome alignments and read depths from sequencing reads identify approximately 1.1 Gb of syntenic sequences as well as extensive structural variation, including a 1.8-Mb duplication containing the Gametophyte factor1 locus for unilateral cross-incompatibility, and six inversions of 0.7 Mb or greater. Increased copy number of carotenoid cleavage dioxygenase 1 (ccd1) in A188 is associated with elevated expression during seed development. Highccd1expression in seeds together with low expression of yellow endosperm 1 (y1) reduces carotenoid accumulation, accounting for the white seed phenotype of A188. Furthermore, transcriptome and epigenome analyses reveal enhanced expression of defense pathways and altered DNA methylation patterns of the embryonic callus.

    Conclusions

    The A188 genome assembly provides a high-resolution sequence for a complex genome species and a foundational resource for analyses of genome variation and gene function in maize. The genome, in comparison to B73, contains extensive intra-species structural variations and other genetic differences. Expression and network analyses identify discrete profiles for embryonic callus and other tissues.

     
    more » « less
  4. Pyhäjärvi, T (Ed.)
    Abstract Blackberries (Rubus spp.) are the fourth most economically important berry crop worldwide. Genome assemblies and annotations have been developed for Rubus species in subgenus Idaeobatus, including black raspberry (R. occidentalis), red raspberry (R. idaeus), and R. chingii, but very few genomic resources exist for blackberries and their relatives in subgenus Rubus. Here we present a chromosome-length assembly and annotation of the diploid blackberry germplasm accession “Hillquist” (R. argutus). “Hillquist” is the only known source of primocane-fruiting (annual-fruiting) in tetraploid fresh-market blackberry breeding programs and is represented in the pedigree of many important cultivars worldwide. The “Hillquist” assembly, generated using Pacific Biosciences long reads scaffolded with high-throughput chromosome conformation capture sequencing, consisted of 298 Mb, of which 270 Mb (90%) was placed on 7 chromosome-length scaffolds with an average length of 38.6 Mb. Approximately 52.8% of the genome was composed of repetitive elements. The genome sequence was highly collinear with a novel maternal haplotype-resolved linkage map of the tetraploid blackberry selection A-2551TN and genome assemblies of R. chingii and red raspberry. A total of 38,503 protein-coding genes were predicted, of which 72% were functionally annotated. Eighteen flowering gene homologs within a previously mapped locus aligning to an 11.2 Mb region on chromosome Ra02 were identified as potential candidate genes for primocane-fruiting. The utility of the “Hillquist” genome has been demonstrated here by the development of the first genotyping-by-sequencing-based linkage map of tetraploid blackberry and the identification of possible candidate genes for primocane-fruiting. This chromosome-length assembly will facilitate future studies in Rubus biology, genetics, and genomics and strengthen applied breeding programs. 
    more » « less
  5. Sharakhov, Igor V. (Ed.)
    Rubus idaeus L. (red raspberry), is a perennial woody plant species of the Rosaceae family that is widely cultivated in the temperate regions of world and is thus an economically important soft fruit species. It is prized for its flavour and aroma, as well as a high content of healthful compounds such as vitamins and antioxidants. Breeding programs exist globally for red raspberry, but variety development is a long and challenging process. Genomic and molecular tools for red raspberry are valuable resources for breeding. Here, a chromosome-length genome sequence assembly and related gene predictions for the red raspberry cultivar ‘Anitra’ are presented, comprising PacBio long read sequencing scaffolded using Hi-C sequence data. The assembled genome sequence totalled 291.7 Mbp, with 247.5 Mbp (84.8%) incorporated into seven sequencing scaffolds with an average length of 35.4 Mbp. A total of 39,448 protein-coding genes were predicted, 75% of which were functionally annotated. The seven chromosome scaffolds were anchored to a previously published genetic linkage map with a high degree of synteny and comparisons to genomes of closely related species within the Rosoideae revealed chromosome-scale rearrangements that have occurred over relatively short evolutionary periods. A chromosome-level genomic sequence of R . idaeus will be a valuable resource for the knowledge of its genome structure and function in red raspberry and will be a useful and important resource for researchers and plant breeders. 
    more » « less