skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: An improved reference of the grapevine genome reasserts the origin of the PN40024 highly homozygous genotype
Abstract The genome sequence of the diploid and highly homozygous Vitis vinifera genotype PN40024 serves as the reference for many grapevine studies. Despite several improvements to the PN40024 genome assembly, its current version PN12X.v2 is quite fragmented and only represents the haploid state of the genome with mixed haplotypes. In fact, being nearly homozygous, this genome contains several heterozygous regions that are yet to be resolved. Taking the opportunity of improvements that long-read sequencing technologies offer to fully discriminate haplotype sequences, an improved version of the reference, called PN40024.v4, was generated. Through incorporating long genomic sequencing reads to the assembly, the continuity of the 12X.v2 scaffolds was highly increased with a total number decreasing from 2,059 to 640 and a reduction in N bases of 88%. Additionally, the full alternative haplotype sequence was built for the first time, the chromosome anchoring was improved and the number of unplaced scaffolds was reduced by half. To obtain a high-quality gene annotation that outperforms previous versions, a liftover approach was complemented with an optimized annotation workflow for Vitis. Integration of the gene reference catalogue and its manual curation have also assisted in improving the annotation, while defining the most reliable estimation of 35,230 genes to date. Finally, we demonstrated that PN40024 resulted from 9 selfings of cv. “Helfensteiner” (cross of cv. “Pinot noir” and “Schiava grossa”) instead of a single “Pinot noir”. These advances will help maintain the PN40024 genome as a gold-standard reference, also contributing toward the eventual elaboration of the grapevine pangenome.  more » « less
Award ID(s):
1950621
PAR ID:
10497413
Author(s) / Creator(s):
; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ;
Editor(s):
Whiteman, N
Publisher / Repository:
Oxford University Press
Date Published:
Journal Name:
G3: Genes, Genomes, Genetics
Volume:
13
Issue:
5
ISSN:
2160-1836
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Ingvarsson, P (Ed.)
    Abstract Eucalyptus grandis is a hardwood tree used worldwide as pure species or hybrid partner to breed fast-growing plantation forestry crops that serve as feedstocks of timber and lignocellulosic biomass for pulp, paper, biomaterials, and biorefinery products. The current v2.0 genome reference for the species served as the first reference for the genus and has helped drive the development of molecular breeding tools for eucalypts. Using PacBio HiFi long reads and Omni-C proximity ligation sequencing, we produced an improved, haplotype-phased assembly (v4.0) for TAG0014, an early-generation selection of E. grandis. The 2 haplotypes are 571 Mbp (HAP1) and 552 Mbp (HAP2) in size and consist of 37 and 46 contigs scaffolded onto 11 chromosomes (contig N50 of 28.9 and 16.7 Mbp), respectively. These haplotype assemblies are 70–90 Mbp smaller than the diploid v2.0 assembly but capture all except one of the 22 telomeres, suggesting that substantial redundant sequence was included in the previous assembly. A total of 35,929 (HAP1) and 35,583 (HAP2) gene models were annotated, of which 438 and 472 contain long introns (>10 kbp) in gene models previously (v2.0) identified as multiple smaller genes. These and other improvements have increased gene annotation completeness levels from 93.8 to 99.4% in the v4.0 assembly. We found that 6,493 and 6,346 genes are within tandem duplicate arrays (HAP1 and HAP2, respectively, 18.4 and 17.8% of the total) and >43.8% of the haplotype assemblies consists of repeat elements. Analysis of synteny between the haplotypes and the E. grandis v2.0 reference genome revealed extensive regions of collinearity, but also some major rearrangements, and provided a preview of population and pangenome variation in the species. 
    more » « less
  2. Holland, J. (Ed.)
    Abstract Douglas-fir (Pseudotsuga menziesii) is native to western North America. It grows in a wide range of environmental conditions and is an important timber tree. Although there are several studies on the gene expression responses of Douglas-fir to abiotic cues, the absence of high-quality transcriptome and genome data is a barrier to further investigation. Like for most conifers, the available transcriptome and genome reference dataset for Douglas-fir remains fragmented and requires refinement. We aimed to generate a highly accurate, and complete reference transcriptome and genome annotation. We deep-sequenced the transcriptome of Douglas-fir needles from seedlings that were grown under nonstress control conditions or a combination of heat and drought stress conditions using long-read (LR) and short-read (SR) sequencing platforms. We used 2 computational approaches, namely de novo and genome-guided LR transcriptome assembly. Using the LR de novo assembly, we identified 1.3X more high-quality transcripts, 1.85X more “complete” genes, and 2.7X more functionally annotated genes compared to the genome-guided assembly approach. We predicted 666 long noncoding RNAs and 12,778 unique protein-coding transcripts including 2,016 putative transcription factors. We leveraged the LR de novo assembled transcriptome with paired-end SR and a published single-end SR transcriptome to generate an improved genome annotation. This was conducted with BRAKER2 and refined based on functional annotation, repetitive content, and transcriptome alignment. This high-quality genome annotation has 51,419 unique gene models derived from 322,631 initial predictions. Overall, our informatics approach provides a new reference Douglas-fir transcriptome assembly and genome annotation with considerably improved completeness and functional annotation. 
    more » « less
  3. Harris, T (Ed.)
    Abstract Potato is a key food crop with a complex, polyploid genome. Advancements in sequencing technologies coupled with improvements in genome assembly algorithms have enabled generation of phased, chromosome-scale genome assemblies for cultivated tetraploid potato. The SpudDB database houses potato genome sequence and annotation, with the doubled monoploid DM 1–3 516 R44 (hereafter DM) genome serving as the reference genome and haplotype. Diverse annotation data types for DM genes are provided through a suite of Gene Report Pages including gene expression profiles across 438 potato samples. To further annotate potato genes based on expression, 65 gene co-expression modules were constructed that permit the identification of tightly co-regulated genes within DM across development and responses to wounding, abiotic stress, and biotic stress. Genome browser views of DM and 28 other potato genomes are provided along with a download page for genome sequence and annotation. To link syntenic genes within and between haplotypes, syntelogs were identified across 25 cultivated potato genomes. Through access to potato genome sequences and associated annotations, SpudDB can enable potato biologists, geneticists, and breeders to continue to improve this key food crop. 
    more » « less
  4. null (Ed.)
    Legumes are of great interest for sustainable agricultural production as they fix atmospheric nitrogen to improve the soil. Medicago truncatula is a well-established model legume, and extensive studies in fundamental molecular, physiological, and developmental biology have been undertaken to translate into trait improvements in economically important legume crops worldwide. However, M. truncatula reference genome was generated in the accession Jemalong A17, which is highly recalcitrant to transformation. M. truncatula R108 is more attractive for genetic studies due to its high transformation efficiency and Tnt1-insertion population resource for functional genomics. The need to perform accurate synteny analysis and comprehensive genome-scale comparisons necessitates a chromosome-length genome assembly for M. truncatula cv. R108. Here, we performed in situ Hi-C (48×) to anchor, order, orient scaffolds, and correct misjoins of contigs in a previously published genome assembly (R108 v1.0), resulting in an improved genome assembly containing eight chromosome-length scaffolds that span 97.62% of the sequenced bases in the input assembly. The long-range physical information data generated using Hi-C allowed us to obtain a chromosome-length ordering of the genome assembly, better validate previous draft misjoins, and provide further insights accurately predicting synteny between A17 and R108 regions corresponding to the known chromosome 4/8 translocation. Furthermore, mapping the Tnt1 insertion landscape on this reference assembly presents an important resource for M. truncatula functional genomics by supporting efficient mutant gene identification in Tnt1 insertion lines. Our data provide a much-needed foundational resource that supports functional and molecular research into the Leguminosae for sustainable agriculture and feeding the future. 
    more » « less
  5. The western painted turtle, Chrysemys picta bellii, has the greatest tolerance to anoxia of any tetrapod studied to date. These turtles reside in the northern United States and southern Canada, and survive months of anoxia while submerged in ice-locked ponds and bogs. Reference genomes provide an important resource for elucidating the molecular bases for such unique physiological traits. An initial reference genome for this species was published in 2013, but the assembly is highly fragmented which poses several limitations for downstream analyses and biological interpretation. In this study, we created a new and improved assembly by combining PacBio HiFi, 10x Genomics Chromium, Hi-C sequence data and BioNano optical mapping derived from a single individual to generate a new haplotype-resolved chromosome-level assembly for C. picta bellii, called SLU_Cpb5.0. The genome size of the primary assembly is 2.372 Gb with a scaffold N50 of 133.6 Mb, which is a 6.5-fold improvement over the existing assembly. Genome annotation of SLU_Cpb5.0 revealed 12,242 novel genes compared to previous assemblies. Our PacBio Iso-Seq RNA sequencing data for twelve tissues unraveled over 100,000 novel transcript isoforms and 4,325 novel genes that were not annotated by standard NCBI pipeline. We also observed distinct patterns of tissue-specific isoform expression, creating a robust foundation for future characterization of the functions of these genes. The improved genome assembly and annotation will facilitate comparative genomics studies to better understand the genetic basis of C.picta bellii's extreme physiological adaptations and other aspects of its biology. 
    more » « less