skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: GrameneOryza: a comprehensive resource for Oryza genomes, genetic variation, and functional data
Abstract Rice is a vital staple crop, sustaining over half of the global population, and is a key model for genetic research. To support the growing need for comprehensive and accessible rice genomic data, GrameneOryza (https://oryza.gramene.org) was developed as an online resource adhering to FAIR (Findable, Accessible, Interoperable, and Reusable) principles of data management. It distinguishes itself through its comprehensive multispecies focus, encompassing a wide variety of Oryza genomes and related species, and its integration with FAIR principles to ensure data accessibility and usability. It offers a community curated selection of high-quality Oryza genomes, genetic variation, gene function, and trait data. The latest release, version 8, includes 28 Oryza genomes, covering wild rice and domesticated cultivars. These genomes, along with Leersia perrieri and seven additional outgroup species, form the basis for 38 K protein-coding gene family trees, essential for identifying orthologs, paralogs, and developing pan-gene sets. GrameneOryza’s genetic variation data features 66 million single-nucleotide variants (SNVs) anchored to the Os-Nipponbare-Reference-IRGSP-1.0 genome, derived from various studies, including the Rice Genome 3 K (RG3K) project. The RG3K sequence reads were also mapped to seven additional platinum-quality Asian rice genomes, resulting in 19 million SNVs for each genome, significantly expanding the coverage of genetic variation beyond the Nipponbare reference. Of the 66 million SNVs on IRGSP-1.0, 27 million acquired standardized reference SNP cluster identifiers (rsIDs) from the European Variation Archive release v5. Additionally, 1200 distinct phenotypes provide a comprehensive overview of quantitative trait loci (QTL) features. The newly introduced Oryza CLIMtools portal offers insights into environmental impacts on genome adaptation. The platform’s integrated search interface, along with a BLAST server and curation tools, facilitates user access to genomic, phylogenetic, gene function, and QTL data, supporting broad research applications. Database URL: https://oryza.gramene.org  more » « less
Award ID(s):
1950621 2029854 2122357 2122358
PAR ID:
10620775
Author(s) / Creator(s):
; ; ; ; ; ; ; ; ; ; ;
Publisher / Repository:
Oxford University Press
Date Published:
Journal Name:
Database
Volume:
2025
ISSN:
1758-0463
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract Plant Reactome (https://plantreactome.gramene.org) is a freely accessible, comprehensive plant pathway knowledgebase. It provides curated reference pathways from rice (Oryza sativa) and gene-orthology-based pathway projections to 129 additional species, spanning single-cell photoautotrophs, non-vascular plants, and higher plants, thus encompassing a wide-ranging taxonomic diversity. Currently, Plant Reactome houses a collection of 339 reference pathways, covering metabolic and transport pathways, hormone signaling, genetic regulations of developmental processes, and intricate transcriptional networks that orchestrate a plant's response to abiotic and biotic stimuli. Beyond being a mere repository, Plant Reactome serves as a dynamic data discovery platform. Users can analyze and visualize omics data, such as gene expression, gene-gene interaction, proteome, and metabolome data, all within the rich context of plant pathways. Plant Reactome is dedicated to fostering data interoperability, upholding global data standards, and embracing the tenets of the Findable, Accessible, Interoperable and Re-usable (FAIR) data policy. 
    more » « less
  2. Abstract The genetic basis of traits shapes and constrains how adaptation proceeds in nature; rapid adaptation can proceed using stores of polygenic standing genetic variation or hard selective sweeps, and increasing polygenicity fuels genetic redundancy, reducing gene re-use (genetic convergence). Guppy life history traits evolve rapidly and convergently among natural high- and low-predation environments in northern Trinidad. This system has been studied extensively at the phenotypic level, but little is known about the underlying genetic architecture. Here, we use four independent F2 QTL crosses to examine the genetic basis of seven (five female, two male) guppy life history phenotypes and discuss how these genetic architectures may facilitate or constrain rapid adaptation and convergence. We use RAD-sequencing data (16,539 SNPs) from 370 male and 267 female F2 individuals. We perform linkage mapping, estimates of genome-wide and per-chromosome heritability (multi-locus associations), and QTL mapping (single-locus associations). Our results are consistent with architectures of many loci of small-effect for male age and size at maturity and female interbrood period. Male trait associations are clustered on specific chromosomes, but female interbrood period exhibits a weak genome-wide signal suggesting a potentially highly polygenic component. Offspring weight and female size at maturity are also associated with a single significant QTL each. These results suggest rapid, repeatable phenotypic evolution of guppies may be facilitated by polygenic trait architectures, but subsequent genetic redundancy may limit gene re-use across populations, in agreement with an absence of strong signatures of genetic convergence from recent analyses of wild guppies. 
    more » « less
  3. INTRODUCTION One of the central applications of the human reference genome has been to serve as a baseline for comparison in nearly all human genomic studies. Unfortunately, many difficult regions of the reference genome have remained unresolved for decades and are affected by collapsed duplications, missing sequences, and other issues. Relative to the current human reference genome, GRCh38, the Telomere-to-Telomere CHM13 (T2T-CHM13) genome closes all remaining gaps, adds nearly 200 million base pairs (Mbp) of sequence, corrects thousands of structural errors, and unlocks the most complex regions of the human genome for scientific inquiry. RATIONALE We demonstrate how the T2T-CHM13 reference genome universally improves read mapping and variant identification in a globally diverse cohort. This cohort includes all 3202 samples from the expanded 1000 Genomes Project (1KGP), sequenced with short reads, as well as 17 globally diverse samples sequenced with long reads. By applying state-of-the-art methods for calling single-nucleotide variants (SNVs) and structural variants (SVs), we document the strengths and limitations of T2T-CHM13 relative to its predecessors and highlight its promise for revealing new biological insights within technically challenging regions of the genome. RESULTS Across the 1KGP samples, we found more than 1 million additional high-quality variants genome-wide using T2T-CHM13 than with GRCh38. Within previously unresolved regions of the genome, we identified hundreds of thousands of variants per sample—a promising opportunity for evolutionary and biomedical discovery. T2T-CHM13 improves the Mendelian concordance rate among trios and eliminates tens of thousands of spurious SNVs per sample, including a reduction of false positives in 269 challenging, medically relevant genes by up to a factor of 12. These corrections are in large part due to improvements to 70 protein-coding genes in >9 Mbp of inaccurate sequence caused by falsely collapsed or duplicated regions in GRCh38. Using the T2T-CHM13 genome also yields a more comprehensive view of SVs genome-wide, with a greatly improved balance of insertions and deletions. Finally, by providing numerous resources for T2T-CHM13 (including 1KGP genotypes, accessibility masks, and prominent annotation databases), our work will facilitate the transition to T2T-CHM13 from the current reference genome. CONCLUSION The vast improvements in variant discovery across samples of diverse ancestries position T2T-CHM13 to succeed as the next prevailing reference for human genetics. T2T-CHM13 thus offers a model for the construction and study of high-quality reference genomes from globally diverse individuals, such as is now being pursued through collaboration with the Human Pangenome Reference Consortium. As a foundation, our work underscores the benefits of an accurate and complete reference genome for revealing diversity across human populations. Genomic features and resources available for T2T-CHM13. Comparisons to GRCh38 reveal broad improvements in SNVs, indels, and SVs discovered across diverse human populations by means of short-read (1KGP) and long-read sequencing (LRS). These improvements are due to resolution of complex genomic loci (nonsyntenic and previously unresolved), duplication errors, and discordant haplotypes, including those in medically relevant genes. 
    more » « less
  4. Summary Eukaryotic genomes harbor many forms of variation, including nucleotide diversity and structural polymorphisms, which experience natural selection and contribute to genome evolution and biodiversity. However, harnessing this variation for agriculture hinges on our ability to detect, quantify, catalog, and utilize genetic diversity.Here, we explore seven complete genomes of the emerging biofuel crop pennycress (Thlaspi arvense) drawn from across the species’s current genetic diversity to catalogue variation in genome structure and content.Across this new pangenome resource, we find contrasting evolutionary modes in different genomic regions. Gene-poor, repeat-rich pericentromeric regions experience frequent rearrangements, including repeated centromere repositioning. In contrast, conserved gene-dense chromosome arms maintain large-scale synteny across accessions, even in fast-evolving immune genes where microsynteny breaks down across species but the macrosynteny of gene cluster positioning is maintained.Our findings highlight that multiple elements of the genome experience dynamic evolution that conserves functional content on the chromosome scale but allows rearrangement and presence-absence variation on a local scale. This diversity is invisible to classical reference-based approaches and highlights the strength and utility of pangenomic resources. These results provide a valuable case study of rapid genomic structural evolution within a species and powerful resources for crop development in an emerging biofuel crop. 
    more » « less
  5. Modern crop varieties display a degree of mismatch between their current distributions and the suitability of the local climate for their productivity. To address this issue, we present Oryza CLIMtools (https:// gramene.org/CLIMtools/oryza_v1.0/), the first resource for pan-genome prediction of climate-associated genetic variants in a crop species. Oryza CLIMtools consists of interactive web-based databases that enable the user to (1) explore the local environments of traditional rice varieties (landraces) in South- East Asia and (2) investigate the environment by genome associations for 658 Indica and 283 Japonica rice landrace accessions collected from georeferenced local environments and included in the 3K Rice Genomes Project. We demonstrate the value of these resources by identifying an interplay between flowering time and temperature in the local environment that is facilitated by adaptive natural variation in OsHD2 and disrupted by a natural variant in OsSOC1. Prior quantitative trait locus analysis has suggested the importance of heterotrimeric G proteins in the control of agronomic traits. Accordingly, we analyzed the climate associations of natural variants in the different heterotrimeric G protein subunits. We identified a coordinated role of G proteins in adaptation to the prevailing potential evapotranspiration gradient and revealed their regulation of key agronomic traits, including plant height and seed and panicle length. We conclude by highlighting the prospect of targeting heterotrimeric G proteins to produce climate-resilient crops. 
    more » « less