skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Efficient curation of genebanks using next generation sequencing reveals substantial duplication of germplasm accessions
Abstract Genebanks are valuable resources for crop improvement through the acquisition,ex-situconservation and sharing of unique germplasm among plant breeders and geneticists. With over seven million existing accessions and increasing storage demands and costs, genebanks need efficient characterization and curation to make them more accessible and usable and to reduce operating costs, so that the crop improvement community can most effectively leverage this vast resource of untapped novel genetic diversity. However, the sharing and inconsistent documentation of germplasm often results in unintentionally duplicated collections with poor characterization and many identical accessions that can be hard or impossible to identify without passport information and unmatched accession identifiers. Here we demonstrate the use of genotypic information from these accessions using a cost-effective next generation sequencing platform to find and remove duplications. We identify and characterize over 50% duplicated accessions both within and across genebank collections ofAegilops tauschii, an important wild relative of wheat and source of genetic diversity for wheat improvement. We present a pipeline to identify and remove identical accessions within and among genebanks and curate globally unique accessions. We also show how this approach can also be applied to future collection efforts to avoid the accumulation of identical material. When coordinated across global genebanks, this approach will ultimately allow for cost effective and efficient management of germplasm and better stewarding of these valuable resources.  more » « less
Award ID(s):
1822162
PAR ID:
10153345
Author(s) / Creator(s):
; ; ; ; ; ; ; ; ; ;
Publisher / Repository:
Nature Publishing Group
Date Published:
Journal Name:
Scientific Reports
Volume:
9
Issue:
1
ISSN:
2045-2322
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Genetic diversity found in crop wild relatives is critical to preserve and utilize for crop improvement to achieve sustainable food production amid climate change and increased demand. We genetically characterized a large collection of 1,041Aegilopsaccessions distributed among 23 different species using more than 45K single nucleotide polymorphisms identified by genotyping-by-sequencing. The Wheat Genetics Resource Center (WGRC)Aegilopsgermplasm collection was curated through the identification of misclassified and redundant accessions. There were 49 misclassified and 28 sets of redundant accessions within the four diploid species. The curated germplasm sets now have improved utility for genetic studies and wheat improvement. We constructed a phylogenetic tree and principal component analysis cluster for allAegilopsspecies together, giving one of the most comprehensive views ofAegilops. TheSitopsissection and the U genomeAegilopsclade were further scrutinized with in-depth population analysis. The genetic relatedness among the pair ofAegilopsspecies provided strong evidence for the species evolution, speciation, and diversification. We inferred genome symbols for two speciesAe.neglectaandAe.columnarisbased on the sequence read mapping and the presence of segregating loci on the pertinent genomes as well as genetic clustering. The high genetic diversity observed amongAegilopsspecies indicated that the genus could play an even greater role in providing the critical need for untapped genetic diversity for future wheat breeding and improvement. To fully characterize theseAegilopsspecies, there is an urgent need to generate reference assemblies for these wild wheats, especially for the polyploidAegilops. 
    more » « less
  2. Genetic diversity found in crop wild relatives is critical to preserve and utilize for crop improvement to achieve sustainable food production amid climate change and increased demand. We genetically characterized a large collection of 1,041Aegilopsaccessions distributed among 23 different species using more than 45K single nucleotide polymorphisms identified by genotyping-by-sequencing. The Wheat Genetics Resource Center (WGRC)Aegilopsgermplasm collection was curated through the identification of misclassified and redundant accessions. There were 49 misclassified and 28 sets of redundant accessions within the four diploid species. The curated germplasm sets now have improved utility for genetic studies and wheat improvement. We constructed a phylogenetic tree and principal component analysis cluster for allAegilopsspecies together, giving one of the most comprehensive views ofAegilops. TheSitopsissection and the U genomeAegilopsclade were further scrutinized with in-depth population analysis. The genetic relatedness among the pair ofAegilopsspecies provided strong evidence for the species evolution, speciation, and diversification. We inferred genome symbols for two speciesAe.neglectaandAe.columnarisbased on the sequence read mapping and the presence of segregating loci on the pertinent genomes as well as genetic clustering. The high genetic diversity observed amongAegilopsspecies indicated that the genus could play an even greater role in providing the critical need for untapped genetic diversity for future wheat breeding and improvement. To fully characterize theseAegilopsspecies, there is an urgent need to generate reference assemblies for these wild wheats, especially for the polyploidAegilops. 
    more » « less
  3. Abstract A-genome diploid wheats represent the earliest domesticated and cultivated wheat species in the Fertile Crescent and include the donor of the wheat A sub-genome. The A-genome species encompass the cultivated einkorn (Triticum monococcum L. subsp. monococcum), wild einkorn (T. monococcum L. subsp. aegilopoides (Link) Thell.), and Triticum urartu. We evaluated the collection of 930 accessions in the Wheat Genetics Resource Center (WGRC) using genotyping by sequencing and identified 13,860 curated single-nucleotide polymorphisms. Genomic analysis detected misclassified and genetically identical (>99%) accessions, with most of the identical accessions originating from the same or nearby locations. About 56% (n = 520) of the WGRC A-genome species collections were genetically identical, supporting the need for genomic characterization for effective curation and maintenance of these collections. Population structure analysis confirmed the morphology-based classifications of the accessions and reflected the species geographic distributions. We also showed that T. urartu is the closest A-genome diploid to the A-subgenome in common wheat (Triticum aestivum L.) through phylogenetic analysis. Population analysis within the wild einkorn group showed three genetically distinct clusters, which corresponded with wild einkorn races α, β, and γ described previously. The T. monococcum genome-wide FST scan identified candidate genomic regions harboring a domestication selection signature at the Non-brittle rachis 1 (Btr1) locus on the short arm of chromosome 3Am at ∼70 Mb. We established an A-genome core set (79 accessions) based on allelic diversity, geographical distribution, and available phenotypic data. The individual species core set maintained at least 79% of allelic variants in the A-genome collection and constituted a valuable genetic resource to improve wheat and domesticated einkorn in breeding programs. 
    more » « less
  4. Societal Impact StatementCrop genetic resources, particularly seeds held in ex situ germplasm collections, have enormous value in breeding climate‐resilient crops. Much of this value accrues from information associated with germplasm accessions. Here, we argue that flavor, culinary attributes, and other traditional ecological knowledge (TEK) are important characteristics alongside genomic information and high‐throughput phenotypes. We explore both the value of this information and the potential risks of exploitation of sensitive TEK. We also examine the potential of in situ conservation to preserve not just the genetic diversity of crops, but the TEK associated with them. SummaryCrop genetic diversity is essential for meeting the challenges posed to agriculture by a rapidly changing climate. Harnessing that diversity requires well‐organized information, often held by ex situ genebanks and associated databases. However, the characterization of crop germplasm often lacks information on its cultural and culinary background, specifically its flavor or taste. For most crops, characterization data is lacking, but when it is present it is more likely to include whole genome information, high‐throughput estimation of growth characteristics, and chemical profiles indicating flavor rather than details on the dishes for which particular varieties are favored or how smallholder farms have grown particular accessions. This loss of cultural and culinary information, and the broader loss of traditional ecological knowledge (TEK), is more than just missing information. It is a loss of legacy when landraces are no longer grown by the communities that developed them. In the face of climate change, TEK has great value for developing more sustainable or resilient practices. And with increasingly global palettes, we must balance consumers enjoying dishes from new crops with the appropriation of culturally meaningful foods. Our aim here is to explore this flavor gap, to understand the risks in sharing data and the benefits of honoring long‐established uses. We emphasize the importance of ensuring the fair representation of diverse peoples in genebanks and consider both ex situ and in situ conservation approaches. Finally, we analyze the impact of modern breeding choices on culinary diversity, emphasizing the preservation of ancestral knowledge and flavor profiles. 
    more » « less
  5. Phenotypic evaluation and efficient utilization of germplasm collections can be time-intensive, laborious, and expensive. However, with the plummeting costs of next-generation sequencing and the addition of genomic selection to the plant breeder’s toolbox, we now can more efficiently tap the genetic diversity within large germplasm collections. In this study, we applied and evaluated genomic prediction’s potential to a set of 482 pea ( Pisum sativum L.) accessions—genotyped with 30,600 single nucleotide polymorphic (SNP) markers and phenotyped for seed yield and yield-related components—for enhancing selection of accessions from the USDA Pea Germplasm Collection. Genomic prediction models and several factors affecting predictive ability were evaluated in a series of cross-validation schemes across complex traits. Different genomic prediction models gave similar results, with predictive ability across traits ranging from 0.23 to 0.60, with no model working best across all traits. Increasing the training population size improved the predictive ability of most traits, including seed yield. Predictive abilities increased and reached a plateau with increasing number of markers presumably due to extensive linkage disequilibrium in the pea genome. Accounting for population structure effects did not significantly boost predictive ability, but we observed a slight improvement in seed yield. By applying the best genomic prediction model (e.g., RR-BLUP), we then examined the distribution of genotyped but nonphenotyped accessions and the reliability of genomic estimated breeding values (GEBV). The distribution of GEBV suggested that none of the nonphenotyped accessions were expected to perform outside the range of the phenotyped accessions. Desirable breeding values with higher reliability can be used to identify and screen favorable germplasm accessions. Expanding the training set and incorporating additional orthogonal information (e.g., transcriptomics, metabolomics, physiological traits, etc.) into the genomic prediction framework can enhance prediction accuracy. 
    more » « less