skip to main content

Title: Efficient curation of genebanks using next generation sequencing reveals substantial duplication of germplasm accessions

Genebanks are valuable resources for crop improvement through the acquisition,ex-situconservation and sharing of unique germplasm among plant breeders and geneticists. With over seven million existing accessions and increasing storage demands and costs, genebanks need efficient characterization and curation to make them more accessible and usable and to reduce operating costs, so that the crop improvement community can most effectively leverage this vast resource of untapped novel genetic diversity. However, the sharing and inconsistent documentation of germplasm often results in unintentionally duplicated collections with poor characterization and many identical accessions that can be hard or impossible to identify without passport information and unmatched accession identifiers. Here we demonstrate the use of genotypic information from these accessions using a cost-effective next generation sequencing platform to find and remove duplications. We identify and characterize over 50% duplicated accessions both within and across genebank collections ofAegilops tauschii, an important wild relative of wheat and source of genetic diversity for wheat improvement. We present a pipeline to identify and remove identical accessions within and among genebanks and curate globally unique accessions. We also show how this approach can also be applied to future collection efforts to avoid the accumulation of identical material. When coordinated across global genebanks, this approach will ultimately allow for cost effective and efficient management of germplasm and better stewarding of these valuable resources.

more » « less
Award ID(s):
Author(s) / Creator(s):
; ; ; ; ; ; ; ; ; ;
Publisher / Repository:
Nature Publishing Group
Date Published:
Journal Name:
Scientific Reports
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Phenotypic evaluation and efficient utilization of germplasm collections can be time-intensive, laborious, and expensive. However, with the plummeting costs of next-generation sequencing and the addition of genomic selection to the plant breeder’s toolbox, we now can more efficiently tap the genetic diversity within large germplasm collections. In this study, we applied and evaluated genomic prediction’s potential to a set of 482 pea ( Pisum sativum L.) accessions—genotyped with 30,600 single nucleotide polymorphic (SNP) markers and phenotyped for seed yield and yield-related components—for enhancing selection of accessions from the USDA Pea Germplasm Collection. Genomic prediction models and several factors affecting predictive ability were evaluated in a series of cross-validation schemes across complex traits. Different genomic prediction models gave similar results, with predictive ability across traits ranging from 0.23 to 0.60, with no model working best across all traits. Increasing the training population size improved the predictive ability of most traits, including seed yield. Predictive abilities increased and reached a plateau with increasing number of markers presumably due to extensive linkage disequilibrium in the pea genome. Accounting for population structure effects did not significantly boost predictive ability, but we observed a slight improvement in seed yield. By applying the best genomic prediction model (e.g., RR-BLUP), we then examined the distribution of genotyped but nonphenotyped accessions and the reliability of genomic estimated breeding values (GEBV). The distribution of GEBV suggested that none of the nonphenotyped accessions were expected to perform outside the range of the phenotyped accessions. Desirable breeding values with higher reliability can be used to identify and screen favorable germplasm accessions. Expanding the training set and incorporating additional orthogonal information (e.g., transcriptomics, metabolomics, physiological traits, etc.) into the genomic prediction framework can enhance prediction accuracy. 
    more » « less
  2. Abstract A-genome diploid wheats represent the earliest domesticated and cultivated wheat species in the Fertile Crescent and include the donor of the wheat A sub-genome. The A-genome species encompass the cultivated einkorn (Triticum monococcum L. subsp. monococcum), wild einkorn (T. monococcum L. subsp. aegilopoides (Link) Thell.), and Triticum urartu. We evaluated the collection of 930 accessions in the Wheat Genetics Resource Center (WGRC) using genotyping by sequencing and identified 13,860 curated single-nucleotide polymorphisms. Genomic analysis detected misclassified and genetically identical (>99%) accessions, with most of the identical accessions originating from the same or nearby locations. About 56% (n = 520) of the WGRC A-genome species collections were genetically identical, supporting the need for genomic characterization for effective curation and maintenance of these collections. Population structure analysis confirmed the morphology-based classifications of the accessions and reflected the species geographic distributions. We also showed that T. urartu is the closest A-genome diploid to the A-subgenome in common wheat (Triticum aestivum L.) through phylogenetic analysis. Population analysis within the wild einkorn group showed three genetically distinct clusters, which corresponded with wild einkorn races α, β, and γ described previously. The T. monococcum genome-wide FST scan identified candidate genomic regions harboring a domestication selection signature at the Non-brittle rachis 1 (Btr1) locus on the short arm of chromosome 3Am at ∼70 Mb. We established an A-genome core set (79 accessions) based on allelic diversity, geographical distribution, and available phenotypic data. The individual species core set maintained at least 79% of allelic variants in the A-genome collection and constituted a valuable genetic resource to improve wheat and domesticated einkorn in breeding programs. 
    more » « less
  3. Abstract Background

    Plants have complex and dynamic immune systems that have evolved to resist pathogens. Humans have worked to enhance these defenses in crops through breeding. However, many crops harbor only a fraction of the genetic diversity present in wild relatives. Increased utilization of diverse germplasm to search for desirable traits, such as disease resistance, is therefore a valuable step towards breeding crops that are adapted to both current and emerging threats. Here, we examine diversity of defense responses across four populations of the long-generation tree cropTheobroma cacaoL., as well as four non-cacaoTheobromaspecies, with the goal of identifying genetic elements essential for protection against the oomycete pathogenPhytophthora palmivora.


    We began by creating a new, highly contiguous genome assembly for theP. palmivora-resistant genotype SCA 6 (Additional file 1: Tables S1-S5), deposited in GenBank under accessions CP139290-CP139299. We then used this high-quality assembly to combine RNA and whole-genome sequencing data to discover several genes and pathways associated with resistance. Many of these are unique, i.e., differentially regulated in only one of the four populations (diverged 40 k–900 k generations). Among the pathways shared across all populations is phenylpropanoid biosynthesis, a metabolic pathway with well-documented roles in plant defense. One gene in this pathway, caffeoyl shikimate esterase (CSE), was upregulated across all four populations following pathogen treatment, indicating its broad importance for cacao’s defense response. Further experimental evidence suggests this gene hydrolyzes caffeoyl shikimate to create caffeic acid, an antimicrobial compound and known inhibitor ofPhytophthora spp.


    Our results indicate most expression variation associated with resistance is unique to populations. Moreover, our findings demonstrate the value of using a broad sample of evolutionarily diverged populations for revealing the genetic bases of cacao resistance toP. palmivora. This approach has promise for further revealing and harnessing valuable genetic resources in this and other long-generation plants.

    more » « less
  4. Summary

    We report reference‐quality genome assemblies and annotations for two accessions of soybean (Glycine max) and for one accession ofGlycine soja, the closest wild relative ofG. max. TheG. maxassemblies provided are for widely used US cultivars: the northern line Williams 82 (Wm82) and the southern line Lee. The Wm82 assembly improves the prior published assembly, and the Lee andG. sojaassemblies are new for these accessions. Comparisons among the three accessions show generally high structural conservation, but nucleotide difference of 1.7 single‐nucleotide polymorphisms (snps) per kb between Wm82 and Lee, and 4.7 snps per kb between these lines andG. soja.snpdistributions and comparisons with genotypes of the Lee and Wm82 parents highlight patterns of introgression and haplotype structure. Comparisons against the US germplasm collection show placement of the sequenced accessions relative to global soybean diversity. Analysis of a pan‐gene collection shows generally high conservation, with variation occurring primarily in genomically clustered gene families. We found approximately 40–42 inversions per chromosome between either Lee or Wm82v4 andG. soja, and approximately 32 inversions per chromosome between Wm82 and Lee. We also investigated five domestication loci. For each locus, we found two different alleles with functional differences betweenG. sojaand the two domesticated accessions. The genome assemblies for multiple cultivated accessions and for the closest wild ancestor of soybean provides a valuable set of resources for identifying causal variants that underlie traits for the domestication and improvement of soybean, serving as a basis for future research and crop improvement efforts for this important crop species.

    more » « less
  5. Abstract BACKGROUND

    The wheat stem sawfly (WSS,Cephus cinctus) is a major pest of wheat (Triticum aestivum) and can cause significant yield losses. WSS damage results from stem boring and/or cutting, leading to the lodging of wheat plants. Although solid‐stem wheat genotypes can effectively reduce larval survival, they may have lower yields than hollow‐stem genotypes and show inconsistent solidness expression. Because of limited resistance sources to WSS, evaluating diverse wheat germplasm for novel resistance genes is crucial. We evaluated 91 accessions across five wild wheat species (Triticum monococcum,T. urartu,T. turgidum,T. timopheevii, andAegilops tauschii) and common wheat cultivars (T. aestivum) for antixenosis (host selection) and antibiosis (host suitability) to WSS. Host selection was measured as the number of eggs after adult oviposition, and host suitability was determined by examining the presence or absence of larval infestation within the stem. The plants were grown in the greenhouse and brought to the field for WSS infestation. In addition, a phylogenetic analysis was performed to determine the relationship between the WSS traits and phylogenetic clustering.


    Overall,Ae. tauschii,T. turgidumandT. urartuhad lower egg counts and larval infestation thanT. monococcum, andT. timopheevii.T. monococcum,T. timopheevii,T. turgidum, andT. urartuhad lower larval weights compared withT. aestivum.


    This study shows that wild relatives of wheat could be a valuable source of alleles for enhancing resistance to WSS and identifies specific germplasm resources that may be useful for breeding. © 2024 The Authors.Pest Management Sciencepublished by John Wiley & Sons Ltd on behalf of Society of Chemical Industry.

    more » « less