skip to main content


Title: Prediction of evolutionary constraint by genomic annotations improves functional prioritization of genomic variants in maize
Abstract Background

Crop improvement through cross-population genomic prediction and genome editing requires identification of causal variants at high resolution, within fewer than hundreds of base pairs. Most genetic mapping studies have generally lacked such resolution. In contrast, evolutionary approaches can detect genetic effects at high resolution, but they are limited by shifting selection, missing data, and low depth of multiple-sequence alignments. Here we use genomic annotations to accurately predict nucleotide conservation across angiosperms, as a proxy for fitness effect of mutations.

Results

Using only sequence analysis, we annotate nonsynonymous mutations in 25,824 maize gene models, with information from bioinformatics and deep learning. Our predictions are validated by experimental information: within-species conservation, chromatin accessibility, and gene expression. According to gene ontology and pathway enrichment analyses, predicted nucleotide conservation points to genes in central carbon metabolism. Importantly, it improves genomic prediction for fitness-related traits such as grain yield, in elite maize panels, by stringent prioritization of fewer than 1% of single-site variants.

Conclusions

Our results suggest that predicting nucleotide conservation across angiosperms may effectively prioritize sites most likely to impact fitness-related traits in crops, without being limited by shifting selection, missing data, and low depth of multiple-sequence alignments. Our approach—Prediction of mutation Impact by Calibrated Nucleotide Conservation (PICNC)—could be useful to select polymorphisms for accurate genomic prediction, and candidate mutations for efficient base editing. The trained PICNC models and predicted nucleotide conservation at protein-coding SNPs in maize are publicly available in CyVerse (https://doi.org/10.25739/hybz-2957).

 
more » « less
Award ID(s):
1822330
NSF-PAR ID:
10370540
Author(s) / Creator(s):
;
Publisher / Repository:
Springer Science + Business Media
Date Published:
Journal Name:
Genome Biology
Volume:
23
Issue:
1
ISSN:
1474-760X
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Summary

    Effective evaluation of millions of crop genetic stocks is an essential component of exploiting genetic diversity to achieve global food security. By leveraging genomics and data analytics, genomic prediction is a promising strategy to efficiently explore the potential of these gene banks by starting with phenotyping a small designed subset. Reliable genomic predictions have enhanced selection of many macroscopic phenotypes in plants and animals. However, the use of genomicprediction strategies for analysis of microscopic phenotypes is limited. Here, we exploited the power of genomic prediction for eight maize traits related to the shoot apical meristem (SAM), the microscopic stem cell niche that generates all the above‐ground organs of the plant. With 435 713 genomewide single‐nucleotide polymorphisms (SNPs), we predicted SAM morphology traits for 2687 diverse maize inbreds based on a model trained from 369 inbreds. An empirical validation experiment with 488 inbreds obtained a prediction accuracy of 0.37–0.57 across eight traits. In addition, we show that a significantly higher prediction accuracy was achieved by leveraging theUvalue (upper bound for reliability) that quantifies the genomic relationships of the validation set with the training set. Our findings suggest that double selection considering both prediction and reliability can be implemented in choosing selection candidates for phenotyping when exploring new diversity is desired. In this case, individuals with less extreme predicted values and moderate reliability values can be considered. Our study expands the turbocharging gene banksviagenomic prediction from the macrophenotypes into the microphenotypic space.

     
    more » « less
  2. Abstract

    Base‐editing technologies enable the introduction of point mutations at targeted genomic sites in mammalian cells, with higher efficiency and precision than traditional genome‐editing methods that use DNA double‐strand breaks, such as zinc finger nucleases (ZFNs), transcription‐activator‐like effector nucleases (TALENs), and the clustered regularly interspaced short palindromic repeats (CRISPR)–CRISPR‐associated protein 9 (CRISPR‐Cas9) system. This allows the generation of single‐nucleotide‐variant isogenic cell lines (i.e., cell lines whose genomic sequences differ from each other only at a single, edited nucleotide) in a more time‐ and resource‐effective manner. These single‐nucleotide‐variant clonal cell lines represent a powerful tool with which to assess the functional role of genetic variants in a native cellular context. Base editing can therefore facilitate genotype‐to‐phenotype studies in a controlled laboratory setting, with applications in both basic research and clinical applications. Here, we provide optimized protocols (including experimental design, methods, and analyses) to design base‐editing constructs, transfect adherent cells, quantify base‐editing efficiencies in bulk, and generate single‐nucleotide‐variant clonal cell lines. © 2020 Wiley Periodicals LLC.

    Basic Protocol 1: Design and production of plasmids for base‐editing experiments

    Basic Protocol 2: Transfection of adherent cells and harvesting of genomic DNA

    Basic Protocol 3: Genotyping of harvested cells using Sanger sequencing

    Alternate Protocol 1: Next‐generation sequencing to quantify base editing

    Basic Protocol 4: Single‐cell isolation of base‐edited cells using FACS

    Alternate Protocol 2: Single‐cell isolation of base‐edited cells using dilution plating

    Basic Protocol 5: Clonal expansion to generate isogenic cell lines and genotyping of clones

     
    more » « less
  3. Abstract

    Structural variants (SVs)—including duplications, deletions, and inversions of DNA—can have significant genomic and functional impacts but are technically difficult to identify and assay compared with single‐nucleotide variants. With the aid of new genomic technologies, it has become clear that SVs account for significant differences across and within species. This phenomenon is particularly well‐documented for humans and other primates due to the wealth of sequence data available. In great apes, SVs affect a larger number of nucleotides than single‐nucleotide variants, with many identified SVs exhibiting population and species specificity. In this review, we highlight the importance of SVs in human evolution by (1) how they have shaped great ape genomes resulting in sensitized regions associated with traits and diseases, (2) their impact on gene functions and regulation, which subsequently has played a role in natural selection, and (3) the role of gene duplications in human brain evolution. We further discuss how to incorporate SVs in research, including the strengths and limitations of various genomic approaches. Finally, we propose future considerations in integrating existing data and biospecimens with the ever‐expanding SV compendium propelled by biotechnology advancements.

     
    more » « less
  4. Abstract Background

    The maize inbred line A188 is an attractive model for elucidation of gene function and improvement due to its high embryogenic capacity and many contrasting traits to the first maize reference genome, B73, and other elite lines. The lack of a genome assembly of A188 limits its use as a model for functional studies.

    Results

    Here, we present a chromosome-level genome assembly of A188 using long reads and optical maps. Comparison of A188 with B73 using both whole-genome alignments and read depths from sequencing reads identify approximately 1.1 Gb of syntenic sequences as well as extensive structural variation, including a 1.8-Mb duplication containing the Gametophyte factor1 locus for unilateral cross-incompatibility, and six inversions of 0.7 Mb or greater. Increased copy number of carotenoid cleavage dioxygenase 1 (ccd1) in A188 is associated with elevated expression during seed development. Highccd1expression in seeds together with low expression of yellow endosperm 1 (y1) reduces carotenoid accumulation, accounting for the white seed phenotype of A188. Furthermore, transcriptome and epigenome analyses reveal enhanced expression of defense pathways and altered DNA methylation patterns of the embryonic callus.

    Conclusions

    The A188 genome assembly provides a high-resolution sequence for a complex genome species and a foundational resource for analyses of genome variation and gene function in maize. The genome, in comparison to B73, contains extensive intra-species structural variations and other genetic differences. Expression and network analyses identify discrete profiles for embryonic callus and other tissues.

     
    more » « less
  5. Genomic structural variants (SVs) can play important roles in adaptation and speciation. Yet the overall fitness effects of SVs are poorly understood, partly because accurate population-level identification of SVs requires multiple high-quality genome assemblies. Here, we use 31 chromosome-scale, haplotype-resolved genome assemblies ofTheobroma cacao—an outcrossing, long-lived tree species that is the source of chocolate—to investigate the fitness consequences of SVs in natural populations. Among the 31 accessions, we find over 160,000 SVs, which together cover eight times more of the genome than single-nucleotide polymorphisms and short indels (125 versus 15 Mb). Our results indicate that a vast majority of these SVs are deleterious: they segregate at low frequencies and are depleted from functional regions of the genome. We show that SVs influence gene expression, which likely impairs gene function and contributes to the detrimental effects of SVs. We also provide empirical support for a theoretical prediction that SVs, particularly inversions, increase genetic load through the accumulation of deleterious nucleotide variants as a result of suppressed recombination. Despite the overall detrimental effects, we identify individual SVs bearing signatures of local adaptation, several of which are associated with genes differentially expressed between populations. Genes involved in pathogen resistance are strongly enriched among these candidates, highlighting the contribution of SVs to this important local adaptation trait. Beyond revealing empirical evidence for the evolutionary importance of SVs, these 31 de novo assemblies provide a valuable resource for genetic and breeding studies inT.cacao.

     
    more » « less