skip to main content

Attention:

The NSF Public Access Repository (PAR) system and access will be unavailable from 11:00 PM ET on Thursday, January 16 until 2:00 AM ET on Friday, January 17 due to maintenance. We apologize for the inconvenience.


Title: Prediction of evolutionary constraint by genomic annotations improves functional prioritization of genomic variants in maize
Abstract Background

Crop improvement through cross-population genomic prediction and genome editing requires identification of causal variants at high resolution, within fewer than hundreds of base pairs. Most genetic mapping studies have generally lacked such resolution. In contrast, evolutionary approaches can detect genetic effects at high resolution, but they are limited by shifting selection, missing data, and low depth of multiple-sequence alignments. Here we use genomic annotations to accurately predict nucleotide conservation across angiosperms, as a proxy for fitness effect of mutations.

Results

Using only sequence analysis, we annotate nonsynonymous mutations in 25,824 maize gene models, with information from bioinformatics and deep learning. Our predictions are validated by experimental information: within-species conservation, chromatin accessibility, and gene expression. According to gene ontology and pathway enrichment analyses, predicted nucleotide conservation points to genes in central carbon metabolism. Importantly, it improves genomic prediction for fitness-related traits such as grain yield, in elite maize panels, by stringent prioritization of fewer than 1% of single-site variants.

Conclusions

Our results suggest that predicting nucleotide conservation across angiosperms may effectively prioritize sites most likely to impact fitness-related traits in crops, without being limited by shifting selection, missing data, and low depth of multiple-sequence alignments. Our approach—Prediction of mutation Impact by Calibrated Nucleotide Conservation (PICNC)—could be useful to select polymorphisms for accurate genomic prediction, and candidate mutations for efficient base editing. The trained PICNC models and predicted nucleotide conservation at protein-coding SNPs in maize are publicly available in CyVerse (https://doi.org/10.25739/hybz-2957).

 
more » « less
Award ID(s):
1822330
PAR ID:
10370540
Author(s) / Creator(s):
;
Publisher / Repository:
Springer Science + Business Media
Date Published:
Journal Name:
Genome Biology
Volume:
23
Issue:
1
ISSN:
1474-760X
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract

    Base‐editing technologies enable the introduction of point mutations at targeted genomic sites in mammalian cells, with higher efficiency and precision than traditional genome‐editing methods that use DNA double‐strand breaks, such as zinc finger nucleases (ZFNs), transcription‐activator‐like effector nucleases (TALENs), and the clustered regularly interspaced short palindromic repeats (CRISPR)–CRISPR‐associated protein 9 (CRISPR‐Cas9) system. This allows the generation of single‐nucleotide‐variant isogenic cell lines (i.e., cell lines whose genomic sequences differ from each other only at a single, edited nucleotide) in a more time‐ and resource‐effective manner. These single‐nucleotide‐variant clonal cell lines represent a powerful tool with which to assess the functional role of genetic variants in a native cellular context. Base editing can therefore facilitate genotype‐to‐phenotype studies in a controlled laboratory setting, with applications in both basic research and clinical applications. Here, we provide optimized protocols (including experimental design, methods, and analyses) to design base‐editing constructs, transfect adherent cells, quantify base‐editing efficiencies in bulk, and generate single‐nucleotide‐variant clonal cell lines. © 2020 Wiley Periodicals LLC.

    Basic Protocol 1: Design and production of plasmids for base‐editing experiments

    Basic Protocol 2: Transfection of adherent cells and harvesting of genomic DNA

    Basic Protocol 3: Genotyping of harvested cells using Sanger sequencing

    Alternate Protocol 1: Next‐generation sequencing to quantify base editing

    Basic Protocol 4: Single‐cell isolation of base‐edited cells using FACS

    Alternate Protocol 2: Single‐cell isolation of base‐edited cells using dilution plating

    Basic Protocol 5: Clonal expansion to generate isogenic cell lines and genotyping of clones

     
    more » « less
  2. Abstract Background

    Fusion of RNA-binding proteins (RBPs) to RNA base-editing enzymes (such as APOBEC1 or ADAR) has emerged as a powerful tool for the discovery of RBP binding sites. However, current methods that analyze sequencing data from RNA-base editing experiments are vulnerable to false positives due to off-target editing, genetic variation and sequencing errors.

    Results

    We present FLagging Areas of RNA-editing Enrichment (FLARE), a Snakemake-based pipeline that builds on the outputs of the SAILOR edit site discovery tool to identify regions statistically enriched for RNA editing. FLARE can be configured to analyze any type of RNA editing, including C to U and A to I. We applied FLARE to C-to-U editing data from a RBFOX2-APOBEC1 STAMP experiment, to show that our approach attains high specificity for detecting RBFOX2 binding sites. We also applied FLARE to detect regions of exogenously introduced as well as endogenous A-to-I editing.

    Conclusions

    FLARE is a fast and flexible workflow that identifies significantly edited regions from RNA-seq data. The FLARE codebase is available athttps://github.com/YeoLab/FLARE.

     
    more » « less
  3. Abstract Background

    Structural variation (SV), which ranges from 50 bp to$$\sim$$ 3 Mb in size, is an important type of genetic variations. Deletion is a type of SV in which a part of a chromosome or a sequence of DNA is lost during DNA replication. Three types of signals, including discordant read-pairs, reads depth and split reads, are commonly used for SV detection from high-throughput sequence data. Many tools have been developed for detecting SVs by using one or multiple of these signals.

    Results

    In this paper, we develop a new method called EigenDel for detecting the germline submicroscopic genomic deletions. EigenDel first takes advantage of discordant read-pairs and clipped reads to get initial deletion candidates, and then it clusters similar candidates by using unsupervised learning methods. After that, EigenDel uses a carefully designed approach for calling true deletions from each cluster. We conduct various experiments to evaluate the performance of EigenDel on low coverage sequence data.

    Conclusions

    Our results show that EigenDel outperforms other major methods in terms of improving capability of balancing accuracy and sensitivity as well as reducing bias. EigenDel can be downloaded fromhttps://github.com/lxwgcool/EigenDel.

     
    more » « less
  4. Abstract Background

    Adding sequences into an existing (possibly user-provided) alignment has multiple applications, including updating a large alignment with new data, adding sequences into a constraint alignment constructed using biological knowledge, or computing alignments in the presence of sequence length heterogeneity. Although this is a natural problem, only a few tools have been developed to use this information with high fidelity.

    Results

    We present EMMA (Extending Multiple alignments using MAFFT--add) for the problem of adding a set of unaligned sequences into a multiple sequence alignment (i.e., a constraint alignment). EMMA builds on MAFFT--add, which is also designed to add sequences into a given constraint alignment. EMMA improves on MAFFT--add methods by using a divide-and-conquer framework to scale its most accurate version, MAFFT-linsi--add, to constraint alignments with many sequences. We show that EMMA has an accuracy advantage over other techniques for adding sequences into alignments under many realistic conditions and can scale to large datasets with high accuracy (hundreds of thousands of sequences). EMMA is available athttps://github.com/c5shen/EMMA.

    Conclusions

    EMMA is a new tool that provides high accuracy and scalability for adding sequences into an existing alignment.

     
    more » « less
  5. Abstract

    Structural variants (SVs)—including duplications, deletions, and inversions of DNA—can have significant genomic and functional impacts but are technically difficult to identify and assay compared with single‐nucleotide variants. With the aid of new genomic technologies, it has become clear that SVs account for significant differences across and within species. This phenomenon is particularly well‐documented for humans and other primates due to the wealth of sequence data available. In great apes, SVs affect a larger number of nucleotides than single‐nucleotide variants, with many identified SVs exhibiting population and species specificity. In this review, we highlight the importance of SVs in human evolution by (1) how they have shaped great ape genomes resulting in sensitized regions associated with traits and diseases, (2) their impact on gene functions and regulation, which subsequently has played a role in natural selection, and (3) the role of gene duplications in human brain evolution. We further discuss how to incorporate SVs in research, including the strengths and limitations of various genomic approaches. Finally, we propose future considerations in integrating existing data and biospecimens with the ever‐expanding SV compendium propelled by biotechnology advancements.

     
    more » « less