skip to main content


Title: Accelerating structure‐function mapping using the ViVa webtool to mine natural variation
Abstract

Thousands of sequenced genomes are now publicly available capturing a significant amount of natural variation within plant species; yet, much of these data remain inaccessible to researchers without significant bioinformatics experience. Here, we present a webtool called ViVa (Visualizing Variation) which aims to empower any researcher to take advantage of the amazing genetic resource collected in theArabidopsis thaliana1001 Genomes Project (http://1001genomes.org). ViVa facilitates data mining on the gene, gene family, or gene network level. To test the utility and accessibility of ViVa, we assembled a team with a range of expertise within biology and bioinformatics to analyze the natural variation within the well‐studied nuclear auxin signaling pathway. Our analysis has provided further confirmation of existing knowledge and has also helped generate new hypotheses regarding this well‐studied pathway. These results highlight how natural variation could be used to generate and test hypotheses about less‐studied gene families and networks, especially when paired with biochemical and genetic characterization. ViVa is also readily extensible to databases of interspecific genetic variation in plants as well as other organisms, such as the 3,000 Rice Genomes Project (http://snp-seek.irri.org/) and human genetic variation (https://www.ncbi.nlm.nih.gov/clinvar/).

 
more » « less
NSF-PAR ID:
10197229
Author(s) / Creator(s):
 ;  ;  ;  ;  ;  ;  ;  ;  ;  ;  
Publisher / Repository:
Wiley Blackwell (John Wiley & Sons)
Date Published:
Journal Name:
Plant Direct
Volume:
3
Issue:
7
ISSN:
2475-4455
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract

    Understanding the mechanisms underlying biological invasions and rapid adaptation to global change remains a fundamental challenge, particularly in small populations lacking in genetic variation. Two understudied mechanisms that could facilitate adaptive evolution and adaptive plasticity are the increased genetic variation due to transposable elements (TEs), and associated or independent modification of gene expression through epigenetic changes.

    Here, we focus on the potential role of these genetic and non‐genetic mechanisms for facilitating invasion success. Because novel or stressful environments are known to induce both epigenetic changes and TE activity, these mechanisms may play an underappreciated role in generating phenotypic and genetic variation for selection to act on. We review how these mechanisms operate, the evidence for how they respond to novel or stressful environments, and how these mechanisms can contribute to the success of biological invasions by facilitating adaptive evolution and phenotypic plasticity.

    Because genetic and phenotypic variations due to TEs and epigenetic changes are often well regulated or “hidden” in the native environment, the independent and combined contribution of these mechanisms may only become important when populations colonize novel environments. A focus on the mechanisms that generate and control the expression of this variation in new environments may provide insights into biological invasions that would otherwise not be obvious.

    Global changes and human activities impact on ecosystems and allow new opportunities for biological invasions. Invasive species succeed by adapting rapidly to new environments. The degree to which rapid responses to environmental change could be mediated by the epigenome—the regulatory system that integrates how environmental and genomic variation jointly shape phenotypic variation—requires greater attention if we want to understand the mechanisms by which populations successfully colonize and adapt to new environments.

    A freePlain Language Summarycan be found within the Supporting Information of this article.

     
    more » « less
  2. Abstract Background

    Crop improvement through cross-population genomic prediction and genome editing requires identification of causal variants at high resolution, within fewer than hundreds of base pairs. Most genetic mapping studies have generally lacked such resolution. In contrast, evolutionary approaches can detect genetic effects at high resolution, but they are limited by shifting selection, missing data, and low depth of multiple-sequence alignments. Here we use genomic annotations to accurately predict nucleotide conservation across angiosperms, as a proxy for fitness effect of mutations.

    Results

    Using only sequence analysis, we annotate nonsynonymous mutations in 25,824 maize gene models, with information from bioinformatics and deep learning. Our predictions are validated by experimental information: within-species conservation, chromatin accessibility, and gene expression. According to gene ontology and pathway enrichment analyses, predicted nucleotide conservation points to genes in central carbon metabolism. Importantly, it improves genomic prediction for fitness-related traits such as grain yield, in elite maize panels, by stringent prioritization of fewer than 1% of single-site variants.

    Conclusions

    Our results suggest that predicting nucleotide conservation across angiosperms may effectively prioritize sites most likely to impact fitness-related traits in crops, without being limited by shifting selection, missing data, and low depth of multiple-sequence alignments. Our approach—Prediction of mutation Impact by Calibrated Nucleotide Conservation (PICNC)—could be useful to select polymorphisms for accurate genomic prediction, and candidate mutations for efficient base editing. The trained PICNC models and predicted nucleotide conservation at protein-coding SNPs in maize are publicly available in CyVerse (https://doi.org/10.25739/hybz-2957).

     
    more » « less
  3. Summary Open Research Badges

    This article has earned an Open Data Badge for making publicly available the digitally‐shareable data necessary to reproduce the reported results. The data is available athttps://github.com/SNAnderson/maizeTE_variation;https://mcstitzer.github.io/maize_TEs.

     
    more » « less
  4. Abstract

    Expression of herbivore defense traits can change dramatically during the course of plant development. Little is known, however, about the degree of genetic or sexual variation in these ontogenetic defense trajectories or whether the trajectories themselves are adaptive, especially in long‐lived species.

    We used a 13‐year dataset of chemical defense traits, growth and survivorship from a common garden of trembling aspen (Populus tremuloides) genotypes to document long‐term defense trajectories and their relationship to tree fitness during juvenile and early mature stages.

    Overall, concentrations of the two principal classes of aspen defense compounds (salicinoid phenolic glycosides [SPGs] and condensed tannins [CTs]) decreased to differing degrees in foliage of juvenile trees and then remained relatively constant in maturity. Initial values, juvenile rates of change and average mature values all exhibited significant genetic variation for both SPGs and CTs.

    Relationships between defense trajectory parameters and metrics of tree fitness (growth and survivorship) depended on compound type and tree sex. Females with higher‐allocation SPG trajectories (high initial juvenile concentrations, slow juvenile declines, high mature concentrations) grew more slowly relative to females with lower‐allocation trajectories. In males, higher‐allocation SPG trajectories had a lesser effect on growth but were associated with reduced mortality. Juvenile CT trajectories were not correlated with tree fitness, but average CT concentration in maturity was positively related to growth in females.

    These results suggest that ontogenetic defense trajectories are adaptive and subject to natural selection. Genotypic variation and ontogeny shape tree defensive chemistry, both independently and interactively. These patterns of defense expression have the potential to structure trophic interactions and the genetic composition of forests in both space and time.

    A freePlain Language Summarycan be found within the Supporting Information of this article.

     
    more » « less
  5. Abstract Background

    Systems-level analyses, such as differential gene expression analysis, co-expression analysis, and metabolic pathway reconstruction, depend on the accuracy of the transcriptome. Multiple tools exist to perform transcriptome assembly from RNAseq data. However, assembling high quality transcriptomes is still not a trivial problem. This is especially the case for non-model organisms where adequate reference genomes are often not available. Different methods produce different transcriptome models and there is no easy way to determine which are more accurate. Furthermore, having alternative-splicing events exacerbates such difficult assembly problems. While benchmarking transcriptome assemblies is critical, this is also not trivial due to the general lack of true reference transcriptomes.

    Results

    In this study, we first provide a pipeline to generate a set of the simulated benchmark transcriptome and corresponding RNAseq data. Using the simulated benchmarking datasets, we compared the performance of various transcriptome assembly approaches including both de novo and genome-guided methods. The results showed that the assembly performance deteriorates significantly when alternative transcripts (isoforms) exist or for genome-guided methods when the reference is not available from the same genome. To improve the transcriptome assembly performance, leveraging the overlapping predictions between different assemblies, we present a new consensus-based ensemble transcriptome assembly approach, ConSemble.

    Conclusions

    Without using a reference genome, ConSemble using four de novo assemblers achieved an accuracy up to twice as high as any de novo assemblers we compared. When a reference genome is available, ConSemble using four genome-guided assemblies removed many incorrectly assembled contigs with minimal impact on correctly assembled contigs, achieving higher precision and accuracy than individual genome-guided methods. Furthermore, ConSemble using de novo assemblers matched or exceeded the best performing genome-guided assemblers even when the transcriptomes included isoforms. We thus demonstrated that the ConSemble consensus strategy both for de novo and genome-guided assemblers can improve transcriptome assembly. The RNAseq simulation pipeline, the benchmark transcriptome datasets, and the script to perform the ConSemble assembly are all freely available from:http://bioinfolab.unl.edu/emlab/consemble/.

     
    more » « less