skip to main content


Search for: All records

Award ID contains: 1546617

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Abstract

    Plant alkaloids constitute an important class of bioactive chemicals with applications in medicine and agriculture. However, the knowledge gap of the diversity and biosynthesis of phytoalkaloids prevents systematic advances in biotechnology for engineered production of these high-value compounds. In particular, the identification of cytochrome P450s driving the structural diversity of phytoalkaloids has remained challenging. Here, we use a combination of reverse genetics with discovery metabolomics and multivariate statistical analysis followed byin plantatransient assays to investigate alkaloid diversity and functionally characterize two candidate cytochrome P450s genes fromAtropa belladonnawithout a priori knowledge of their functions or information regarding the identities of key pathway intermediates. This approach uncovered a largely unexplored root localized alkaloid sub-network that relies on pseudotropine as precursor. The two cytochrome P450s catalyzeN-demethylation and ring-hydroxylation reactions within the early steps in the biosynthesis of diverseN-demethylated modified tropane alkaloids.

     
    more » « less
  2. Summary

    Plant metabolites from diverse pathways are important for plant survival, human nutrition and medicine. The pathway memberships of most plant enzyme genes are unknown. While co‐expression is useful for assigning genes to pathways, expression correlation may exist only under specific spatiotemporal and conditional contexts.

    Utilising > 600 tomato (Solanum lycopersicum) expression data combinations, three strategies for predicting memberships in 85 pathways were explored.

    Optimal predictions for different pathways require distinct data combinations indicative of pathway functions. Naive prediction (i.e. identifying pathways with the most similarly expressed genes) is error prone. In 52 pathways, unsupervised learning performed better than supervised approaches, possibly due to limited training data availability. Using gene‐to‐pathway expression similarities led to prediction models that outperformed those based simply on expression levels. Using 36 experimental validated genes, the pathway‐best model prediction accuracy is 58.3%, significantly better compared with that for predicting annotated genes without experimental evidence (37.0%) or random guess (1.2%), demonstrating the importance of data quality.

    Our study highlights the need to extensively explore expression‐based features and prediction strategies to maximise the accuracy of metabolic pathway membership assignment. The prediction framework outlined here can be applied to other species and serves as a baseline model for future comparisons.

     
    more » « less
  3. null (Ed.)
    Abstract Background Availability of plant genome sequences has led to significant advances. However, with few exceptions, the great majority of existing genome assemblies are derived from short read sequencing technologies with highly uneven read coverages indicative of sequencing and assembly issues that could significantly impact any downstream analysis of plant genomes. In tomato for example, 0.6% (5.1 Mb) and 9.7% (79.6 Mb) of short-read based assembly had significantly higher and lower coverage compared to background, respectively. Results To understand what the causes may be for such uneven coverage, we first established machine learning models capable of predicting genomic regions with variable coverages and found that high coverage regions tend to have higher simple sequence repeat and tandem gene densities compared to background regions. To determine if the high coverage regions were misassembled, we examined a recently available tomato long-read based assembly and found that 27.8% (1.41 Mb) of high coverage regions were potentially misassembled of duplicate sequences, compared to 1.4% in background regions. In addition, using a predictive model that can distinguish correctly and incorrectly assembled high coverage regions, we found that misassembled, high coverage regions tend to be flanked by simple sequence repeats, pseudogenes, and transposon elements. Conclusions Our study provides insights on the causes of variable coverage regions and a quantitative assessment of factors contributing to plant genome misassembly when using short reads and the generality of these causes and factors should be tested further in other species. 
    more » « less
  4. null (Ed.)
    Abstract Plants respond to their environment by dynamically modulating gene expression. A powerful approach for understanding how these responses are regulated is to integrate information about cis-regulatory elements (CREs) into models called cis-regulatory codes. Transcriptional response to combined stress is typically not the sum of the responses to the individual stresses. However, cis-regulatory codes underlying combined stress response have not been established. Here we modeled transcriptional response to single and combined heat and drought stress in Arabidopsis thaliana. We grouped genes by their pattern of response (independent, antagonistic and synergistic) and trained machine learning models to predict their response using putative CREs (pCREs) as features (median F-measure = 0.64). We then developed a deep learning approach to integrate additional omics information (sequence conservation, chromatin accessibility and histone modification) into our models, improving performance by 6.2%. While pCREs important for predicting independent and antagonistic responses tended to resemble binding motifs of transcription factors associated with heat and/or drought stress, important synergistic pCREs resembled binding motifs of transcription factors not known to be associated with stress. These findings demonstrate how in silico approaches can improve our understanding of the complex codes regulating response to combined stress and help us identify prime targets for future characterization. 
    more » « less
  5. Marshall-Colon, Amy (Ed.)
    Abstract Plant specialized metabolites mediate interactions between plants and the environment and have significant agronomical/pharmaceutical value. Most genes involved in specialized metabolism (SM) are unknown because of the large number of metabolites and the challenge in differentiating SM genes from general metabolism (GM) genes. Plant models like Arabidopsis thaliana have extensive, experimentally derived annotations, whereas many non-model species do not. Here we employed a machine learning strategy, transfer learning, where knowledge from A. thaliana is transferred to predict gene functions in cultivated tomato with fewer experimentally annotated genes. The first tomato SM/GM prediction model using only tomato data performs well (F-measure = 0.74, compared with 0.5 for random and 1.0 for perfect predictions), but from manually curating 88 SM/GM genes, we found many mis-predicted entries were likely mis-annotated. When the SM/GM prediction models built with A. thaliana data were used to filter out genes where the A. thaliana-based model predictions disagreed with tomato annotations, the new tomato model trained with filtered data improved significantly (F-measure = 0.92). Our study demonstrates that SM/GM genes can be better predicted by leveraging cross-species information. Additionally, our findings provide an example for transfer learning in genomics where knowledge can be transferred from an information-rich species to an information-poor one. 
    more » « less
  6. The evolution of transcriptional regulatory mechanisms is central to how stress response and tolerance differ between species. However, it remains largely unknown how divergence in cis-regulatory sites and, subsequently, transcription factor (TF) binding specificity contribute to stress-responsive expression divergence, particularly between wild and domesticated spe-cies. By profiling wound-responsive gene transcriptomes in wild Solanum pennellii and do-mesticated S. lycopersicum, we found extensive wound-response divergence and identified 493 S. lycopersicum and 278 S. pennellii putative cis-regulatory elements (pCREs) that were predictive of wound-responsive gene expression. Only 24-52% of these wound-response pCREs (depending on wound-response patterns) were consistently enriched in the putative promoter regions of wound-responsive genes across species. In addition, between these two species, their differences in pCRE site sequences were significantly and positively correlated with differences in wound-responsive gene expression. Furthermore, ~11-39% of pCREs were specific to only one of the species and likely bound by TFs from different families. These findings indicate substantial regulatory divergence in these two plant species that di-verged ~3-7 million years ago. Our study provides insights into the mechanistic basis of how the transcriptional response to wounding is regulated and, importantly, the contribution of cis-regulatory components to variation in wound-responsive gene expression between a wild and a domesticated plant species. 
    more » « less