skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Mapping of specialized metabolite terms onto a plant phylogeny using text mining and large language models
SUMMARY Plants produce a staggering array of chemicals that are the basis for organismal function and important human nutrients and medicines. However, it is poorly defined how these compounds evolved and are distributed across the plant kingdom, hindering a systematic view and understanding of plant chemical diversity. Recent advances in plant genome/transcriptome sequencing have provided a well‐defined molecular phylogeny of plants, on which the presence of diverse natural products can be mapped to systematically determine their phylogenetic distribution. Here, we built a proof‐of‐concept workflow where previously reported diverse tyrosine‐derived plant natural products were mapped onto the plant tree of life. Plant chemical‐species associations were mined from literature, filtered, evaluated through manual inspection of over 2500 scientific articles, and mapped onto the plant phylogeny. The resulting “phylochemical” map confirmed several highly lineage‐specific compound class distributions, such as betalain pigments and Amaryllidaceae alkaloids. The map also highlighted several lineages enriched in dopamine‐derived compounds, including the orders Caryophyllales, Liliales, and Fabales. Additionally, the application of large language models, using our manually curated data as a ground truth set, showed that post‐mining processing can largely be automated with a low false‐positive rate, critical for generating a reliable phylochemical map. Although a high false‐negative rate remains a challenge, our study demonstrates that combining text mining with language model‐based processing can generate broader phylochemical maps, which will serve as a valuable community resource to uncover key evolutionary events that underlie plant chemical diversity and enable system‐level views of nature's millions of years of chemical experimentation.  more » « less
Award ID(s):
1938597 1836824
PAR ID:
10587335
Author(s) / Creator(s):
 ;  ;  ;  ;  ;  ;  ;  ;  
Publisher / Repository:
Wiley-Blackwell
Date Published:
Journal Name:
The Plant Journal
Volume:
120
Issue:
1
ISSN:
0960-7412
Format(s):
Medium: X Size: p. 406-419
Size(s):
p. 406-419
Sponsoring Org:
National Science Foundation
More Like this
  1. l-Tyrosine is an essential amino acid for protein synthesis and is also used in plants to synthesize diverse natural products. Plants primarily synthesize tyrosine via TyrA arogenate dehydrogenase (TyrAa or ADH), which are typically strongly feedback inhibited by tyrosine. However, two plant lineages, Fabaceae (legumes) and Caryophyllales, have TyrA enzymes that exhibit relaxed sensitivity to tyrosine inhibition and are associated with elevated production of tyrosine-derived compounds, such as betalain pigments uniquely produced in core Caryophyllales. Although we previously showed that a single D222N substitution is primarily responsible for the deregulation of legume TyrAs, it is unknown when and how the deregulated Caryophyllales TyrA emerged. Here, through phylogeny-guided TyrA structure–function analysis, we found that functionally deregulated TyrAs evolved early in the core Caryophyllales before the origin of betalains, where the E208D amino acid substitution in the active site, which is at a different and opposite location from D222N found in legume TyrAs, played a key role in the TyrA functionalization. Unlike legumes, however, additional substitutions on non-active site residues further contributed to the deregulation of TyrAs in Caryophyllales. The introduction of a mutation analogous to E208D partially deregulated tyrosine-sensitive TyrAs, such as Arabidopsis TyrA2 (AtTyrA2). Moreover, the combined introduction of D222N and E208D additively deregulated AtTyrA2, for which the expression in Nicotiana benthamiana led to highly elevated accumulation of tyrosine in planta. The present study demonstrates that phylogeny-guided characterization of key residues underlying primary metabolic innovations can provide powerful tools to boost the production of essential plant natural products. 
    more » « less
  2. Abstract The production of complex mixtures of secondary metabolites is a ubiquitous feature of plants. Several evolutionary hypotheses seek to explain how phytochemical diversity is maintained, including the synergy hypothesis, the interaction diversity hypothesis, and the screening hypothesis. We experimentally tested a set of predictions derived from these hypotheses by manipulating the richness and structural diversity of phenolic metabolites in the diets of eight plant consumers. Across 3940 total bioassays, there was clear support for the interaction diversity hypothesis over the synergy or screening hypotheses. The number of consumers affected by a particular phenolic composition increased with increasing richness and structural diversity of compounds. Furthermore, the bioactivity of phenolics was consumer‐specific. All compounds tested reduced the performance of at least one consumer, but no compounds affected all consumers. These results show how phytochemical diversity may be maintained in nature by a complex selective landscape exerted by diverse communities of plant consumers. 
    more » « less
  3. {"Abstract":["Original data and R code to accompany the manuscript: "Interaction diversity explains the maintenance of phytochemical diversity" by Susan R. Whitehead, Ethan Bass, Alexsandra Corrigan, André Kessler, and Katja Poveda Accepted for publication in Ecology Letters<\/p>\n\nAbstract: The production of complex mixtures of secondary metabolites is a ubiquitous feature of plants. Several evolutionary hypotheses seek to explain how phytochemical diversity is maintained, including the synergy hypothesis, the interaction diversity hypothesis, and the screening hypothesis. We experimentally tested predictions derived from these hypotheses by manipulating the richness and structural diversity of phenolic metabolites in the diets of eight plant consumers. Across 3940 total bioassays, there was clear support for the interaction diversity hypothesis over the synergy or screening hypotheses. The number of consumers affected by a particular phenolic composition increased with increasing richness and structural diversity of compounds. Furthermore, the bioactivity of phenolics was consumer-specific. All compounds tested reduced the performance of at least one consumer, but no compounds affected all consumers. These results show how phytochemical diversity may be maintained in nature by a complex selective landscape exerted by diverse communities of plant consumers.<\/p>\n\nhttps://github.com/WhiteheadLabVT/Phytochemical-Diversity-Experiment/releases/tag/v1.0.0<\/p>"]} 
    more » « less
  4. Plant natural products (PNPs) play important roles in plant physiology and have been applied across diverse fields of human society. Understanding their biosynthetic pathways informs plant evolution and meanwhile enables sustainable production through metabolic engineering. However, the discovery of PNP biosynthetic pathways remains challenging due to the diversity of enzymes involved and limitations in traditional gene mining approaches. In this review, we will summarize state-of-the-art strategies and recent examples for predicting and characterizing PNP biosynthetic pathways, respectively, with multiomics-guided tools and heterologous host systems and share our perspectives on the systematic pipelines integrating these various bioinformatic and biochemical approaches. 
    more » « less
  5. SUMMARY Plants synthesize natural products via lineage‐specific offshoots of their core metabolic pathways, including fatty acid synthesis. Recent studies have shed light on new fatty acid‐derived natural products and their biosynthetic pathways in disparate plant species. Inspired by this progress, we set out to develop tools for exploring the evolution of fatty‐acid derived products. We sampled multiple species from all major clades of euphyllophytes, including ferns, gymnosperms, and angiosperms (monocots and eudicots), and we show that the compositional profiles (though not necessarily the total amounts) of fatty‐acid derived surface waxes from preserved plant specimens are consistent with those obtained from freshly collected tissue in a semi‐quantitative and sometimes quantitative manner. We then sampled herbarium specimens representing 57 monocot species to assess the phylogenetic distribution and evolution, of two fatty acid‐derived natural products found in that clade: beta‐diketones and alkylresorcinols. These chemical data, combined with analyses of 26 monocot genomes, revealed a co‐occurrence (though not necessarily a causal relationship) between whole genome duplication and the evolution of diketone synthases from an ancestral alkylresorcinol synthase‐like polyketide synthase. Limitations of using herbarium specimen wax profiles as proxies for those of fresh tissue seem likely to include effects from loss of epicuticular wax crystals, effects from preservation techniques, and variation in wax chemical profiles due to genotype or environment. Nevertheless, this work reinforces the widespread utility of herbarium specimens for studying leaf surface waxes (and possibly other chemical classes) and reveals some of the evolutionary history of fatty acid‐derived natural products within monocots. 
    more » « less