skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Machine learning enables identification of an alternative yeast galactose utilization pathway
How genomic differences contribute to phenotypic differences is a major question in biology. The recently characterized genomes, isolation environments, and qualitative patterns of growth on 122 sources and conditions of 1,154 strains from 1,049 fungal species (nearly all known) in the yeast subphylum Saccharomycotina provide a powerful, yet complex, dataset for addressing this question. We used a random forest algorithm trained on these genomic, metabolic, and environmental data to predict growth on several carbon sources with high accuracy. Known structural genes involved in assimilation of these sources and presence/absence patterns of growth in other sources were important features contributing to prediction accuracy. By further examining growth on galactose, we found that it can be predicted with high accuracy from either genomic (92.2%) or growth data (82.6%) but not from isolation environment data (65.6%). Prediction accuracy was even higher (93.3%) when we combined genomic and growth data. After theGALactose utilization genes, the most important feature for predicting growth on galactose was growth on galactitol, raising the hypothesis that several species in two orders, Serinales and Pichiales (containing the emerging pathogenCandida aurisand the genusOgataea, respectively), have an alternative galactose utilization pathway because they lack theGALgenes. Growth and biochemical assays confirmed that several of these species utilize galactose through an alternative oxidoreductive D-galactose pathway, rather than the canonicalGALpathway. Machine learning approaches are powerful for investigating the evolution of the yeast genotype–phenotype map, and their application will uncover novel biology, even in well-studied traits.  more » « less
Award ID(s):
2110404 2110403
PAR ID:
10515152
Author(s) / Creator(s):
; ; ; ; ; ; ; ; ;
Publisher / Repository:
PNAS
Date Published:
Journal Name:
Proceedings of the National Academy of Sciences
Volume:
121
Issue:
18
ISSN:
0027-8424
Page Range / eLocation ID:
e2315314121
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Townsend, Jeffrey (Ed.)
    Abstract Xylose is the second most abundant monomeric sugar in plant biomass. Consequently, xylose catabolism is an ecologically important trait for saprotrophic organisms, as well as a fundamentally important trait for industries that hope to convert plant mass to renewable fuels and other bioproducts using microbial metabolism. Although common across fungi, xylose catabolism is rare within Saccharomycotina, the subphylum that contains most industrially relevant fermentative yeast species. The genomes of several yeasts unable to consume xylose have been previously reported to contain the full set of genes in the XYL pathway, suggesting the absence of a gene–trait correlation for xylose metabolism. Here, we measured growth on xylose and systematically identified XYL pathway orthologs across the genomes of 332 budding yeast species. Although the XYL pathway coevolved with xylose metabolism, we found that pathway presence only predicted xylose catabolism about half of the time, demonstrating that a complete XYL pathway is necessary, but not sufficient, for xylose catabolism. We also found that XYL1 copy number was positively correlated, after phylogenetic correction, with xylose utilization. We then quantified codon usage bias of XYL genes and found that XYL3 codon optimization was significantly higher, after phylogenetic correction, in species able to consume xylose. Finally, we showed that codon optimization of XYL2 was positively correlated, after phylogenetic correction, with growth rates in xylose medium. We conclude that gene content alone is a weak predictor of xylose metabolism and that using codon optimization enhances the prediction of xylose metabolism from yeast genome sequence data. 
    more » « less
  2. Abstract BackgroundCost-effective production of biofuels from lignocellulose requires the fermentation ofd-xylose. Many yeast species within and closely related to the generaSpathasporaandScheffersomyces(both of the order Serinales) natively assimilate and ferment xylose. Other species consume xylose inefficiently, leading to extracellular accumulation of xylitol. Xylitol excretion is thought to be due to the different cofactor requirements of the first two steps of xylose metabolism. Xylose reductase (XR) generally uses NADPH to reduce xylose to xylitol, while xylitol dehydrogenase (XDH) generally uses NAD+to oxidize xylitol to xylulose, creating an imbalanced redox pathway. This imbalance is thought to be particularly consequential in hypoxic or anoxic environments. ResultsWe screened the growth of xylose-fermenting yeast species in high and moderate aeration and identified both ethanol producers and xylitol producers. Selected species were further characterized for their XR and XDH cofactor preferences by enzyme assays and gene expression patterns by RNA-Seq. Our data revealed that xylose metabolism is more redox balanced in some species, but it is strongly affected by oxygen levels. Under high aeration, most species switched from ethanol production to xylitol accumulation, despite the availability of ample oxygen to accept electrons from NADH. This switch was followed by decreases in enzyme activity and the expression of genes related to xylose metabolism, suggesting that bottlenecks in xylose fermentation are not always due to cofactor preferences. Finally, we expressedXYLgenes from multipleScheffersomycesspecies in a strain ofSaccharomyces cerevisiae. RecombinantS. cerevisiaeexpressingXYL1fromScheffersomyces xylosifermentans, which encodes an XR without a cofactor preference, showed improved anaerobic growth on xylose as the primary carbon source compared toS. cerevisiaestrain expressingXYLgenes fromScheffersomyces stipitis. ConclusionCollectively, our data do not support the hypothesis that xylitol accumulation occurs primarily due to differences in cofactor preferences between xylose reductase and xylitol dehydrogenase; instead, gene expression plays a major role in response to oxygen levels. We have also identified the yeastSc. xylosifermentansas a potential source for genes that can be engineered intoS. cerevisiaeto improve xylose fermentation and biofuel production. 
    more » « less
  3. Organisms exhibit extensive variation in ecological niche breadth, from very narrow (specialists) to very broad (generalists). Two general paradigms have been proposed to explain this variation: trade-offs between performance efficiency and breadth; and the joint influence of extrinsic (environmental) and intrinsic (genomic) factors. We assembled genomic, metabolic, and ecological data from nearly all known species of the ancient fungal subphylum Saccharomycotina (1,154 yeast strains from 1,051 species), grown in 24 different environmental conditions, to examine niche breadth evolution. We found that large differences in the breadth of carbon utilization traits between yeasts stem from intrinsic differences in genes encoding specific metabolic pathways, but limited evidence for trade-offs. These comprehensive data argue that intrinsic factors shape niche breadth variation in microbes. 
    more » « less
  4. Summary Plant metabolites from diverse pathways are important for plant survival, human nutrition and medicine. The pathway memberships of most plant enzyme genes are unknown. While co‐expression is useful for assigning genes to pathways, expression correlation may exist only under specific spatiotemporal and conditional contexts.Utilising > 600 tomato (Solanum lycopersicum) expression data combinations, three strategies for predicting memberships in 85 pathways were explored.Optimal predictions for different pathways require distinct data combinations indicative of pathway functions. Naive prediction (i.e. identifying pathways with the most similarly expressed genes) is error prone. In 52 pathways, unsupervised learning performed better than supervised approaches, possibly due to limited training data availability. Using gene‐to‐pathway expression similarities led to prediction models that outperformed those based simply on expression levels. Using 36 experimental validated genes, the pathway‐best model prediction accuracy is 58.3%, significantly better compared with that for predicting annotated genes without experimental evidence (37.0%) or random guess (1.2%), demonstrating the importance of data quality.Our study highlights the need to extensively explore expression‐based features and prediction strategies to maximise the accuracy of metabolic pathway membership assignment. The prediction framework outlined here can be applied to other species and serves as a baseline model for future comparisons. 
    more » « less
  5. Abstract Evolutionary adaptation increases the fitness of a species in its environment. It can occur through rewiring of gene regulatory networks, such that an organism responds appropriately to environmental changes. We investigated whether sirtuin deacetylases, which repress transcription and require NAD+ for activity, serve as transcriptional rewiring points that facilitate the evolution of potentially adaptive traits. If so, bringing genes under the control of sirtuins could enable organisms to mount appropriate responses to stresses that decrease NAD+ levels. To explore how the genomic targets of sirtuins shift over evolutionary time, we compared two yeast species, Saccharomyces cerevisiae and Kluyveromyces lactis, that display differences in cellular metabolism and life cycle timing in response to nutrient availability. We identified sirtuin-regulated genes through a combination of chromatin immunoprecipitation and RNA expression. In both species, regulated genes were associated with NAD+ homeostasis, mating, and sporulation, but the specific genes differed. In addition, regulated genes in K. lactis were associated with other processes, including utilization of nonglucose carbon sources, detoxification of arsenic, and production of the siderophore pulcherrimin. Consistent with the species-restricted regulation of these genes, sirtuin deletion affected relevant phenotypes in K. lactis but not S. cerevisiae. Finally, sirtuin-regulated gene sets were depleted for broadly conserved genes, consistent with sirtuins regulating processes restricted to a few species. Taken together, these results are consistent with the notion that sirtuins serve as rewiring points that allow species to evolve distinct responses to low NAD+ stress. 
    more » « less