skip to main content

Title: Estimating Cellular Goals from High-Dimensional Biological Data
Optimization-based models have been used to predict cellular behavior for over 25 years. The constraints in these models are derived from genome annotations, measured macromolecular composition of cells, and by measuring the cell's growth rate and metabolism in different conditions. The cellular goal (the optimization problem that the cell is trying to solve) can be challenging to derive experimentally for many organisms, including human or mammalian cells, which have complex metabolic capabilities and are not well understood. Existing approaches to learning goals from data include (a) estimating a linear objective function, or (b) estimating linear constraints that model complex biochemical reactions and constrain the cell's operation. The latter approach is important because often the known reactions are not enough to explain observations; therefore, there is a need to extend automatically the model complexity by learning new reactions. However, this leads to nonconvex optimization problems, and existing tools cannot scale to realistically large metabolic models. Hence, constraint estimation is still used sparingly despite its benefits for modeling cell metabolism, which is important for developing novel antimicrobials against pathogens, discovering cancer drug targets, and producing value-added chemicals. Here, we develop the first approach to estimating constraint reactions from data that can scale more » to realistically large metabolic models. Previous tools were used on problems having less than 75 reactions and 60 metabolites, which limits real-life-size applications. We perform extensive experiments using 75 large-scale metabolic network models for different organisms (including bacteria, yeasts, and mammals) and show that our algorithm can recover cellular constraint reactions. The recovered constraints enable accurate prediction of metabolic states in hundreds of growth environments not seen in training data, and we recover useful cellular goals even when some measurements are missing. « less
; ; ; ;
Award ID(s):
Publication Date:
Journal Name:
Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining
Page Range or eLocation-ID:
2202 to 2211
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract

    Constraint-based modeling has been applied to analyze metabolism of numerous organisms via flux balance analysis and genome-scale metabolic models, including mammalian cells such as the Chinese hamster ovary (CHO) cells—the principal cell factory platform for therapeutic protein production. Unfortunately, the application of genome-scale model methodologies using the conventional biomass objective function is challenged by the presence of overly-restrictive constraints, including essential amino acid exchange fluxes that can lead to improper predictions of growth rates and intracellular flux distributions. In this study, these constraints are found to be reliably predicted by an “essential nutrient minimization” approach. After modifying these constraints with the predicted minimal uptake values, a series of unconventional objective functions are applied to minimize each individual non-essential nutrient uptake rate, revealing useful insights about metabolic exchange rates and flows across different cell lines and culture conditions. This unconventional uptake-rate objective functions (UOFs) approach is able to distinguish metabolic differences between three distinct CHO cell lines (CHO-K1, -DG44, and -S) not directly observed using the conventional biomass growth maximization solutions. Further, a comparison of model predictions with experimental data from literature correctly correlates with the specific CHO-DG44-derived cell line used experimentally, and the corresponding dual prices provide fruitful informationmore »concerning coupling relationships between nutrients. The UOFs approach is likely to be particularly suited for mammalian cells and other complex organisms which contain multiple distinct essential nutrient inputs, and may offer enhanced applicability for characterizing cell metabolism and physiology as well as media optimization and biomanufacturing control.

    « less
  2. Abstract Background

    Genome-scale metabolic network models and constraint-based modeling techniques have become important tools for analyzing cellular metabolism. Thermodynamically infeasible cycles (TICs) causing unbounded metabolic flux ranges are often encountered. TICs satisfy the mass balance and directionality constraints but violate the second law of thermodynamics. Current practices involve implementing additional constraints to ensure not only optimal but also loopless flux distributions. However, the mixed integer linear programming problems required to solve become computationally intractable for genome-scale metabolic models.


    We aimed to identify the fewest needed constraints sufficient for optimality under the loopless requirement. We found that loopless constraints are required only for the reactions that share elementary flux modes representing TICs with reactions that are part of the objective function. We put forth the concept of localized loopless constraints (LLCs) to enforce this minimal required set of loopless constraints. By combining with a novel procedure for minimal null-space calculation, the computational time for loopless flux variability analysis (ll-FVA) is reduced by a factor of 10–150 compared to the original loopless constraints and by 4–20 times compared to the current fastest method Fast-SNP with the percent improvement increasing with model size. Importantly, LLCs offer a scalable strategy for loopless flux calculations formore »multi-compartment/multi-organism models of large sizes, for example, shortening the CPU time for ll-FVA from 35 h to less than 2 h for a model with more than104 reactions.

    Availability and implementation

    Matlab functions are available in the Supplementary Material or at

    Supplementary information

    Supplementary data are available at Bioinformatics online.

    « less
  3. ABSTRACT Microbes face a trade-off between being metabolically independent and relying on neighboring organisms for the supply of some essential metabolites. This balance of conflicting strategies affects microbial community structure and dynamics, with important implications for microbiome research and synthetic ecology. A “gedanken” (thought) experiment to investigate this trade-off would involve monitoring the rise of mutual dependence as the number of metabolic reactions allowed in an organism is increasingly constrained. The expectation is that below a certain number of reactions, no individual organism would be able to grow in isolation and cross-feeding partnerships and division of labor would emerge. We implemented this idealized experiment using in silico genome-scale models. In particular, we used mixed-integer linear programming to identify trade-off solutions in communities of Escherichia coli strains. The strategies that we found revealed a large space of opportunities in nuanced and nonintuitive metabolic division of labor, including, for example, splitting the tricarboxylic acid (TCA) cycle into two separate halves. The systematic computation of possible solutions in division of labor for 1-, 2-, and 3-strain consortia resulted in a rich and complex landscape. This landscape displayed a nonlinear boundary, indicating that the loss of an intracellular reaction was not necessarily compensated formore »by a single imported metabolite. Different regions in this landscape were associated with specific solutions and patterns of exchanged metabolites. Our approach also predicts the existence of regions in this landscape where independent bacteria are viable but are outcompeted by cross-feeding pairs, providing a possible incentive for the rise of division of labor. IMPORTANCE Understanding how microbes assemble into communities is a fundamental open issue in biology, relevant to human health, metabolic engineering, and environmental sustainability. A possible mechanism for interactions of microbes is through cross-feeding, i.e., the exchange of small molecules. These metabolic exchanges may allow different microbes to specialize in distinct tasks and evolve division of labor. To systematically explore the space of possible strategies for division of labor, we applied advanced optimization algorithms to computational models of cellular metabolism. Specifically, we searched for communities able to survive under constraints (such as a limited number of reactions) that would not be sustainable by individual species. We found that predicted consortia partition metabolic pathways in ways that would be difficult to identify manually, possibly providing a competitive advantage over individual organisms. In addition to helping understand diversity in natural microbial communities, our approach could assist in the design of synthetic consortia.« less
  4. Chinese hamster ovary (CHO) cells are the most commonly used cell lines in biopharmaceutical manufacturing. Genome-scale metabolic models have become a valuable tool to study cellular metabolism. Despite the presence of reference global genome-scale CHO model, context-specific metabolic models may still be required for specific cell lines (for example, CHO-K1, CHO-S, and CHO-DG44), and for specific process conditions. Many integration algorithms have been available to reconstruct specific genome-scale models. These methods are mainly based on integrating omics data (i.e., transcriptomics, proteomics, and metabolomics) into reference genome-scale models. In the present study, we aimed to investigate the impact of time points of transcriptomics integration on the genome-scale CHO model by assessing the prediction of growth rates with each reconstructed model. We also evaluated the feasibility of applying extracted models to different cell lines (generated from the same parental cell line). Our findings illustrate that gene expression at various stages of culture slightly impacts the reconstructed models. However, the prediction capability is robust enough on cell growth prediction not only across different growth phases but also in expansion to other cell lines.
  5. Mackelprang, Rachel (Ed.)
    ABSTRACT Microbial acclimation to different temperature conditions can involve broad changes in cell composition and metabolic efficiency. A systems-level view of these metabolic responses in nonmesophilic organisms, however, is currently missing. In this study, thermodynamically constrained genome-scale models were applied to simulate the metabolic responses of a deep-sea psychrophilic bacterium, Shewanella psychrophila WP2, under suboptimal (4°C), optimal (15°C), and supraoptimal (20°C) growth temperatures. The models were calibrated with experimentally determined growth rates of WP2. Gibbs free energy change of reactions (Δ r G ′), metabolic fluxes, and metabolite concentrations were predicted using random simulations to characterize temperature-dependent changes in the metabolism. The modeling revealed the highest metabolic efficiency at the optimal temperature, and it suggested distinct patterns of ATP production and consumption that could lead to lower metabolic efficiency under suboptimal or supraoptimal temperatures. The modeling also predicted rearrangement of fluxes through multiple metabolic pathways, including the glycolysis pathway, Entner-Doudoroff pathway, tricarboxylic acid (TCA) cycle, and electron transport system, and these predictions were corroborated through comparisons to WP2 transcriptomes. Furthermore, predictions of metabolite concentrations revealed the potential conservation of reducing equivalents and ATP in the suboptimal temperature, consistent with experimental observations from other psychrophiles. Taken together, the WP2 models providedmore »mechanistic insights into the metabolism of a psychrophile in response to different temperatures. IMPORTANCE Metabolic flexibility is a central component of any organism’s ability to survive and adapt to changes in environmental conditions. This study represents the first application of thermodynamically constrained genome-scale models in simulating the metabolic responses of a deep-sea psychrophilic bacterium to various temperatures. The models predicted differences in metabolic efficiency that were attributed to changes in metabolic pathway utilization and metabolite concentration during growth under optimal and nonoptimal temperatures. Experimental growth measurements were used for model calibration, and temperature-dependent transcriptomic changes corroborated the model-predicted rearrangement of metabolic fluxes. Overall, this study highlights the utility of modeling approaches in studying the temperature-driven metabolic responses of an extremophilic organism.« less