skip to main content


Title: Accelerating flux balance calculations in genome-scale metabolic models by localizing the application of loopless constraints
Abstract Background

Genome-scale metabolic network models and constraint-based modeling techniques have become important tools for analyzing cellular metabolism. Thermodynamically infeasible cycles (TICs) causing unbounded metabolic flux ranges are often encountered. TICs satisfy the mass balance and directionality constraints but violate the second law of thermodynamics. Current practices involve implementing additional constraints to ensure not only optimal but also loopless flux distributions. However, the mixed integer linear programming problems required to solve become computationally intractable for genome-scale metabolic models.

Results

We aimed to identify the fewest needed constraints sufficient for optimality under the loopless requirement. We found that loopless constraints are required only for the reactions that share elementary flux modes representing TICs with reactions that are part of the objective function. We put forth the concept of localized loopless constraints (LLCs) to enforce this minimal required set of loopless constraints. By combining with a novel procedure for minimal null-space calculation, the computational time for loopless flux variability analysis (ll-FVA) is reduced by a factor of 10–150 compared to the original loopless constraints and by 4–20 times compared to the current fastest method Fast-SNP with the percent improvement increasing with model size. Importantly, LLCs offer a scalable strategy for loopless flux calculations for multi-compartment/multi-organism models of large sizes, for example, shortening the CPU time for ll-FVA from 35 h to less than 2 h for a model with more than104 reactions.

Availability and implementation

Matlab functions are available in the Supplementary Material or at https://github.com/maranasgroup/lll-FVA

Supplementary information

Supplementary data are available at Bioinformatics online.

 
more » « less
NSF-PAR ID:
10393428
Author(s) / Creator(s):
; ; ; ;
Publisher / Repository:
Oxford University Press
Date Published:
Journal Name:
Bioinformatics
Volume:
34
Issue:
24
ISSN:
1367-4803
Page Range / eLocation ID:
p. 4248-4255
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract Summary

    In constraint-based metabolic modelling, physical and biochemical constraints define a polyhedral convex set of feasible flux vectors. Uniform sampling of this set provides an unbiased characterization of the metabolic capabilities of a biochemical network. However, reliable uniform sampling of genome-scale biochemical networks is challenging due to their high dimensionality and inherent anisotropy. Here, we present an implementation of a new sampling algorithm, coordinate hit-and-run with rounding (CHRR). This algorithm is based on the provably efficient hit-and-run random walk and crucially uses a preprocessing step to round the anisotropic flux set. CHRR provably converges to a uniform stationary sampling distribution. We apply it to metabolic networks of increasing dimensionality. We show that it converges several times faster than a popular artificial centering hit-and-run algorithm, enabling reliable and tractable sampling of genome-scale biochemical networks.

    Availability and Implementation

    https://github.com/opencobra/cobratoolbox.

    Supplementary information

    Supplementary data are available at Bioinformatics online.

     
    more » « less
  2. Abstract

    Constraint‐based reconstruction and analysis (COBRA) modeling results can be difficult to interpret given the large numbers of reactions in genome‐scale models. While paths in metabolic networks can be found, existing methods are not easily combined with constraint‐based approaches. To address this limitation, two tools (MapMaker and PathTracer) were developed to find paths (including cycles) between metabolites, where each step transfers carbon from reactant to product. MapMaker predicts carbon transfer maps (CTMs) between metabolites using only information on molecular formulae and reaction stoichiometry, effectively determining which reactants and products share carbon atoms. MapMaker correctly assigned CTMs for over 97% of the 2,251 reactions in anEscherichia colimetabolic model (iJO1366). Using CTMs as inputs, PathTracer finds paths between two metabolites. PathTracer was applied to iJO1366 to investigate the importance of using CTMs and COBRA constraints when enumerating paths, to find active and high flux paths in flux balance analysis (FBA) solutions, to identify paths for putrescine utilization, and to elucidate a potential CO2fixation pathway inE. coli. These results illustrate how MapMaker and PathTracer can be used in combination with constraint‐based models to identify feasible, active, and high flux paths between metabolites.

     
    more » « less
  3. Abstract Motivation

    A chronogram is a dated phylogenetic tree whose branch lengths have been scaled to represent time. Such chronograms are computed based on available date estimates (e.g. from dated fossils), which provide absolute time constraints for one or more nodes of an input undated phylogeny, coupled with an appropriate underlying model for evolutionary rates variation along the branches of the phylogeny. However, traditional methods for phylogenetic dating cannot take into account relative time constraints, such as those provided by inferred horizontal transfer events. In many cases, chronograms computed using only absolute time constraints are inconsistent with known relative time constraints.

    Results

    In this work, we introduce a new approach, Dating Trees using Relative constraints (DaTeR), for phylogenetic dating that can take into account both absolute and relative time constraints. The key idea is to use existing Bayesian approaches for phylogenetic dating to sample posterior chronograms satisfying desired absolute time constraints, minimally adjust or ‘error-correct’ these sampled chronograms to satisfy all given relative time constraints, and aggregate across all error-corrected chronograms. DaTeR uses a constrained optimization framework for the error-correction step, finding minimal deviations from previously assigned dates or branch lengths. We applied DaTeR to a biological dataset of 170 Cyanobacterial taxa and a reliable set of 24 transfer-based relative constraints, under six different molecular dating models. Our extensive analysis of this dataset demonstrates that DaTeR is both highly effective and scalable and that its application can significantly improve estimated chronograms.

    Availability and implementation

    Freely available from https://compbio.engr.uconn.edu/software/dater/

    Supplementary information

    Supplementary data are available at Bioinformatics online.

     
    more » « less
  4. Abstract Motivation

    Predicting the secondary structure of an ribonucleic acid (RNA) sequence is useful in many applications. Existing algorithms [based on dynamic programming] suffer from a major limitation: their runtimes scale cubically with the RNA length, and this slowness limits their use in genome-wide applications.

    Results

    We present a novel alternative O(n3)-time dynamic programming algorithm for RNA folding that is amenable to heuristics that make it run in O(n) time and O(n) space, while producing a high-quality approximation to the optimal solution. Inspired by incremental parsing for context-free grammars in computational linguistics, our alternative dynamic programming algorithm scans the sequence in a left-to-right (5′-to-3′) direction rather than in a bottom-up fashion, which allows us to employ the effective beam pruning heuristic. Our work, though inexact, is the first RNA folding algorithm to achieve linear runtime (and linear space) without imposing constraints on the output structure. Surprisingly, our approximate search results in even higher overall accuracy on a diverse database of sequences with known structures. More interestingly, it leads to significantly more accurate predictions on the longest sequence families in that database (16S and 23S Ribosomal RNAs), as well as improved accuracies for long-range base pairs (500+ nucleotides apart), both of which are well known to be challenging for the current models.

    Availability and implementation

    Our source code is available at https://github.com/LinearFold/LinearFold, and our webserver is at http://linearfold.org (sequence limit: 100 000nt).

    Supplementary information

    Supplementary data are available at Bioinformatics online.

     
    more » « less
  5. Abstract

    Constraint-based modeling has been applied to analyze metabolism of numerous organisms via flux balance analysis and genome-scale metabolic models, including mammalian cells such as the Chinese hamster ovary (CHO) cells—the principal cell factory platform for therapeutic protein production. Unfortunately, the application of genome-scale model methodologies using the conventional biomass objective function is challenged by the presence of overly-restrictive constraints, including essential amino acid exchange fluxes that can lead to improper predictions of growth rates and intracellular flux distributions. In this study, these constraints are found to be reliably predicted by an “essential nutrient minimization” approach. After modifying these constraints with the predicted minimal uptake values, a series of unconventional objective functions are applied to minimize each individual non-essential nutrient uptake rate, revealing useful insights about metabolic exchange rates and flows across different cell lines and culture conditions. This unconventional uptake-rate objective functions (UOFs) approach is able to distinguish metabolic differences between three distinct CHO cell lines (CHO-K1, -DG44, and -S) not directly observed using the conventional biomass growth maximization solutions. Further, a comparison of model predictions with experimental data from literature correctly correlates with the specific CHO-DG44-derived cell line used experimentally, and the corresponding dual prices provide fruitful information concerning coupling relationships between nutrients. The UOFs approach is likely to be particularly suited for mammalian cells and other complex organisms which contain multiple distinct essential nutrient inputs, and may offer enhanced applicability for characterizing cell metabolism and physiology as well as media optimization and biomanufacturing control.

     
    more » « less