skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Shortest Hyperpaths in Directed Hypergraphs for Reaction Pathway Inference
Signaling and metabolic pathways, which consist of chains of reactions that produce target molecules from source compounds, are cornerstones of cellular biology. Properly modeling the reaction networks that represent such pathways requires directed hypergraphs, where each molecule or compound maps to a vertex, and each reaction maps to a hyperedge directed from its set of input reactants to its set of output products. Inferring the most likely series of reactions that produces a given set of targets from a given set of sources, where for each reaction its reactants are produced by prior reactions in the series, corresponds to finding a shortest hyperpath in a directed hypergraph, which is NP-complete. We give the first exact algorithm for general shortest hyperpaths that can find provably optimal solutions for large, real-world, reaction networks. In particular, we derive a novel graph-theoretic characterization of hyperpaths, which we leverage in a new integer linear programming formulation of shortest hyperpaths that for the first time handles cycles, and develop a cutting-plane algorithm that can solve this integer linear program to optimality in practice. Through comprehensive experiments over all of the thousands of instances from the standard Reactome and NCI-PID reaction databases, we demonstrate that our cutting- plane algorithm quickly finds an optimal hyperpath—inferring the most likely pathway— with a median running time of under 10 seconds, and a maximum time of less than 30 minutes, even on instances with thousands of reactions. We also explore for the first time how well hyperpaths infer true pathways, and show that shortest hyperpaths accurately recover known pathways, typically with very high precision and recall. Source code implementing our cutting-plane algorithm for shortest hyperpaths is avail- able free for research use in a new tool called Mmunin.  more » « less
Award ID(s):
2041613
PAR ID:
10530392
Author(s) / Creator(s):
;
Publisher / Repository:
Mary Ann Liebert, Inc.
Date Published:
Journal Name:
Journal of Computational Biology
Volume:
30
Issue:
11
ISSN:
1557-8666
Page Range / eLocation ID:
1198 to 1225
Subject(s) / Keyword(s):
cell signaling networks combinatorial optimization cutting plane algorithms directed hypergraphs inferring pathways integer linear programming metabolic networks shortest hyperpaths systems biology
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Signaling and metabolic pathways, which consist of a series of reactions producing target molecules from source compounds, are cornerstones of cellular biology. The cellular reaction networks containing such pathways can be precisely modeled by directed hypergraphs, where each reaction corresponds to a hyperedge, directed from its set of reactants to its set of products. Given such a network represented by a directed hypergraph, inferring the most likely set of reactions that produce a given target from a given set of sources corresponds to finding a shortest hyperpath, which is NP-complete. The best methods currently available for shortest hyperpaths either offer no guarantee of optimality, or exclude hyperpaths containing cycles even though cycles are abundant in real biological pathways. We derive a novel graph-theoretic characterization of hyperpaths, leveraged in a new formulation of the general shortest hyperpath problem as an integer linear program that for the first time handles hyperpaths containing cycles, and present a novel cutting-plane algorithm that can solve this integer program to optimality in practice. This represents a major advance over the best prior exact algorithm, which was limited to acyclic hyperpaths (and hence fails to find a solution for the many biological instances where all hyperpaths are in fact cyclic). In comprehensive experiments over thousands of instances from the standard NCI-PID and Reactome databases, we demonstrate that our cutting-plane algorithm quickly finds an optimal hyperpath, with a median running-time of under ten seconds and a maximum time of around thirty minutes, even on large instances with many thousands of reactions. Source code implementing our cutting-plane algorithm for shortest hyperpaths in a new tool called Mmunin is available free for research use at http://mmunin.cs.arizona.edu. 
    more » « less
  2. Carbone, Alessandra; El-Kebir, Mohammed (Ed.)
    Cell signaling pathways, which are a series of reactions that start at receptors and end at transcription factors, are basic to systems biology. Properly modeling the reactions in such pathways requires directed hypergraphs, where an edge is now directed between two sets of vertices. Inferring a pathway by the most parsimonious series of reactions then corresponds to finding a shortest hyperpath in a directed hypergraph, which is NP-complete. The state of the art for shortest hyperpaths in cell-signaling hypergraphs solves a mixed-integer linear program to find an optimal hyperpath that is restricted to be acyclic, and offers no efficiency guarantees. We present for the first time a heuristic for general shortest hyperpaths that properly handles cycles, and is guaranteed to be efficient. Its accuracy is demonstrated through exhaustive experiments on all instances from the standard NCI-PID and Reactome pathway databases, which show the heuristic finds a hyperpath that matches the state-of-the-art mixed-integer linear program on over 99% of all instances that are acyclic. On instances where only cyclic hyperpaths exist, the heuristic surpasses the state-of-the-art, which finds no solution; on every such cyclic instance, enumerating all possible hyperpaths shows that the solution found by the heuristic is in fact optimal. 
    more » « less
  3. Abstract Background Cell signaling pathways, which are a series of reactions that start at receptors and end at transcription factors, are basic to systems biology. Properly modeling the reactions in such pathways requires directed hypergraphs , where an edge is now directed between two sets of vertices. Inferring a pathway by the most parsimonious series of reactions corresponds to finding a shortest hyperpath in a directed hypergraph, which is NP-complete. The current state-of-the-art for shortest hyperpaths in cell signaling hypergraphs solves a mixed-integer linear program to find an optimal hyperpath that is restricted to be acyclic, and offers no efficiency guarantees. Results We present, for the first time, a heuristic for general shortest hyperpaths that properly handles cycles , and is guaranteed to be efficient . We show the heuristic finds provably optimal hyperpaths for the class of singleton-tail hypergraphs, and also give a practical algorithm for tractably generating all source-sink hyperpaths. The accuracy of the heuristic is demonstrated through comprehensive experiments on all source-sink instances from the standard NCI-PID and Reactome pathway databases, which show it finds a hyperpath that matches the state-of-the-art mixed-integer linear program on over 99% of all instances that are acyclic. On instances where only cyclic hyperpaths exist, the heuristic surpasses the state-of-the-art, which finds no solution; on every such cyclic instance, enumerating all source-sink hyperpaths shows the solution found by the heuristic was in fact optimal . Conclusions The new shortest hyperpath heuristic is both fast and accurate . This makes finding source-sink hyperpaths, which in general may contain cycles, now practical for real cell signaling networks. Availability Source code for the hyperpath heuristic in a new tool we call  (as well as for hyperpath enumeration, and all dataset instances) is available free for non-commercial use at . 
    more » « less
  4. Ma, J (Ed.)
    Perhaps the most fundamental model in synthetic and sys- tems biology for inferring pathways in metabolic reaction networks is a metabolic factory: a system of reactions that starts from a set of source compounds and produces a set of target molecules, while conserving or not depleting intermediate metabolites. Finding a shortest factory—that minimizes a sum of real-valued weights on its reactions to infer the most likely pathway—is NP-complete. The current state-of-the-art for shortest factories solves a mixed-integer linear program with a major drawback: it requires the user to set a critical parameter, where too large a value can make optimal solutions infeasible, while too small a value can yield degenerate solutions due to numerical error. We present the first robust algorithm for optimal factories that is both parameter-free (relieving the user from determining a parameter setting) and degeneracy-free (guaranteeing it finds an optimal nondegen- erate solution). We also give for the first time a complete characterization of the graph-theoretic structure of shortest factories via cuts of hyper- graphs that reveals two important classes of degenerate solutions which were overlooked and potentially output by the prior state-of-the-art. In addition we settle the relationship between the two established pathway models of hyperpaths and factories by proving that hyperpaths are actu- ally a subclass of factories. Comprehensive experiments over all instances from the standard metabolic reaction databases in the literature demon- strate our algorithm is fast in practice, quickly finding optimal factories in large real-world networks containing thousands of reactions. A preliminary implementation of our algorithm for robust optimal factories in a new tool called Freeia is available free for research use at http://freeia.cs.arizona.edu. 
    more » « less
  5. We consider a variant of the vehicle routing problem (VRP) where each customer has a unit demand and the goal is to minimize the total cost of routing a fleet of capacitated vehicles from one or multiple depots to visit all customers. We propose two parallel algorithms to efficiently solve the column-generation-based linear-programming relaxation for this VRP. Specifically, we focus on algorithms for the “pricing problem,” which corresponds to the resource-constrained elementary shortest path problem. The first algorithm extends the pulse algorithm for which we derive a new bounding scheme on the maximum load of any route. The second algorithm is based on random coloring from parameterized complexity which can be also combined with other techniques in the literature for improving VRPs, including cutting planes and column enumeration. We conduct numerical studies using VRP benchmarks (with 50–957 nodes) and instances of a medical home care delivery problem using census data in Wayne County, Michigan. Using parallel computing, both pulse and random coloring can significantly improve column generation for solving the linear programming relaxations and we can obtain heuristic integer solutions with small optimality gaps. Combining random coloring with column enumeration, we can obtain improved integer solutions having less than 2% optimality gaps for most VRP benchmark instances and less than 1% optimality gaps for the medical home care delivery instances, both under a 30-minute computational time limit. The use of cutting planes (e.g., robust cuts) can further reduce optimality gaps on some hard instances, without much increase in the run time. Summary of Contribution: The vehicle routing problem (VRP) is a fundamental combinatorial problem, and its variants have been studied extensively in the literature of operations research and computer science. In this paper, we consider general-purpose algorithms for solving VRPs, including the column-generation approach for the linear programming relaxations of the integer programs of VRPs and the column-enumeration approach for seeking improved integer solutions. We revise the pulse algorithm and also propose a random-coloring algorithm that can be used for solving the elementary shortest path problem that formulates the pricing problem in the column-generation approach. We show that the parallel implementation of both algorithms can significantly improve the performance of column generation and the random coloring algorithm can improve the solution time and quality of the VRP integer solutions produced by the column-enumeration approach. We focus on algorithmic design for VRPs and conduct extensive computational tests to demonstrate the performance of various approaches. 
    more » « less