skip to main content


Title: QuTIE: quantum optimization for target identification by enzymes
Abstract Summary

Target identification by enzymes (TIE) problem aims to identify the set of enzymes in a given metabolic network, such that their inhibition eliminates a given set of target compounds associated with a disease while incurring minimum damage to the rest of the compounds. This is a NP-hard problem, and thus optimal solutions using classical computers fail to scale to large metabolic networks. In this article, we develop the first quantum optimization solution, called QuTIE (quantum optimization for target identification by enzymes), to this NP-hard problem. We do that by developing an equivalent formulation of the TIE problem in quadratic unconstrained binary optimization form. We then map it to a logical graph, and embed the logical graph on a quantum hardware graph. Our experimental results on 27 metabolic networks from Escherichia coli, Homo sapiens, and Mus musculus show that QuTIE yields solutions that are optimal or almost optimal. Our experiments also demonstrate that QuTIE can successfully identify enzyme targets already verified in wet-lab experiments for 14 major disease classes.

Availability and implementation

Code and sample data are available at: https://github.com/ngominhhoang/Quantum-Target-Identification-by-Enzymes.

 
more » « less
Award ID(s):
2111679
NSF-PAR ID:
10468327
Author(s) / Creator(s):
; ;
Editor(s):
Lengauer, Thomas
Publisher / Repository:
Oxford Academic
Date Published:
Journal Name:
Bioinformatics Advances
Volume:
3
Issue:
1
ISSN:
2635-0041
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract Motivation

    A factory in a metabolic network specifies how to produce target molecules from source compounds through biochemical reactions, properly accounting for reaction stoichiometry to conserve or not deplete intermediate metabolites. While finding factories is a fundamental problem in systems biology, available methods do not consider the number of reactions used, nor address negative regulation.

    Methods

    We introduce the new problem of finding optimal factories that use the fewest reactions, for the first time incorporating both first- and second-order negative regulation. We model this problem with directed hypergraphs, prove it is NP-complete, solve it via mixed-integer linear programming, and accommodate second-order negative regulation by an iterative approach that generates next-best factories.

    Results

    This optimization-based approach is remarkably fast in practice, typically finding optimal factories in a few seconds, even for metabolic networks involving tens of thousands of reactions and metabolites, as demonstrated through comprehensive experiments across all instances from standard reaction databases.

    Availability and implementation

    Source code for an implementation of our new method for optimal factories with negative regulation in a new tool called Odinn, together with all datasets, is available free for non-commercial use at http://odinn.cs.arizona.edu.

     
    more » « less
  2. Signaling and metabolic pathways, which consist of a series of reactions producing target molecules from source compounds, are cornerstones of cellular biology. The cellular reaction networks containing such pathways can be precisely modeled by directed hypergraphs, where each reaction corresponds to a hyperedge, directed from its set of reactants to its set of products. Given such a network represented by a directed hypergraph, inferring the most likely set of reactions that produce a given target from a given set of sources corresponds to finding a shortest hyperpath, which is NP-complete. The best methods currently available for shortest hyperpaths either offer no guarantee of optimality, or exclude hyperpaths containing cycles even though cycles are abundant in real biological pathways. We derive a novel graph-theoretic characterization of hyperpaths, leveraged in a new formulation of the general shortest hyperpath problem as an integer linear program that for the first time handles hyperpaths containing cycles, and present a novel cutting-plane algorithm that can solve this integer program to optimality in practice. This represents a major advance over the best prior exact algorithm, which was limited to acyclic hyperpaths (and hence fails to find a solution for the many biological instances where all hyperpaths are in fact cyclic). In comprehensive experiments over thousands of instances from the standard NCI-PID and Reactome databases, we demonstrate that our cutting-plane algorithm quickly finds an optimal hyperpath, with a median running-time of under ten seconds and a maximum time of around thirty minutes, even on large instances with many thousands of reactions. Source code implementing our cutting-plane algorithm for shortest hyperpaths in a new tool called Mmunin is available free for research use at http://mmunin.cs.arizona.edu. 
    more » « less
  3. LaValle, Steve M. ; Lin, Ming ; Ojala, Timo ; Shell, Dylan ; Yu, Jingjin (Ed.)
    The line coverage problem is the task of servicing a given set of one-dimensional features in an environment. Its applications include the inspection of road networks, power lines, and oil and gas lines. The line coverage problem is a generalization of the standard arc routing problems, and is NP-hard in general. We address the single robot line coverage problem where the service and deadhead costs are distinct and asymmetric. We model the problem as an optimization problem that minimizes the total cost of travel on a given graph. We present approximation algorithms to obtain bounded solutions efficiently, using the minimum cost flow problem. We build the main algorithm in stages by considering three simpler subproblems. The subproblems are based on the structure of the required graph, i.e., the graph induced by the features that require servicing. We fi rst present an optimal algorithm for the case of Eulerian graphs with only required edges. Next we consider general graphs, not necessarily Eulerian, with only required edges and present a 2-approximation algorithm. Finally, we consider the general case with both required and non-required edges. The approximation algorithm is dependent on the Asymmetric Traveling Salesperson Problem (ATSP), and is bounded by alpha(C) + 2, where alpha(C) is the approximation factor of the ATSP algorithm with C connected components. Our upper bound is also an improvement over the existing results for the asymmetric rural postman problem. 
    more » « less
  4. Abstract Motivation

    Cancer phylogenies are key to studying tumorigenesis and have clinical implications. Due to the heterogeneous nature of cancer and limitations in current sequencing technology, current cancer phylogeny inference methods identify a large solution space of plausible phylogenies. To facilitate further downstream analyses, methods that accurately summarize such a set T of cancer phylogenies are imperative. However, current summary methods are limited to a single consensus tree or graph and may miss important topological features that are present in different subsets of candidate trees.

    Results

    We introduce the Multiple Consensus Tree (MCT) problem to simultaneously cluster T and infer a consensus tree for each cluster. We show that MCT is NP-hard, and present an exact algorithm based on mixed integer linear programming (MILP). In addition, we introduce a heuristic algorithm that efficiently identifies high-quality consensus trees, recovering all optimal solutions identified by the MILP in simulated data at a fraction of the time. We demonstrate the applicability of our methods on both simulated and real data, showing that our approach selects the number of clusters depending on the complexity of the solution space T.

    Availability and implementation

    https://github.com/elkebir-group/MCT.

    Supplementary information

    Supplementary data are available at Bioinformatics online.

     
    more » « less
  5. Influence maximization aims to select k most-influential vertices or seeds in a network, where influence is defined by a given diffusion process. Although computing optimal seed set is NP-Hard, efficient approximation algorithms exist. However, even state-of-the-art parallel implementations are limited by a sampling step that incurs large memory footprints. This in turn limits the problem size reach and approximation quality. In this work, we study the memory footprint of the sampling process collecting reverse reachability information in the IMM (Influence Maximization via Martingales) algorithm over large real-world social networks. We present a memory-efficient optimization approach (called HBMax) based on Ripples, a state-of-the-art multi-threaded parallel influence maximization solution. Our approach, HBMax, uses a portion of the reverse reachable (RR) sets collected by the algorithm to learn the characteristics of the graph. Then, it compresses the intermediate reverse reachability information with Huffman coding or bitmap coding, and queries on the partially decoded data, or directly on the compressed data to preserve the memory savings obtained through compression. Considering a NUMA architecture, we scale up our solution on 64 CPU cores and reduce the memory footprint by up to 82.1% with average 6.3% speedup (encoding overhead is offset by performance gain from memory reduction) without loss of accuracy. For the largest tested graph Twitter7 (with 1.4 billion edges), HBMax achieves 5.9× compression ratio and 2.2× speedup. 
    more » « less