skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Leibniz International Proceedings in Informatics (LIPIcs):23rd International Workshop on Algorithms in Bioinformatics (WABI 2023)
Not only do many biological populations undergo evolution, but population members may also migrate from one location to another. For example, tumor cells may migrate from the primary tumor and seed a new metastasis, and pathogens may migrate from one host to another. One may represent a population’s migration history by labeling the vertices of a given phylogeny T with locations such that an edge incident to vertices with distinct locations represents a migration. Additionally, in some biological populations, taxa from distinct lineages may comigrate from one location to another in a single event, a phenomenon known as a comigration. Here, we show that a previous problem statement for inferring migration histories that are parsimonious in terms of migrations and comigrations may lead to temporally inconsistent solutions. To remedy this deficiency, we introduce precise definitions of temporal consistency of comigrations in a phylogeny, leading to three successive problems. First, we formulate the Temporally Consistent Comigrations (TCC) problem to check if a set of comigrations is temporally consistent and provide a linear time algorithm for solving this problem. Second, we formulate the Parsimonious Consistent Comigration (PCC) problem, which aims to find comigrations given a location labeling of a phylogeny. We show that PCC is NP-hard. Third, we formulate the Parsimonious Consistent Comigration History (PCCH) problem, which infers the migration history given a phylogeny and locations of its extant vertices only. We show that PCCH is NP-hard as well. On the positive side, we propose integer linear programming models to solve the PCC and PCCH problems. We apply our approach to real and simulated data.  more » « less
Award ID(s):
2046488
PAR ID:
10500546
Author(s) / Creator(s):
; ;
Editor(s):
Belazzougui, Djamal; Ouangraoua, Aïda
Publisher / Repository:
Schloss Dagstuhl – Leibniz-Zentrum für Informatik
Date Published:
Subject(s) / Keyword(s):
Metastasis Migration Integer Linear Programming Maximum parsimony Applied computing → Computational biology
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. In the Maximum Independent Set problem we are asked to find a set of pairwise nonadjacent vertices in a given graph with the maximum possible cardinality. In general graphs, this classical problem is known to be NP-hard and hard to approximate within a factor of n1−ε for any ε > 0. Due to this, investigating the complexity of Maximum Independent Set in various graph classes in hope of finding better tractability results is an active research direction. In H-free graphs, that is, graphs not containing a fixed graph H as an induced subgraph, the problem is known to remain NP-hard and APX-hard whenever H contains a cycle, a vertex of degree at least four, or two vertices of degree at least three in one connected component. For the remaining cases, where every component of H is a path or a subdivided claw, the complexity of Maximum Independent Set remains widely open, with only a handful of polynomial-time solvability results for small graphs H such as P5, P6, the claw, or the fork. We prove that for every such “possibly tractable” graph H there exists an algorithm that, given an H-free graph G and an accuracy parameter ε > 0, finds an independent set in G of cardinality within a factor of (1 – ε) of the optimum in time exponential in a polynomial of log | V(G) | and ε−1. That is, we show that for every graph H for which Maximum Independent Set is not known to be APX-hard in H-free graphs, the problem admits a quasi-polynomial time approximation scheme in this graph class. Our algorithm works also in the more general weighted setting, where the input graph is supplied with a weight function on vertices and we are maximizing the total weight of an independent set. 
    more » « less
  2. null (Ed.)
    Abstract Motivation The combination of genomic and epidemiological data holds the potential to enable accurate pathogen transmission history inference. However, the inference of outbreak transmission histories remains challenging due to various factors such as within-host pathogen diversity and multi-strain infections. Current computational methods ignore within-host diversity and/or multi-strain infections, often failing to accurately infer the transmission history. Thus, there is a need for efficient computational methods for transmission tree inference that accommodate the complexities of real data. Results We formulate the direct transmission inference (DTI) problem for inferring transmission trees that support multi-strain infections given a timed phylogeny and additional epidemiological data. We establish hardness for the decision and counting version of the DTI problem. We introduce Transmission Tree Uniform Sampler (TiTUS), a method that uses SATISFIABILITY to almost uniformly sample from the space of transmission trees. We introduce criteria that prioritize parsimonious transmission trees that we subsequently summarize using a novel consensus tree approach. We demonstrate TiTUS’s ability to accurately reconstruct transmission trees on simulated data as well as a documented HIV transmission chain. Availability and implementation https://github.com/elkebir-group/TiTUS. Supplementary information Supplementary data are available at Bioinformatics online. 
    more » « less
  3. null (Ed.)
    Abstract Motivation While each cancer is the result of an isolated evolutionary process, there are repeated patterns in tumorigenesis defined by recurrent driver mutations and their temporal ordering. Such repeated evolutionary trajectories hold the potential to improve stratification of cancer patients into subtypes with distinct survival and therapy response profiles. However, current cancer phylogeny methods infer large solution spaces of plausible evolutionary histories from the same sequencing data, obfuscating repeated evolutionary patterns. Results To simultaneously resolve ambiguities in sequencing data and identify cancer subtypes, we propose to leverage common patterns of evolution found in patient cohorts. We first formulate the Multiple Choice Consensus Tree problem, which seeks to select a tumor tree for each patient and assign patients into clusters in such a way that maximizes consistency within each cluster of patient trees. We prove that this problem is NP-hard and develop a heuristic algorithm, Revealing Evolutionary Consensus Across Patients (RECAP), to solve this problem in practice. Finally, on simulated data, we show RECAP outperforms existing methods that do not account for patient subtypes. We then use RECAP to resolve ambiguities in patient trees and find repeated evolutionary trajectories in lung and breast cancer cohorts. Availability and implementation https://github.com/elkebir-group/RECAP. Supplementary information Supplementary data are available at Bioinformatics online. 
    more » « less
  4. Cognitive radio (CR) technology is envisioned to use wireless spectrum opportunistically when the primary user (PU) is not using it. In cognitive radio ad-hoc networks (CRAHNs), the mobile users form a distributed multi-hop network using the unused spectrum. The qualities of the channels are different in different locations. When a user moves from one place to another, it needs to switch the channel to maintain the quality-of-service (QoS) required by different applications. The QoS of a channel depends on the amount of usage. A user can select the channels that meet the QoS requirement during its movement. In this paper, we study the mobility patterns of users, predict their next locations and probabilities to move there based on its history. We extract the mobility patterns from each user’s location history and match the recent trajectory with the patterns to find future locations. We construct a spectrum database using Wi-Fi access point location data and the free space path loss formula. We propose a machine learning-based mechanism to predict spectrum status of some missing locations in the spectrum database. We formulate a problem to select the current channel in order to minimize the total number of channel switches during a certain number of next moves of a user. We conduct an extensive simulation combining real and synthetic datasets to support our model. 
    more » « less
  5. null (Ed.)
    Consider the following two fundamental open problems in complexity theory: • Does a hard-on-average language in NP imply the existence of one-way functions? • Does a hard-on-average language in NP imply a hard-on-average problem in TFNP (i.e., the class of total NP search problem)? Our main result is that the answer to (at least) one of these questions is yes. Both one-way functions and problems in TFNP can be interpreted as promise-true distributional NP search problems—namely, distributional search problems where the sampler only samples true statements. As a direct corollary of the above result, we thus get that the existence of a hard-on-average distributional NP search problem implies a hard-on-average promise-true distributional NP search problem. In other words, It is no easier to find witnesses (a.k.a. proofs) for efficiently-sampled statements (theorems) that are guaranteed to be true. This result follows from a more general study of interactive puzzles—a generalization of average-case hardness in NP—and in particular, a novel round-collapse theorem for computationallysound protocols, analogous to Babai-Moran’s celebrated round-collapse theorem for informationtheoretically sound protocols. As another consequence of this treatment, we show that the existence of O(1)-round public-coin non-trivial arguments (i.e., argument systems that are not proofs) imply the existence of a hard-on-average problem in NP/poly. 
    more » « less