skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Search for: All records

Editors contains: "Puglisi, Simon J"

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Inenaga, Shunsuke; Puglisi, Simon J (Ed.)
    Within the field of haplotype analysis, the Positional Burrows-Wheeler Transform (PBWT) stands out as a key innovation, addressing numerous challenges in genomics. For example, Sanaullah et al. introduced a PBWT-based method that addresses the haplotype threading problem, which involves representing a query haplotype through a minimal set of substrings. To solve this problem using the PBWT data structure, they formulate the Minimal Positional Substring Cover (MPSC) problem, and then, subsequently present a solution for it. Additionally, they present and solve several variants of this problem: k-MPSC, leftmost MPSC, rightmost MPSC, and length-maximal MPSC. Yet, a full PBWT is required for each of their solutions, which yields a significant memory usage requirement. Here, we take advantage of the latest results on run-length encoding the PBWT, to solve the MPSC in a sublinear amount of space. Our methods involve demonstrating that k-Set Maximal Exact Matches (k-SMEMs) can be computed in a sublinear amount of space via efficient computation of k-Matching Statistics (k-MS). This leads to a solution that requires sublinear space for, not only the MPSC problem, but for all its variations proposed by Sanaullah et al. Most importantly, we present experimental results on haplotype panels from the 1000 Genomes Project data that show the utility of these theoretical results. We conclusively demonstrate that our approach markedly decreases the memory required to solve the MPSC problem, achieving a reduction of at least two orders of magnitude compared to the method proposed by Sanaullah et al. This efficiency allows us to solve the problem on large versions of the problem, where other methods are unable to scale to. In summary, the creation of {μ}-PBWT paves the way for new possibilities in conducting in-depth genetic research and analysis on a large scale. All source code is publicly available at https://github.com/dlcgold/muPBWT/tree/k-smem. 
    more » « less
  2. Gørtz, Inge Li; Farach-Colton, Martin; Puglisi, Simon J.; Herman, Grzegorz (Ed.)
    In this paper, we study efficient parallel edit distance algorithms, both in theory and in practice. Given two strings A[1..n] and B[1..m], and a set of operations allowed to edit the strings, the edit distance between A and B is the minimum number of operations required to transform A into B. In this paper, we use edit distance to refer to the Levenshtein distance, which allows for unit-cost single-character edits (insertions, deletions, substitutions). Sequentially, a standard Dynamic Programming (DP) algorithm solves edit distance with Θ(nm) cost. In many real-world applications, the strings to be compared are similar to each other and have small edit distances. To achieve highly practical implementations, we focus on output-sensitive parallel edit-distance algorithms, i.e., to achieve asymptotically better cost bounds than the standard Θ(nm) algorithm when the edit distance is small. We study four algorithms in the paper, including three algorithms based on Breadth-First Search (BFS), and one algorithm based on Divide-and-Conquer (DaC). Our BFS-based solution is based on the Landau-Vishkin algorithm. We implement three different data structures for the longest common prefix (LCP) queries needed in the algorithm: the classic solution using parallel suffix array, and two hash-based solutions proposed in this paper. Our DaC-based solution is inspired by the output-insensitive solution proposed by Apostolico et al., and we propose a non-trivial adaption to make it output-sensitive. All of the algorithms studied in this paper have good theoretical guarantees, and they achieve different tradeoffs between work (total number of operations), span (longest dependence chain in the computation), and space. We test and compare our algorithms on both synthetic data and real-world data, including DNA sequences, Wikipedia texts, GitHub repositories, etc. Our BFS-based algorithms outperform the existing parallel edit-distance implementation in ParlayLib in all test cases. On cases with fewer than 10⁵ edits, our algorithm can process input sequences of size 10⁹ in about ten seconds, while ParlayLib can only process sequences of sizes up to 10⁶ in the same amount of time. By comparing our algorithms, we also provide a better understanding of the choice of algorithms for different input patterns. We believe that our paper is the first systematic study in the theory and practice of parallel edit distance. 
    more » « less
  3. Gørtz, Inge Li; Farach-Colton, Martin; Puglisi, Simon J; Herman, Grzegorz (Ed.)
    We consider variants of the classic Multiway Cut problem. Multiway Cut asks to partition a graph G into k parts so as to separate k given terminals. Recently, Chandrasekaran and Wang (ESA 2021) introduced l_p-norm Multiway Cut, a generalization of the problem, in which the goal is to minimize the l_p norm of the edge boundaries of k parts. We provide an O(log^{1/2} n log^{1/2 + 1/p} k) approximation algorithm for this problem, improving upon the approximation guarantee of O(log^{3/2} n log^{1/2} k) due to Chandrasekaran and Wang. We also introduce and study Norm Multiway Cut, a further generalization of Multiway Cut. We assume that we are given access to an oracle, which answers certain queries about the norm. We present an O(log^{1/2} n log^{7/2} k) approximation algorithm with a weaker oracle and an O(log^{1/2} n log^{5/2} k) approximation algorithm with a stronger oracle. Additionally, we show that without any oracle access, there is no n^{1/4-ε} approximation algorithm for every ε > 0 assuming the Hypergraph Dense-vs-Random Conjecture. 
    more » « less
  4. Gørtz, Inge Li; Farach-Colton, Martin; Puglisi, Simon J.; Herman, Grzegorz (Ed.)
    We present efficient algorithms for solving systems of linear equations in 1-Laplacians of well-shaped simplicial complexes. 1-Laplacians, or higher-dimensional Laplacians, generalize graph Laplacians to higher-dimensional simplicial complexes and play a key role in computational topology and topological data analysis. Previously, nearly-linear time solvers were developed for simplicial complexes with known collapsing sequences and bounded Betti numbers, such as those triangulating a three-ball in ℝ³ (Cohen, Fasy, Miller, Nayyeri, Peng, and Walkington [SODA'2014], Black, Maxwell, Nayyeri, and Winkelman [SODA'2022], Black and Nayyeri [ICALP'2022]). Furthermore, Nested Dissection provides quadratic time solvers for more general systems with nonzero structures representing well-shaped simplicial complexes embedded in ℝ³. We generalize the specialized solvers for 1-Laplacians to simplicial complexes with additional geometric structures but without collapsing sequences and bounded Betti numbers, and we improve the runtime of Nested Dissection. We focus on simplicial complexes that meet two conditions: (1) each individual simplex has a bounded aspect ratio, and (2) they can be divided into "disjoint" and balanced regions with well-shaped interiors and boundaries. Our solvers draw inspiration from the Incomplete Nested Dissection for stiffness matrices of well-shaped trusses (Kyng, Peng, Schwieterman, and Zhang [STOC'2018]). 
    more » « less
  5. Gortz, Inge Li; Farach-Colton, Martin; Puglisi, Simon J.; Herman, Grzegorz (Ed.)
    Computing the diameter of a graph, i.e. the largest distance, is a fundamental problem that is central in fine-grained complexity. In undirected graphs, the Strong Exponential Time Hypothesis (SETH) yields a lower bound on the time vs. approximation trade-off that is quite close to the upper bounds. In directed graphs, however, where only some of the upper bounds apply, much larger gaps remain. Since d(u,v) may not be the same as d(v,u), there are multiple ways to define the problem, the two most natural being the (one-way) diameter (max_(u,v) d(u,v)) and the roundtrip diameter (max_{u,v} d(u,v)+d(v,u)). In this paper we make progress on the outstanding open question for each of them. - We design the first algorithm for diameter in sparse directed graphs to achieve n^{1.5-ε} time with an approximation factor better than 2. The new upper bound trade-off makes the directed case appear more similar to the undirected case. Notably, this is the first algorithm for diameter in sparse graphs that benefits from fast matrix multiplication. - We design new hardness reductions separating roundtrip diameter from directed and undirected diameter. In particular, a 1.5-approximation in subquadratic time would refute the All-Nodes k-Cycle hypothesis, and any (2-ε)-approximation would imply a breakthrough algorithm for approximate 𝓁_∞-Closest-Pair. Notably, these are the first conditional lower bounds for diameter that are not based on SETH. 
    more » « less
  6. Bonchi, Filippo; Puglisi, Simon J. (Ed.)
    We present an algorithm for constructing a depth-first search tree in planar digraphs; the algorithm can be implemented in the complexity class AC^1(UL ∩ co-UL), which is contained in AC^2. Prior to this (for more than a quarter-century), the fastest uniform deterministic parallel algorithm for this problem was O(log^10 n) (corresponding to the complexity class AC^10 ⊆ NC^11). We also consider the problem of computing depth-first search trees in other classes of graphs, and obtain additional new upper bounds. 
    more » « less
  7. Gørtz, Inge Li; Farach-Colton, Martin; Puglisi, Simon J.; Herman, Grzegorz (Ed.)
    Boob et al. [Boob et al., 2020] described an iterative peeling algorithm called Greedy++ for the Densest Subgraph Problem (DSG) and conjectured that it converges to an optimum solution. Chekuri, Qaunrud and Torres [Chandra Chekuri et al., 2022] extended the algorithm to supermodular density problems (of which DSG is a special case) and proved that the resulting algorithm Super-Greedy++ (and hence also Greedy++) converges. In this paper we revisit the convergence proof and provide a different perspective. This is done via a connection to Fujishige’s quadratic program for finding a lexicographically optimal base in a (contra) polymatroid [Satoru Fujishige, 1980], and a noisy version of the Frank-Wolfe method from convex optimization [Frank and Wolfe, 1956; Jaggi, 2013]. This yields a simpler convergence proof, and also shows a stronger property that Super-Greedy++ converges to the optimal dense decomposition vector, answering a question raised in Harb et al. [Harb et al., 2022]. A second contribution of the paper is to understand Thorup’s work on ideal tree packing and greedy tree packing [Thorup, 2007; Thorup, 2008] via the Frank-Wolfe algorithm applied to find a lexicographically optimum base in the graphic matroid. This yields a simpler and transparent proof. The two results appear disparate but are unified via Fujishige’s result and convex optimization. 
    more » « less
  8. Gørtz, Inge Li; Farach-Colton, Martin; Puglisi, Simon J; Herman, Grzegorz (Ed.)
    Computing the diameter of a graph, i.e. the largest distance, is a fundamental problem that is central in fine-grained complexity. In undirected graphs, the Strong Exponential Time Hypothesis (SETH) yields a lower bound on the time vs. approximation trade-off that is quite close to the upper bounds. In directed graphs, however, where only some of the upper bounds apply, much larger gaps remain. Since d(u,v) may not be the same as d(v,u), there are multiple ways to define the problem, the two most natural being the (one-way) diameter (max_(u,v) d(u,v)) and the roundtrip diameter (max_{u,v} d(u,v)+d(v,u)). In this paper we make progress on the outstanding open question for each of them. - We design the first algorithm for diameter in sparse directed graphs to achieve n^{1.5-ε} time with an approximation factor better than 2. The new upper bound trade-off makes the directed case appear more similar to the undirected case. Notably, this is the first algorithm for diameter in sparse graphs that benefits from fast matrix multiplication. - We design new hardness reductions separating roundtrip diameter from directed and undirected diameter. In particular, a 1.5-approximation in subquadratic time would refute the All-Nodes k-Cycle hypothesis, and any (2-ε)-approximation would imply a breakthrough algorithm for approximate 𝓁_∞-Closest-Pair. Notably, these are the first conditional lower bounds for diameter that are not based on SETH. 
    more » « less
  9. Gørtz, Inge Li; Farach-Colton, Martin; Puglisi, Simon J; Herman, Grzegorz (Ed.)
    The k-Detour problem is a basic path-finding problem: given a graph G on n vertices, with specified nodes s and t, and a positive integer k, the goal is to determine if G has an st-path of length exactly dist(s,t) + k, where dist(s,t) is the length of a shortest path from s to t. The k-Detour problem is NP-hard when k is part of the input, so researchers have sought efficient parameterized algorithms for this task, running in f(k)poly(n) time, for f(⋅) as slow-growing as possible. We present faster algorithms for k-Detour in undirected graphs, running in 1.853^k poly(n) randomized and 4.082^kpoly(n) deterministic time. The previous fastest algorithms for this problem took 2.746^k poly(n) randomized and 6.523^k poly(n) deterministic time [Bezáková-Curticapean-Dell-Fomin, ICALP 2017]. Our algorithms use the fact that detecting a path of a given length in an undirected graph is easier if we are promised that the path belongs to what we call a "bipartitioned" subgraph, where the nodes are split into two parts and the path must satisfy constraints on those parts. Previously, this idea was used to obtain the fastest known algorithm for finding paths of length k in undirected graphs [Björklund-Husfeldt-Kaski-Koivisto, JCSS 2017], intuitively by looking for paths of length k in randomly bipartitioned subgraphs. Our algorithms for k-Detour stem from a new application of this idea, which does not involve choosing the bipartitioned subgraphs randomly. Our work has direct implications for the k-Longest Detour problem, another related path-finding problem. In this problem, we are given the same input as in k-Detour, but are now tasked with determining if G has an st-path of length at least dist(s,t)+k. Our results for k-Detour imply that we can solve k-Longest Detour in 3.432^k poly(n) randomized and 16.661^k poly(n) deterministic time. The previous fastest algorithms for this problem took 7.539^k poly(n) randomized and 42.549^k poly(n) deterministic time [Fomin et al., STACS 2022]. 
    more » « less
  10. Gørtz, Inge Li; Farach-Colton, Martin; Puglisi, Simon J; Herman, Grzegorz (Ed.)
    The min-diameter of a directed graph G is a measure of the largest distance between nodes. It is equal to the maximum min-distance d_{min}(u,v) across all pairs u,v ∈ V(G), where d_{min}(u,v) = min(d(u,v), d(v,u)). Min-diameter approximation in directed graphs has attracted attention recently as an offshoot of the classical and well-studied diameter approximation problem. Our work provides a 3/2-approximation algorithm for min-diameter in DAGs running in time O(m^{1.426} n^{0.288}), and a faster almost-3/2-approximation variant which runs in time O(m^{0.713} n). (An almost-α-approximation algorithm determines the min-diameter to within a multiplicative factor of α plus constant additive error.) This is the first known algorithm to solve 3/2-approximation for min-diameter in sparse DAGs in truly subquadratic time O(m^{2-ε}) for ε > 0; previously only a 2-approximation was known. By a conditional lower bound result of [Abboud et al, SODA 2016], a better than 3/2-approximation can't be achieved in truly subquadratic time under the Strong Exponential Time Hypothesis (SETH), so our result is conditionally tight. We additionally obtain a new conditional lower bound for min-diameter approximation in general directed graphs, showing that under SETH, one cannot achieve an approximation factor below 2 in truly subquadratic time. Our work also presents the first study of approximating bichromatic min-diameter, which is the maximum min-distance between oppositely colored vertices in a 2-colored graph. We show that SETH implies that in DAGs, a better than 2 approximation cannot be achieved in truly subquadratic time, and that in general graphs, an approximation within a factor below 5/2 is similarly out of reach. We then obtain an O(m)-time algorithm which determines if bichromatic min-diameter is finite, and an almost-2-approximation algorithm for bichromatic min-diameter with runtime Õ(min(m^{4/3} n^{1/3}, m^{1/2} n^{3/2})). 
    more » « less