skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Leibniz International Proceedings in Informatics (LIPIcs):14th Innovations in Theoretical Computer Science Conference (ITCS 2023)
The edit distance between strings classically assigns unit cost to every character insertion, deletion, and substitution, whereas the Hamming distance only allows substitutions. In many real-life scenarios, insertions and deletions (abbreviated indels) appear frequently but significantly less so than substitutions. To model this, we consider substitutions being cheaper than indels, with cost 1/a for a parameter a ≥ 1. This basic variant, denoted ED_a, bridges classical edit distance (a = 1) with Hamming distance (a → ∞), leading to interesting algorithmic challenges: Does the time complexity of computing ED_a interpolate between that of Hamming distance (linear time) and edit distance (quadratic time)? What about approximating ED_a? We first present a simple deterministic exact algorithm for ED_a and further prove that it is near-optimal assuming the Orthogonal Vectors Conjecture. Our main result is a randomized algorithm computing a (1+ε)-approximation of ED_a(X,Y), given strings X,Y of total length n and a bound k ≥ ED_a(X,Y). For simplicity, let us focus on k ≥ 1 and a constant ε > 0; then, our algorithm takes Õ(n/a + ak³) time. Unless a = Õ(1), in which case ED_a resembles the standard edit distance, and for the most interesting regime of small enough k, this running time is sublinear in n. We also consider a very natural version that asks to find a (k_I, k_S)-alignment, i.e., an alignment with at most k_I indels and k_S substitutions. In this setting, we give an exact algorithm and, more importantly, an Õ((nk_I)/k_S + k_S k_I³)-time (1,1+ε)-bicriteria approximation algorithm. The latter solution is based on the techniques we develop for ED_a for a = Θ(k_S/k_I), and its running time is again sublinear in n whenever k_I ≪ k_S and the overall distance is small enough. These bounds are in stark contrast to unit-cost edit distance, where state-of-the-art algorithms are far from achieving (1+ε)-approximation in sublinear time, even for a favorable choice of k.  more » « less
Award ID(s):
2223282 2217058
PAR ID:
10496600
Author(s) / Creator(s):
; ; ;
Editor(s):
Tauman Kalai, Yael
Publisher / Repository:
Schloss Dagstuhl – Leibniz-Zentrum für Informatik
Date Published:
Journal Name:
SourceOECD ITCS
ISSN:
1608-1218
Subject(s) / Keyword(s):
edit distance Hamming distance Longest Common Extension queries Theory of computation → Pattern matching Theory of computation → Streaming, sublinear and near linear time algorithms
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Kumar, Amit; Ron-Zewi, Noga (Ed.)
    We study the Matrix Multiplication Verification Problem (MMV) where the goal is, given three n × n matrices A, B, and C as input, to decide whether AB = C. A classic randomized algorithm by Freivalds (MFCS, 1979) solves MMV in Õ(n²) time, and a longstanding challenge is to (partially) derandomize it while still running in faster than matrix multiplication time (i.e., in o(n^ω) time). To that end, we give two algorithms for MMV in the case where AB - C is sparse. Specifically, when AB - C has at most O(n^δ) non-zero entries for a constant 0 ≤ δ < 2, we give (1) a deterministic O(n^(ω-ε))-time algorithm for constant ε = ε(δ) > 0, and (2) a randomized Õ(n²)-time algorithm using δ/2 ⋅ log₂ n + O(1) random bits. The former algorithm is faster than the deterministic algorithm of Künnemann (ESA, 2018) when δ ≥ 1.056, and the latter algorithm uses fewer random bits than the algorithm of Kimbrel and Sinha (IPL, 1993), which runs in the same time and uses log₂ n + O(1) random bits (in turn fewer than Freivalds’s algorithm). Our algorithms are simple and use techniques from coding theory. Let H be a parity-check matrix of a Maximum Distance Separable (MDS) code, and let G = (I | G') be a generator matrix of a (possibly different) MDS code in systematic form. Our deterministic algorithm uses fast rectangular matrix multiplication to check whether HAB = HC and H(AB)^T = H(C^T), and our randomized algorithm samples a uniformly random row g' from G' and checks whether g'AB = g'C and g'(AB)^T = g'C^T. We additionally study the complexity of MMV. We first show that all algorithms in a natural class of deterministic linear algebraic algorithms for MMV (including ours) require Ω(n^ω) time. We also show a barrier to proving a super-quadratic running time lower bound for matrix multiplication (and hence MMV) under the Strong Exponential Time Hypothesis (SETH). Finally, we study relationships between natural variants and special cases of MMV (with respect to deterministic Õ(n²)-time reductions). 
    more » « less
  2. null (Ed.)
    The edit distance between two strings is defined as the smallest number of insertions , deletions , and substitutions that need to be made to transform one of the strings to another one. Approximating edit distance in subquadratic time is “one of the biggest unsolved problems in the field of combinatorial pattern matching” [37]. Our main result is a quantum constant approximation algorithm for computing the edit distance in truly subquadratic time. More precisely, we give an quantum algorithm that approximates the edit distance within a factor of 3. We further extend this result to an quantum algorithm that approximates the edit distance within a larger constant factor. Our solutions are based on a framework for approximating edit distance in parallel settings. This framework requires as black box an algorithm that computes the distances of several smaller strings all at once. For a quantum algorithm, we reduce the black box to metric estimation and provide efficient algorithms for approximating it. We further show that this framework enables us to approximate edit distance in distributed settings. To this end, we provide a MapReduce algorithm to approximate edit distance within a factor of , with sublinearly many machines and sublinear memory. Also, our algorithm runs in a logarithmic number of rounds. 
    more » « less
  3. Mulzer, Wolfgang; Phillips, Jeff M (Ed.)
    Let d be a (well-behaved) shortest-path metric defined on a path-connected subset of ℝ² and let 𝒟 = {D_1,…,D_n} be a set of geodesic disks with respect to the metric d. We prove that 𝒢^×(𝒟), the intersection graph of the disks in 𝒟, has a clique-based separator consisting of O(n^{3/4+ε}) cliques. This significantly extends the class of objects whose intersection graphs have small clique-based separators. Our clique-based separator yields an algorithm for q-Coloring that runs in time 2^O(n^{3/4+ε}), assuming the boundaries of the disks D_i can be computed in polynomial time. We also use our clique-based separator to obtain a simple, efficient, and almost exact distance oracle for intersection graphs of geodesic disks. Our distance oracle uses O(n^{7/4+ε}) storage and can report the hop distance between any two nodes in 𝒢^×(𝒟) in O(n^{3/4+ε}) time, up to an additive error of one. So far, distance oracles with an additive error of one that use subquadratic storage and sublinear query time were not known for such general graph classes. 
    more » « less
  4. We study the bit complexity of two related fundamental computational problems in linear algebra and control theory. Our results are: (1) An Õ(n^{ω+3}a+n⁴a²+n^ωlog(1/ε)) time algorithm for finding an ε-approximation to the Jordan Normal form of an integer matrix with a-bit entries, where ω is the exponent of matrix multiplication. (2) An Õ(n⁶d⁶a+n⁴d⁴a²+n³d³log(1/ε)) time algorithm for ε-approximately computing the spectral factorization P(x) = Q^*(x)Q(x) of a given monic n× n rational matrix polynomial of degree 2d with rational a-bit coefficients having a-bit common denominators, which satisfies P(x)⪰0 for all real x. The first algorithm is used as a subroutine in the second one. Despite its being of central importance, polynomial complexity bounds were not previously known for spectral factorization, and for Jordan form the best previous best running time was an unspecified polynomial in n of degree at least twelve [Cai, 1994]. Our algorithms are simple and judiciously combine techniques from numerical and symbolic computation, yielding significant advantages over either approach by itself. 
    more » « less
  5. Leroux, Jérôme; Lombardy, Sylvain; Peleg, David (Ed.)
    Feedback Vertex Set (FVS) is one of the most studied vertex deletion problems in the field of graph algorithms. In the decision version of the problem, given a graph G and an integer k, the question is whether there exists a set S of at most k vertices in G such that G-S is acyclic. It is one of the first few problems which were shown to be NP-complete, and has been extensively studied from the viewpoint of approximation and parameterized algorithms. The best-known polynomial time approximation algorithm for FVS is a 2-factor approximation, while the best known deterministic and randomized FPT algorithms run in time 𝒪^*(3.460^k) and 𝒪^*(2.7^k) respectively. In this paper, we contribute to the newly established area of parameterized approximation, by studying FVS in this paradigm. In particular, we combine the approaches of parameterized and approximation algorithms for the study of FVS, and achieve an approximation guarantee with a factor better than 2 in randomized FPT running time, that improves over the best known parameterized algorithm for FVS. We give three simple randomized (1+ε) approximation algorithms for FVS, running in times 𝒪^*(2^{εk}⋅ 2.7^{(1-ε)k}), 𝒪^*(({(4/(1+ε))^{(1+ε)}}⋅{(ε/3)^ε})^k), and 𝒪^*(4^{(1-ε)k}) respectively for every ε ∈ (0,1). Combining these three algorithms, we obtain a factor (1+ε) approximation algorithm for FVS, which has better running time than the best-known (randomized) FPT algorithm for every ε ∈ (0, 1). This is the first attempt to look at a parameterized approximation of FVS to the best of our knowledge. Our algorithms are very simple, and they rely on some well-known reduction rules used for arriving at FPT algorithms for FVS. 
    more » « less