We introduce synchronization strings , which provide a novel way to efficiently deal with synchronization errors , i.e., insertions and deletions. Synchronization errors are strictly more general and much harder to cope with than more commonly considered Hamming-type errors , i.e., symbol substitutions and erasures. For every ε > 0, synchronization strings allow us to index a sequence with an ε -O(1) -size alphabet, such that one can efficiently transform k synchronization errors into (1 + ε)k Hamming-type errors . This powerful new technique has many applications. In this article, we focus on designing insdel codes , i.e., error correcting block codes (ECCs) for insertion-deletion channels. While ECCs for both Hamming-type errors and synchronization errors have been intensely studied, the latter has largely resisted progress. As Mitzenmacher puts it in his 2009 survey [30]: “ Channels with synchronization errors...are simply not adequately understood by current theory. Given the near-complete knowledge, we have for channels with erasures and errors...our lack of understanding about channels with synchronization errors is truly remarkable. ” Indeed, it took until 1999 for the first insdel codes with constant rate, constant distance, and constant alphabet size to be constructed and only since 2016 are there constructions of constant rate insdel codes for asymptotically large noise rates. Even in the asymptotically large or small noise regimes, these codes are polynomially far from the optimal rate-distance tradeoff. This makes the understanding of insdel codes up to this work equivalent to what was known for regular ECCs after Forney introduced concatenated codes in his doctoral thesis 50 years ago. A straightforward application of our synchronization strings-based indexing method gives a simple black-box construction that transforms any ECC into an equally efficient insdel code with only a small increase in the alphabet size. This instantly transfers much of the highly developed understanding for regular ECCs into the realm of insdel codes. Most notably, for the complete noise spectrum, we obtain efficient “near-MDS” insdel codes, which get arbitrarily close to the optimal rate-distance tradeoff given by the Singleton bound. In particular, for any δ ∈ (0,1) and ε > 0, we give a family of insdel codes achieving a rate of 1 - δ - ε over a constant-size alphabet that efficiently corrects a δ fraction of insertions or deletions.
more »
« less
Deletions and Insertions of the Symbol “0” and Asymmetric/Unidirectional Error Control Codes for the L Metric
This paper gives some theory and efficient design of binary block codes capable of controlling the deletions of the symbol “0” (referred to as 0-deletions) and/or the insertions of the symbol “0” (referred to as 0-insertions). This problem of controlling 0-deletions and/or 0-insertions (referred to as 0-errors) is shown to be equivalent to the efficient design of L1 metric asymmetric error control codes over the natural alphabet, IIN . In this way, it is shown that the t0 -insertion correcting codes are actually capable of controlling much more; namely, they can correct t0 -errors, detect (t+1)0 -errors and, simultaneously, detect all occurrences of only 0-deletions or only 0-insertions in every received word (briefly, they are t -Symmetric 0-Error Correcting/ (t+1) -Symmetric 0-Error Detecting/All Unidirectional 0-Error Detecting ( t -Sy0EC/ (t+1) -Sy0ED/AU0ED) codes). From the relations with the L1 distance error control codes, new improved bounds are given for the optimal t0 -error correcting codes. Optimal non-systematic code designs are given. Decoding can be efficiently performed by algebraic means using the Extended Euclidean Algorithm (EEA).
more »
« less
- Award ID(s):
- 2006571
- PAR ID:
- 10562482
- Publisher / Repository:
- IEEE
- Date Published:
- Journal Name:
- IEEE Transactions on Information Theory
- Volume:
- 69
- Issue:
- 1
- ISSN:
- 0018-9448
- Page Range / eLocation ID:
- 86 to 106
- Subject(s) / Keyword(s):
- Deletion/insertion of zero errors , repetition/sticky errors , L₁ distance , asymmetric distance , elementary symmetric functions , constant weight codes
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
We introduce synchronization strings, which provide a novel way of efficiently dealing with synchronization errors, i.e., insertions and deletions. Synchronization errors are strictly more general and much harder to deal with than more commonly considered half-errors, i.e., symbol corruptions and erasures. For every ε > 0, synchronization strings allow to index a sequence with an ε-O(1) size alphabet such that one can efficiently transform k synchronization errors into (1 + ε)k half-errors. This powerful new technique has many applications. In this paper we focus on designing insdel codes, i.e., error correcting block codes (ECCs) for insertion deletion channels. While ECCs for both half-errors and synchronization errors have been intensely studied, the later has largely resisted progress. As Mitzenmacher puts it in his 2009 survey: "Channels with synchronization errors ... are simply not adequately understood by current theory. Given the near-complete knowledge we have for channels with erasures and errors ... our lack of understanding about channels with synchronization errors is truly remarkable." Indeed, it took until 1999 for the first insdel codes with constant rate, constant distance, and constant alphabet size to be constructed and only since 2016 are there constructions of constant rate indel codes for asymptotically large noise rates. Even in the asymptotically large or small noise regime these codes are polynomially far from the optimal rate-distance tradeoff. This makes the understanding of insdel codes up to this work equivalent to what was known for regular ECCs after Forney introduced concatenated codes in his doctoral thesis 50 years ago. A straight forward application of our synchronization strings based indexing method gives a simple black-box construction which transforms any ECC into an equally efficient insdel code with only a small increase in the alphabet size. This instantly transfers much of the highly developed understanding for regular ECCs over large constant alphabets into the realm of insdel codes. Most notably, for the complete noise spectrum we obtain efficient "near-MDS" insdel codes which get arbitrarily close to the optimal rate-distance tradeoff given by the Singleton bound. In particular, for any δ ∈ (0,1) and ε > 0 we give insdel codes achieving a rate of 1 - ξ - ε over a constant size alphabet that efficiently correct a δ´ fraction of insertions or deletions.more » « less
-
In this paper the theory and design of codes capable of correcting t insertion/deletion of the symbol 0 in each and every bucket of zeros (i. e., zeros in between two consecutive ones) are studied. It is shown that this problem is related to the zero error capacity achieving codes in limited magnitude error channel. Close to optimal non-systematic code designs and the encoding/decoding algorithms are described.more » « less
-
We introduce fast-decodable indexing schemes for edit distance which can be used to speed up edit distance computations to near-linear time if one of the strings is indexed by an indexing string I. In particular, for every length n and every ε >0, one can in near linear time construct a string I ∈ Σ′n with |Σ′| = Oε(1), such that, indexing any string S ∈ Σn, symbol-by-symbol, with I results in a string S′ ∈ Σ″n where Σ″ = Σ × Σ′ for which edit distance computations are easy, i.e., one can compute a (1+ε)-approximation of the edit distance between S′ and any other string in O(n (log n)) time. Our indexing schemes can be used to improve the decoding complexity of state-of-the-art error correcting codes for insertions and deletions. In particular, they lead to near-linear time decoding algorithms for the insertion-deletion codes of [Haeupler, Shahrasbi; STOC ‘17] and faster decoding algorithms for list-decodable insertion-deletion codes of [Haeupler, Shahrasbi, Sudan; ICALP ‘18]. Interestingly, the latter codes are a crucial ingredient in the construction of fast-decodable indexing schemes.more » « less
An official website of the United States government

