Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher.
Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?
Some links on this page may take you to nonfederal websites. Their policies may differ from this site.

Abstract Sequence mappability is an important task in genome resequencing. In the (
k ,m )mappability problem, for a given sequenceT of lengthn , the goal is to compute a table whosei th entry is the number of indices such that the length$$j \ne i$$ $j\ne i$m substrings ofT starting at positionsi andj have at mostk mismatches. Previous works on this problem focused on heuristics computing a rough approximation of the result or on the case of . We present several efficient algorithms for the general case of the problem. Our main result is an algorithm that, for$$k=1$$ $k=1$ , works in$$k=O(1)$$ $k=O\left(1\right)$ space and, with high probability, in$$O(n)$$ $O\left(n\right)$ time. Our algorithm requires a careful adaptation of the$$O(n \cdot \min \{m^k,\log ^k n\})$$ $O(n\xb7min\{{m}^{k},{log}^{k}n\})$k errata trees of Cole et al. [STOC 2004] to avoid multiple counting of pairs of substrings. Our technique can also be applied to solve the allpairs Hamming distance problem introduced by Crochemore et al. [WABI 2017]. We further develop time algorithms to compute$$O(n^2)$$ $O\left({n}^{2}\right)$all (k ,m )mappability tables for a fixedm and all or a fixed$$k\in \{0,\ldots ,m\}$$ $k\in \{0,\dots ,m\}$k and all . Finally, we show that, for$$m\in \{k,\ldots ,n\}$$ $m\in \{k,\dots ,n\}$ , the ($$k,m = \Theta (\log n)$$ $k,m=\Theta (logn)$k ,m )mappability problem cannot be solved in strongly subquadratic time unless the Strong Exponential Time Hypothesis fails. This is an improved and extended version of a paper presented at SPIRE 2018. 
null (Ed.)Let T[1,n] be a string of length n and T[i,j] be the substring of T starting at position i and ending at position j. A substring T[i,j] of T is a repeat if it occurs more than once in T; otherwise, it is a unique substring of T. Repeats and unique substrings are of great interest in computational biology and information retrieval. Given string T as input, the Shortest Unique Substring problem is to find a shortest substring of T that does not occur elsewhere in T. In this paper, we introduce the range variant of this problem, which we call the Range Shortest Unique Substring problem. The task is to construct a data structure over T answering the following type of online queries efficiently. Given a range [α,β], return a shortest substring T[i,j] of T with exactly one occurrence in [α,β]. We present an O(nlogn)word data structure with O(logwn) query time, where w=Ω(logn) is the word size. Our construction is based on a nontrivial reduction allowing for us to apply a recently introduced optimal geometric data structure [Chan et al., ICALP 2018]. Additionally, we present an O(n)word data structure with O(nlogϵn) query time, where ϵ>0 is an arbitrarily small constant. The latter data structure relies heavily on another geometric data structure [Nekrich and Navarro, SWAT 2012].more » « less