The problem of reconstructing a sequence when observed through multiple looks over deletion channels occurs in “de novo” DNA sequencing. The DNA could be sequenced multiple times, yielding several “looks” of it, but each time the sequencer could be noisy with (independent) deletion impairments. The main goal of this paper is to develop reconstruction algorithms for a sequence observed through the lens of a fixed number of deletion channels. We use the probabilistic model of the deletion channels to develop both symbol-wise and sequence maximum likelihood decoding crite- ria, and algorithms motivated by them. Numerical evaluations demonstrate improvement in terms of edit distance error, over earlier algorithms.
more »
« less
Algorithms for reconstruction over single and multiple deletion channels
Recent advances in DNA sequencing technology and DNA storage systems have rekindled the interest in deletion channels. Multiple recent works have looked at variants of sequence reconstruction over a single and over multiple deletion channels, a notoriously difficult problem due to its highly combinatorial nature. Although works in theoretical computer science have provided algorithms which guarantee perfect reconstruction with multiple independent observations from the deletion channel, they are only applicable in the large blocklength regime and more restrictively, when the number of observations is also large. Indeed, with only a few observations, perfect reconstruction of the input sequence may not even be possible in most cases. In such situations, maximum likelihood (ML) and maximum aposteriori (MAP) estimates for the deletion channels are natural questions that arise and these have remained open to the best of our knowledge. In this work, we take steps to answer the two aforementioned questions. Specifically: 1. We show that solving for the ML estimate over the single deletion channel (which can be cast as a discrete optimization problem) is equivalent to solving its relaxation, a continuous optimization problem; 2. We exactly compute the symbolwise posterior distributions (under some assumptions on the priors) for both the single as well as multiple deletion channels. As part of our contributions, we also introduce tools to visualize and analyze error events, which we believe could be useful in other related problems concerning deletion channels.
more »
« less
- PAR ID:
- 10273066
- Date Published:
- Journal Name:
- IEEE transactions on information theory
- Volume:
- 67
- Issue:
- 6
- ISSN:
- 1557-9654
- Page Range / eLocation ID:
- 3389-3410
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
The problem of reconstructing a sequence when observed through multiple looks over deletion channels occurs in “de novo” DNA sequencing. The DNA could be sequenced multiple times, yielding several “looks” of it, but each time the sequencer could be noisy with (independent) deletion impairments. The main goal of this paper is to develop reconstruction algorithms for a sequence observed through the lens of a fixed number of deletion channels. We use the probabilistic model of the deletion channels to develop both symbol-wise and sequence maximum likelihood decoding crite- ria, and algorithms motivated by them. Numerical evaluations demonstrate improvement in terms of edit distance error, over earlier algorithms.more » « less
-
The problem of reconstructing a sequence when observed through multiple looks over deletion channels occurs in “de novo” DNA sequencing. The DNA could be sequenced multiple times, yielding several “looks” of it, but each time the sequencer could be noisy with (independent) deletion impairments. The main goal of this paper is to develop reconstruction algorithms for a sequence observed through the lens of a fixed number of deletion channels. We use the probabilistic model of the deletion channels to develop both symbol-wise and sequence maximum likelihood decoding criteria, and algorithms motivated by them. Numerical evaluations demonstrate improvement in terms of edit distance error, over earlier algorithms.more » « less
-
Maximum likelihood (ML) and symbolwise maximum aposteriori (MAP) estimation for discrete input sequences play a central role in a number of applications that arise in communications, information and coding theory. Many instances of these problems are proven to be intractable, for example through reduction to NP-complete integer optimization problems. In this work, we prove that the ML estimation of a discrete input sequence (with no assumptions on the encoder/channel used) is equivalent to the solution of a continuous non-convex optimization problem, and that this formulation is closely related to the computation of symbolwise MAP estimates. This equivalence is particularly useful in situations where a function we term the expected likelihood is efficiently computable. In such situations, we give a ML heuristic and show numerics for sequence estimation over the deletion channel.more » « less
-
We develop a method to jointly estimate the carrier frequency offset (CFO) and the narrowband channel in millimeter wave (mmWave) MIMO systems operating with one-bit analog-to-digital converters (ADCs). We assume perfect timing synchronization and transform the underlying CFO-channel optimization problem to a higher dimensional space using lifting techniques. Exploiting the sparse nature of mmWave MIMO channels in the angle domain, we perform the joint estimation by solving a noisy quantized compressed sensing problem of the lifted version, using generalized approximate message passing. Simulation results show that our method is able to recover both the channel and the CFO using one-bit measurements.more » « less
An official website of the United States government

