NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

On k -Mer-Based and Maximum Likelihood Estimation Algorithms for Trace Reconstruction

https://doi.org/10.1109/TIT.2025.3541375

Cheng, Kuan; Grigorescu, Elena; Li, Xin; Sudan, Madhu; Zhu, Minshen (April 2025, IEEE Transactions on Information Theory)

Free, publicly-accessible full text available April 1, 2026
On k-Mer-Based and Maximum Likelihood Estimation Algorithms for Trace Reconstruction

Cheng, Kuan; Grigorescu, Elena; Li, Xin; Sudan, Madhu; Zhu, Minshen (July 2024, IEEE Press)

The goal of the trace reconstruction problem is to recover a string x E {0, 1} given many independent traces of x, where a trace is a subsequence obtained from deleting bits of x independently with some given probability. In this paper we consider two kinds of algorithms for the trace reconstruction problem. We first observe that the state-of-the-art result of Chase (STOC 2021), which is based on statistics of arbitrary length-k subsequences, can also be obtained by considering the “k-mer statistics”, i.e., statistics regarding occurrences of contiguous k-bit strings (a.k.a, k-mers) in the initial string x, for k = Mazooji and Shomorony (ISIT 2023) show that such statistics (called k-mer density map) can be estimated within accuracy from poly(n, 2k, l/e) traces. We call an algorithm to be k-mer-based if it reconstructs x given estimates of the k-mer density map. Such algorithms essentially capture all the analyses in the worst-case and smoothed-complexity models of the trace reconstruction problem we know of so far. Our first, and technically more involved, result shows that any k-mer-based algorithm for trace reconstruction must use exp n)) traces, under the assumption that the estimator requires poly(2k, 1 e) traces, thus establishing the optimality of this number of traces. Our analysis also shows that the analysis technique used by Chase is essentially tight, and hence new techniques are needed in order to improve the worst-case upper bound. Our second, simple, result considers the performance of the Maximum Likelihood Estimator (MLE), which specifically picks the source string that has the maximum likelihood to generate the samples (traces). We show that the MLE algorithm uses a nearly optimal number of traces, i.e., up to a factor of n in the number of samples needed for an optimal algorithm, and show that this factor of n loss may be necessary under general “model estimation” settings.
more » « less
Full Text Available
On k-Mer-Based and Maximum Likelihood Estimation Algorithms for Trace Reconstruction

Cheng, Kuan; Grigorescu, Elena; Li, Xin; Sudan, Madhu; Zhu, Minshen (July 2024, IEEE Press)

The goal of the trace reconstruction problem is to recover a string x E {0, 1} given many independent traces of x, where a trace is a subsequence obtained from deleting bits of x independently with some given probability. In this paper we consider two kinds of algorithms for the trace reconstruction problem. We first observe that the state-of-the-art result of Chase (STOC 2021), which is based on statistics of arbitrary length-k subsequences, can also be obtained by considering the “k-mer statistics”, i.e., statistics regarding occurrences of contiguous k-bit strings (a.k.a, k-mers) in the initial string x, for k = Mazooji and Shomorony (ISIT 2023) show that such statistics (called k-mer density map) can be estimated within accuracy from poly(n, 2k, l/e) traces. We call an algorithm to be k-mer-based if it reconstructs x given estimates of the k-mer density map. Such algorithms essentially capture all the analyses in the worst-case and smoothed-complexity models of the trace reconstruction problem we know of so far. Our first, and technically more involved, result shows that any k-mer-based algorithm for trace reconstruction must use exp n)) traces, under the assumption that the estimator requires poly(2k, 1 e) traces, thus establishing the optimality of this number of traces. Our analysis also shows that the analysis technique used by Chase is essentially tight, and hence new techniques are needed in order to improve the worst-case upper bound. Our second, simple, result considers the performance of the Maximum Likelihood Estimator (MLE), which specifically picks the source string that has the maximum likelihood to generate the samples (traces). We show that the MLE algorithm uses a nearly optimal number of traces, i.e., up to a factor of n in the number of samples needed for an optimal algorithm, and show that this factor of n loss may be necessary under general “model estimation” settings.
more » « less
Full Text Available
Random Shortening of Linear Codes and Applications

https://doi.org/10.1007/978-3-031-49193-1_14

Chen, Xue; Cheng, Kuan; Li, Xin; Mao, Songtao (December 2023, Lecture notes in computer science)

Random linear codes (RLCs) are well known to have nice combinatorial properties and near-optimal parameters in many different settings. However, getting explicit constructions matching the parameters of RLCs is challenging, and RLCs are hard to decode efficiently. This motivated several previous works to study the problem of partially derandomizing RLCs, by applying certain operations to an explicit mother code. Among them, one of the most well studied operations is random puncturing, where a series of works culminated in the work of Guruswami and Mosheiff (FOCS’ 22), which showed that a random puncturing of a low-biased code is likely to possess almost all interesting local properties of RLCs. In this work, we provide an in-depth study of another, dual operation of random puncturing, known as random shortening, which can be viewed equivalently as random puncturing on the dual code. Our main results show that for any small , by starting from a mother code with certain weaker conditions (e.g., having a large distance) and performing a random (or even pseudorandom) shortening, the new code is -biased with high probability. Our results hold for any field size and yield a shortened code with constant rate. This can be viewed as a complement to random puncturing, and together, we can obtain codes with properties like RLCs from weaker initial conditions. Our proofs involve several non-trivial methods of estimating the weight distribution of codewords, which may be of independent interest.
more » « less
Ultrathin rubbery bio-optoelectronic stimulators for untethered cardiac stimulation

https://doi.org/10.1126/sciadv.adq5061

Rao, Zhoulyu; Ershad, Faheem; Guan, Ying-Shi; Paccola_Mesquita, Fernanda C; da_Costa, Ernesto Curty; Morales-Garza, Marco A; Moctezuma-Ramirez, Angel; Kan, Bin; Lu, Yuntao; Patel, Shubham; et al (December 2024, Science Advances)

Untethered electrical stimulation or pacing of the heart is of critical importance in addressing the pressing needs of cardiovascular diseases in both clinical therapies and fundamental studies. Among various stimulation methods, light illumination–induced electrical stimulation via photoelectric effect without any genetic modifications to beating cells/tissues or whole heart has profound benefits. However, a critical bottleneck lies in the lack of a suitable material with tissue-like mechanical softness and deformability and sufficient optoelectronic performances toward effective stimulation. Here, we introduce an ultrathin (<500 nm), stretchy, and self-adhesive rubbery bio-optoelectronic stimulator (RBOES) in a bilayer construct of a rubbery semiconducting nanofilm and a transparent, stretchable gold nanomesh conductor. The RBOES could maintain its optoelectronic performance when it was stretched by 20%. The RBOES was validated to effectively accelerate the beating of the human induced pluripotent stem cell–derived cardiomyocytes. Furthermore, acceleration of ex vivo perfused rat hearts by optoelectronic stimulation with the self-adhered RBOES was achieved with repetitive pulsed light illumination.
more » « less
Full Text Available
Improved Decoding of Expander Codes

https://doi.org/10.1109/TIT.2023.3239163

Chen, Xue; Cheng, Kuan; Li, Xin; Ouyang, Minghui (June 2023, IEEE Transactions on Information Theory)

Full Text Available
Linear Insertion Deletion Codes in the High-Noise and High-Rate Regimes

Cheng, Kuan; Jin, Zhengzhong; Li, Xin; Wei, Zhide; Zheng, Yu (July 2023, Leibniz international proceedings in informatics)

Full Text Available
On Relaxed Locally Decodable Codes for Hamming and Insertion-Deletion Errors

Block, Alexander R.; Blocki, Jeremiah; Cheng, Kuan; Grigorescu, Elena; Li, Xin; Zheng, Yu; Zhu, Minshen (July 2023, Leibniz international proceedings in informatics)

Full Text Available
Leibniz International Proceedings in Informatics (LIPIcs):50th International Colloquium on Automata, Languages, and Programming (ICALP 2023)

https://doi.org/10.4230/LIPIcs.ICALP.2023.41

Cheng, Kuan; Jin, Zhengzhong; Li, Xin; Wei, Zhide; Zheng, Yu (January 2023, Schloss Dagstuhl – Leibniz-Zentrum für Informatik)
Etessami, Kousha; Feige, Uriel; Puppis, Gabriele (Ed.)
This work continues the study of linear error correcting codes against adversarial insertion deletion errors (insdel errors). Previously, the work of Cheng, Guruswami, Haeupler, and Li [Kuan Cheng et al., 2021] showed the existence of asymptotically good linear insdel codes that can correct arbitrarily close to 1 fraction of errors over some constant size alphabet, or achieve rate arbitrarily close to 1/2 even over the binary alphabet. As shown in [Kuan Cheng et al., 2021], these bounds are also the best possible. However, known explicit constructions in [Kuan Cheng et al., 2021], and subsequent improved constructions by Con, Shpilka, and Tamo [Con et al., 2022] all fall short of meeting these bounds. Over any constant size alphabet, they can only achieve rate < 1/8 or correct < 1/4 fraction of errors; over the binary alphabet, they can only achieve rate < 1/1216 or correct < 1/54 fraction of errors. Apparently, previous techniques face inherent barriers to achieve rate better than 1/4 or correct more than 1/2 fraction of errors. In this work we give new constructions of such codes that meet these bounds, namely, asymptotically good linear insdel codes that can correct arbitrarily close to 1 fraction of errors over some constant size alphabet, and binary asymptotically good linear insdel codes that can achieve rate arbitrarily close to 1/2. All our constructions are efficiently encodable and decodable. Our constructions are based on a novel approach of code concatenation, which embeds the index information implicitly into codewords. This significantly differs from previous techniques and may be of independent interest. Finally, we also prove the existence of linear concatenated insdel codes with parameters that match random linear codes, and propose a conjecture about linear insdel codes.
more » « less
On Relaxed Locally Decodable Codes for Hamming and Insertion-Deletion Errors

https://doi.org/10.4230/LIPIcs.CCC.2023.14

Block, Alexander R.; Blocki, Jeremiah; Cheng, Kuan; Grigorescu, Elena; Li, Xin; Zheng, Yu; Zhu, Minshen (January 2023, 38th Computational Complexity Conference (CCC 2023))
Ta-Shma, Amnon (Ed.)
Locally Decodable Codes (LDCs) are error-correcting codes C:Σⁿ → Σ^m, encoding messages in Σⁿ to codewords in Σ^m, with super-fast decoding algorithms. They are important mathematical objects in many areas of theoretical computer science, yet the best constructions so far have codeword length m that is super-polynomial in n, for codes with constant query complexity and constant alphabet size. In a very surprising result, Ben-Sasson, Goldreich, Harsha, Sudan, and Vadhan (SICOMP 2006) show how to construct a relaxed version of LDCs (RLDCs) with constant query complexity and almost linear codeword length over the binary alphabet, and used them to obtain significantly-improved constructions of Probabilistically Checkable Proofs. In this work, we study RLDCs in the standard Hamming-error setting, and introduce their variants in the insertion and deletion (Insdel) error setting. Standard LDCs for Insdel errors were first studied by Ostrovsky and Paskin-Cherniavsky (Information Theoretic Security, 2015), and are further motivated by recent advances in DNA random access bio-technologies. Our first result is an exponential lower bound on the length of Hamming RLDCs making 2 queries (even adaptively), over the binary alphabet. This answers a question explicitly raised by Gur and Lachish (SICOMP 2021) and is the first exponential lower bound for RLDCs. Combined with the results of Ben-Sasson et al., our result exhibits a "phase-transition"-type behavior on the codeword length for some constant-query complexity. We achieve these lower bounds via a transformation of RLDCs to standard Hamming LDCs, using a careful analysis of restrictions of message bits that fix codeword bits. We further define two variants of RLDCs in the Insdel-error setting, a weak and a strong version. On the one hand, we construct weak Insdel RLDCs with almost linear codeword length and constant query complexity, matching the parameters of the Hamming variants. On the other hand, we prove exponential lower bounds for strong Insdel RLDCs. These results demonstrate that, while these variants are equivalent in the Hamming setting, they are significantly different in the insdel setting. Our results also prove a strict separation between Hamming RLDCs and Insdel RLDCs.
more » « less
Full Text Available

« Prev Next »

Search for: All records