NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

On k -Mer-Based and Maximum Likelihood Estimation Algorithms for Trace Reconstruction

https://doi.org/10.1109/TIT.2025.3541375

Cheng, Kuan; Grigorescu, Elena; Li, Xin; Sudan, Madhu; Zhu, Minshen (April 2025, IEEE Transactions on Information Theory)

Free, publicly-accessible full text available April 1, 2026
Differential Privacy and Sublinear Time Are Incompatible Sometimes

https://doi.org/10.4230/LIPIcs.ITCS.2025.19

Blocki, Jeremiah; Fichtenberger, Hendrik; Grigorescu, Elena; Mukherjee, Tamalika (January 2025, Schloss Dagstuhl – Leibniz-Zentrum für Informatik)
Meka, Raghu (Ed.)
Differential privacy and sublinear algorithms are both rapidly emerging algorithmic themes in times of big data analysis. Although recent works have shown the existence of differentially private sublinear algorithms for many problems including graph parameter estimation and clustering, little is known regarding hardness results on these algorithms. In this paper, we initiate the study of lower bounds for problems that aim for both differentially-private and sublinear-time algorithms. Our main result is the incompatibility of both the desiderata in the general case. In particular, we prove that a simple problem based on one-way marginals yields both a differentially-private algorithm, as well as a sublinear-time algorithm, but does not admit a "strictly" sublinear-time algorithm that is also differentially private.
more » « less
Free, publicly-accessible full text available January 1, 2026
On k-Mer-Based and Maximum Likelihood Estimation Algorithms for Trace Reconstruction

Cheng, Kuan; Grigorescu, Elena; Li, Xin; Sudan, Madhu; Zhu, Minshen (July 2024, IEEE Press)

The goal of the trace reconstruction problem is to recover a string x E {0, 1} given many independent traces of x, where a trace is a subsequence obtained from deleting bits of x independently with some given probability. In this paper we consider two kinds of algorithms for the trace reconstruction problem. We first observe that the state-of-the-art result of Chase (STOC 2021), which is based on statistics of arbitrary length-k subsequences, can also be obtained by considering the “k-mer statistics”, i.e., statistics regarding occurrences of contiguous k-bit strings (a.k.a, k-mers) in the initial string x, for k = Mazooji and Shomorony (ISIT 2023) show that such statistics (called k-mer density map) can be estimated within accuracy from poly(n, 2k, l/e) traces. We call an algorithm to be k-mer-based if it reconstructs x given estimates of the k-mer density map. Such algorithms essentially capture all the analyses in the worst-case and smoothed-complexity models of the trace reconstruction problem we know of so far. Our first, and technically more involved, result shows that any k-mer-based algorithm for trace reconstruction must use exp n)) traces, under the assumption that the estimator requires poly(2k, 1 e) traces, thus establishing the optimality of this number of traces. Our analysis also shows that the analysis technique used by Chase is essentially tight, and hence new techniques are needed in order to improve the worst-case upper bound. Our second, simple, result considers the performance of the Maximum Likelihood Estimator (MLE), which specifically picks the source string that has the maximum likelihood to generate the samples (traces). We show that the MLE algorithm uses a nearly optimal number of traces, i.e., up to a factor of n in the number of samples needed for an optimal algorithm, and show that this factor of n loss may be necessary under general “model estimation” settings.
more » « less
Full Text Available
On k-Mer-Based and Maximum Likelihood Estimation Algorithms for Trace Reconstruction

Cheng, Kuan; Grigorescu, Elena; Li, Xin; Sudan, Madhu; Zhu, Minshen (July 2024, IEEE Press)

The goal of the trace reconstruction problem is to recover a string x E {0, 1} given many independent traces of x, where a trace is a subsequence obtained from deleting bits of x independently with some given probability. In this paper we consider two kinds of algorithms for the trace reconstruction problem. We first observe that the state-of-the-art result of Chase (STOC 2021), which is based on statistics of arbitrary length-k subsequences, can also be obtained by considering the “k-mer statistics”, i.e., statistics regarding occurrences of contiguous k-bit strings (a.k.a, k-mers) in the initial string x, for k = Mazooji and Shomorony (ISIT 2023) show that such statistics (called k-mer density map) can be estimated within accuracy from poly(n, 2k, l/e) traces. We call an algorithm to be k-mer-based if it reconstructs x given estimates of the k-mer density map. Such algorithms essentially capture all the analyses in the worst-case and smoothed-complexity models of the trace reconstruction problem we know of so far. Our first, and technically more involved, result shows that any k-mer-based algorithm for trace reconstruction must use exp n)) traces, under the assumption that the estimator requires poly(2k, 1 e) traces, thus establishing the optimality of this number of traces. Our analysis also shows that the analysis technique used by Chase is essentially tight, and hence new techniques are needed in order to improve the worst-case upper bound. Our second, simple, result considers the performance of the Maximum Likelihood Estimator (MLE), which specifically picks the source string that has the maximum likelihood to generate the samples (traces). We show that the MLE algorithm uses a nearly optimal number of traces, i.e., up to a factor of n in the number of samples needed for an optimal algorithm, and show that this factor of n loss may be necessary under general “model estimation” settings.
more » « less
Full Text Available
Approximation Algorithms for Directed Weighted Spanners

Grigorescu, Elena Kumar (September 2023, Schloss Dagstuhl - Leibniz-Zentrum f{\"{u}}r Informatik)

Full Text Available
On computing discretized Ricci curvatures of graphs: Local algorithms and (localized) fine-grained reductions

https://doi.org/10.1016/J.TCS.2023.114127

DasGupta, Bhaskar; Grigorescu, Elena; Mukherjee, Tamalika (October 2023, Theoretical Computer Science)

Full Text Available
Learning-Augmented Algorithms for Online Linear and Semidefinite Programming

Grigorescu, Elena Lin (May 2023, http://papers.nips.cc/paper\_files/paper/2022/hash/fc5a1845bee1f5405ef99ba25c2d44e1-Abstract-Conference.html})

Full Text Available
On Relaxed Locally Decodable Codes for Hamming and Insertion-Deletion Errors

Block, Alexander R.; Blocki, Jeremiah; Cheng, Kuan; Grigorescu, Elena; Li, Xin; Zheng, Yu; Zhu, Minshen (July 2023, Leibniz international proceedings in informatics)

Full Text Available
Limitations of Mean-Based Algorithms for Trace Reconstruction at Small Edit Distance

https://doi.org/10.1109/TIT.2022.3168624

Grigorescu, Elena; Sudan, Madhu; Zhu, Minshen (October 2022, IEEE Transactions on Information Theory)

Full Text Available
Learning-Augmented Algorithms for Online Linear and Semidefinite Programming.

Grigorescu, Elena Lin (May 2022, http://papers.nips.cc/paper\_files/paper/2022/hash/fc5a1845bee1f5405ef99ba25c2d44e1-Abstract-Conference.html)

Full Text Available

« Prev Next »

Search for: All records