Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher.
Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?
Some links on this page may take you to non-federal websites. Their policies may differ from this site.
-
Benoit, Anne; Kaplan, Haim; Wild, Sebastian; Herman, Grzegorz (Ed.)Data structures on a multiset of genomic k-mers are at the heart of many bioinformatic tools. As genomic datasets grow in scale, the efficiency of these data structures increasingly depends on how well they leverage the inherent patterns in the data. One recent and effective approach is the use of learned indexes that approximate the rank function of a multiset using a piecewise linear function with very few segments. However, theoretical worst-case analysis struggles to predict the practical performance of these indexes. We address this limitation by developing a novel measure of piecewise-linear approximability of the data, called CaPLa (Canonical Piecewise Linear approximability). CaPLa builds on the empirical observation that a power-law model often serves as a reasonable proxy for piecewise linear-approximability, while explicitly accounting for deviations from a true power-law fit. We prove basic properties of CaPLa and present an efficient algorithm to compute it. We then demonstrate that CaPLa can accurately predict space bounds for data structures on real data. Empirically, we analyze over 500 genomes through the lens of CaPLa, revealing that it varies widely across the tree of life and even within individual genomes. Finally, we study the robustness of CaPLa as a measure and the factors that make genomic k-mer multisets different from random ones.more » « less
-
Benoit, Anne; Kaplan, Haim; Wild, Sebastian; Herman, Grzegorz (Ed.)We consider algorithmic problems motivated by modular robotic reconfiguration in the sliding square model, in which we are given n square-shaped modules in a (labeled or unlabeled) start configuration and need to find a schedule of sliding moves to transform it into a desired goal configuration, maintaining connectivity of the configuration at all times. Recent work has aimed at minimizing the total number of moves, resulting in fully sequential schedules that can perform reconfiguration in 𝒪(n²) moves, or 𝒪(nP) for arrangements of bounding box perimeter size P. We provide first results in the sliding square model that exploit parallel motion, performing reconfiguration in worst-case optimal makespan of 𝒪(P). We also provide tight bounds on the complexity of the problem by showing that even deciding the possibility of reconfiguration within makespan 1 is NP-complete in the unlabeled case. In the labeled variant, we note that deciding the same for makespan 2 is NP-complete, while makespan 1 is straightforward.more » « less
-
Benoit, Anne; Kaplan, Haim; Wild, Sebastian; Herman, Grzegorz (Ed.)A covering path for a finite set P of points in the plane is a polygonal path such that every point of P lies on a segment of the path. The vertices of the path need not be at points of P. A covering path is plane if its segments do not cross each other. Let π(n) be the minimum number such that every set of n points in the plane admits a plane covering path with at most π(n) segments. We prove that π(n) ≤ ⌈6n/7⌉. This improves the previous best-known upper bound of ⌈21n/22⌉, due to Biniaz (SoCG 2023). Our proof is constructive and yields a simple O(n log n)-time algorithm for computing a plane covering path.more » « less
-
Benoit, Anne; Kaplan, Haim; Wild, Sebastian; Herman, Grzegorz (Ed.)
-
Benoit, Anne; Kaplan, Haim; Wild, Sebastian; Herman, Grzegorz (Ed.){"Abstract":["The classical rank aggregation problem seeks to combine a set X of n permutations into a single representative "consensus" permutation. In this paper, we investigate two fundamental rank aggregation tasks under the well-studied Ulam metric: computing a median permutation (which minimizes the sum of Ulam distances to X) and computing a center permutation (which minimizes the maximum Ulam distance to X) in two settings.\r\n- Continuous Setting: In the continuous setting, the median/center is allowed to be any permutation. It is known that computing a center in the Ulam metric is NP-hard and we add to this by showing that computing a median is NP-hard as well via a simple reduction from the Max-Cut problem. While this result may not be unexpected, it had remained elusive until now and confirms a speculation by Chakraborty, Das, and Krauthgamer [SODA '21].\r\n- Discrete Setting: In the discrete setting, the median/center must be a permutation from the input set. We fully resolve the fine-grained complexity of the discrete median and discrete center problems under the Ulam metric, proving that the naive Õ(n² L)-time algorithm (where L is the length of the permutation) is conditionally optimal. This resolves an open problem raised by Abboud, Bateni, Cohen-Addad, Karthik C. S., and Seddighin [APPROX '23]. Our reductions are inspired by the known fine-grained lower bounds for similarity measures, but we face and overcome several new highly technical challenges."]}more » « less
-
Benoit, Anne; Kaplan, Haim; Wild, Sebastian; Herman, Grzegorz (Ed.)In a seminal work, Chierichetti et al. [Chierichetti et al., 2017] introduced the (t,k)-fair clustering problem: Given a set of red points and a set of blue points in a metric space, a clustering is called fair if the number of red points in each cluster is at most t times and at least 1/t times the number of blue points in that cluster. The goal is to compute a fair clustering with at most k clusters that optimizes certain objective function. Considering this problem, they designed a polynomial-time O(1)- and O(t)-approximation for the k-center and the k-median objective, respectively. Recently, Carta et al. [Carta et al., 2024] studied this problem with the sum-of-radii objective and obtained a (6+ε)-approximation with running time O((k log_{1+ε}(k/ε))^k n^O(1)), i.e., fixed-parameter tractable in k. Here n is the input size. In this work, we design the first polynomial-time O(1)-approximation for (t,k)-fair clustering with the sum-of-radii objective, improving the result of Carta et al. Our result places sum-of-radii in the same group of objectives as k-center, that admit polynomial-time O(1)-approximations. This result also implies a polynomial-time O(1)-approximation for the Euclidean version of the problem, for which an f(k)⋅n^O(1)-time (1+ε)-approximation was known due to Drexler et al. [Drexler et al., 2023]. Here f is an exponential function of k. We are also able to extend our result to any arbitrary 𝓁 ≥ 2 number of colors when t = 1. This matches known results for the k-center and k-median objectives in this case. The significant disparity of sum-of-radii compared to k-center and k-median presents several complex challenges, all of which we successfully overcome in our work. Our main contribution is a novel cluster-merging-based analysis technique for sum-of-radii that helps us achieve the constant-approximation bounds.more » « less
-
Chan, Timothy; Fischer, Johannes; Iacono, John; Herman, Grzegorz (Ed.)We consider hypergraph network design problems where the goal is to construct a hypergraph that satisfies certain connectivity requirements. For graph network design problems where the goal is to construct a graph that satisfies certain connectivity requirements, the number of edges in every feasible solution is at most quadratic in the number of vertices. In contrast, for hypergraph network design problems, we might have feasible solutions in which the number of hyperedges is exponential in the number of vertices. This presents an additional technical challenge in hypergraph network design problems compared to graph network design problems: in order to solve the problem in polynomial time, we first need to show that there exists a feasible solution in which the number of hyperedges is polynomial in the input size. The central theme of this work is to overcome this additional technical challenge for certain hypergraph network design problems. We show that these hypergraph network design problems admit solutions in which the number of hyperedges is polynomial in the number of vertices and moreover, can be solved in strongly polynomial time. Our work improves on the previous fastest pseudo-polynomial run-time for these problems. As applications of our results, we derive the first strongly polynomial time algorithms for (i) degree-specified hypergraph connectivity augmentation using hyperedges and (ii) degree-specified hypergraph node-to-area connectivity augmentation using hyperedges.more » « less
-
Chan, Timothy; Fischer, Johannes; Iacono, John; Herman, Grzegorz (Ed.)We describe a simple deterministic near-linear time approximation scheme for uncapacitated minimum cost flow in undirected graphs with positive real edge weights, a problem also known as transshipment. Specifically, our algorithm takes as input a (connected) undirected graph G = (V, E), vertex demands b ∈ R^V such that ∑_{v ∈ V} b(v) = 0, positive edge costs c ∈ R_{≥ 0}^E, and a parameter ε > 0. In O(ε^{-2} m log^{O(1)} n) time, it returns a flow f such that the net flow out of each vertex is equal to the vertex’s demand and the cost of the flow is within a (1 ± ε) factor of optimal. Our algorithm is combinatorial and has no running time dependency on the demands or edge costs. With the exception of a recent result presented at STOC 2022 for polynomially bounded edge weights, all almost- and near-linear time approximation schemes for transshipment relied on randomization to embed the problem instance into low-dimensional space. Our algorithm instead deterministically approximates the cost of routing decisions that would be made if the input were subject to a random tree embedding. To avoid computing the Ω(n²) vertex-vertex distances that an approximation of this kind suggests, we also take advantage of the clustering method used in the well-known Thorup-Zwick distance oracle.more » « less
-
Chan, Timothy; Fischer, Johannes; Iacono, John; Herman, Grzegorz (Ed.)We study the problem of robust multivariate polynomial regression: let p: ℝⁿ → ℝ be an unknown n-variate polynomial of degree at most d in each variable. We are given as input a set of random samples (𝐱_i,y_i) ∈ [-1,1]ⁿ × ℝ that are noisy versions of (𝐱_i,p(𝐱_i)). More precisely, each 𝐱_i is sampled independently from some distribution χ on [-1,1]ⁿ, and for each i independently, y_i is arbitrary (i.e., an outlier) with probability at most ρ < 1/2, and otherwise satisfies |y_i-p(𝐱_i)| ≤ σ. The goal is to output a polynomial p̂, of degree at most d in each variable, within an 𝓁_∞-distance of at most O(σ) from p. Kane, Karmalkar, and Price [FOCS'17] solved this problem for n = 1. We generalize their results to the n-variate setting, showing an algorithm that achieves a sample complexity of O_n(dⁿlog d), where the hidden constant depends on n, if χ is the n-dimensional Chebyshev distribution. The sample complexity is O_n(d^{2n}log d), if the samples are drawn from the uniform distribution instead. The approximation error is guaranteed to be at most O(σ), and the run-time depends on log(1/σ). In the setting where each 𝐱_i and y_i are known up to N bits of precision, the run-time’s dependence on N is linear. We also show that our sample complexities are optimal in terms of dⁿ. Furthermore, we show that it is possible to have the run-time be independent of 1/σ, at the cost of a higher sample complexity.more » « less
-
Chan, Timothy; Fischer, Johannes; Iacono, John; Herman, Grzegorz (Ed.)The maximum coverage problem is to select k sets, from a collection of m sets, such that the cardinality of their union, in a universe of size n, is maximized. We consider (1-1/e-ε)-approximation algorithms for this NP-hard problem in three standard data stream models. 1) Dynamic Model. The stream consists of a sequence of sets being inserted and deleted. Our multi-pass algorithm uses ε^{-2} k ⋅ polylog(n,m) space. The best previous result (Assadi and Khanna, SODA 2018) used (n +ε^{-4} k) polylog(n,m) space. While both algorithms use O(ε^{-1} log m) passes, our analysis shows that, when ε ≤ 1/log log m, it is possible to reduce the number of passes by a 1/log log m factor without incurring additional space. 2) Random Order Model. In this model, there are no deletions, and the sets forming the instance are uniformly randomly permuted to form the input stream. We show that a single pass and k polylog(n,m) space suffices for arbitrary small constant ε. The best previous result, by Warneke et al. (ESA 2023), used k² polylog(n,m) space. 3) Insert-Only Model. Lastly, our results, along with numerous previous results, use a sub-sampling technique introduced by McGregor and Vu (ICDT 2017) to sparsify the input instance. We explain how this technique and others used in the paper can be implemented such that the amortized update time of our algorithm is polylogarithmic. This also implies an improvement of the state-of-the-art insert only algorithms in terms of the update time: polylog(m,n) update time suffices, whereas the best previous result by Jaud et al. (SEA 2023) required update time that was linear in k.more » « less
An official website of the United States government
