skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Random Cuts are Optimal for Explainable k-Medians
We show that the RandomCoordinateCut algorithm gives the optimal competitive ratio for explainable k-medians in l1. The problem of explainable k-medians was introduced by Dasgupta, Frost, Moshkovitz, and Rashtchian in 2020. Several groups of authors independently proposed a simple polynomial-time randomized algorithm for the problem and showed that this algorithm is O(log k loglog k) competitive. We provide a tight analysis of the algorithm and prove that its competitive ratio is upper bounded by 2ln k +2. This bound matches the Omega(log k) lower bound by Dasgupta et al (2020).  more » « less
Award ID(s):
1955351
PAR ID:
10519127
Author(s) / Creator(s):
;
Publisher / Repository:
NeurIPS 2023
Date Published:
Format(s):
Medium: X
Location:
New Orleans, Louisiana, United States of America
Sponsoring Org:
National Science Foundation
More Like this
  1. null (Ed.)
    We consider the problem of explainable k-medians and k-means introduced by Dasgupta, Frost, Moshkovitz, and Rashtchian (ICML 2020). In this problem, our goal is to find a threshold decision tree that partitions data into k clusters and minimizes the k-medians or k-means objective. The obtained clustering is easy to interpret because every decision node of a threshold tree splits data based on a single feature into two groups. We propose a new algorithm for this problem which is O(log k) competitive with k-medians with ℓ1 norm and O(k) competitive with k-means. This is an improvement over the previous guarantees of O(k) and O(k^2) by Dasgupta et al (2020). We also provide a new algorithm which is O(log^{3}{2}k) competitive for k-medians with ℓ2 norm. Our first algorithm is near-optimal: Dasgupta et al (2020) showed a lower bound of Ω(log k) for k-medians; in this work, we prove a lower bound of Ω(k) for k-means. We also provide a lower bound of Ω(log k) for k-medians with ℓ2 norm. 
    more » « less
  2. We provide a new bi-criteria O(log2k) competitive algorithm for explainable k-means clustering. Explainable k-means was recently introduced by Dasgupta, Frost, Moshkovitz, and Rashtchian (ICML 2020). It is described by an easy to interpret and understand (threshold) decision tree or diagram. The cost of the explainable k-means clustering equals to the sum of costs of its clusters; and the cost of each cluster equals the sum of squared distances from the points in the cluster to the center of that cluster. The best non bi-criteria algorithm for explainable clustering O(k) competitive, and this bound is tight. Our randomized bi-criteria algorithm constructs a threshold decision tree that partitions the data set into (1+δ)k clusters (where δ∈(0,1) is a parameter of the algorithm). The cost of this clustering is at most O(1/δ⋅log2k) times the cost of the optimal unconstrained k-means clustering. We show that this bound is almost optimal. 
    more » « less
  3. Kumar, Amit; Ron-Zewi, Noga (Ed.)
    We consider the problem in which n points arrive online over time, and upon arrival must be irrevocably assigned to one of k clusters where the objective is the standard k-median objective. Lower-bound instances show that for this problem no online algorithm can achieve a competitive ratio bounded by any function of n. Thus we turn to a beyond worst-case analysis approach, namely we assume that the online algorithm is a priori provided with a predicted budget B that is an upper bound to the optimal objective value (e.g., obtained from past instances). Our main result is an online algorithm whose competitive ratio (measured against B) is solely a function of k. We also give a lower bound showing that the competitive ratio of every algorithm must depend on k. 
    more » « less
  4. null (Ed.)
    We study an online hypergraph matching problem with delays, motivated by ridesharing applications. In this model, users enter a marketplace sequentially, and are willing to wait up to $$d$$ timesteps to be matched, after which they will leave the system in favor of an outside option. A platform can match groups of up to $$k$$ users together, indicating that they will share a ride. Each group of users yields a match value depending on how compatible they are with one another. As an example, in ridesharing, $$k$$ is the capacity of the service vehicles, and $$d$$ is the amount of time a user is willing to wait for a driver to be matched to them. We present results for both the utility maximization and cost minimization variants of the problem. In the utility maximization setting, the optimal competitive ratio is $$\frac{1}{d}$$ whenever $$k \geq 3$$, and is achievable in polynomial-time for any fixed $$k$$. In the cost minimization variation, when $k = 2$, the optimal competitive ratio for deterministic algorithms is $$\frac{3}{2}$$ and is achieved by a polynomial-time thresholding algorithm. When $k>2$, we show that a polynomial-time randomized batching algorithm is $$(2 - \frac{1}{d}) \log k$$-competitive, and it is NP-hard to achieve a competitive ratio better than $$\log k - O (\log \log k)$$. 
    more » « less
  5. Guruswami, Venkatesan (Ed.)
    The problem of sorting with priced information was introduced by [Charikar, Fagin, Guruswami, Kleinberg, Raghavan, Sahai (CFGKRS), STOC 2000]. In this setting, different comparisons have different (potentially infinite) costs. The goal is to find a sorting algorithm with small competitive ratio, defined as the (worst-case) ratio of the algorithm’s cost to the cost of the cheapest proof of the sorted order. The simple case of bichromatic sorting posed by [CFGKRS] remains open: We are given two sets A and B of total size N, and the cost of an A-A comparison or a B-B comparison is higher than an A-B comparison. The goal is to sort A ∪ B. An Ω(log N) lower bound on competitive ratio follows from unit-cost sorting. Note that this is a generalization of the famous nuts and bolts problem, where A-A and B-B comparisons have infinite cost, and elements of A and B are guaranteed to alternate in the final sorted order. In this paper we give a randomized algorithm InversionSort with an almost-optimal w.h.p. competitive ratio of O(log³ N). This is the first algorithm for bichromatic sorting with a o(N) competitive ratio. 
    more » « less