We show that the RandomCoordinateCut algorithm gives the optimal competitive ratio for explainable k-medians in l1. The problem of explainable k-medians was introduced by Dasgupta, Frost, Moshkovitz, and Rashtchian in 2020. Several groups of authors independently proposed a simple polynomial-time randomized algorithm for the problem and showed that this algorithm is O(log k loglog k) competitive. We provide a tight analysis of the algorithm and prove that its competitive ratio is upper bounded by 2ln k +2. This bound matches the Omega(log k) lower bound by Dasgupta et al (2020). 
                        more » 
                        « less   
                    
                            
                            Near-optimal Algorithms for Explainable k-Medians and k-Means
                        
                    
    
            We consider the problem of explainable k-medians and k-means introduced by Dasgupta, Frost, Moshkovitz, and Rashtchian (ICML 2020). In this problem, our goal is to find a threshold decision tree that partitions data into k clusters and minimizes the k-medians or k-means objective. The obtained clustering is easy to interpret because every decision node of a threshold tree splits data based on a single feature into two groups. We propose a new algorithm for this problem which is O(log k) competitive with k-medians with ℓ1 norm and O(k) competitive with k-means. This is an improvement over the previous guarantees of O(k) and O(k^2) by Dasgupta et al (2020). We also provide a new algorithm which is O(log^{3}{2}k) competitive for k-medians with ℓ2 norm. Our first algorithm is near-optimal: Dasgupta et al (2020) showed a lower bound of Ω(log k) for k-medians; in this work, we prove a lower bound of Ω(k) for k-means. We also provide a lower bound of Ω(log k) for k-medians with ℓ2 norm. 
        more » 
        « less   
        
    
                            - Award ID(s):
- 1955351
- PAR ID:
- 10281998
- Date Published:
- Journal Name:
- International Conference on Machine Learning
- Volume:
- 38
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
- 
            
- 
            We provide a new bi-criteria O(log2k) competitive algorithm for explainable k-means clustering. Explainable k-means was recently introduced by Dasgupta, Frost, Moshkovitz, and Rashtchian (ICML 2020). It is described by an easy to interpret and understand (threshold) decision tree or diagram. The cost of the explainable k-means clustering equals to the sum of costs of its clusters; and the cost of each cluster equals the sum of squared distances from the points in the cluster to the center of that cluster. The best non bi-criteria algorithm for explainable clustering O(k) competitive, and this bound is tight. Our randomized bi-criteria algorithm constructs a threshold decision tree that partitions the data set into (1+δ)k clusters (where δ∈(0,1) is a parameter of the algorithm). The cost of this clustering is at most O(1/δ⋅log2k) times the cost of the optimal unconstrained k-means clustering. We show that this bound is almost optimal.more » « less
- 
            Chakrabarti, Amit; Swamy, Chaitanya (Ed.)A Boolean maximum constraint satisfaction problem, Max-CSP(f), is specified by a predicate f:{-1,1}^k → {0,1}. An n-variable instance of Max-CSP(f) consists of a list of constraints, each of which applies f to k distinct literals drawn from the n variables. For k = 2, Chou, Golovnev, and Velusamy [Chou et al., 2020] obtained explicit ratios characterizing the √ n-space streaming approximability of every predicate. For k ≥ 3, Chou, Golovnev, Sudan, and Velusamy [Chou et al., 2022] proved a general dichotomy theorem for √ n-space sketching algorithms: For every f, there exists α(f) ∈ (0,1] such that for every ε > 0, Max-CSP(f) is (α(f)-ε)-approximable by an O(log n)-space linear sketching algorithm, but (α(f)+ε)-approximation sketching algorithms require Ω(√n) space. In this work, we give closed-form expressions for the sketching approximation ratios of multiple families of symmetric Boolean functions. Letting α'_k = 2^{-(k-1)} (1-k^{-2})^{(k-1)/2}, we show that for odd k ≥ 3, α(kAND) = α'_k, and for even k ≥ 2, α(kAND) = 2α'_{k+1}. Thus, for every k, kAND can be (2-o(1))2^{-k}-approximated by O(log n)-space sketching algorithms; we contrast this with a lower bound of Chou, Golovnev, Sudan, Velingker, and Velusamy [Chou et al., 2022] implying that streaming (2+ε)2^{-k}-approximations require Ω(n) space! We also resolve the ratio for the "at-least-(k-1)-1’s" function for all even k; the "exactly-(k+1)/2-1’s" function for odd k ∈ {3,…,51}; and fifteen other functions. We stress here that for general f, the dichotomy theorem in [Chou et al., 2022] only implies that α(f) can be computed to arbitrary precision in PSPACE, and thus closed-form expressions need not have existed a priori. Our analyses involve identifying and exploiting structural "saddle-point" properties of this dichotomy. Separately, for all threshold functions, we give optimal "bias-based" approximation algorithms generalizing [Chou et al., 2020] while simplifying [Chou et al., 2022]. Finally, we investigate the √ n-space streaming lower bounds in [Chou et al., 2022], and show that they are incomplete for 3AND, i.e., they fail to rule out (α(3AND})-ε)-approximations in o(√ n) space.more » « less
- 
            Consider an instance of Euclidean k-means or k-medians clustering. We show that the cost of the optimal solution is preserved up to a factor of (1+ε) under a projection onto a random O(log(k /ε) / ε2)-dimensional subspace. Further, the cost of every clustering is preserved within (1+ε). More generally, our result applies to any dimension reduction map satisfying a mild sub-Gaussian-tail condition. Our bound on the dimension is nearly optimal. Additionally, our result applies to Euclidean k-clustering with the distances raised to the p-th power for any constant p. For k-means, our result resolves an open problem posed by Cohen, Elder, Musco, Musco, and Persu (STOC 2015); for k-medians, it answers a question raised by Kannan.more » « less
- 
            Chan, Timothy; Fischer, Johannes; Iacono, John; Herman, Grzegorz (Ed.)In the Directed Steiner Tree (DST) problem the input is a directed edge-weighted graph G = (V,E), a root vertex r and a set S ⊆ V of k terminals. The goal is to find a min-cost subgraph that connects r to each of the terminals. DST admits an O(log² k/log log k)-approximation in quasi-polynomial time [Grandoni et al., 2022; Rohan Ghuge and Viswanath Nagarajan, 2022], and an O(k^{ε})-approximation for any fixed ε > 0 in polynomial-time [Alexander Zelikovsky, 1997; Moses Charikar et al., 1999]. Resolving the existence of a polynomial-time poly-logarithmic approximation is a major open problem in approximation algorithms. In a recent work, Friggstad and Mousavi [Zachary Friggstad and Ramin Mousavi, 2023] obtained a simple and elegant polynomial-time O(log k)-approximation for DST in planar digraphs via Thorup’s shortest path separator theorem [Thorup, 2004]. We build on their work and obtain several new results on DST and related problems. - We develop a tree embedding technique for rooted problems in planar digraphs via an interpretation of the recursion in [Zachary Friggstad and Ramin Mousavi, 2023]. Using this we obtain polynomial-time poly-logarithmic approximations for Group Steiner Tree [Naveen Garg et al., 2000], Covering Steiner Tree [Goran Konjevod et al., 2002] and the Polymatroid Steiner Tree [Gruia Călinescu and Alexander Zelikovsky, 2005] problems in planar digraphs. All these problems are hard to approximate to within a factor of Ω(log² n/log log n) even in trees [Eran Halperin and Robert Krauthgamer, 2003; Grandoni et al., 2022]. - We prove that the natural cut-based LP relaxation for DST has an integrality gap of O(log² k) in planar digraphs. This is in contrast to general graphs where the integrality gap of this LP is known to be Ω(√k) [Leonid Zosin and Samir Khuller, 2002] and Ω(n^{δ}) for some fixed δ > 0 [Shi Li and Bundit Laekhanukit, 2022]. - We combine the preceding results with density based arguments to obtain poly-logarithmic approximations for the multi-rooted versions of the problems in planar digraphs. For DST our result improves the O(R + log k) approximation of [Zachary Friggstad and Ramin Mousavi, 2023] when R = ω(log² k).more » « less
 An official website of the United States government
An official website of the United States government 
				
			 
					 
					
 
                                    