Motivated by applications to classification problems on metric data, we study Weighted Metric Clustering problem: given a metric d over n points and a k x k symmetric matrix A with nonnegative entries, the goal is to find a kpartition of these points into clusters C1,...,Ck, while minimizing the sum of A[i,j] * d(u,v) over all pairs of clusters Ci and Cj and all pairs of points u from Ci and v from Cj. Specific choices of A lead to Weighted Metric Clustering capturing wellstudied graph partitioning problems in metric spaces, such as MinUncut, MinkSum, MinkCut, and more.Our main result is that Weighted Metric Clustering admits a polynomialtime approximation scheme (PTAS). Our algorithm handles all the above problems using the SheraliAdams linear programming relaxation. This subsumes several prior works, unifies many of the techniques for various metric clustering objectives, and yields a PTAS for several new problems, including metric clustering on manifolds and a new family of hierarchical clustering objectives. Our experiments on the hierarchical clustering objective show that it better captures the groundtruth structural information compared to the popular Dasgupta's objective.
Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher.
Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?
Some links on this page may take you to nonfederal websites. Their policies may differ from this site.

Free, publiclyaccessible full text available March 25, 2025

Free, publiclyaccessible full text available January 1, 2025

We study the natural problem of Triplet Reconstruction (also known as Rooted Triplets Consistency or Triplet Clustering), originally motivated by applications in computational biology and relational databases (Aho, Sagiv, Szymanski, and Ullman, 1981) [2]: given n datapoints, we want to embed them onto the n leaves of a rooted binary tree (also known as a hierarchical clustering, or ultrametric embedding) such that a given set of m triplet constraints is satisfied. A triplet constraint i j · k for points i, j, k indicates that 'i, j are more closely related to each other than to k,' (in terms of distances d(i, j) ≤ d(i, k) and d(i, j) ≤ d(j, k)) and we say that a tree satisfies the triplet i j · k if the distance in the tree between i, j is smaller than the distance between i, k (or j, k). Among all possible trees with n leaves, can we efficiently find one that satisfies a large fraction of the m given triplets? Aho et al. (1981) [2] studied the decision version and gave an elegant polynomialtime algorithm that determines whether or not there exists a tree that satisfies all of the m constraints. Moreover, it is straightforward to see that a random binary tree achieves a constant 13approximation, since there are only 3 distinct triplets i jk, i k j, j k · i (each will be satisfied w.p. 13). Unfortunately, despite more than four decades of research by various communities, there is no better approximation algorithm for this basic Triplet Reconstruction problem.Our main theoremwhich captures Triplet Reconstruction as a special caseis a general hardness of approximation result about Constraint Satisfaction Problems (CSPs) over infinite domains (CSPs where instead of boolean values {0,1} or a fixedsize domain, the variables can be mapped to any of the n leaves of a tree). Specifically, we prove that assuming the Unique Games Conjecture [57], Triplet Reconstruction and more generally, every Constraint Satisfaction Problem (CSP) over hierarchies is approximation resistant, i.e., there is no polynomialtime algorithm that does asymptotically better than a biased random assignment.Our result settles the approximability not only for Triplet Reconstruction, but for many interesting problems that have been studied by various scientific communities such as the popular Quartet Reconstruction and Subtree/Supertree Aggregation Problems. More broadly, our result significantly extends the list of approximation resistant predicates by pointing to a large new family of hard problems over hierarchies. Our main theorem is a generalization of Guruswami, Håstad, Manokaran, Raghavendra, and Charikar (2011) [36], who showed that ordering CSPs (CSPs over permutations of n elements, e.g., Max Acyclic Subgraph, Betweenness, NonBetweenness) are approximation resistant. The main challenge in our analyses stems from the fact that trees have topology (in contrast to permutations and ordering CSPs) and it is the tree topology that determines whether a given constraint on the variables is satisfied or not. As a byproduct, we also present some of the first CSPs where their approximation resistance is proved against biased random assignments, instead of uniformly random assignments.more » « lessFree, publiclyaccessible full text available November 6, 2024