DAG-Structured Clustering by Nearest Neighbors

Monath, Nicholas; Zaheer, Manzil; Dubey, Kumar Avinava; Ahmed, Amr; McCallum, Andrew

Citation Details

Hierarchical clusterings compactly encode multiple granularities of clusters within a tree structure. Hierarchies, by definition, fail to capture different flat partitions that are not subsumed in one another. In this paper, we advocate for an alternative structure for representing multiple clusterings, a directed acyclic graph (DAG). By allowing nodes to have multiple parents, DAG structures are not only more flexible than trees, but also allow for points to be members of multiple clusters. We describe a scalable algorithm, Llama, which simply merges nearest neighbor substructures to form a DAG structure. Llama discovers structures that are more accurate than state-of-the-art tree-based techniques while remaining scalable to large-scale clustering benchmarks. Additionally, we support the proposed algorithm with theoretical guarantees on separated data, including types of data that cannot be correctly clustered by tree-based algorithms. more »

Award ID(s):: 1763618

PAR ID:: 10290562

Author(s) / Creator(s):: Monath, Nicholas; Zaheer, Manzil; Dubey, Kumar Avinava; Ahmed, Amr; McCallum, Andrew

Date Published:: 2021-01-01

Journal Name:: Proceedings of The 24th International Conference on Artificial Intelligence and Statistics

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript1.0
Conference Paper:
The DOI is not currently available.

More Like this