Exact and Approximate Hierarchical Clustering with A*

Greenberg, Craig; Macaluso, Sebastian; Monath, Nicholas; Dubey, Avinava; Flaherty, Patrick; Zaheer, Manzil; Ahmed, Amr; Cranmer, Kyle; McCallum, Andrew

Citation Details

Hierarchical clustering is a critical task in numerous domains. Many approaches are based on heuristics and the properties of the resulting clusterings are studied post hoc. However, in several applications, there is a natural cost function that can be used to characterize the quality of the clustering. In those cases, hierarchical clustering can be seen as a combinatorial optimization problem. To that end, we introduce a new approach based on A* search. We overcome the prohibitively large search space by combining A* with a novel \emph{trellis} data structure. This combination results in an exact algorithm that scales beyond previous state of the art, from a search space with 10^12 trees to 10^15 trees, and an approximate algorithm that improves over baselines, even in enormous search spaces that contain more than 10^1000 trees. We empirically demonstrate that our method achieves substantially higher quality results than baselines for a particle physics use case and other clustering benchmarks. We describe how our method provides significantly improved theoretical bounds on the time and space complexity of A* for clustering. more »

Award ID(s):: 1763618

PAR ID:: 10290571

Author(s) / Creator(s):: Greenberg, Craig; Macaluso, Sebastian; Monath, Nicholas; Dubey, Avinava; Flaherty, Patrick; Zaheer, Manzil; Ahmed, Amr; Cranmer, Kyle; McCallum, Andrew

Date Published:: 2021-01-01

Journal Name:: The Conference on Uncertainty in Artificial Intelligence (UAI)

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript1.0
Conference Paper:
The DOI is not currently available.

More Like this