Search for: All records

Creators/Authors contains: "Greenberg, Craig"

« Prev Next »

Total Resources

5

Resource Type
Conference Paper

5

Conference Proceeding

0

Dataset

0

Journal Article

0

Workshop Report

0

Availability
Full Text / Resource Available

5

Citation Only

0

Save Results
Excel (limit 2000)
CSV (limit 5000)
XML (limit 5000)

Have feedback or suggestions for a way to improve these results?
!

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Cluster Trellis: Data Structures & Algorithms for Exact Inference in Hierarchical Clustering

Greenberg, Craig ; Macaluso, Sebastian ; Monath, Nicholas ; Lee, Ji Ah ; Flaherty, Patrick ; Cranmer, Kyle ; McGregor, Andrew ; McCallum, Andrew ( January 2021 , AISTATS)
null (Ed.)
Full Text Available
Exact and Approximate Hierarchical Clustering with A*

Greenberg, Craig ; Macaluso, Sebastian ; Monath, Nicholas ; Dubey, Avinava ; Flaherty, Patrick ; Zaheer, Manzil ; Ahmed, Amr ; Cranmer, Kyle ; McCallum, Andrew ( January 2021 , The Conference on Uncertainty in Artificial Intelligence (UAI))
null (Ed.)
Hierarchical clustering is a critical task in numerous domains. Many approaches are based on heuristics and the properties of the resulting clusterings are studied post hoc. However, in several applications, there is a natural cost function that can be used to characterize the quality of the clustering. In those cases, hierarchical clustering can be seen as a combinatorial optimization problem. To that end, we introduce a new approach based on A* search. We overcome the prohibitively large search space by combining A* with a novel \emph{trellis} data structure. This combination results in an exact algorithm that scales beyond previous state of the art, from a search space with 10^12 trees to 10^15 trees, and an approximate algorithm that improves over baselines, even in enormous search spaces that contain more than 10^1000 trees. We empirically demonstrate that our method achieves substantially higher quality results than baselines for a particle physics use case and other clustering benchmarks. We describe how our method provides significantly improved theoretical bounds on the time and space complexity of A* for clustering.
more » « less
Full Text Available
Exact and Approximate Hierarchical Clustering Using A*

Greenberg, Craig ; Macaluso, Sebastian ; Monath, Nicholas ; Dubey, Avinava ; Flaherty, Patrick ; Zaheer, Manzil ; Ahmed, Amr ; Cranmer, Amr ; McCallum, Andrew ( January 2021 , UAI 2021)

Full Text Available
Exact and Approximate Hierarchical Clustering Using A*

Greenberg, Craig ; Macaluso, Sebastian ; Monath, Nicholas ; Dubey, Avinava ; Flaherty, Patrick ; Zaheer, Manzil ; Ahmed, Amr ; Cranmer, Kyle ; McCallum, Andrew ( January 2021 , UAI)
null (Ed.)
Full Text Available
Compact Representation of Uncertainty in Clustering

Greenberg, Craig ; Monath, Nicholas ; Kobren, Ari ; Flaherty, Patrick ; McGregor, Andrew ; McCallum, Andrew ( January 2018 , Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, NeurIPS 2018)

For many classic structured prediction problems, probability distributions over the dependent variables can be efficiently computed using widely-known algorithms and data structures (such as forward-backward, and its corresponding trellis for exact probability distributions in Markov models). However, we know of no previ- ous work studying efficient representations of exact distributions over clusterings. This paper presents definitions and proofs for a dynamic-programming inference procedure that computes the partition function, the marginal probability of a cluster, and the MAP clustering—all exactly. Rather than the N th Bell number, these exact solutions take time and space proportional to the substantially smaller powerset of N . Indeed, we improve upon the time complexity of the algorithm introduced by Kohonen and Corander [11] for this problem by a factor of N. While still large, this previously unknown result is intellectually interesting in its own right, makes feasible exact inference for important real-world small data applications (such as medicine), and provides a natural stepping stone towards sparse-trellis approximations that enable further scalability (which we also explore). In experi- ments, we demonstrate the superiority of our approach over approximate methods in analyzing real-world gene expression data used in cancer treatment.
more » « less
Full Text Available