Compact Representation of Uncertainty in Clustering

Greenberg, Craig; Monath, Nicholas; Kobren, Ari; Flaherty, Patrick; McGregor, Andrew; McCallum, Andrew

Citation Details

For many classic structured prediction problems, probability distributions over the dependent variables can be efficiently computed using widely-known algorithms and data structures (such as forward-backward, and its corresponding trellis for exact probability distributions in Markov models). However, we know of no previ- ous work studying efficient representations of exact distributions over clusterings. This paper presents definitions and proofs for a dynamic-programming inference procedure that computes the partition function, the marginal probability of a cluster, and the MAP clustering—all exactly. Rather than the N th Bell number, these exact solutions take time and space proportional to the substantially smaller powerset of N . Indeed, we improve upon the time complexity of the algorithm introduced by Kohonen and Corander [11] for this problem by a factor of N. While still large, this previously unknown result is intellectually interesting in its own right, makes feasible exact inference for important real-world small data applications (such as medicine), and provides a natural stepping stone towards sparse-trellis approximations that enable further scalability (which we also explore). In experi- ments, we demonstrate the superiority of our approach over approximate methods in analyzing real-world gene expression data used in cancer treatment. more »

Award ID(s):: 1637536

PAR ID:: 10112136

Author(s) / Creator(s):: Greenberg, Craig; Monath, Nicholas; Kobren, Ari; Flaherty, Patrick; McGregor, Andrew; McCallum, Andrew

Date Published:: 2018-01-01

Journal Name:: Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, NeurIPS 2018

Page Range / eLocation ID:: 8639--8649

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript1.0
Conference Paper:
The DOI is not currently available.

More Like this