%AGreenberg, Craig%AMonath, Nicholas%AKobren, Ari%AFlaherty, Patrick%AMcGregor, Andrew%AMcCallum, Andrew%D2018%I
%K
%MOSTI ID: 10112136
%PMedium: X
%TCompact Representation of Uncertainty in Clustering
%XFor many classic structured prediction problems, probability distributions over the dependent variables can be efficiently computed using widely-known algorithms and data structures (such as forward-backward, and its corresponding trellis for exact probability distributions in Markov models). However, we know of no previ- ous work studying efficient representations of exact distributions over clusterings. This paper presents definitions and proofs for a dynamic-programming inference procedure that computes the partition function, the marginal probability of a cluster, and the MAP clusteringâ€”all exactly. Rather than the N th Bell number, these exact solutions take time and space proportional to the substantially smaller powerset of N . Indeed, we improve upon the time complexity of the algorithm introduced by Kohonen and Corander [11] for this problem by a factor of N. While still large, this previously unknown result is intellectually interesting in its own right, makes feasible exact inference for important real-world small data applications (such as medicine), and provides a natural stepping stone towards sparse-trellis approximations that enable further scalability (which we also explore). In experi- ments, we demonstrate the superiority of our approach over approximate methods in analyzing real-world gene expression data used in cancer treatment.
Country unknown/Code not availableOSTI-MSA