Algorithms for Generalized Topic Modeling

Blum, Avrim; Haghtalab, Nika

Citation Details

Recently there has been significant activity in developing algorithms with provable guarantees for topic modeling. In this work we consider a broad generalization of the traditional topic modeling framework, where we no longer assume that words are drawn i.i.d. and instead view a topic as a complex distribution over sequences of paragraphs. Since one could not hope to even represent such a distribution in general (even if paragraphs are given using some natural feature representation), we aim instead to directly learn a predictor that given a new document, accurately predicts its topic mixture, without learning the distributions explicitly. We present several natural conditions under which one can do this from unlabeled data only, and give efficient algorithms to do so, also discussing issues such as noise tolerance and sample complexity. More generally, our model can be viewed as a generalization of the multi-view or co-training setting in machine learning. more »

Award ID(s):: 1525971 1800317

NSF-PAR ID:: 10057349

Author(s) / Creator(s):: Blum, Avrim; Haghtalab, Nika

Date Published:: 2018-02-01

Journal Name:: Proceedings of the ... AAAI Conference on Artificial Intelligence

ISSN:: 2159-5399

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript1.0
Conference Paper:
The DOI is not currently available.

More Like this