Semi-Supervised Aggregation of Dependent Weak Supervision Sources With Performance Guarantees

Mazzetto, Alessio; Sam, Dylan; Park, Andrew; Upfal, Eli; Bach, Stephen

Citation Details

We develop a novel method that provides theoretical guarantees for learning from weak labelers without the (mostly unrealistic) assumption that the errors of the weak labelers are independent or come from a particular family of distributions. We show a rigorous technique for efficiently selecting small subsets of the labelers so that a majority vote from such subsets has a provably low error rate. We explore several extensions of this method and provide experimental results over a range of labeled data set sizes on 45 image classification tasks. Our performance-guaranteed methods consistently match the best performing alternative, which varies based on problem difficulty. On tasks with accurate weak labelers, our methods are on average 3 percentage points more accurate than the state-of-the-art adversarial method. On tasks with inaccurate weak labelers, our methods are on average 15 percentage points more accurate than the semi-supervised Dawid-Skene model (which assumes independence). more »

Award ID(s):: 1813444

PAR ID:: 10273522

Author(s) / Creator(s):: Mazzetto, Alessio; Sam, Dylan; Park, Andrew; Upfal, Eli; Bach, Stephen

Editor(s):: Arindam, Banerjee; Kenji, Fukumizu

Date Published:: 2021-04-06

Journal Name:: International Conference on Artificial Intelligence and Statistics, 13-15

Volume:: 30

Page Range / eLocation ID:: 3196-3204

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript1.0
Conference Paper:
The DOI is not currently available.

More Like this