Exploiting Proximity Search and Easy Examples to Select Rare Events

Kang, Daniel; Derhacobian, Alex; Tsuji, Kaoru; Hebert, Trevor; Bailis, Peter; Fukami, Tadashi; Hashimoto, Tatsunori; Sun, Yi; Zaharia, Matei

Citation Details

A common problem practitioners face is to select rare events in a large dataset. Unfortunately, standard techniques ranging from pre-trained models to active learning do not leverage proximity structure present in many datasets and can lead to worse-than-random results. To address this, we propose EZMODE, an algorithm for iterative selection of rare events in large, unlabeled datasets. EZMODE leverages active learning to iteratively train classifiers, but chooses the easiest positive examples to label in contrast to standard uncertainty techniques. EZMODE also leverages proximity structure (e.g., temporal sampling) to find difficult positive examples. We show that EZMODE can outperform baselines by up to 130× on a novel, real-world, 9,000 GB video dataset. more »

Award ID(s):: 1737758

PAR ID:: 10316595

Author(s) / Creator(s):: Kang, Daniel; Derhacobian, Alex; Tsuji, Kaoru; Hebert, Trevor; Bailis, Peter; Fukami, Tadashi; Hashimoto, Tatsunori; Sun, Yi; Zaharia, Matei

Date Published:: 2021-12-14

Journal Name:: NeurIPS Data-Centric AI Workshop 2021

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript1.0
Conference Paper:
The DOI is not currently available.

More Like this