This content will become publicly available on January 1, 2023

Latent Outlier Exposure for Anomaly Detection with Contaminated Data
Anomaly detection aims at identifying data points that show systematic deviations from the major- ity of data in an unlabeled dataset. A common assumption is that clean training data (free of anomalies) is available, which is often violated in practice. We propose a strategy for training an anomaly detector in the presence of unlabeled anomalies that is compatible with a broad class of models. The idea is to jointly infer binary la- bels to each datum (normal vs. anomalous) while updating the model parameters. Inspired by out- lier exposure (Hendrycks et al., 2018) that con- siders synthetically created, labeled anomalies, we thereby use a combination of two losses that share parameters: one for the normal and one for the anomalous data. We then iteratively proceed with block coordinate updates on the parameters and the most likely (latent) labels. Our exper- iments with several backbone models on three image datasets, 30 tabular data sets, and a video anomaly detection benchmark showed consistent and significant improvements over the baselines.
Authors:
; ; ; ;
Award ID(s):
Publication Date:
NSF-PAR ID:
10347068
Journal Name:
Proceedings of Machine Learning Research
Volume:
162
ISSN:
2640-3498
We present the Swimmy (Subaru WIde-field Machine-learning anoMalY) survey program, a deep-learning-based search for unique sources using multicolored (grizy) imaging data from the Hyper Suprime-Cam Subaru Strategic Program (HSC-SSP). This program aims to detect unexpected, novel, and rare populations and phenomena, by utilizing the deep imaging data acquired from the wide-field coverage of the HSC-SSP. This article, as the first paper in the Swimmy series, describes an anomaly detection technique to select unique populations as “outliers” from the data-set. The model was tested with known extreme emission-line galaxies (XELGs) and quasars, which consequently confirmed that the proposed method successfully selected $\sim\!\! 60\%$–$70\%$ of the quasars and $60\%$ of the XELGs without labeled training data. In reference to the spectral information of local galaxies at z = 0.05–0.2 obtained from the Sloan Digital Sky Survey, we investigated the physical properties of the selected anomalies and compared them based on the significance of their outlier values. The results revealed that XELGs constitute notable fractions of the most anomalous galaxies, and certain galaxies manifest unique morphological features. In summary, deep anomaly detection is an effective tool that can search rare objects, and, ultimately, unknown unknowns with large data-sets. Further development of themore »