Detection of unknown galaxy types in large databases of galaxy images

Modern digital sky surveys utilize robotic telescopes that collect extremely large multi- PB astronomical databases. While these databases can contain billions of galaxies, most of the galaxies are “regular” galaxies of known galaxy types. However, a small portion of the galaxies is rare “peculiar” galaxies that are not yet known. These unknown galaxies are of paramount scientific interest, but due to the enormous size of astronomical databases they are practically impossible to find without automation. Since these novelty galaxies are, by definition, not known, machine learning models cannot be trained to detect them. In this paper, an unsupervised machine learning method for automatic detection of novelty galaxies in large databases is proposed. The method is based on a large and comprehensive set of numerical image content descriptors weighted by their entropy, and the farthest neighbors are ranked-ordered to handle self-similar peculiar galaxies that are expected in the very large datasets. Experimental results using data from the Panoramic Survey Telescope and Rapid Response System (Pan-STARRS) show that the ability of the method to detect novelty galaxies outperforms other shallow learning methods such as one-class SVM, Local Outlier Factor, and K-Means, and also newer deep learning-based methods such as auto-encoders. The dataset more »

Authors:
; ;
Award ID(s):
Publication Date:
NSF-PAR ID:
10268489
Journal Name:
EPiC Series in Computing
Volume:
76
Page Range or eLocation-ID:
29 to 18
ISSN:
2398-7340
We present the Swimmy (Subaru WIde-field Machine-learning anoMalY) survey program, a deep-learning-based search for unique sources using multicolored (grizy) imaging data from the Hyper Suprime-Cam Subaru Strategic Program (HSC-SSP). This program aims to detect unexpected, novel, and rare populations and phenomena, by utilizing the deep imaging data acquired from the wide-field coverage of the HSC-SSP. This article, as the first paper in the Swimmy series, describes an anomaly detection technique to select unique populations as “outliers” from the data-set. The model was tested with known extreme emission-line galaxies (XELGs) and quasars, which consequently confirmed that the proposed method successfully selected $\sim\!\! 60\%$–$70\%$ of the quasars and $60\%$ of the XELGs without labeled training data. In reference to the spectral information of local galaxies at z = 0.05–0.2 obtained from the Sloan Digital Sky Survey, we investigated the physical properties of the selected anomalies and compared them based on the significance of their outlier values. The results revealed that XELGs constitute notable fractions of the most anomalous galaxies, and certain galaxies manifest unique morphological features. In summary, deep anomaly detection is an effective tool that can search rare objects, and, ultimately, unknown unknowns with large data-sets. Further development of themore »