skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Detection of unknown galaxy types in large databases of galaxy images
Modern digital sky surveys utilize robotic telescopes that collect extremely large multi- PB astronomical databases. While these databases can contain billions of galaxies, most of the galaxies are “regular” galaxies of known galaxy types. However, a small portion of the galaxies is rare “peculiar” galaxies that are not yet known. These unknown galaxies are of paramount scientific interest, but due to the enormous size of astronomical databases they are practically impossible to find without automation. Since these novelty galaxies are, by definition, not known, machine learning models cannot be trained to detect them. In this paper, an unsupervised machine learning method for automatic detection of novelty galaxies in large databases is proposed. The method is based on a large and comprehensive set of numerical image content descriptors weighted by their entropy, and the farthest neighbors are ranked-ordered to handle self-similar peculiar galaxies that are expected in the very large datasets. Experimental results using data from the Panoramic Survey Telescope and Rapid Response System (Pan-STARRS) show that the ability of the method to detect novelty galaxies outperforms other shallow learning methods such as one-class SVM, Local Outlier Factor, and K-Means, and also newer deep learning-based methods such as auto-encoders. The dataset used to evaluate the method is publicly available and can be used as a benchmark to test future algorithms for automatic detection of peculiar galaxies.  more » « less
Award ID(s):
1903823
PAR ID:
10268489
Author(s) / Creator(s):
; ;
Date Published:
Journal Name:
EPiC Series in Computing
Volume:
76
ISSN:
2398-7340
Page Range / eLocation ID:
29 to 18
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. null (Ed.)
    Galaxy images of the order of multi-PB are collected as part of modern digital sky surveys using robotic telescopes. While there is a plethora of imaging data available, the majority of the images that are captured resemble galaxies that are “regular”, i.e., galaxy types that are already known and probed. However, “novelty" galaxy types, i.e., little-known galaxy types are encountered on occasion. The astronomy community shows paramount interest in the novelty galaxy types since they contain the potential for scientific discovery. However, since these galaxies are rare, the identification of such novelty galaxies is not trivial and requires automation techniques. Since these novelty galaxies are by definition, not known, supervised machine learning models cannot be trained to detect them. In this paper, an unsupervised machine learning method for automatic detection of novelty galaxies in large databases is proposed. The method uses a large set of image features weighted by their entropy. To handle the impact of self-similar novelty galaxies, the most similar galaxies are ranked-ordered. In addition, Bag of Visual Words (BOVW) is assimilated to the problem of detecting novelty galaxies. Each image in the dataset is represented as a set of features made up of key-points and descriptors. A histogram of the features is constructed and is leveraged to identify the neighbors of each of the images. Experimental results using data from the Panoramic Survey Telescope and Rapid Response System (Pan-STARRS) show that the performance of the methods in detecting novelty galaxies is superior to other shallow learning methods such as one-class SVM, Local Outlier Factor, and K-Means, and also newer deep learning-based methods such as auto-encoders. The dataset used to evaluate the method is publicly available and can be used as a benchmark to test future algorithms for automatic detection of peculiar galaxies. 
    more » « less
  2. null (Ed.)
    ABSTRACT Rare extragalactic objects can carry substantial information about the past, present, and future universe. Given the size of astronomical data bases in the information era, it can be assumed that very many outlier galaxies are included in existing and future astronomical data bases. However, manual search for these objects is impractical due to the required labour, and therefore the ability to detect such objects largely depends on computer algorithms. This paper describes an unsupervised machine learning algorithm for automatic detection of outlier galaxy images, and its application to several Hubble Space Telescope fields. The algorithm does not require training, and therefore is not dependent on the preparation of clean training sets. The application of the algorithm to a large collection of galaxies detected a variety of outlier galaxy images. The algorithm is not perfect in the sense that not all objects detected by the algorithm are indeed considered outliers, but it reduces the data set by two orders of magnitude to allow practical manual identification. The catalogue contains 147 objects that would be very difficult to identify without using automation. 
    more » « less
  3. ABSTRACT A full ring is a form of galaxy morphology that is not associated with a specific stage on the Hubble sequence. Digital sky surveys can collect many millions of galaxy images, and therefore even rare forms of galaxies are expected to be present in relatively large numbers in image data bases created by digital sky surveys. Sloan Digital Sky Survey (SDSS) data release (DR) 14 contains ∼2.6 × 106 objects with spectra identified as galaxies. The method described in this paper applied automatic detection to identify a set of 443 ring galaxy candidates, 104 of them were already included in the Buta  + 17 catalogue of ring galaxies in SDSS, but the majority of the galaxies are not included in previous catalogues. Machine analysis cannot yet match the superior pattern recognition abilities of the human brain, and even a small false positive rate makes automatic analysis impractical when scanning through millions of galaxies. Reducing the false positive rate also increases the true negative rate, and therefore the catalogue of ring galaxy candidates is not exhaustive. However, due to its clear advantage in speed, it can provide a large collection of galaxies that can be used for follow-up observations of objects with ring morphology. 
    more » « less
  4. The Arecibo Pisces-Perseus Supercluster Survey (APPSS) attempts to detect the infall of galaxies onto the Pisces-Perseus Supercluster (PPS). The ALFALFA survey has greatly augmented the known redshifts across the region. APPSS sources will complement the ALFALFA sources, with the goal of building a large enough sample to make a high confidence measurement of infall and backflow onto the PSS filament via peculiar velocity estimates from the Tully-Fisher (TFR) and Baryonic Tully-Fisher (BTFR) relations. APPSS galaxies are selected using photometric data from the Sloan Digital Sky Survey (SDSS), aimed to detect low-mass, nearby gas-rich objects below the ALFALFA detection limit. The L-band wide receiver at Arecibo Observatory in Puerto Rico is used to obtain a five-minute ON-OFF measurement for each galaxy. Since the candidate galaxy redshifts are unknown, the receiver and spectrograph system are used in a search mode that spans the expected frequencies of HI emission from PPS galaxies. We will describe the goals, target selection, and data reduction process for the survey. Our collaboration has divided the PPS into two-degree wide declination strips for data reduction; we report preliminary results for strips 23 and 33. We have made the initial data reduction on more than 200 targets, and determined the systemic velocity, line width, integrated flux density, and HI mass for each candidate detection. We will compare results on our two declination strips, and point out interesting detections found along the way as examples of the data reduction process. This work has been supported by NSF grants AST-1211005 and AST-1637339. Publication: American Astronomical Society, AAS Meeting #233, id.356.07 Pub Date: January 2019 Bibcode: 2019AAS...23335607L 
    more » « less
  5. Frey, Sandor (Ed.)
    The ability to collect unprecedented amounts of astronomical data has enabled the nomical data has enabled the stu scientific questions that were impractical to study in the pre-information era. This study uses large datasets collected by four different robotic telescopes to profile the large-scale distribution of the spin directions of spiral galaxies. These datasets cover the Northern and Southern hemispheres, in addition to data acquired from space by the Hubble Space Telescope. The data were annotated automatically by a fully symmetric algorithm, as well as manually through a long labor-intensive process, leading to a dataset of nearly 10^6 galaxies. The data show possible patterns of asymmetric distribution of the spin directions, and the patterns agree between the different telescopes. The profiles also agree when using automatic or manual annotation of the galaxies, showing very similar large-scale patterns. Combining all data from all telescopes allows the most comprehensive analysis of its kind to date in terms of both the number of galaxies and the footprint size. The results show a statistically significant profile that is consistent across all telescopes. The instruments used in this study are DECam, HST, SDSS, and Pan-STARRS. The paper also discusses possible sources of bias and analyzes the design of previous work that showed different results. Further research will be required to understand and validate these preliminary observations. 
    more » « less