skip to main content


Title: Automatic identification of outliers in Hubble Space Telescope galaxy images
ABSTRACT Rare extragalactic objects can carry substantial information about the past, present, and future universe. Given the size of astronomical data bases in the information era, it can be assumed that very many outlier galaxies are included in existing and future astronomical data bases. However, manual search for these objects is impractical due to the required labour, and therefore the ability to detect such objects largely depends on computer algorithms. This paper describes an unsupervised machine learning algorithm for automatic detection of outlier galaxy images, and its application to several Hubble Space Telescope fields. The algorithm does not require training, and therefore is not dependent on the preparation of clean training sets. The application of the algorithm to a large collection of galaxies detected a variety of outlier galaxy images. The algorithm is not perfect in the sense that not all objects detected by the algorithm are indeed considered outliers, but it reduces the data set by two orders of magnitude to allow practical manual identification. The catalogue contains 147 objects that would be very difficult to identify without using automation.  more » « less
Award ID(s):
1903823
NSF-PAR ID:
10268485
Author(s) / Creator(s):
Date Published:
Journal Name:
Monthly Notices of the Royal Astronomical Society
Volume:
501
Issue:
4
ISSN:
0035-8711
Page Range / eLocation ID:
5229 to 5238
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. ABSTRACT Strongly lensed quadruply imaged quasars (quads) are extraordinary objects. They are very rare in the sky and yet they provide unique information about a wide range of topics, including the expansion history and the composition of the Universe, the distribution of stars and dark matter in galaxies, the host galaxies of quasars, and the stellar initial mass function. Finding them in astronomical images is a classic ‘needle in a haystack’ problem, as they are outnumbered by other (contaminant) sources by many orders of magnitude. To solve this problem, we develop state-of-the-art deep learning methods and train them on realistic simulated quads based on real images of galaxies taken from the Dark Energy Survey, with realistic source and deflector models, including the chromatic effects of microlensing. The performance of the best methods on a mixture of simulated and real objects is excellent, yielding area under the receiver operating curve in the range of 0.86–0.89. Recall is close to 100 per cent down to total magnitude i ∼ 21 indicating high completeness, while precision declines from 85 per cent to 70 per cent in the range i ∼ 17–21. The methods are extremely fast: training on 2 million samples takes 20 h on a GPU machine, and 108 multiband cut-outs can be evaluated per GPU-hour. The speed and performance of the method pave the way to apply it to large samples of astronomical sources, bypassing the need for photometric pre-selection that is likely to be a major cause of incompleteness in current samples of known quads. 
    more » « less
  2. null (Ed.)

    Modern digital sky surveys utilize robotic telescopes that collect extremely large multi- PB astronomical databases. While these databases can contain billions of galaxies, most of the galaxies are “regular” galaxies of known galaxy types. However, a small portion of the galaxies is rare “peculiar” galaxies that are not yet known. These unknown galaxies are of paramount scientific interest, but due to the enormous size of astronomical databases they are practically impossible to find without automation. Since these novelty galaxies are, by definition, not known, machine learning models cannot be trained to detect them. In this paper, an unsupervised machine learning method for automatic detection of novelty galaxies in large databases is proposed. The method is based on a large and comprehensive set of numerical image content descriptors weighted by their entropy, and the farthest neighbors are ranked-ordered to handle self-similar peculiar galaxies that are expected in the very large datasets. Experimental results using data from the Panoramic Survey Telescope and Rapid Response System (Pan-STARRS) show that the ability of the method to detect novelty galaxies outperforms other shallow learning methods such as one-class SVM, Local Outlier Factor, and K-Means, and also newer deep learning-based methods such as auto-encoders. The dataset used to evaluate the method is publicly available and can be used as a benchmark to test future algorithms for automatic detection of peculiar galaxies.

     
    more » « less
  3. ABSTRACT A full ring is a form of galaxy morphology that is not associated with a specific stage on the Hubble sequence. Digital sky surveys can collect many millions of galaxy images, and therefore even rare forms of galaxies are expected to be present in relatively large numbers in image data bases created by digital sky surveys. Sloan Digital Sky Survey (SDSS) data release (DR) 14 contains ∼2.6 × 106 objects with spectra identified as galaxies. The method described in this paper applied automatic detection to identify a set of 443 ring galaxy candidates, 104 of them were already included in the Buta  + 17 catalogue of ring galaxies in SDSS, but the majority of the galaxies are not included in previous catalogues. Machine analysis cannot yet match the superior pattern recognition abilities of the human brain, and even a small false positive rate makes automatic analysis impractical when scanning through millions of galaxies. Reducing the false positive rate also increases the true negative rate, and therefore the catalogue of ring galaxy candidates is not exhaustive. However, due to its clear advantage in speed, it can provide a large collection of galaxies that can be used for follow-up observations of objects with ring morphology. 
    more » « less
  4. ABSTRACT

    This paper aims to quantify how the lowest halo mass that can be detected with galaxy-galaxy strong gravitational lensing depends on the quality of the observations and the characteristics of the observed lens systems. Using simulated data, we measure the lowest detectable NFW mass at each location of the lens plane, in the form of detailed sensitivity maps. In summary, we find that: (i) the lowest detectable mass Mlow decreases linearly as the signal-to-noise ratio (SNR) increases and the sensitive area is larger when we decrease the noise; (ii) a moderate increase in angular resolution (0.07″ versus 0.09″) and pixel scale (0.01″ versus 0.04″) improves the sensitivity by on average 0.25 dex in halo mass, with more significant improvement around the most sensitive regions; (iii) the sensitivity to low-mass objects is largest for bright and complex lensed galaxies located inside the caustic curves and lensed into larger Einstein rings (i.e rE ≥ 1.0″). We find that for the sensitive mock images considered in this work, the minimum mass that we can detect at the redshift of the lens lies between 1.5 × 108 and $3\times 10^{9}\, \mathrm{M}_{\odot }$. We derive analytic relations between Mlow, the SNR and resolution and discuss the impact of the lensing configuration and source structure. Our results start to fill the gap between approximate predictions and real data and demonstrate the challenging nature of calculating precise forecasts for gravitational imaging. In light of our findings, we discuss possible strategies for designing strong lensing surveys and the prospects for HST, Keck, ALMA, Euclid and other future observations.

     
    more » « less
  5. null (Ed.)
    Abstract We present morphological classifications of ∼27 million galaxies from the Dark Energy Survey (DES) Data Release 1 (DR1) using a supervised deep learning algorithm. The classification scheme separates: (a) early-type galaxies (ETGs) from late-types (LTGs), and (b) face-on galaxies from edge-on. Our Convolutional Neural Networks (CNNs) are trained on a small subset of DES objects with previously known classifications. These typically have mr ≲ 17.7mag; we model fainter objects to mr < 21.5 mag by simulating what the brighter objects with well determined classifications would look like if they were at higher redshifts. The CNNs reach 97% accuracy to mr < 21.5 on their training sets, suggesting that they are able to recover features more accurately than the human eye. We then used the trained CNNs to classify the vast majority of the other DES images. The final catalog comprises five independent CNN predictions for each classification scheme, helping to determine if the CNN predictions are robust or not. We obtain secure classifications for ∼ 87% and 73% of the catalog for the ETG vs. LTG and edge-on vs. face-on models, respectively. Combining the two classifications (a) and (b) helps to increase the purity of the ETG sample and to identify edge-on lenticular galaxies (as ETGs with high ellipticity). Where a comparison is possible, our classifications correlate very well with Sérsic index (n), ellipticity (ε) and spectral type, even for the fainter galaxies. This is the largest multi-band catalog of automated galaxy morphologies to date. 
    more » « less