skip to main content


Title: Automating galaxy morphology classification using k -nearest neighbours and non-parametric statistics
ABSTRACT

Morphology is a fundamental property of any galaxy population. It is a major indicator of the physical processes that drive galaxy evolution and in turn the evolution of the entire Universe. Historically, galaxy images were visually classified by trained experts. However, in the era of big data, more efficient techniques are required. In this work, we present a k-nearest neighbours based approach that utilizes non-parametric morphological quantities to classify galaxy morphology in Sloan Digital Sky Survey images. Most previous studies used only a handful of morphological parameters to identify galaxy types. In contrast, we explore 1023 morphological spaces (defined by up to 10 non-parametric statistics) to find the best combination of morphological parameters. Additionally, while most previous studies broadly classified galaxies into early types and late types or ellipticals, spirals, and irregular galaxies, we classify galaxies into 11 morphological types with an average accuracy of ${\sim} 80\!-\!90 \, {{\rm per\, cent}}$ per T-type. Our method is simple, easy to implement, and is robust to varying sizes and compositions of the training and test samples. Preliminary results on the performance of our technique on deeper images from the Hyper Suprime-Cam Subaru Strategic Survey reveal that an extension of our method to modern surveys with better imaging capabilities might be possible.

 
more » « less
NSF-PAR ID:
10532121
Author(s) / Creator(s):
; ; ;
Publisher / Repository:
Oxford University Press
Date Published:
Journal Name:
Monthly Notices of the Royal Astronomical Society
Volume:
533
Issue:
1
ISSN:
0035-8711
Format(s):
Medium: X Size: p. 292-312
Size(s):
p. 292-312
Sponsoring Org:
National Science Foundation
More Like this
  1. ABSTRACT

    We use Bayesian convolutional neural networks and a novel generative model of Galaxy Zoo volunteer responses to infer posteriors for the visual morphology of galaxies. Bayesian CNN can learn from galaxy images with uncertain labels and then, for previously unlabelled galaxies, predict the probability of each possible label. Our posteriors are well-calibrated (e.g. for predicting bars, we achieve coverage errors of 11.8 per cent within a vote fraction deviation of 0.2) and hence are reliable for practical use. Further, using our posteriors, we apply the active learning strategy BALD to request volunteer responses for the subset of galaxies which, if labelled, would be most informative for training our network. We show that training our Bayesian CNNs using active learning requires up to 35–60 per cent fewer labelled galaxies, depending on the morphological feature being classified. By combining human and machine intelligence, Galaxy zoo will be able to classify surveys of any conceivable scale on a time-scale of weeks, providing massive and detailed morphology catalogues to support research into galaxy evolution.

     
    more » « less
  2. ABSTRACT We present Galaxy Zoo DECaLS: detailed visual morphological classifications for Dark Energy Camera Legacy Survey images of galaxies within the SDSS DR8 footprint. Deeper DECaLS images (r = 23.6 versus r = 22.2 from SDSS) reveal spiral arms, weak bars, and tidal features not previously visible in SDSS imaging. To best exploit the greater depth of DECaLS images, volunteers select from a new set of answers designed to improve our sensitivity to mergers and bars. Galaxy Zoo volunteers provide 7.5 million individual classifications over 314 000 galaxies. 140 000 galaxies receive at least 30 classifications, sufficient to accurately measure detailed morphology like bars, and the remainder receive approximately 5. All classifications are used to train an ensemble of Bayesian convolutional neural networks (a state-of-the-art deep learning method) to predict posteriors for the detailed morphology of all 314 000 galaxies. We use active learning to focus our volunteer effort on the galaxies which, if labelled, would be most informative for training our ensemble. When measured against confident volunteer classifications, the trained networks are approximately 99 per cent accurate on every question. Morphology is a fundamental feature of every galaxy; our human and machine classifications are an accurate and detailed resource for understanding how galaxies evolve. 
    more » « less
  3. ABSTRACT

    In this work, we explore the possibility of applying machine learning methods designed for 1D problems to the task of galaxy image classification. The algorithms used for image classification typically rely on multiple costly steps, such as the point spread function deconvolution and the training and application of complex Convolutional Neural Networks of thousands or even millions of parameters. In our approach, we extract features from the galaxy images by analysing the elliptical isophotes in their light distribution and collect the information in a sequence. The sequences obtained with this method present definite features allowing a direct distinction between galaxy types. Then, we train and classify the sequences with machine learning algorithms, designed through the platform Modulos AutoML. As a demonstration of this method, we use the second public release of the Dark Energy Survey (DES DR2). We show that we are able to successfully distinguish between early-type and late-type galaxies, for images with signal-to-noise ratio greater than 300. This yields an accuracy of $86{{\ \rm per\ cent}}$ for the early-type galaxies and $93{{\ \rm per\ cent}}$ for the late-type galaxies, which is on par with most contemporary automated image classification approaches. The data dimensionality reduction of our novel method implies a significant lowering in computational cost of classification. In the perspective of future data sets obtained with e.g. Euclid and the Vera Rubin Observatory, this work represents a path towards using a well-tested and widely used platform from industry in efficiently tackling galaxy classification problems at the peta-byte scale.

     
    more » « less
  4. Spiral galaxies can spin clockwise or counterclockwise, and the spin direction of a spiral galaxy is a clear visual characteristic. Since in a sufficiently large universe the Universe is expected to be symmetric, the spin direction of a galaxy is merely the perception of the observer, and therefore, galaxies that spin clockwise are expected to have the same characteristics of galaxies spinning counterclockwise. Here, machine learning is applied to study the possible morphological differences between galaxies that spin in opposite directions. The dataset used in this study is a dataset of 77,840 spiral galaxies classified by their spin direction, as well as a smaller dataset of galaxies classified manually. A machine learning algorithm was applied to classify between images of clockwise galaxies and counterclockwise galaxies. The results show that the classifier was able to predict the spin direction of the galaxy by its image in accuracy higher than mere chance, even when the images in one of the classes were mirrored to create a dataset with consistent spin directions. That suggests that galaxies that seem to spin clockwise to an Earth-based observer are not necessarily fully symmetric to galaxies that spin counterclockwise; while further research is required, these results are aligned with previous observations of differences between galaxies based on their spin directions. 
    more » « less
  5. ABSTRACT

    Post-starburst galaxies (PSBs) are defined as having experienced a recent burst of star formation, followed by a prompt truncation in further activity. Identifying the mechanism(s) causing a galaxy to experience a post-starburst phase therefore provides integral insight into the causes of rapid quenching. Galaxy mergers have long been proposed as a possible post-starburst trigger. Effectively testing this hypothesis requires a large spectroscopic galaxy survey to identify the rare PSBs as well as high-quality imaging and robust morphology metrics to identify mergers. We bring together these critical elements by selecting PSBs from the overlap of the Sloan Digital Sky Survey and the Canada–France Imaging Survey and applying a suite of classification methods: non-parametric morphology metrics such as asymmetry and Gini-M20, a convolutional neural network trained to identify post-merger galaxies, and visual classification. This work is therefore the largest and most comprehensive assessment of the merger fraction of PSBs to date. We find that the merger fraction of PSBs ranges from 19 per cent to 42 per cent depending on the merger identification method and details of the PSB sample selection. These merger fractions represent an excess of 3–46× relative to non-PSB control samples. Our results demonstrate that mergers play a significant role in generating PSBs, but that other mechanisms are also required. However, applying our merger identification metrics to known post-mergers in the IllustrisTNG simulation shows that 70 per cent of recent post-mergers (≲200 Myr) would not be detected. Thus, we cannot exclude the possibility that nearly all PSBs have undergone a merger in their recent past.

     
    more » « less