skip to main content

Title: Beyond the hubble sequence – exploring galaxy morphology with unsupervised machine learning
ABSTRACT We explore unsupervised machine learning for galaxy morphology analyses using a combination of feature extraction with a vector-quantized variational autoencoder (VQ-VAE) and hierarchical clustering (HC). We propose a new methodology that includes: (1) consideration of the clustering performance simultaneously when learning features from images; (2) allowing for various distance thresholds within the HC algorithm; (3) using the galaxy orientation to determine the number of clusters. This set-up provides 27 clusters created with this unsupervised learning that we show are well separated based on galaxy shape and structure (e.g. Sérsic index, concentration, asymmetry, Gini coefficient). These resulting clusters also correlate well with physical properties such as the colour–magnitude diagram, and span the range of scaling relations such as mass versus size amongst the different machine-defined clusters. When we merge these multiple clusters into two large preliminary clusters to provide a binary classification, an accuracy of $\sim 87{{\ \rm per\ cent}}$ is reached using an imbalanced data set, matching real galaxy distributions, which includes 22.7 per cent early-type galaxies and 77.3 per cent late-type galaxies. Comparing the given clusters with classic Hubble types (ellipticals, lenticulars, early spirals, late spirals, and irregulars), we show that there is an intrinsic vagueness in visual classification systems, in particular galaxies with transitional features such as lenticulars and early spirals. Based on this, the main result in this work is not how well our unsupervised method matches visual classifications and physical properties, but that the method provides an independent classification that may be more physically meaningful than any visually based ones.  more » « less
Award ID(s):
Author(s) / Creator(s):
; ; ; ; ;
Date Published:
Journal Name:
Monthly Notices of the Royal Astronomical Society
Medium: X
Sponsoring Org:
National Science Foundation
More Like this

    In this work, we explore the possibility of applying machine learning methods designed for 1D problems to the task of galaxy image classification. The algorithms used for image classification typically rely on multiple costly steps, such as the point spread function deconvolution and the training and application of complex Convolutional Neural Networks of thousands or even millions of parameters. In our approach, we extract features from the galaxy images by analysing the elliptical isophotes in their light distribution and collect the information in a sequence. The sequences obtained with this method present definite features allowing a direct distinction between galaxy types. Then, we train and classify the sequences with machine learning algorithms, designed through the platform Modulos AutoML. As a demonstration of this method, we use the second public release of the Dark Energy Survey (DES DR2). We show that we are able to successfully distinguish between early-type and late-type galaxies, for images with signal-to-noise ratio greater than 300. This yields an accuracy of $86{{\ \rm per\ cent}}$ for the early-type galaxies and $93{{\ \rm per\ cent}}$ for the late-type galaxies, which is on par with most contemporary automated image classification approaches. The data dimensionality reduction of our novel method implies a significant lowering in computational cost of classification. In the perspective of future data sets obtained with e.g. Euclid and the Vera Rubin Observatory, this work represents a path towards using a well-tested and widely used platform from industry in efficiently tackling galaxy classification problems at the peta-byte scale.

    more » « less

    Galaxy morphology is a fundamental quantity, which is essential not only for the full spectrum of galaxy-evolution studies, but also for a plethora of science in observational cosmology (e.g. as a prior for photometric-redshift measurements and as contextual data for transient light-curve classifications). While a rich literature exists on morphological-classification techniques, the unprecedented data volumes, coupled, in some cases, with the short cadences of forthcoming ‘Big-Data’ surveys (e.g. from the LSST), present novel challenges for this field. Large data volumes make such data sets intractable for visual inspection (even via massively distributed platforms like Galaxy Zoo), while short cadences make it difficult to employ techniques like supervised machine learning, since it may be impractical to repeatedly produce training sets on short time-scales. Unsupervised machine learning, which does not require training sets, is ideally suited to the morphological analysis of new and forthcoming surveys. Here, we employ an algorithm that performs clustering of graph representations, in order to group image patches with similar visual properties and objects constructed from those patches, like galaxies. We implement the algorithm on the Hyper-Suprime-Cam Subaru-Strategic-Program Ultra-Deep survey, to autonomously reduce the galaxy population to a small number (160) of ‘morphological clusters’, populated by galaxies with similar morphologies, which are then benchmarked using visual inspection. The morphological classifications (which we release publicly) exhibit a high level of purity, and reproduce known trends in key galaxy properties as a function of morphological type at z < 1 (e.g. stellar-mass functions, rest-frame colours, and the position of galaxies on the star-formation main sequence). Our study demonstrates the power of unsupervised machine learning in performing accurate morphological analysis, which will become indispensable in this new era of deep-wide surveys.

    more » « less

    As the earliest relics of star formation episodes of the Universe, the most massive galaxies are the key to our understanding of the stellar population, cosmic structure, and supermassive black hole (SMBH) evolution. However, the details of their formation histories remain uncertain. We address these problems by planning a large survey sample of 101 ultramassive galaxies (z ≤ 0.3, |δ + 24°| < 45°, |b| > 8°), including 76 per cent ellipticals, 17 per cent lenticulars, and 7 per cent spirals brighter than MK ≤ −27 mag (stellar mass 2 × 1012 ≲ M⋆ ≲ 5 × 1012 M⊙) with ELT/HARMONI. Our sample comprises diverse galaxy environments ranging from isolated to dense-cluster galaxies. The primary goals of the project are to (1) explore the stellar dynamics inside galaxy nuclei and weigh SMBHs, (2) constrain the black hole scaling relations at the highest mass, and (3) probe the late-time assembly of these most massive galaxies through the stellar population and kinematical gradients. We describe the survey, discuss the distinct demographics and environmental properties of the sample, and simulate their HARMONI Iz-, Iz + J-, and H + K-band observations by combining the inferred stellar-mass models from Pan-STARRS observations, an assumed synthetic spectrum of stars, and SMBHs with masses estimated based on different black hole scaling relations. Our simulations produce excellent state-of-the-art integral field spectrography and stellar kinematics (ΔVrms ≲ 1.5 per cent) in a relatively short exposure time. We use these stellar kinematics in combination with the Jeans anisotropic model to reconstruct the SMBH mass and its error using a Markov chain Monte Carlo simulation. Thus, these simulations and modellings can be benchmarks to evaluate the instrument models and pipelines dedicated to HARMONI to exploit the unprecedented capabilities of ELT.

    more » « less
  4. Abstract

    This paper presents a thousand passive spiral galaxy samples at z = 0.01–0.3 based on a combined analysis of the Third Public Data Release of the Hyper Suprime-Cam Subaru Strategic Program (HSC-SSP PDR3) and the GALEX–SDSS–WISE Legacy Catalog (GSWLC-2). Among 54871 gri galaxy cutouts taken from the HSC-SSP PDR3 over 1072 deg2, we conducted a search with deep-learning morphological classification for candidates of passive spirals below the star-forming main sequence derived by ultraviolet to mid-infrared spectral energy distribution fitting in the GSWLC-2. We then classified the candidates into 1100 passive spirals and 1141 secondary samples based on visual inspections. Most of the latter cases are considered to be passive ringed S0 or pseudo-ringed galaxies. The remaining secondary samples have ambiguous morphologies, including two peculiar objects with diamond-shaped stellar wings. The selected passive spirals have a similar distribution to the general quiescent galaxies on the EWHδ–Dn4000 diagram and concentration indices. Moreover, we detected an enhanced passive fraction of spiral galaxies in X-ray clusters. Passive spirals in galaxy clusters are preferentially located in the midterm or late infall phase on the phase–space diagram, supporting the ram pressure scenario, which has been widely advocated in previous studies. The source catalog and gri-composite images are available on the HSC-SSP PDR3 website 〈〉. Future updates, including integration with a citizen science project dedicated to the HSC data, will achieve more effective and comprehensive classifications.

    more » « less

    We compare the two largest galaxy morphology catalogues, which separate early- and late-type galaxies at intermediate redshift. The two catalogues were built by applying supervised deep learning (convolutional neural networks, CNNs) to the Dark Energy Survey data down to a magnitude limit of ∼21 mag. The methodologies used for the construction of the catalogues include differences such as the cutout sizes, the labels used for training, and the input to the CNN – monochromatic images versus gri-band normalized images. In addition, one catalogue is trained using bright galaxies observed with DES (i < 18), while the other is trained with bright galaxies (r < 17.5) and ‘emulated’ galaxies up to r-band magnitude 22.5. Despite the different approaches, the agreement between the two catalogues is excellent up to i < 19, demonstrating that CNN predictions are reliable for samples at least one magnitude fainter than the training sample limit. It also shows that morphological classifications based on monochromatic images are comparable to those based on gri-band images, at least in the bright regime. At fainter magnitudes, i > 19, the overall agreement is good (∼95 per cent), but is mostly driven by the large spiral fraction in the two catalogues. In contrast, the agreement within the elliptical population is not as good, especially at faint magnitudes. By studying the mismatched cases, we are able to identify lenticular galaxies (at least up to i < 19), which are difficult to distinguish using standard classification approaches. The synergy of both catalogues provides an unique opportunity to select a population of unusual galaxies.

    more » « less