Beyond the hubble sequence – exploring galaxy morphology with unsupervised machine learning
ABSTRACT We explore unsupervised machine learning for galaxy morphology analyses using a combination of feature extraction with a vector-quantized variational autoencoder (VQ-VAE) and hierarchical clustering (HC). We propose a new methodology that includes: (1) consideration of the clustering performance simultaneously when learning features from images; (2) allowing for various distance thresholds within the HC algorithm; (3) using the galaxy orientation to determine the number of clusters. This set-up provides 27 clusters created with this unsupervised learning that we show are well separated based on galaxy shape and structure (e.g. Sérsic index, concentration, asymmetry, Gini coefficient). These resulting clusters also correlate well with physical properties such as the colour–magnitude diagram, and span the range of scaling relations such as mass versus size amongst the different machine-defined clusters. When we merge these multiple clusters into two large preliminary clusters to provide a binary classification, an accuracy of $\sim 87{{\ \rm per\ cent}}$ is reached using an imbalanced data set, matching real galaxy distributions, which includes 22.7 per cent early-type galaxies and 77.3 per cent late-type galaxies. Comparing the given clusters with classic Hubble types (ellipticals, lenticulars, early spirals, late spirals, and irregulars), we show that there is an intrinsic vagueness in visual classification systems, in particular more »
Authors:
; ; ; ; ;
Award ID(s):
Publication Date:
NSF-PAR ID:
10313054
Journal Name:
Monthly Notices of the Royal Astronomical Society
Volume:
503
Issue:
3
ISSN:
0035-8711
National Science Foundation
##### More Like this
1. ABSTRACT

In this work, we explore the possibility of applying machine learning methods designed for 1D problems to the task of galaxy image classification. The algorithms used for image classification typically rely on multiple costly steps, such as the point spread function deconvolution and the training and application of complex Convolutional Neural Networks of thousands or even millions of parameters. In our approach, we extract features from the galaxy images by analysing the elliptical isophotes in their light distribution and collect the information in a sequence. The sequences obtained with this method present definite features allowing a direct distinction between galaxy types. Then, we train and classify the sequences with machine learning algorithms, designed through the platform Modulos AutoML. As a demonstration of this method, we use the second public release of the Dark Energy Survey (DES DR2). We show that we are able to successfully distinguish between early-type and late-type galaxies, for images with signal-to-noise ratio greater than 300. This yields an accuracy of $86{{\ \rm per\ cent}}$ for the early-type galaxies and $93{{\ \rm per\ cent}}$ for the late-type galaxies, which is on par with most contemporary automated image classification approaches. The data dimensionality reduction of our novelmore »

2. ABSTRACT

Galaxy morphology is a fundamental quantity, which is essential not only for the full spectrum of galaxy-evolution studies, but also for a plethora of science in observational cosmology (e.g. as a prior for photometric-redshift measurements and as contextual data for transient light-curve classifications). While a rich literature exists on morphological-classification techniques, the unprecedented data volumes, coupled, in some cases, with the short cadences of forthcoming ‘Big-Data’ surveys (e.g. from the LSST), present novel challenges for this field. Large data volumes make such data sets intractable for visual inspection (even via massively distributed platforms like Galaxy Zoo), while short cadences make it difficult to employ techniques like supervised machine learning, since it may be impractical to repeatedly produce training sets on short time-scales. Unsupervised machine learning, which does not require training sets, is ideally suited to the morphological analysis of new and forthcoming surveys. Here, we employ an algorithm that performs clustering of graph representations, in order to group image patches with similar visual properties and objects constructed from those patches, like galaxies. We implement the algorithm on the Hyper-Suprime-Cam Subaru-Strategic-Program Ultra-Deep survey, to autonomously reduce the galaxy population to a small number (160) of ‘morphological clusters’, populated by galaxiesmore »

3. ABSTRACT

Misalignments between the rotation axis of stars and gas are an indication of external processes shaping galaxies throughout their evolution. Using observations of 3068 galaxies from the SAMI Galaxy Survey, we compute global kinematic position angles for 1445 objects with reliable kinematics and identify 169 (12 per cent) galaxies which show stellar-gas misalignments. Kinematically decoupled features are more prevalent in early-type/passive galaxies compared to late-type/star-forming systems. Star formation is the main source of gas ionization in only 22 per cent of misaligned galaxies; 17 per cent are Seyfert objects, while 61 per cent show Low-Ionization Nuclear Emission-line Region features. We identify the most probable physical cause of the kinematic decoupling and find that, while accretion-driven cases are dominant, for up to 8 per cent of our sample, the misalignment may be tracing outflowing gas. When considering only misalignments driven by accretion, the acquired gas is feeding active star formation in only ∼1/4 of cases. As a population, misaligned galaxies have higher Sérsic indices and lower stellar spin and specific star formation rates than appropriately matched samples of aligned systems. These results suggest that both morphology and star formation/gas content are significantly correlated with the prevalence and timescales of misalignments. Specifically, torques on misaligned gas discs are smaller for more centrallymore »

4. Introduction: Vaso-occlusive crises (VOCs) are a leading cause of morbidity and early mortality in individuals with sickle cell disease (SCD). These crises are triggered by sickle red blood cell (sRBC) aggregation in blood vessels and are influenced by factors such as enhanced sRBC and white blood cell (WBC) adhesion to inflamed endothelium. Advances in microfluidic biomarker assays (i.e., SCD Biochip systems) have led to clinical studies of blood cell adhesion onto endothelial proteins, including, fibronectin, laminin, P-selectin, ICAM-1, functionalized in microchannels. These microfluidic assays allow mimicking the physiological aspects of human microvasculature and help characterize biomechanical properties of adhered sRBCs under flow. However, analysis of the microfluidic biomarker assay data has so far relied on manual cell counting and exhaustive visual morphological characterization of cells by trained personnel. Integrating deep learning algorithms with microscopic imaging of adhesion protein functionalized microfluidic channels can accelerate and standardize accurate classification of blood cells in microfluidic biomarker assays. Here we present a deep learning approach into a general-purpose analytical tool covering a wide range of conditions: channels functionalized with different proteins (laminin or P-selectin), with varying degrees of adhesion by both sRBCs and WBCs, and in both normoxic and hypoxic environments. Methods: Our neuralmore »
5. Galaxy images of the order of multi-PB are collected as part of modern digital sky surveys using robotic telescopes. While there is a plethora of imaging data available, the majority of the images that are captured resemble galaxies that are “regular”, i.e., galaxy types that are already known and probed. However, “novelty" galaxy types, i.e., little-known galaxy types are encountered on occasion. The astronomy community shows paramount interest in the novelty galaxy types since they contain the potential for scientific discovery. However, since these galaxies are rare, the identification of such novelty galaxies is not trivial and requires automation techniques. Since these novelty galaxies are by definition, not known, supervised machine learning models cannot be trained to detect them. In this paper, an unsupervised machine learning method for automatic detection of novelty galaxies in large databases is proposed. The method uses a large set of image features weighted by their entropy. To handle the impact of self-similar novelty galaxies, the most similar galaxies are ranked-ordered. In addition, Bag of Visual Words (BOVW) is assimilated to the problem of detecting novelty galaxies. Each image in the dataset is represented as a set of features made up of key-points and descriptors. Amore »