skip to main content

Title: Galaxy Zoo DECaLS: Detailed visual morphology measurements from volunteers and deep learning for 314 000 galaxies
ABSTRACT We present Galaxy Zoo DECaLS: detailed visual morphological classifications for Dark Energy Camera Legacy Survey images of galaxies within the SDSS DR8 footprint. Deeper DECaLS images (r = 23.6 versus r = 22.2 from SDSS) reveal spiral arms, weak bars, and tidal features not previously visible in SDSS imaging. To best exploit the greater depth of DECaLS images, volunteers select from a new set of answers designed to improve our sensitivity to mergers and bars. Galaxy Zoo volunteers provide 7.5 million individual classifications over 314 000 galaxies. 140 000 galaxies receive at least 30 classifications, sufficient to accurately measure detailed morphology like bars, and the remainder receive approximately 5. All classifications are used to train an ensemble of Bayesian convolutional neural networks (a state-of-the-art deep learning method) to predict posteriors for the detailed morphology of all 314 000 galaxies. We use active learning to focus our volunteer effort on the galaxies which, if labelled, would be most informative for training our ensemble. When measured against confident volunteer classifications, the trained networks are approximately 99 per cent accurate on every question. Morphology is a fundamental feature of every galaxy; our human and machine classifications are an accurate and detailed resource for understanding how galaxies evolve.  more » « less
Award ID(s):
1835530 1835272
Author(s) / Creator(s):
; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ;
Date Published:
Journal Name:
Monthly Notices of the Royal Astronomical Society
Medium: X
Sponsoring Org:
National Science Foundation
More Like this

    We present detailed morphology measurements for 8.67 million galaxies in the DESI Legacy Imaging Surveys (DECaLS, MzLS, and BASS, plus DES). These are automated measurements made by deep learning models trained on Galaxy Zoo volunteer votes. Our models typically predict the fraction of volunteers selecting each answer to within 5–10 per cent for every answer to every GZ question. The models are trained on newly collected votes for DESI-LS DR8 images as well as historical votes from GZ DECaLS. We also release the newly collected votes. Extending our morphology measurements outside of the previously released DECaLS/SDSS intersection increases our sky coverage by a factor of 4 (5000–19 000 deg2) and allows for full overlap with complementary surveys including ALFALFA and MaNGA.

    more » « less
  2. ABSTRACT Astronomers have typically set out to solve supervised machine learning problems by creating their own representations from scratch. We show that deep learning models trained to answer every Galaxy Zoo DECaLS question learn meaningful semantic representations of galaxies that are useful for new tasks on which the models were never trained. We exploit these representations to outperform several recent approaches at practical tasks crucial for investigating large galaxy samples. The first task is identifying galaxies of similar morphology to a query galaxy. Given a single galaxy assigned a free text tag by humans (e.g. ‘#diffuse’), we can find galaxies matching that tag for most tags. The second task is identifying the most interesting anomalies to a particular researcher. Our approach is 100 per cent accurate at identifying the most interesting 100 anomalies (as judged by Galaxy Zoo 2 volunteers). The third task is adapting a model to solve a new task using only a small number of newly labelled galaxies. Models fine-tuned from our representation are better able to identify ring galaxies than models fine-tuned from terrestrial images (ImageNet) or trained from scratch. We solve each task with very few new labels; either one (for the similarity search) or several hundred (for anomaly detection or fine-tuning). This challenges the longstanding view that deep supervised methods require new large labelled data sets for practical use in astronomy. To help the community benefit from our pretrained models, we release our fine-tuning code zoobot. Zoobot is accessible to researchers with no prior experience in deep learning. 
    more » « less

    We use Bayesian convolutional neural networks and a novel generative model of Galaxy Zoo volunteer responses to infer posteriors for the visual morphology of galaxies. Bayesian CNN can learn from galaxy images with uncertain labels and then, for previously unlabelled galaxies, predict the probability of each possible label. Our posteriors are well-calibrated (e.g. for predicting bars, we achieve coverage errors of 11.8 per cent within a vote fraction deviation of 0.2) and hence are reliable for practical use. Further, using our posteriors, we apply the active learning strategy BALD to request volunteer responses for the subset of galaxies which, if labelled, would be most informative for training our network. We show that training our Bayesian CNNs using active learning requires up to 35–60 per cent fewer labelled galaxies, depending on the morphological feature being classified. By combining human and machine intelligence, Galaxy zoo will be able to classify surveys of any conceivable scale on a time-scale of weeks, providing massive and detailed morphology catalogues to support research into galaxy evolution.

    more » « less

    Studies of cosmology, galaxy evolution, and astronomical transients with current and next-generation wide-field imaging surveys like the Rubin Observatory Legacy Survey of Space and Time are all critically dependent on estimates of photometric redshifts. Capsule networks are a new type of neural network architecture that is better suited for identifying morphological features of the input images than traditional convolutional neural networks. We use a deep capsule network trained on ugriz images, spectroscopic redshifts, and Galaxy Zoo spiral/elliptical classifications of ∼400 000 Sloan Digital Sky Survey galaxies to do photometric redshift estimation. We achieve a photometric redshift prediction accuracy and a fraction of catastrophic outliers that are comparable to or better than current methods for SDSS main galaxy sample-like data sets (r ≤ 17.8 and zspec ≤ 0.4) while requiring less data and fewer trainable parameters. Furthermore, the decision-making of our capsule network is much more easily interpretable as capsules act as a low-dimensional encoding of the image. When the capsules are projected on a two-dimensional manifold, they form a single redshift sequence with the fraction of spirals in a region exhibiting a gradient roughly perpendicular to the redshift sequence. We perturb encodings of real galaxy images in this low-dimensional space to create synthetic galaxy images that demonstrate the image properties (e.g. size, orientation, and surface brightness) encoded by each dimension. We also measure correlations between galaxy properties (e.g. magnitudes, colours, and stellar mass) and each capsule dimension. We publicly release our code, estimated redshifts, and additional catalogues at

    more » « less

    We present a detailed visual morphological classification for the 4614 MaNGA galaxies in SDSS Data Release 15, using image mosaics generated from a combination of r band (SDSS and deeper DESI Legacy Surveys) images and their digital post-processing. We distinguish 13 Hubble types and identify the presence of bars and bright tidal debris. After correcting the MaNGA sample for volume completeness, we calculate the morphological fractions, the bi-variate distribution of type and stellar mass M* – where we recognize a morphological transition ‘valley’ around S0a-Sa types – and the variations of the g − i colour and luminosity-weighted age over this distribution. We identified bars in 46.8 per cent of galaxies, present in all Hubble types later than S0. This fraction amounts to a factor ∼2 larger when compared with other works for samples in common. We detected 14 per cent of galaxies with tidal features, with the fraction changing with M* and morphology. For 355 galaxies, the classification was uncertain; they are visually faint, mostly of low/intermediate masses, low concentrations, and discy in nature. Our morphological classification agrees well with other works for samples in common, though some particular differences emerge, showing that our image procedures allow us to identify a wealth of added value information as compared to SDSS-based previous estimates. Based on our classification, we also propose an alternative criteria for the E–S0 separation, in the structural semimajor to semiminor axis versus bulge to total light ratio (b/a − B/T) and concentration versus semimajor to semiminor axis (C − b/a) space.

    more » « less