skip to main content


Title: Deep learning methods for obtaining photometric redshift estimations from images
ABSTRACT

Knowing the redshift of galaxies is one of the first requirements of many cosmological experiments, and as it is impossible to perform spectroscopy for every galaxy being observed, photometric redshift (photo-z) estimations are still of particular interest. Here, we investigate different deep learning methods for obtaining photo-z estimates directly from images, comparing these with ‘traditional’ machine learning algorithms which make use of magnitudes retrieved through photometry. As well as testing a convolutional neural network (CNN) and inception-module CNN, we introduce a novel mixed-input model that allows for both images and magnitude data to be used in the same model as a way of further improving the estimated redshifts. We also perform benchmarking as a way of demonstrating the performance and scalability of the different algorithms. The data used in the study comes entirely from the Sloan Digital Sky Survey (SDSS) from which 1 million galaxies were used, each having 5-filtre (ugriz) images with complete photometry and a spectroscopic redshift which was taken as the ground truth. The mixed-input inception CNN achieved a mean squared error (MSE) =0.009, which was a significant improvement ($30{{\ \rm per\ cent}}$) over the traditional random forest (RF), and the model performed even better at lower redshifts achieving a MSE = 0.0007 (a $50{{\ \rm per\ cent}}$ improvement over the RF) in the range of z < 0.3. This method could be hugely beneficial to upcoming surveys, such as Euclid and the Vera C. Rubin Observatory’s Legacy Survey of Space and Time (LSST), which will require vast numbers of photo-z estimates produced as quickly and accurately as possible.

 
more » « less
NSF-PAR ID:
10364226
Author(s) / Creator(s):
; ; ; ;
Publisher / Repository:
Oxford University Press
Date Published:
Journal Name:
Monthly Notices of the Royal Astronomical Society
Volume:
512
Issue:
2
ISSN:
0035-8711
Page Range / eLocation ID:
p. 1696-1709
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract

    A reliable estimate of the redshift distributionn(z) is crucial for using weak gravitational lensing and large-scale structures of galaxy catalogs to study cosmology. Spectroscopic redshifts for the dim and numerous galaxies of next-generation weak-lensing surveys are expected to be unavailable, making photometric redshift (photo-z) probability density functions (PDFs) the next best alternative for comprehensively encapsulating the nontrivial systematics affecting photo-zpoint estimation. The established stacked estimator ofn(z) avoids reducing photo-zPDFs to point estimates but yields a systematically biased estimate ofn(z) that worsens with a decreasing signal-to-noise ratio, the very regime where photo-zPDFs are most necessary. We introduce Cosmological Hierarchical Inference with Probabilistic Photometric Redshifts (CHIPPR), a statistically rigorous probabilistic graphical model of redshift-dependent photometry that correctly propagates the redshift uncertainty information beyond the best-fit estimator ofn(z) produced by traditional procedures and is provably the only self-consistent way to recovern(z) from photo-zPDFs. We present thechipprprototype code, noting that the mathematically justifiable approach incurs computational cost. TheCHIPPRapproach is applicable to any one-point statistic of any random variable, provided the prior probability density used to produce the posteriors is explicitly known; if the prior is implicit, as may be the case for popular photo-ztechniques, then the resulting posterior PDFs cannot be used for scientific inference. We therefore recommend that the photo-zcommunity focus on developing methodologies that enable the recovery of photo-zlikelihoods with support over all redshifts, either directly or via a known prior probability density.

     
    more » « less
  2. ABSTRACT We introduce a probabilistic approach to select 6 ≤ $z$ ≤ 8 quasar candidates for spectroscopic follow-up, which is based on density estimation in the high-dimensional space inhabited by the optical and near-infrared photometry. Densities are modelled as Gaussian mixtures with principled accounting of errors using the extreme deconvolution (XD) technique, generalizing an approach successfully used to select lower redshift ($z$ ≤ 3) quasars. We train the probability density of contaminants on 1902 071 7-d flux measurements from the 1076 deg2 overlapping area from the Dark Energy Camera Legacy Survey (DECaLS) ($z$), VIKING (YJHKs), and unWISE (W1W2) imaging surveys, after requiring they dropout of DECaLS g and r, whereas the distribution of high-$z$ quasars are trained on synthetic model photometry. Extensive simulations based on these density distributions and current estimates of the quasar luminosity function indicate that this method achieves a completeness of $\ge 56{{\ \rm per\ cent}}$ and an efficiency of $\ge 5{{\ \rm per\ cent}}$ for selecting quasars at 6 < $z$ < 8 with JAB < 21.5. Among the classified sources are 8 known 6 < $z$ < 7 quasars, of which 2/8 are selected suggesting a completeness $\simeq 25{{\ \rm per\ cent}}$, whereas classifying the 6 known (JAB < 21.5) quasars at $z$ > 7 from the entire sky, we select 5/6 or a completeness of $\simeq 80{{\ \rm per\ cent}}$. The failure to select the majority of 6 < $z$ < 7 quasars arises because our quasar density model is based on an empirical quasar spectral energy distribution model that underestimates the scatter in the distribution of fluxes. This new approach to quasar selection paves the way for efficient spectroscopic follow-up of Euclid quasar candidates with ground-based telescopes and James Webb Space Telescope. 
    more » « less
  3. ABSTRACT

    We present a cosmic density field reconstruction method that augments the traditional reconstruction algorithms with a convolutional neural network (CNN). Following previous work, the key component of our method is to use the reconstructed density field as the input to the neural network. We extend this previous work by exploring how the performance of these reconstruction ideas depends on the input reconstruction algorithm, the reconstruction parameters, and the shot noise of the density field, as well as the robustness of the method. We build an eight-layer CNN and train the network with reconstructed density fields computed from the Quijote suite of simulations. The reconstructed density fields are generated by both the standard algorithm and a new iterative algorithm. In real space at z = 0, we find that the reconstructed field is 90 per cent correlated with the true initial density out to $k\sim 0.5 \, \mathrm{ h}\, \rm {Mpc}^{-1}$, a significant improvement over $k\sim 0.2 \, \mathrm{ h}\, \rm {Mpc}^{-1}$ achieved by the input reconstruction algorithms. We find similar improvements in redshift space, including an improved removal of redshift space distortions at small scales. We also find that the method is robust across changes in cosmology. Additionally, the CNN removes much of the variance from the choice of different reconstruction algorithms and reconstruction parameters. However, the effectiveness decreases with increasing shot noise, suggesting that such an approach is best suited to high density samples. This work highlights the additional information in the density field beyond linear scales as well as the power of complementing traditional analysis approaches with machine learning techniques.

     
    more » « less
  4. ABSTRACT

    We use two independent galaxy-formation simulations, flares, a cosmological hydrodynamical simulation, and shark, a semi-analytic model, to explore how well the JWST will be able to uncover the existence and parameters of the star-forming main sequence (SFS) at z = 5 → 10, i.e. shape, scatter, normalization. Using two independent simulations allows us to isolate predictions (e.g. stellar mass, star formation rate, SFR, luminosity functions) that are robust to or highly dependent on the implementation of the physics of galaxy formation. Both simulations predict that JWST can observe ≥70–90 per cent (for shark and flares, respectively) of galaxies up to z ∼ 10 (down to stellar masses of ${\approx}10^{8.3}\rm M_{\odot }$ and SFRs of ${\approx}10^{0.5}{\rm M}_{\odot }\,{\rm yr}^{-1}$) in modest integration times and given current proposed survey areas (e.g. the Web COSMOS 0.6 deg2) to accurately constrain the parameters of the SFS. Although both simulations predict qualitatively similar distributions of stellar mass and SFR. There are important quantitative differences, such as the abundance of massive, star-forming galaxies with flares predicting a higher abundance than shark; the early onset of quenching as a result of black hole growth in flares (at z ≈ 8), not seen in shark until much lower redshifts; and the implementation of synthetic photometry with flares predicting more JWST-detected galaxies (∼90 per cent) than shark (∼70 per cent) at z = 10. JWST observations will distinguish between these models, leading to a significant improvement upon our understanding of the formation of the very first galaxies.

     
    more » « less
  5. ABSTRACT

    We present a mock image catalogue of ∼100 000 MUV ≃ −22.5 to −19.6 mag galaxies at z = 7–12 from the bluetides cosmological simulation. We create mock images of each galaxy with the James Webb Space Telescope (JWST), Hubble, Roman, and Euclid Space Telescopes, as well as Subaru, and VISTA, with a range of near- and mid-infrared filters. We perform photometry on the mock images to estimate the success of these instruments for detecting high-z galaxies. We predict that JWST will have unprecedented power in detecting high-z galaxies, with a 95 per cent completeness limit at least 2.5 mag fainter than VISTA and Subaru, 1.1 mag fainter than Hubble, and 0.9 mag fainter than Roman, for the same wavelength and exposure time. Focusing on JWST, we consider a range of exposure times and filters, and find that the NIRCam F356W and F277W filters will detect the faintest galaxies, with 95 per cent completeness at m ≃ 27.4 mag in 10-ks exposures. We also predict the number of high-z galaxies that will be discovered by upcoming JWST imaging surveys. We predict that the COSMOS-Web survey will detect ∼1000 M1500 Å < −20.1 mag galaxies at 6.5 < z < 7.5, by virtue of its large survey area. JADES-Medium will detect almost $100{{\ \rm per\ cent}}$ of M1500 Å ≲ −20 mag galaxies at z < 8.5 due to its significant depth, however, with its smaller survey area it will detect only ∼100 of these galaxies at 6.5 < z < 7.5. Cosmic variance results in a large range in the number of predicted galaxies each survey will detect, which is more evident in smaller surveys such as CEERS and the PEARLS NEP and GOODS-S fields.

     
    more » « less