skip to main content


Title: Re-calibrating Photometric Redshift Probability Distributions Using Feature-space Regression
Many astrophysical analyses depend on estimates of redshifts (a proxy for distance) determined from photometric (i.e., imaging) data alone. Inaccurate estimates of photometric redshift uncertainties can result in large systematic errors. However, probability distribution outputs from many photometric redshift methods do not follow the frequentist definition of a Probability Density Function (PDF) for redshift -- i.e., the fraction of times the true redshift falls between two limits z1 and z2 should be equal to the integral of the PDF between these limits. Previous works have used the global distribution of Probability Integral Transform (PIT) values to re-calibrate PDFs, but offsetting inaccuracies in different regions of feature space can conspire to limit the efficacy of the method. We leverage a recently developed regression technique that characterizes the local PIT distribution at any location in feature space to perform a local re-calibration of photometric redshift PDFs. Though we focus on an example from astrophysics, our method can produce PDFs which are calibrated at all locations in feature space for any use case.  more » « less
Award ID(s):
2009251
NSF-PAR ID:
10332320
Author(s) / Creator(s):
; ; ; ; ; ; ;
Date Published:
Journal Name:
Advances in neural information processing systems
ISSN:
1049-5258
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Many astrophysical analyses depend on estimates of redshifts (a proxy for distance) determined from photometric (i.e., imaging) data alone. Inaccurate estimates of photometric redshift uncertainties can result in large systematic errors. However, probability distribution outputs from many photometric redshift methods do not follow the frequentist definition of a Probability Density Function (PDF) for redshift — i.e., the fraction of times the true redshift falls between two limits z1 and z2 should be equal to the integral of the PDF between these limits. Previous works have used the global distribution of Probability Integral Transform (PIT) values to re-calibrate PDFs, but offsetting inaccuracies in different regions of feature space can conspire to limit the efficacy of the method. We leverage a recently developed regression technique that characterizes the local PIT distribution at any location in feature space to perform a local re-calibration of photometric redshift PDFs resulting in calibrated predictive distributions. Though we focus on an example from astrophysics, our method can produce predictive distributions which are calibrated at all locations in feature space for any use case. 
    more » « less
  2. Abstract The accurate estimation of photometric redshifts is crucial to many upcoming galaxy surveys, for example, the Vera C. Rubin Observatory Legacy Survey of Space and Time (LSST). Almost all Rubin extragalactic and cosmological science requires accurate and precise calculation of photometric redshifts; many diverse approaches to this problem are currently in the process of being developed, validated, and tested. In this work, we use the photometric redshift code GPz to examine two realistically complex training set imperfections scenarios for machine learning based photometric redshift calculation: (i) where the spectroscopic training set has a very different distribution in color–magnitude space to the test set, and (ii) where the effect of emission line confusion causes a fraction of the training spectroscopic sample to not have the true redshift. By evaluating the sensitivity of GPz to a range of increasingly severe imperfections, with a range of metrics (both of photo- z point estimates as well as posterior probability distribution functions, PDFs), we quantify the degree to which predictions get worse with higher degrees of degradation. In particular, we find that there is a substantial drop-off in photo- z quality when line-confusion goes above ∼1%, and sample incompleteness below a redshift of 1.5, for an experimental setup using data from the Buzzard Flock synthetic sky catalogs. 
    more » « less
  3. Abstract

    A reliable estimate of the redshift distributionn(z) is crucial for using weak gravitational lensing and large-scale structures of galaxy catalogs to study cosmology. Spectroscopic redshifts for the dim and numerous galaxies of next-generation weak-lensing surveys are expected to be unavailable, making photometric redshift (photo-z) probability density functions (PDFs) the next best alternative for comprehensively encapsulating the nontrivial systematics affecting photo-zpoint estimation. The established stacked estimator ofn(z) avoids reducing photo-zPDFs to point estimates but yields a systematically biased estimate ofn(z) that worsens with a decreasing signal-to-noise ratio, the very regime where photo-zPDFs are most necessary. We introduce Cosmological Hierarchical Inference with Probabilistic Photometric Redshifts (CHIPPR), a statistically rigorous probabilistic graphical model of redshift-dependent photometry that correctly propagates the redshift uncertainty information beyond the best-fit estimator ofn(z) produced by traditional procedures and is provably the only self-consistent way to recovern(z) from photo-zPDFs. We present thechipprprototype code, noting that the mathematically justifiable approach incurs computational cost. TheCHIPPRapproach is applicable to any one-point statistic of any random variable, provided the prior probability density used to produce the posteriors is explicitly known; if the prior is implicit, as may be the case for popular photo-ztechniques, then the resulting posterior PDFs cannot be used for scientific inference. We therefore recommend that the photo-zcommunity focus on developing methodologies that enable the recovery of photo-zlikelihoods with support over all redshifts, either directly or via a known prior probability density.

     
    more » « less
  4. The relative velocities and positions of monodisperse high-inertia particle pairs in isotropic turbulence are studied using direct numerical simulations (DNS), as well as Langevin simulations (LS) based on a probability density function (PDF) kinetic model for pair relative motion. In a prior study (Rani et al. , J. Fluid Mech. , vol. 756, 2014, pp. 870–902), the authors developed a stochastic theory that involved deriving closures in the limit of high Stokes number for the diffusivity tensor in the PDF equation for monodisperse particle pairs. The diffusivity contained the time integral of the Eulerian two-time correlation of fluid relative velocities seen by pairs that are nearly stationary. The two-time correlation was analytically resolved through the approximation that the temporal change in the fluid relative velocities seen by a pair occurs principally due to the advection of smaller eddies past the pair by large-scale eddies. Accordingly, two diffusivity expressions were obtained based on whether the pair centre of mass remained fixed during flow time scales, or moved in response to integral-scale eddies. In the current study, a quantitative analysis of the (Rani et al. 2014) stochastic theory is performed through a comparison of the pair statistics obtained using LS with those from DNS. LS consist of evolving the Langevin equations for pair separation and relative velocity, which is statistically equivalent to solving the classical Fokker–Planck form of the pair PDF equation. Langevin simulations of particle-pair dispersion were performed using three closure forms of the diffusivity – i.e. the one containing the time integral of the Eulerian two-time correlation of the seen fluid relative velocities and the two analytical diffusivity expressions. In the first closure form, the two-time correlation was computed using DNS of forced isotropic turbulence laden with stationary particles. The two analytical closure forms have the advantage that they can be evaluated using a model for the turbulence energy spectrum that closely matched the DNS spectrum. The three diffusivities are analysed to quantify the effects of the approximations made in deriving them. Pair relative-motion statistics obtained from the three sets of Langevin simulations are compared with the results from the DNS of (moving) particle-laden forced isotropic turbulence for $St_{\unicode[STIX]{x1D702}}=10,20,40,80$ and $Re_{\unicode[STIX]{x1D706}}=76,131$ . Here, $St_{\unicode[STIX]{x1D702}}$ is the particle Stokes number based on the Kolmogorov time scale and $Re_{\unicode[STIX]{x1D706}}$  is the Taylor micro-scale Reynolds number. Statistics such as the radial distribution function (RDF), the variance and kurtosis of particle-pair relative velocities and the particle collision kernel were computed using both Langevin and DNS runs, and compared. The RDFs from the stochastic runs were in good agreement with those from the DNS. Also computed were the PDFs $\unicode[STIX]{x1D6FA}(U|r)$ and $\unicode[STIX]{x1D6FA}(U_{r}|r)$ of relative velocity $U$ and of the radial component of relative velocity $U_{r}$ respectively, both PDFs conditioned on separation $r$ . The first closure form, involving the Eulerian two-time correlation of fluid relative velocities, showed the best agreement with the DNS results for the PDFs. 
    more » « less
  5. null (Ed.)
    ABSTRACT Cosmological analyses of galaxy surveys rely on knowledge of the redshift distribution of their galaxy sample. This is usually derived from a spectroscopic and/or many-band photometric calibrator survey of a small patch of sky. The uncertainties in the redshift distribution of the calibrator sample include a contribution from shot noise, or Poisson sampling errors, but, given the small volume they probe, they are dominated by sample variance introduced by large-scale structures. Redshift uncertainties have been shown to constitute one of the leading contributions to systematic uncertainties in cosmological inferences from weak lensing and galaxy clustering, and hence they must be propagated through the analyses. In this work, we study the effects of sample variance on small-area redshift surveys, from theory to simulations to the COSMOS2015 data set. We present a three-step Dirichlet method of resampling a given survey-based redshift calibration distribution to enable the propagation of both shot noise and sample variance uncertainties. The method can accommodate different levels of prior confidence on different redshift sources. This method can be applied to any calibration sample with known redshifts and phenotypes (i.e. cells in a self-organizing map, or some other way of discretizing photometric space), and provides a simple way of propagating prior redshift uncertainties into cosmological analyses. As a worked example, we apply the full scheme to the COSMOS2015 data set, for which we also present a new, principled SOM algorithm designed to handle noisy photometric data. We make available a catalogue of the resulting resamplings of the COSMOS2015 galaxies. 
    more » « less