Many astrophysical analyses depend on estimates of redshifts (a proxy for distance) determined from photometric (i.e., imaging) data alone. Inaccurate estimates of photometric redshift uncertainties can result in large systematic errors. However, probability distribution outputs from many photometric redshift methods do not follow the frequentist definition of a Probability Density Function (PDF) for redshift -- i.e., the fraction of times the true redshift falls between two limits z1 and z2 should be equal to the integral of the PDF between these limits. Previous works have used the global distribution of Probability Integral Transform (PIT) values to re-calibrate PDFs, but offsetting inaccuracies in different regions of feature space can conspire to limit the efficacy of the method. We leverage a recently developed regression technique that characterizes the local PIT distribution at any location in feature space to perform a local re-calibration of photometric redshift PDFs. Though we focus on an example from astrophysics, our method can produce PDFs which are calibrated at all locations in feature space for any use case.
more »
« less
Calibrated Predictive Distributions for Photometric Redshifts
Many astrophysical analyses depend on estimates of redshifts (a proxy for distance) determined from photometric (i.e., imaging) data alone. Inaccurate estimates of photometric redshift uncertainties can result in large systematic errors. However, probability distribution outputs from many photometric redshift methods do not follow the frequentist definition of a Probability Density Function (PDF) for redshift — i.e., the fraction of times the true redshift falls between two limits z1 and z2 should be equal to the integral of the PDF between these limits. Previous works have used the global distribution of Probability Integral Transform (PIT) values to re-calibrate PDFs, but offsetting inaccuracies in different regions of feature space can conspire to limit the efficacy of the method. We leverage a recently developed regression technique that characterizes the local PIT distribution at any location in feature space to perform a local re-calibration of photometric redshift PDFs resulting in calibrated predictive distributions. Though we focus on an example from astrophysics, our method can produce predictive distributions which are calibrated at all locations in feature space for any use case.
more »
« less
- Award ID(s):
- 2009251
- PAR ID:
- 10457174
- Date Published:
- Journal Name:
- Machine Learning for Astrophysics, proceedings of the Thirty-ninth International Conference on Machine Learning (ICML 2022)
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Abstract Key science questions, such as galaxy distance estimation and weather forecasting, often require knowing the full predictive distribution of a target variableYgiven complex inputsX. Despite recent advances in machine learning and physics-based models, it remains challenging to assess whether an initial model is calibrated for allx, and when needed, to reshape the densities ofytoward ‘instance-wise’ calibration. This paper introduces the local amortized diagnostics and reshaping of conditional densities (LADaR) framework and proposes a new computationally efficient algorithm (Cal-PIT) that produces interpretable local diagnostics and provides a mechanism for adjusting conditional density estimates (CDEs).Cal-PITlearns a single interpretable local probability–probability map from calibration data that identifies where and how the initial model is miscalibrated across feature space, which can be used to morph CDEs such that they are well-calibrated. We illustrate the LADaR framework on synthetic examples, including probabilistic forecasting from image sequences, akin to predicting storm wind speed from satellite imagery. Our main science application involves estimating the probability density functions of galaxy distances given photometric data, whereCal-PITachieves better instance-wise calibration than all 11 other literature methods in a benchmark data challenge, demonstrating its utility for next-generation cosmological analyzes99Code available as a Python package here:https://github.com/lee-group-cmu/Cal-PIT..more » « less
-
ABSTRACT We present a method for mapping variations between probability distribution functions and apply this method within the context of measuring galaxy redshift distributions from imaging survey data. This method, which we name PITPZ for the probability integral transformations it relies on, uses a difference in curves between distribution functions in an ensemble as a transformation to apply to another distribution function, thus transferring the variation in the ensemble to the latter distribution function. This procedure is broadly applicable to the problem of uncertainty propagation. In the context of redshift distributions, for example, the uncertainty contribution due to certain effects can be studied effectively only in simulations, thus necessitating a transfer of variation measured in simulations to the redshift distributions measured from data. We illustrate the use of PITPZ by using the method to propagate photometric calibration uncertainty to redshift distributions of the Dark Energy Survey Year 3 weak lensing source galaxies. For this test case, we find that PITPZ yields a lensing amplitude uncertainty estimate due to photometric calibration error within 1 per cent of the truth, compared to as much as a 30 per cent underestimate when using traditional methods.more » « less
-
Abstract The accurate estimation of photometric redshifts is crucial to many upcoming galaxy surveys, for example, the Vera C. Rubin Observatory Legacy Survey of Space and Time (LSST). Almost all Rubin extragalactic and cosmological science requires accurate and precise calculation of photometric redshifts; many diverse approaches to this problem are currently in the process of being developed, validated, and tested. In this work, we use the photometric redshift code GPz to examine two realistically complex training set imperfections scenarios for machine learning based photometric redshift calculation: (i) where the spectroscopic training set has a very different distribution in color–magnitude space to the test set, and (ii) where the effect of emission line confusion causes a fraction of the training spectroscopic sample to not have the true redshift. By evaluating the sensitivity of GPz to a range of increasingly severe imperfections, with a range of metrics (both of photo- z point estimates as well as posterior probability distribution functions, PDFs), we quantify the degree to which predictions get worse with higher degrees of degradation. In particular, we find that there is a substantial drop-off in photo- z quality when line-confusion goes above ∼1%, and sample incompleteness below a redshift of 1.5, for an experimental setup using data from the Buzzard Flock synthetic sky catalogs.more » « less
-
Abstract Upcoming photometric surveys will discover tens of thousands of Type Ia supernovae (SNe Ia), vastly outpacing the capacity of our spectroscopic resources. In order to maximize the scientific return of these observations in the absence of spectroscopic information, we must accurately extract key parameters, such as SN redshifts, with photometric information alone. We present Photo-zSNthesis, a convolutional neural network-based method for predicting full redshift probability distributions from multi-band supernova lightcurves, tested on both simulated Sloan Digital Sky Survey (SDSS) and Vera C. Rubin Legacy Survey of Space and Time data as well as observed SDSS SNe. We show major improvements over predictions from existing methods on both simulations and real observations as well as minimal redshift-dependent bias, which is a challenge due to selection effects, e.g., Malmquist bias. Specifically, we show a 61× improvement in prediction bias 〈Δz〉 on PLAsTiCC simulations and 5× improvement on real SDSS data compared to results from a widely used photometric redshift estimator, LCFIT+Z. The PDFs produced by this method are well constrained and will maximize the cosmological constraining power of photometric SNe Ia samples.more » « less
An official website of the United States government

