<?xml-model href='http://www.tei-c.org/release/xml/tei/custom/schema/relaxng/tei_all.rng' schematypens='http://relaxng.org/ns/structure/1.0'?><TEI xmlns="http://www.tei-c.org/ns/1.0">
	<teiHeader>
		<fileDesc>
			<titleStmt><title level='a'>Galaxy–Galaxy lensing in HSC: Validation tests and the impact of heterogeneous spectroscopic training sets</title></titleStmt>
			<publicationStmt>
				<publisher></publisher>
				<date>12/01/2019</date>
			</publicationStmt>
			<sourceDesc>
				<bibl> 
					<idno type="par_id">10191394</idno>
					<idno type="doi">10.1093/mnras/stz2968</idno>
					<title level='j'>Monthly Notices of the Royal Astronomical Society</title>
<idno>0035-8711</idno>
<biblScope unit="volume">490</biblScope>
<biblScope unit="issue">4</biblScope>					

					<author>Joshua S Speagle</author><author>Alexie Leauthaud</author><author>Song Huang</author><author>Christopher P Bradshaw</author><author>Felipe Ardila</author><author>Peter L Capak</author><author>Daniel J Eisenstein</author><author>Daniel C Masters</author><author>Rachel Mandelbaum</author><author>Surhud More</author><author>Melanie Simet</author><author>Cristóbal Sifón</author>
				</bibl>
			</sourceDesc>
		</fileDesc>
		<profileDesc>
			<abstract><ab><![CDATA[ABSTRACT            Although photometric redshifts (photo-z’s) are crucial ingredients for current and upcoming large-scale surveys, the high-quality spectroscopic redshifts currently available to train, validate, and test them are substantially non-representative in both magnitude and colour. We investigate the nature and structure of this bias by tracking how objects from a heterogeneous training sample contribute to photo-z predictions as a function of magnitude and colour, and illustrate that the underlying redshift distribution at fixed colour can evolve strongly as a function of magnitude. We then test the robustness of the galaxy–galaxy lensing signal in 120deg2 of HSC–SSP DR1 data to spectroscopic completeness and photo-z biases, and find that their impacts are sub-dominant to current statistical uncertainties. Our methodology provides a framework to investigate how spectroscopic incompleteness can impact photo-z-based weak lensing predictions in future surveys such as LSST and WFIRST.]]></ab></abstract>
		</profileDesc>
	</teiHeader>
	<text><body xmlns="http://www.tei-c.org/ns/1.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xlink="http://www.w3.org/1999/xlink">
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1">INTRODUCTION</head><p>Between the surface of last scattering (z &#8764; 1100) and the present day (z = 0), the paths of all observed photons have been gravitationally influenced by the intervening "cosmic web" of matter. This gravitational lensing, and particularly weak lensing, is sensitive to the growth of structure and expansion history of the Universe and serves as a key probe of cosmology (e.g., see review by <ref type="bibr">Mandelbaum 2018)</ref>. In addition, weak lensing serves as an effective complementary NSF Graduate Research Fellow &#8224; E-mail: jspeagle@cfa.harvard.edu technique to other cosmological probes (e.g., the Cosmic Microwave Background or Type Ia supernovae) by helping to break degeneracies between cosmological parameters and providing constraints on the growth of large scale structure (e.g., <ref type="bibr">DES Collaboration et al. 2017;</ref><ref type="bibr">Hildebrandt et al. 2018;</ref><ref type="bibr">Hikage et al. 2018</ref>, for some recent cosmic shear results).</p><p>The determination of accurate photometric redshifts (photo-z's) is a key challenge for deep lensing surveys. While shallow surveys (i &lt; 24) can obtain spectroscopic follow-up for representative samples, deeper surveys face tougher challenges. In this paper, we focus on the challenges of deriving photo-z's for the first year source catalog of the HSC 1 survey which reaches an i-band depth of &#8764; 26 AB magnitudes. As a precursor to LSST 2 , the HSC-SSP survey is a crucial testing ground for photo-z methods that will be applied for future precision cosmology analyses.</p><p>At the depths probed by HSC, there is a lack of adequate representative spectroscopic redshifts (spec-z's) available for training, validating, and testing photo-z methods <ref type="bibr">(Masters et al. 2015;</ref><ref type="bibr">Tanaka et al. 2018)</ref>. As a result, the HSC photo-z team instead supplements spec-z's taken from a variety of public surveys with grism/prism-based redshifts (g/prism-z's) along with photo-z's derived from deep, manyband photometry when training various photo-z algorithms and validating their performance. These unavoidable choices lead to a heterogeneous training set spanning a wide range of possibly redshift "quality".</p><p>Although mixing spec-z's and high-quality alternatives will likely occur in future surveys, the impact of using such a heterogeneous mixture on weak lensing has not yet been extensively explored. While the lack of high-quality spec-z's in regions of color and magnitude space makes it difficult to validate photo-z performance in those regions independently of the assumptions used to generate them, supplementing spec-z's in these regions with other methods that rely more heavily on these assumptions (see, e.g., <ref type="bibr">Bezanson et al. 2016)</ref> will not alleviate this problem. This means that performance in these regions remains a "known unknown" that is difficult to directly validate. This problem is particularly acute for future cosmology surveys hoping to derive unbiased photoz's at the sub-percent level to the majority of their faint photometric samples.</p><p>Currently, there are several attempts in the literature to try to resolve this issue. These take two broad approaches. The first is an attempt to efficiently collect spec-z's to "fill in" regions of color space that currently do not have available data. The largest systematic approach is the C3R2 survey <ref type="bibr">(Masters et al. 2017)</ref>, which has so far collected &#8764; 1300 highquality spectra in under-populated regions of color-space. The second is to assume that we can use cross-correlations of ensembles of galaxies that span the relevant redshift range, regardless of their color and/or magnitude, to characterize photo-z accuracy for a population of galaxies. This has proven to be promising but is not without challenges <ref type="bibr">(M&#233;nard et al. 2013;</ref><ref type="bibr">Newman et al. 2015;</ref><ref type="bibr">Hoyle &amp; Rau 2018)</ref>. Importantly, both of these methods assume that we can use ensembles of galaxies in specific regions of color and/or magnitude space to calibrate photo-z biases and uncertainties.</p><p>In this paper, we investigate how the use of heterogeneous training samples affects photo-z performance and galaxy-galaxy (gg) lensing analyses (e.g., <ref type="bibr">Kwan et al. 2017;</ref><ref type="bibr">Leauthaud et al. 2017;</ref><ref type="bibr">Prat et al. 2018</ref>) using HSC-SSP data. In &#167;2, we describe the photometry, shear, and redshift data used in this paper. In &#167;3, we investigate the representativeness of current spec-z samples and examine the dependence on color and magnitude. We find strong evidence for evolution in the redshift distribution of galaxies at fixed color as a 1 The Hyper Suprime-Cam (HSC) Subaru Strategic Program (SSP) Survey <ref type="bibr">(Aihara et al. 2018</ref>). See hsc.mtk.nao.ac.jp/ssp. 2 The Large Synoptic Survey Telescope <ref type="bibr">(Ivezic et al. 2008</ref>). See lsst.org. function of magnitude. This leads us to develop a new framework, described in &#167;4, for computing photo-z's from heterogeneous data that incorporates magnitude dependence and allows us to track how specific training objects contribute to individual photo-z predictions. We investigate the accuracy of photo-z probability density functions (PDFs) computed using this method in &#167;5.</p><p>After discussing our photo-z tests, in &#167;6 we outline the framework used for our lensing analysis. In &#167;7, we then test whether our gg lensing measurements are robust to a variety of estimators, quality cuts, and spectroscopic incompleteness. We conclude in &#167;8.</p><p>We assume a flat &#923;CDM cosmology whenever appropriate with &#8486;&#923; = 0.7, &#8486;m = 0.3, and h = 0.7. All magnitudes in the paper are AB magnitudes <ref type="bibr">(Oke &amp; Gunn 1983)</ref>. For our lensing calculations (see &#167;7), we assume physical coordinates to compute &#8710;&#931; in 10 logarithmically spaced bins from 0.05 Mpc to 15 Mpc.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2">DATA</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.1">The HSC Survey</head><p>The Hyper Suprime-Cam Subaru Strategic Program (HSC-SSP) Survey <ref type="bibr">(Aihara et al. 2018</ref>) utilizes the Hyper Suprime-Cam prime-focus camera <ref type="bibr">(Miyazaki et al. 2018;</ref><ref type="bibr">Komiyama et al. 2018;</ref><ref type="bibr">Kawanomoto et al. 2018;</ref><ref type="bibr">Furusawa et al. 2018</ref>) on the 8.2 m Subaru telescope at Mauna Kea. The survey has a "wedding cake" construction, with three different area/depth combinations to optimize a variety of science goals: the Wide survey will cover 1400 deg 2 in grizy to a limiting depth of i &#8764; 26 mag, the Deep survey will cover 26 deg 2 to i &#8764; 27 mag, and the UltraDeep survey will cover 3.5 deg 2 to a depth of i &#8764; 28 deg 2 . This work is based on the S16A HSC-SSP internal data release, which covers 136.9 deg 2 to full Wide depths in all five bands. For more information on the HSC-SSP survey, please see <ref type="bibr">Aihara et al. (2018)</ref>. For more information on the data processing and pipeline, please see <ref type="bibr">Bosch et al. (2018)</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.2">The weak lensing source sample</head><p>Our sample of galaxy sources is selected using the weak lensing cuts outlined in <ref type="bibr">Mandelbaum et al. (2018)</ref>. In brief, these are a series of quality cuts to ensure that composite model (i.e. CModel) photometry, point spread functions (PSFs), and measured object shapes are reliable. Observations are restricted to i &lt; 24.5 mag to avoid using data with possibly unreliable shape measurements and to ensure "reasonable" spec-z coverage (although see &#167;3). A "full-depth, full-color" (FDFC) cut was also imposed to eliminate sources that were not observed in all bands to full depth. Objects near bright stars were removed using the updated Arcturus bright star masks described in <ref type="bibr">Coupon et al. (2018)</ref>, as opposed to the original Sirius masks used in <ref type="bibr">Mandelbaum et al. (2018)</ref>, as those preserve more galaxies around bright stars. See <ref type="bibr">Mandelbaum et al. (2018)</ref> for additional details regarding the construction and validation of the HSC-SSP S16A weak lensing shear catalog.</p><p>In addition to these weak lensing cuts, the photo-z's used in this work were only computed for objects with PSF-matched 1.1 aperture photometry available in all five bands. This effectively imposes an additional de facto cut on the seeing in all five bands. A variety of internal tests have found that this does not introduce a meaningful bias on weak lensing analyses <ref type="bibr">(More et al., in prep.)</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.3">Redshift Training Data</head><p>The HSC-SSP Wide survey footprint is designed to maximize the overlap with other photometric and spectroscopic surveys while keeping the survey geometry simple. This allows HSC-SSP to exploit a large number of public specz's when constructing a redshift training set. In addition, (Ultra)Deep data taken in heavily observed fields such as COSMOS <ref type="bibr">(Scoville et al. 2007</ref>) further allows HSC-SSP to include large numbers of fainter objects observed at higher signal-to-noise (S/N). These observations allow for more detailed modeling of the general population observed in the Wide survey, and are especially helpful for fainter sources.</p><p>A detailed description of the training sample can be found in <ref type="bibr">Tanaka et al. (2018)</ref>. We briefly summarize it here.</p><p>The training set contains spectroscopic, grism, and prism redshifts from a variety of overlapping public surveys including:</p><p>&#8226; zCOSMOS DR3 <ref type="bibr">(Lilly et al. 2009</ref>),</p><p>&#8226; UDSz <ref type="bibr">(Bradshaw et al. 2013;</ref><ref type="bibr">McLure et al. 2013)</ref>,</p><p>&#8226; 3D-HST <ref type="bibr">(Skelton et al. 2014;</ref><ref type="bibr">Momcheva et al. 2016</ref>),</p><p>&#8226; FMOS-COSMOS <ref type="bibr">(Silverman et al. 2015)</ref>,</p><p>&#8226; VVDS (Le <ref type="bibr">F&#232;vre et al. 2013</ref>),</p><p>&#8226; VIPERS PDR1 <ref type="bibr">(Garilli et al. 2014</ref>),</p><p>&#8226; SDSS DR12 <ref type="bibr">(Alam et al. 2015)</ref>,</p><p>&#8226; GAMA DR2 <ref type="bibr">(Liske et al. 2015)</ref>,</p><p>&#8226; WiggleZ DR1 <ref type="bibr">(Parkinson et al. 2012</ref>), &#8226; DEEP2 DR4 <ref type="bibr">(Newman et al. 2013)</ref>, and &#8226; PRIMUS DR1 <ref type="bibr">(Coil et al. 2011;</ref><ref type="bibr">Cool et al. 2013)</ref>.</p><p>As each survey has its own flagging scheme to indicate redshift confidence, the different schemes were homogenized and used to select "secure" redshifts using the criteria outlined in <ref type="bibr">Tanaka et al. (2018)</ref>. In addition to these public surveys, a collection of private COSMOS spec-z's were also included exclusively for photo-z training <ref type="bibr">(Mara Salvato &amp; Peter Capak, private communication)</ref>.</p><p>In addition to these spec-z's, grism-z's, and prismz's, the training set was supplemented with a set of highquality, many-band photo-z's taken from 3D-HST and COS-MOS2015 <ref type="bibr">(Laigle et al. 2016</ref>) in order to maintain sufficient magnitude and color coverage down to i &#8764; 24.5 (see &#167;3). Without these photo-z's, the magnitude and color coverage of the training set fails to adequately span the relevant parameter space of the HSC-SSP data used in this analysis. The above heterogeneity in the spec-z's, grism-z's, prism-z's, and many-band photo-z's is one of the motivating reasons for the analysis presented in this work.</p><p>Objects were iteratively matched to this catalog within 1 at (1) UltraDeep, (2) Deep, and (3) Wide HSC-SSP depths in order to take advantage of higher-S/N data when available while avoiding possible duplicates. Our training data ultimately consists of &#8764; 170k, 37k, and 170k high-quality spec-z, g/prism-z, and many-band photo-z's, respectively.</p><p>As described in <ref type="bibr">Tanaka et al. (2018)</ref>, to perform accurate cross-validation at HSC-SSP Wide depths, each object was assigned an "emulated" Wide-depth flux error based on the error properties of similar objects observed in the HSC-SSP Wide survey. Objects were also assigned an associated color-magnitude weight using a nearest-neighbor approach based on a representative subset of the weak lensing source sample to account for domain mismatches following the methodology described in &#167;4.4. The original and reweighted redshift distributions of the HSC-SSP S16A training sample are shown in Figure <ref type="figure">1</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3">HOW REPRESENTATIVE ARE EXISTING SPECTROSCOPIC REDSHIFT SAMPLES?</head><p>As discussed in &#167;2.3, spectroscopic "completeness" within our training set varies strongly across magnitude and color. In other words, in a given color-magnitude "bin" the fraction of objects that come from more reliable sources such as spec-z's versus more unreliable sources such as many-band photo-z's can change rapidly. This behavior is concerning for several reasons. First, spec-z's generally have much smaller (often negligible) errors in redshift measurements compared to photo-z's, so our underlying knowledge of the redshift distribution at fixed magnitude and/or color degrades as the number and/or fraction of spec-z's decreases. Second, it is a well known issue that photo-z PDFs can be mis-calibrated (see, e.g., <ref type="bibr">Tanaka et al. 2018)</ref>. Third, there may be systematic biases of the redshift distribution of spec-z's relative to photo-z's in a given color-magnitude bin arising from selection effects that arise during the process of data collection. These involve choices that often generate the mismatch in the first place, from prioritizing spectroscopic targets (which often impose magnitude and color biases) to how non-detections are treated (which correlates strongly with redshift).</p><p>In particular, many studies assume that these pathological behaviors can be "calibrated out" by matching objects explicitly in terms of color (not magnitude) to obtain a representative spec-z sample (see, e.g., <ref type="bibr">Lima et al. 2008)</ref>. Surveys such as the Complete Calibration of the Color-Redshift Relation (C3R2) Survey <ref type="bibr">(Masters et al. 2017</ref>) have expanded upon this strategy, explicitly sorting possible targets into bins in color space and then pursuing them assuming that spec-z's obtained at fixed color are representative of the entire photometric population in that given color bin.</p><p>While this strategy is efficient, it assumes that the intrinsic redshift distribution P (z|c) at fixed color c is representative over all relevant magnitudes m. This is a strong assumption given that the population of galaxies evolves as a function of redshift and that we expect brighter objects of fixed color to be (on average) at lower redshifts, all else being fixed. We begin by investigating to what degree P (mspec|c) differs from P (m|c) and whether or not these differences are of importance to current lensing surveys.</p><p>More formally, this assumption implies that P (z|c) = P (z|c, m)P (m|c)dm (1)</p><p>where K(z|zi,spec) is a kernel density estimate (i.e. smoothing scale) for each spec-z where zi,spec is a spectroscopic redshift drawn from a particular spec-z distribution P (mspec|c) = P (m|c) based on how the data were collected. Note that, in general, (2) is only guaranteed to be approximately valid if all the spec-z's comprise a representative sample from the underlying magnitude distribution at fixed color P (m|c).</p><p>To investigate these potential issues in our training sample, we will use manifold learning to sort our training galaxies into regions of color space to investigate possible trends in P (z|c, m). In &#167;3.1, we describe the particular algorithm and procedure used to construct the manifold. In &#167;3.2, we examine several examples of P (z|c, m) and find that there can be significant evolution as a function of magnitude in particular regions of color. Our results imply that current spec-z follow-up programs should be cognizant of these effects in order to avoid biasing photo-z predictions at fixed color. Since the redshift success rate for a given spectrograph may depend on the redshift, at a given magnitude and color bin even surveys that are relatively homogeneously selected may be subject to these subtle biases.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.1">Manifold Learning and Self Organized Maps</head><p>For this study, we use a Self-Organizing Map (SOM; <ref type="bibr">Kohonen 1982;</ref><ref type="bibr">Kohonen 2001</ref>) to both cluster our data in color space and learn a lower-dimensional 2-D projection that can be used for visualization purposes. We summarize the main As with Figure <ref type="figure">1</ref>, we see that large portions of color/magnitude space do not have sufficient coverage within our dataset, necessitating the use of many-band photo-z's from surveys such as COSMOS to "bridge the gap" when computing photo-z's for HSC-SSP.</p><p>features of our specific implementation below, and direct the reader to Carrasco <ref type="bibr">Kind &amp; Brunner (2014a)</ref>, <ref type="bibr">Masters et al. (2015</ref><ref type="bibr">Masters et al. ( , 2017))</ref>, and <ref type="bibr">Speagle &amp; Eisenstein (2017)</ref> for more details concerning their applications to photo-z's.</p><p>The SOM is an unsupervised machine learning algorithm that projects high-dimensional data onto a lowerdimensional space using competitive training of a (large) set of "nodes" in a way that attempts to preserve general topological features and correlations present in the higherdimensional data. It consists of a fixed number of nodes N nodes = D i=1 N i nodes , where the product over i is taken over all dimensions D of the SOM, arranged on an arbitrary D-dimensional grid with N i nodes nodes in each dimension. Each node in the grid is assigned a position x on the SOM and is represented by a particular model F(x), where F is the set of observed features. In this paper, these are the set of grizy photometric flux densities comprising a particular galaxy Spectral Energy Distribution (SED) in the HSC filters.</p><p>Once the dimensions/shape of the SOM are chosen, training then proceeds as follows:</p><p>(i) Initialize the node models (most often randomly) and set the current iteration t = 0.</p><p>(ii) Draw (with replacement) a random object F and its associated errors &#963; from the input dataset.</p><p>(iii) Compute</p><p>across all nodes in the SOM over the available bands indexed by b, where the scale factor</p><p>renormalizes the model SED so that we are fitting in terms of flux density ratios (i.e. colors) rather than flux densities (i.e. magnitudes) directly. and plotted based on their relative likelihood of being associated with the node. We see a clear trend towards higher redshift as a function of magnitude. While some of the evolution at fainter magnitudes is due to intra-bin scatter from photometric uncertainty, this effect is minimal at brighter magnitudes where the trend is clearest.</p><p>(iv) Select the best-matching node</p><p>based on the minimum &#967; 2 (x) value. (v) Update the node models F(x) &#8594; F (x) based on a learning rate A(t) and neighborhood function H(x, x best |t) such that</p><p>(vi) If t Niter, increment t and repeat starting from (ii).</p><p>After training, objects are typically "mapped" onto the SOM by repeating steps (iii) and (iv) for every object in the input dataset, assigning each object to its best-matching node. We use a modified version of this approach where each object Fj is instead assigned to a set of nodes along with its corresponding weight wj(x) &#8733; e -&#967; 2 (x)/2 for all nodes with wj(x) &gt; fmin max(wj(x)). We take fmin = 10 -3 , which approximately corresponds to thresholding galaxies that are &#8764; 2.5&#963; away from the best-fit. This "probabilistic mapping" allows us to better capture the uncertainty in an individual object's position on the SOM based on its photometric errors, resulting in smoother maps that are less sensitive to sampling noise and photometric errors relative to, e.g., <ref type="bibr">Masters et al. (2017)</ref>. A detailed discussion of these differences is beyond the scope of this paper and will be explored in future work. We choose our SOM to be 2-D with a 50 &#215; 50 grid of nodes, and train it on the weak lensing source catalog photometry following the steps above for Niter = 10 5 iterations, which we find is enough to ensure the median &#967; 2 (x best ) value for objects across our map is approximately the number of colors (four) available. We choose our learning rate to be the weighted harmonic mean</p><p>for A0 = 0.5 and A1 = 0.1 and the neighborhood function to be a Gaussian kernel</p><p>with a standard deviation that goes as the weighted harmonic mean</p><p>with &#963;0 = 35 and &#963;1 = 1. Our final SOM is shown in Figure <ref type="figure">2</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.2">Redshift Evolution at Fixed Color</head><p>Using our SOM, we can now investigate the questions outlined in the beginning of this section. In particular, we want to examine whether the intrinsic redshift distribution at fixed color P (z|c) is insensitive to magnitude within our training data.</p><p>Although not common across our SOM, we do find some regions where there is evolution in P (z|c, m) at brighter magnitudes within spec-z-dominated samples. One such example is shown in Figure <ref type="figure">3</ref>. For contrast, a more "typical" node shown for contrast in Figure <ref type="figure">4</ref>. This confirms our basic intuition, formalized in Bayesian photo-z approaches such as BPZ <ref type="bibr">(Ben&#237;tez 2000)</ref>, that the complicated evolution of galaxy SEDs and number densities as a function of time can lead to P (z|c, m) evolution as a function of magnitude if the underlying SED cannot be uniquely constrained. While this is likely possible in future multiwavelength datasets with full optical to near-infrared coverage (see, e.g., <ref type="bibr">Hemmati et al. 2018)</ref>, this likely remains a problem for current/planned weak lensing-oriented surveys such as HSC-SSP, DES, KiDS, and LSST.</p><p>Note that we do expect that noisy observations will naturally lead to a broadening of the redshift distribution at fainter magnitudes due to intra-bin scatter (i.e. an object's PDF gets "smeared" across multiple nodes on the SOM), and possibly to one whose mean distribution evolves strongly with magnitudes, mimicking a shift in the intrinsic P (z|c, m) distribution as a function of m.<ref type="foot">foot_1</ref> This effect, however, should not impact the redshift distribution at brighter magnitudes (where measurement errors are small), which is where most of our spec-z's lie and where the trend seen in Figure <ref type="figure">3</ref> is the most apparent.</p><p>To quantify the extent to which possible redshift evolution can impact our redshift results, we focus on evolution in spec-z observations at i-band magnitudes brighter than m = 22.5 to mitigate redshift errors based on photometric measurement errors and incorrect redshift solutions. We compute the median redshift in bins of 0.5 mag, and fit linear trends for all SOM nodes where we could compute medians for 3 bins using 10 spec-z observations in each bin. We find that of the 709 nodes which fit this criteria (&#8776; 30% of the SOM), around 40% (287) display significant redshift evolution with dz/dm &gt; 0.05. This trend is robust to different choices in dz/dm threshold and the required number of bins used in the fit, and is substantially higher than the few percent expected due to random variation. While this test is limited in scope, it highlights that the trend shown in Figure <ref type="figure">3</ref> is not an isolated case and needs to be taken seriously.</p><p>These results indicate that it can be dangerous to use re-weighted spec-z samples based on only a few broadband colors and expect to get the correct P (z|c) distribution, a danger which is indeed recognized by other weak lensing analyses (e.g., <ref type="bibr">Troxel et al. 2017;</ref><ref type="bibr">K&#246;hlinger et al. 2017;</ref><ref type="bibr">Hikage et al. 2018)</ref>. This implies that spec-z samples may need to be representative in magnitude as well as color.<ref type="foot">foot_2</ref> </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4">PHOTOMETRIC REDSHIFT FRAMEWORK</head><p>Based on the results in &#167;3, we aim to develop a framework that allows us to explicitly incorporate magnitudedependence into our photo-z predictions to probe P (z|c, m) and alleviate possible mismatches at fixed color between spec-z's and many-band photo-z's present within our training set. At fainter magnitudes, however, almost all objects that make up our estimates for P (z|c, m) come from manyband photo-z's. As a result, we also want to track how individual objects in the training set propagate forward to our eventual photo-z predictions to investigate how much our many-band photo-z's are contributing to redshift predictions in different regions of color-magnitude space.</p><p>We adopt a Bayesian-oriented nearest-neighbors (NN)based approach that attempts to properly account for measurement errors within both training and testing sets when making photo-z predictions based explicitly on observed flux densities (magnitudes). In &#167;4.1, we discuss the Bayesian underpinning of our approach. We describe our likelihood in &#167;4.2 and our NN-based approximations to the likelihood/posterior in &#167;4.3. We discuss our priors in &#167;4.4.</p><p>All photometric redshifts (and SOMs) in this study were computed using an early development version (v0.1.5) of the Python photo-z package frankenz<ref type="foot">foot_3</ref>  <ref type="bibr">(Speagle et al. in prep.)</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.1">Bayesian Inference</head><p>Deriving photometric redshifts ultimately relies on modeling the continuous mapping between a set of observables F (e.g. flux densities) within some number of bands and redshift z. The central idea of our approach is that in the "big data" limit this comparison can instead be approximated as a discrete comparisons between individual objects. The redshift for a target galaxy indexed by g out of N galaxies given a set of M training galaxies indexed by h can then be written as</p><p>where P (z|h) is the redshift PDF for galaxy h, P (h|g) is the posterior, P (g|h) is the likelihood, P (h) is the prior for h, and P (g) = M h=1 P (g|h)P (h) is the evidence (marginal likelihood). In other words, we are trying to find the probability that our observed galaxy g is actually a realization of our training galaxy h (i.e. whether g and h are a photometric "match"). We then assign it the corresponding redshift kernel P (z|h) for galaxy h with a weight proportional to the posterior probability P (h|g). P (z|g) then becomes a posterior-weighted mixture of our P (z|h)'s.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.2">Photometric Likelihood</head><p>Assuming the errors on the measured fluxes Fg and Fh are independent and Normal (i.e. Gaussian) and ignoring the impact of selection effects (see, e.g., <ref type="bibr">Leistedt et al. 2016)</ref>, the log-likelihood for P (g|h) from our set of B bands indexed by b can be naively written as</p><p>+ B ln 2&#960; This represents the standard &#967; 2 statistic often used in template-fitting methods (see &#167;3.1) but with error contributions from both the target (g) and training (h) objects and including the relevant normalization term.</p><p>Unlike most template-fitting codes and contrary to the approach taken in &#167;3, we deliberately chose not to include a free scaling parameter s to try and account for normalization offsets between g and h (i.e. fitting in magnitudes instead of colors). There are two reasons for this. The first is that the conditional prior P (s|h) we would want to impose over s when computing P (g|h) = P (g|h, s)P (s|h)ds is unclear. For instance, a fixed prior such as the uniform P (s|h) = P (s) = 1 prior used by most template-fitting codes is equivalent to assuming a fixed (and unphysical) luminosity function, which can create biases in inference. Trying to specify a color-dependent luminosity function P (s|h) directly, however, is extremely challenging because the integral over s often cannot be evaluated analytically. Fitting flux densities F directly avoids these complications.</p><p>More importantly for our purposes, however, is that our results from &#167;3 show that there can be strong evolution in the underlying redshift distribution P (z|c, m) at fixed color c as a function of magnitude m. To mitigate this effect without attempting to deal with complicated priors, for simplicity we opt to keep the likelihood "close to the data" and fit directly in F.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.3">Nearest-Neighbors Approximation</head><p>To avoid running over all M objects in the training set, we use a modified nearest-neighbors approach to preferentially select objects that have similar flux densities with respect to their errors. As the relative errors &#963;2 g + &#963;2 h of any two training/target objects g and h will differ, the relevant distance metric will be different for every pairwise training-target object combination. As nearest neighbor searches are typically done with respect to a fixed distance metric (often the Euclidean distance), this pairwise distance dependence poses a problem.</p><p>We deal with this by using Monte Carlo methods to search for neighbors across multiple realizations of the observed flux densities. We first generate a Monte Carlo realization Fg and Fh of the photometry for all objects in our target set and training set, respectively. We then determine the k nearest neighbors based on the Euclidean squared distance between our set of Monte Carlo-ed flux densities to a given observed galaxy g using a k-d tree <ref type="bibr">(Bentley 1975)</ref>, defining a set of k indices h(g). After repeating this process K times, we define an object's set of "photometric neighbors" as the union of the k nearest neighbors from each of the K Monte Carlo realizations:</p><p>Using our K Monte Carlo k nearest neighbors (KM-CkNN) approximation, we only compute photometric likelihoods to a small fraction of the data preferentially selected to contain the highest likelihoods. This procedure is most robust when the set of neighbors is roughly complete out to 3-5 standard deviations (relative to all possible pairwise galaxy combinations). An object can have at most k &#215; K possible neighbors, with the exact number a strong function of the signal-to-noise of the target object g and the density of training objects h in its local region of color-magnitude space.</p><p>The sparse (kK N h ) nature of this KMCkNN approximation enables us to keep track of the individual loglikelihoods ln P (g|h) and their corresponding indices H(g) across all target objects. Because these have been computed exclusively using observables, we can subsequently use them to construct any associated ancillary quantities "after the fact". The most relevant example is the photo-z PDFs following equation ( <ref type="formula">10</ref>), but this may include a whole range of other useful quantities such as those detailed in &#167;4.5. A schematic diagram of our KMCkNN approach is shown in Figure <ref type="figure">5</ref>.</p><p>One significant drawback of using a nearest-neighbors approach is that it is difficult to accommodate missing data. However, since the weak lensing source catalog used in this work is only defined in regions with full depth and full color coverage <ref type="bibr">(Mandelbaum et al. 2018</ref>), this restriction does not impact our results.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.4">Photometric Priors</head><p>We incorporate the KMCkNN approximation from &#167;4.3 into our photometric prior P (h) by defining a new "sparse" prior Pg(h) = P (h) for h &#8712; H(g) 0 otherwise (13)</p><p>Our photo-z PDFs can then be written as</p><p>Typically, the prior P (h) is defined to adjust for "domain mismatch" between the training and target datasets. Since spec-z training sets are significantly biased in both color and magnitude relative to most target photometric galaxy populations, this "reweighting" via P (h) traditionally substantially improves photo-z accuracy relative to cases where P (h) is assumed to be uniform <ref type="bibr">(Lima et al. 2008</ref>).</p><p>We compute a photometric "prior" using the magnitudebased, KMCkNN approach described in &#167;4.2 and &#167;4.3 by computing the approximate Bayesian evidence under a uniform prior</p><p>where our roles for g and h have switched: we now treat our set of training galaxies indexed by h as "target" objects and a subsample of N ref reference objects indexed by g as "training" objects. We found the impact of the prior on our redshift predictions in internal testing was mostly unchanged for values of N ref Ntrain, and so opt to use N ref = 5 &#215; 10 5 here.</p><p>To put this procedure another way, we determine P (h) by stacking the likelihoods of neighbors in the target data around individual training objects. This procedure, while not entirely proper (we are using a subset of galaxies in our sample to determine the prior), is sufficient for our purposes and represents the first step to a proper hierarchical model <ref type="bibr">(Leistedt et al. 2016)</ref>.</p><p>We find our prior-weighted training data are able to reproduce the B-dimensional distribution of our target data quite well as long as K and k are sufficiently large. See <ref type="bibr">Tanaka et al. (2018)</ref> for additional details.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.5">New Quality Indicators</head><p>Unlike other machine learning-oriented approaches, we are able to compute (approximate) posterior quantities to every training-target object pair. This enables us to utilize a variety of Bayesian-oriented indicators to determine the quality of our fits. We will discuss two new quality indicators here: metrics related to basic goodness-of-fit tests ( &#167;4.5.1) and those describing the "information content" used in our predictions ( &#167;4.5.2).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.5.1">Goodness-of-Fit</head><p>By computing posterior quantities to every pair of trainingtarget objects, we can exploit goodness-of-fit tests used in a broad set of Bayesian model fitting applications. We will examine the two most basic indicators here: the maximum a posteriori (MAP) result Pmax(g) &#8801; max{. . . , P (h|g), . . . } and the evidence P (g) &#8801; h&#8712;h P (g|h)P (h).</p><p>The MAP quantifies how good our best-fit result is, enabling us to determine if a given set of observables is represented in our training data. This is extremely useful when trying to remove objects with unreliable predictions that lie outside the parameter space spanned by our training data.</p><p>The evidence quantifies how well an object is represented across the entirety of our training data. This is useful when trying to identify objects which do not have meaningful "coverage" since objects that are only similar to a handful of training examples might have unreliable predictions.</p><p>In general, we find that the MAP and the evidence are highly correlated among our data: objects that are well-fit by at least one training example are very likely to be well-fit by others, and vice versa. Based on internal testing, we find that instituting a cut explicitly on the best-fit result based on the fitted &#967; 2 values removes the majority of poorly-fit objects from our sample. Our final cut is based on the 95th quantile P (&#967; 2 5 X) = 0.95, where &#967; 2 5 is the &#967; 2 distribution with 5 degrees of freedom, which is conceptually roughly equivalent to 2-sigma clipping.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.5.2">Information Content</head><p>As mentioned in &#167;4.3, our sparse KMCkNN approximation allows us to keep track of individual log-likelihoods computed between sets of training-target object pairs. It is then straightforward to transform these results into photo-z PDFs via equation ( <ref type="formula">14</ref>).</p><p>More generally, however, keeping track of the relevant individual posterior predictions Pg(h|g) = P (g|h) Pg(h)</p><p>h P (g|h) Pg(h) allows us to compute almost any posterior-dependent result. This flexibility enables us to investigate auxiliary properties of interest.</p><p>In this work, we explore the impact of photo-z systematics on weak lensing using our heterogeneous HSC-SSP training data. In particular, we are worried about the impact many-band photo-z's might have on our results. In order to do this, we introduce two quantities to keep track of where the information content in a given photo-z prediction P (z|g) actually originates:</p><p>&#8226; Fphot: the fraction of neighbors in the training set with many-band photo-z.</p><p>&#8226; Pphot: the posterior-weighted fraction of neighbors in the training set with many-band photo-z.</p><p>Fphot and Pphot will help us to determine what kind of redshift (e.g. photo-z, spec-z, ..) any given object has been trained on. This will be an important ingredient for our lensing tests in Section 7.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5">PHOTOMETRIC REDSHIFT VALIDATION</head><p>In this section we outline the implementation ( &#167;5.1), validation ( &#167;5.2), and application ( &#167;5.3) of the photo-z framework outlined in &#167;4.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.1">Tuning: Feature Selection and (Hyper-)parameter Choices</head><p>The HSC-SSP catalog contains a variety of features that can be used for photo-z predictions, including a variety of photometry measurements and size information. In addition, the KMCkNN framework described in &#167;4.3 involves several hyper-parameters that can impact performance. We conduct a variety of internal cross-validation and hold-out tests following &#167;5.2 to determine the subset of features and hyper-parameters that give the best performance at a reasonable computational cost. Our results are summarized below:</p><p>&#8226; Our chosen flux density measurements were PSFmatched 1.1 aperture photometry among objects with successful forced photometry in all five bands. Adding additional features such as size or using different combinations of other photometry products (e.g., cmodel) gave comparable or worse results.</p><p>&#8226; We introduce a photometric smoothing kernel &#963; g,b = f b Fg,b for each object g in each band b to account for systematic uncertainties in measured photometric errors and to serve as a smoothing scale when computing likelihoods. We find that f b = 0.02 gives good photo-z PDFs in aggregate and constitutes an effective zero-point calibration uncertainty of &#8776; 0.02 mag (although see <ref type="bibr">Tanaka et al. 2018</ref>).</p><p>&#8226; To ensure optimal runtime, we want to make K and k only as large as necessary to obtain good magnitude/colorspace coverage for each object. The lower limit on K is set by the number of Monte Carlo realizations needed to roughly marginalize over the measurement uncertainties when searching for neighbors, while the lower limit on k is to ensure a reasonably large collection of neighbors. Based on internal testing, we find K = 25 Monte Carlo realizations and k = 10 neighbors selected at each iteration works as a reasonable compromise. The worst case performance (N nghbr &#8764; 10) generally only occurs for bright and rare objects, which usually also have poor likelihoods. The typical number of unique neighbors is N nghbr &#8764; 100 -200.</p><p>&#8226; We take our redshift kernels to be Normal distributions N z &#181; = &#7825;h , &#963; 2 = (&#8710;z) 2 + &#963;2 h,z centered on the measured redshift &#7825;h with a variance set by a combination of an intrinsic width &#8710;z = 0.01, similar to the redshift spacing used when storing most photo-z PDFs, and the associated redshift measurement error &#963;h,z . This allows us to propagate uncertainties from the many-band photo-z's to our final predictions. Following <ref type="bibr">Tanaka et al. (2018)</ref>, our redshift PDFs P (z|g) are evaluated over a redshift grid ranging from 0 z 6 with &#8710;z = 0.01 spacing. All redshift-based quantities described later in the text are derived from these discretized PDFs.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.2">Calibration: Characterizing Behavior with Cross-Validation</head><p>As discussed in &#167;2.3, to account for inhomogeneity and domain mismatch in the training set all objects are assigned an HSC-SSP Wide-depth emulated error following the procedure described in <ref type="bibr">Tanaka et al. (2018)</ref> and an associated color-magnitude weight following &#167;4.4. Unless stated otherwise, we utilize both quantities when computing any of the performance estimates reported here. We first randomly divided our training data into validation/hold-out testing sets comprised of (1 -f )/f fractions of the data for some hold-out fraction f . We then use two strategies to select our hyper-parameters and evaluate our performance within the validation set: k = 5-fold crossvalidation and internal leave-one-out tests. For k = 5-fold cross-validation, we randomly divided our validation set into k = 5 subsets. We then train on k -1 of these subsets to compute photo-z predictions to the remaining subset, cycling through each of the subsets until we had obtained predictions to the entire validation sample. For leave-one-out tests, we instead train on the entire validation set. However, when computing predictions to each object, we "mask out" its possible contribution within the selected group of neighboring objects used to compute the photo-z prediction. Both of these procedures, along with the final hold-out test set, attempt to mitigate over-fitting and ensure realistic performance estimates.</p><p>We find that the results from &#167;5.1 are mostly insensitive to the chosen hold-out fraction f when f 0.5 (i.e. when our validation set consists of more than &#8764; 150k objects). In addition, we also find that the performance on the hold-out test set is essentially identical to performance estimates within the validation set using both strategies when f 0.8, confirming that our approaches avoid over-fitting and that the information content appears to roughly saturate as our validation set exceeds &#8764; 250k objects. Based on these results, we find that it is reasonable to treat our features and hyper-parameters from &#167;5.1 as essentially fixed. Our reported performance is then estimated by applying the more conservative k = 5-fold cross-validation tests across the entire training sample (i.e. without the (1 -f )/f validation/testing split).</p><p>The 2-D stacked photo-z PDFs versus the input true redshifts (smoothed by their intrinsic uncertainties) along with the associated dispersion in &#8710;z/(1 + z) as a function of magnitude are shown in Figure <ref type="figure">6</ref>. We see that our performance over the weak lensing sample is relatively robust, with an overall &#8710;z/(1 + z) &#8776; -0.3% bias and with 68% of the PDFs contained within &#8710;z/(1 + z) = [-7.8%, 6.9%].</p><p>In addition to tests on the overall accuracy of our predictions, we also test the reliability of our individual PDFs. We opt to use the empirical cumulative distribution function (eCDF), which is constructed by evaluating the true redshift of each cross-validation object i &#8712; i at the value of the predicted photo-z CDF &#251;i &#8801;</p><p>z i 0 P (z|i)dz (16) and computing &#219; (x) = i I(&#251;i x)</p><p>where I(&#8226;) is the indicator function which returns 1 if the condition is true and 0 if it is false. In the case where our PDFs are properly calibrated and have the expected coverage provided by any associated confidence interval, each CDF draw &#251;i &#8764; Unif(0, 1) will be uniformly distributed from 0 to 1 and &#219; (x) will approximately define a straight line from 0 to 1. We show the eCDF results for our photo-z PDFs in Figure <ref type="figure">7</ref>. These confirm that our PDFs are relatively robust and internally well-calibrated.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.3">Application: Estimating Spectroscopic Incompleteness</head><p>We now turn our attention to the motivating issue behind the development of the photo-z framework outlined in &#167;4 by investigating the distribution of Fphot and Pphot (see &#167;4.5.2) within our HSC-SSP photo-z's.</p><p>We show the distribution of Fphot and Pphot as a function of magnitude in Figure <ref type="figure">8</ref>. The results are as expected: many-band photo-z's in our training sample make up an increasing large fraction of neighbors and contribute an increasing amount to the photo-z PDFs at fainter magnitudes. We will use Fphot and Pphot to investigate the robustness of weak lensing measurements as a function of spectroscopic incompleteness (see &#167;7). As expected, the photo-z's of objects at i &gt; 23 are trained almost entirely on the many-band photo-z's in our training sample, while those at brighter magnitudes i &lt; 23 tend to be trained on spec-z's and g/prism-z's. This transition happens more smoothly in Fphot than Pphot because the exponential nature of the likelihood tends to strongly favor a few photometric neighbors over others (see &#167;4.2). This behavior is most apparent at brighter magnitudes, where even though &#8764; 15% of neighbors come from many-band photo-z's, they tend to contribute very little to the overall photo-z prediction.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="6">LENSING METHODOLOGY</head><p>We now outline the methodology we will use to stack, compute, and compare our gg lensing signals based on the photoz PDFs illustrated in &#167;5. We describe our basic computation of &#8710;&#931; in &#167;6.1 and our treatment of the bias/dilution factors in &#167;6.2. We outline the approach used to compare gg lensing signals between two samples in &#167;6.3.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="6.1">Computing the Galaxy-Galaxy Lensing Signal</head><p>Our computation of the lensing observable, &#8710;&#931;, follows the methodology of <ref type="bibr">Singh et al. (2017)</ref>. We use the code dsigma<ref type="foot">foot_4</ref> , which was specifically written for computing gg lensing signals for HSC-SSP.</p><p>We compute &#8710;&#931; as a function of physical radius R as</p><p>where &#8710;&#931;L is the stacked signal around lens galaxies, &#8710;&#931;R is the stacked profile around a much larger number of random positions, and f bias is a correction factor (see &#167;6.2). The &#8710;&#931; profile for both lenses and randoms are computed as follows:</p><p>where R Ls indicates a sum over all lens-source pairs with separation R. When computing &#8710;&#931;R(R), we replace R Ls with R Rs since we instead sum over all random-source pairs. In Equation <ref type="formula">19</ref>, &#947;t is the tangential shear of a source galaxy:</p><p>where e1, e2, and &#966; are the two shear components and the angle from the direction of right ascension to the lenssource direction in sky coordinates measured by the HSC-SSP pipeline <ref type="bibr">(Mandelbaum 2018;</ref><ref type="bibr">Bosch et al. 2018)</ref>. &#931;crit is the critical surface mass density:</p><p>which is computed using the angular diameter distance between the source and observer DA(zs), lens and observer DA(z l ), and source and lens DA(zL, zs). Each source galaxy is weighted by:</p><p>where &#963;rms is the intrinsic shape dispersion per component and &#963;e,Ls is the per-component shape measurement error (see <ref type="bibr">Mandelbaum et al. 2018)</ref>. R(R) is the shear responsivity factor<ref type="foot">foot_5</ref> that describes the response of galaxy ellipticity to a small amount of shear:</p><p>We compute and apply R independently for each radial bin.</p><p>The [1+K(R)] term is a correction for the multiplicative shear bias m:</p><p>where ms is a per source value that is calibrated using simulations. Please see <ref type="bibr">Mandelbaum et al. (2018)</ref> for details about the calibration of HSC-SSP weak lensing catalog.</p><p>In this work, we use 10 5 random points to compute &#8710;&#931;R sampled following the HSC-SSP S16A survey geometry. Random points are assigned redshifts following the redshift distribution of lenses. Although <ref type="bibr">Singh et al. (2017)</ref> use the boost factor (&#931; R Ls wLs/&#931; R Rs wRs) to correct for dilution effects (see also <ref type="bibr">Mandelbaum et al. 2005)</ref>, we do not apply any boost factor corrections here. Instead, in Section 7.4 we test whether or not the signal varies as we impose more stringent lens-source separation cuts.</p><p>We assume physical coordinates and compute &#8710;&#931; in 10 logarithmically spaced bins from 0.05 Mpc to 15 Mpc. Errors on all &#8710;&#931;-related quantities are computed via bootstrap resampling. The dsigma code divides lenses and randoms to roughly equal-area regions. Here we use 40 regions with typical sizes of &#8764; 2.5 deg and compute errors with NBs = 5000 bootstraps.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="6.2">Corrections for Photometric Redshift Bias and Dilution Factors</head><p>Our procedure for estimating the bias on &#8710;&#931; arising from photo-z's partially follows that of <ref type="bibr">Mandelbaum et al. (2008)</ref>, <ref type="bibr">Nakajima et al. (2012), and</ref><ref type="bibr">Leauthaud et al. (2017)</ref>. We summarize our approach here.</p><p>To correct for biases in &#8710;&#931; arising from photo-z errors, a common procedure is to use set of galaxies with spectroscopic redshifts that have been re-weighted with appropriate color-magnitude weights (see &#167;4.4) to match the source distribution. However, as shown in Figures <ref type="figure">2</ref> and <ref type="figure">8</ref>, the currently available spec-z's in our training set are so underrepresented in some regions of color-magnitude space that it is impossible to properly re-weight them (spec-z's) to match the source sample.</p><p>For this reason, instead of a spectroscopic redshift catalog, we build a calibration catalog based on the set of manyband photo-z's in the COSMOS field matched with observations taken at HSC-SSP Wide depths with weak lensing cuts applied (see &#167;2.2). Our assumption here is that the COS-MOS many-band photo-z's have narrow enough PDFs that they can be used to compute biases on &#8710;&#931; for HSC. We refer to this catalog hereafter as the COSMOS calibration sample.</p><p>We compute the bias on &#8710;&#931; as follows. Let &#8710;&#931;P (&#931;crit,P) represent the value of &#8710;&#931; measured with photoz's and &#8710;&#931;T (&#931;crit,T) represent the true value of &#8710;&#931;. We define f bias &#8801; &#8710;&#931;T/&#8710;&#931;P and estimate it via:</p><p>Ls wLs (&#931;crit,T,Ls/&#931;crit,P,Ls) Ls wLs</p><p>where the sum is performed over source galaxies drawn from the COSMOS calibration sample. Unlike other versions of this equation (e.g. Equation A3 in <ref type="bibr">Leauthaud et al. 2017)</ref> there is no re-weighting factor to account for color mismatches between the source sample and the calibration sample because our COSMOS calibration sample is already representative.</p><p>For a given lens sample, we estimate f bias using Monte Carlo methods by randomly drawing sources from our COS-MOS calibration catalog and lens redshifts from the lens sample. We correct all &#8710;&#931; values reported hereafter using f bias . This accounts for the dilution effect by sources that scatter above zL but which are actually located at redshifts below zL).</p><p>More explicitly, there are three issues: the impact of photo-z scatter and bias for sources that are above the lens redshift, dilution due to sources that are below the lens redshift but get scattered above it due to photo-z error, and dilution due to physically-associated sources. Our approach corrects for the first two of these, but not the third.</p><p>For our signals, typical values for f bias are around a few percent (&#8764; 2 -5%).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="6.3">Comparing Lensing Signals</head><p>One of the primary concerns in this work is the robustness of the gg lensing signal with respect to possible photo-z biases. Since an absolute calibration does not exist, we instead aim to demonstrate the robustness of the signal to various cuts and choices for lens-source separation. We quantify this by considering the ratio of &#8710;&#931; for two different computations i and j: fi,j &#8801; &#8710;&#931;i/&#8710;&#931;j.</p><p>(26)</p><p>This ratio test assumes that when we change how we calculate the gg lensing signals (by, e.g., tweaking the source sample selection or the redshift estimator), the true &#8710;&#931;(R) should be the same (i.e. &#8710;&#931;i(R) = &#8710;&#931;j(R) for all R). This relies on the assumption that &#8710;&#931;(R) does not vary much across the sample within the lens redshift bins. In other words, we assume that changing the source sample in a way that emphasizes different redshifts within the lens sample does not meaningfully change &#8710;&#931;(R) given the same photoz quality across both source samples.</p><p>While it is straightforward to take the ratio between two lensing signals with different source cuts, &#8710;&#931;i and &#8710;&#931;j will be highly correlated. To deal with this effect, we derive the covariance matrix for f via bootstrap resampling using the same bootstrap regions as described previously.</p><p>We assume that fi,j is a constant (we only consider amplitude changes) and solve for the maximum-likelihood result (MLE). We fit for amplitude shifts over our full radial range (denoted f all ) and also over the radial range R = [0.1, 1] Mpc (denoted finner) and R = [1, 10] Mpc (denoted fouter).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="7">RESULTS: HOW ROBUST IS THE GALAXY-GALAXY LENSING SIGNAL?</head><p>We now investigate how robust gg lensing signals are to various photo-z estimators and quality cuts. After exploring changes in the gg lensing sensitivity to a variety of choices ( &#167;7.2- &#167;7.6), we subsequently use the results of those choices to define a "fiducial" sample (see &#167;7.7). Our results are presented in terms of the stability of the gg lensing signal based on other possible choices with respect to our fiducial sample.</p><p>The various cuts that we test comprise 15 unique lenssource samples. These are described in detail below and summarized in Table <ref type="table">1</ref>.</p><p>Since we perform a large number of tests (3 &#215; 3 &#215; 14 = 126) in the subsequent sections, many often correlated, we might expect by pure statistical chance that some of the results reported might have deviated from the expected null result by a large amount. We attempt to control against Table <ref type="table">1</ref>. Redshift estimates and selection criteria used to construct robustness tests for galaxy-galaxy lensing. V indicates that the quantity is varied for a particular test, while F indicates that it is kept fixed. These comprise a total of 15 unique lens-source samples.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Test</head><p>&#167;7.2 &#167;7.3 &#167;7.4 &#167;7.5 &#167;7.6</p><p>these in two ways. First, since for each lens sample we compute f all , finner, and fouter under 14 configurations, we expect at most one test to display outliers at &gt; 3&#963; significance. We thus adopt a 3&#963; threshold as reasonably indicative of a significant deviation. In addition, we also compare the distribution of our error-normalized f /&#963; f values to those expected under a Gaussian distribution using a Kolmogorov-Smirnov (KS) test. While the sample size is small, we find that for all cases our results are inconsistent with a Gaussian distribution, with the results driven primarily by outliers.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="7.1">Lens Sample</head><p>We use all galaxies with spectroscopic redshifts (both the LOW-Z and CMASS samples) from the Sloan Digital Sky Survey (SDSS) II and III Baryon Acoustic Oscillation Survey (BOSS) <ref type="bibr">(Abazajian et al. 2009;</ref><ref type="bibr">Eisenstein et al. 2005</ref>) that overlap with the HSC-SSP survey footprint. We apply the same geometric masks to the lens sample that were used when constructing the source sample (see &#167;2.2).</p><p>To explore the stability of the lensing signal as a function of redshift, we group the lens population into three separate redshift bins:</p><p>They contain &#8776; 4000, 12000, and 4000 lenses, respectively. All the tests described below and summarized in Table <ref type="table">1</ref> were computed for each redshift bin, leading to a total of 45 &#8710;&#931;(R) measurements. The gg lensing signal for our fiducial sample (see &#167;7.7) in each redshift bin is shown in Figure <ref type="figure">9</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="7.2">Photometric Redshift Point Estimates</head><p>In lensing analyses, &#8710;&#931; is often computed with respect to fixed point estimates derived from the photo-z PDFs to avoid having to integrate over all photo-z PDFs P (z|g)'s.</p><p>We study whether or not the particular choice of a point source estimate impacts the gg lensing signal. We compare five point estimates in this paper:</p><p>&#8226; zmean, the first moment (mean) of the photo-z PDF,</p><p>&#8226; z med , the 50th percentile (median) of the photo-z PDF,</p><p>&#8226; z mode , the redshift corresponding to the maximum value of the photo-z PDF,</p><p>&#8226; z best , the redshift estimator that minimizes the loss assuming a Lorentzian kernel in &#8710;z/(1 + z) with a width of &#963; = 0.15 (see <ref type="bibr">Tanaka et al. 2018</ref> for additional details), and</p><p>&#8226; zmc, a Monte Carlo draw from the photo-z PDF.</p><p>A comparison with integrating over the PDF is beyond the scope of this work and is discussed further in <ref type="bibr">More et al. (in prep.)</ref>. Most of these point estimates have been used to varying degrees in weak lensing analyses in the literature, each with various benefits and drawbacks. Here we will briefly outline the arguments for each estimator (see also <ref type="bibr">Tanaka et al. 2018)</ref>.</p><p>While beyond the scope of this paper, it is well known that the mean estimate zmean is the optimal point estimate for a PDF assuming "squared error" (L2) loss. In other words, if we introduce a penalty proportional to (zest-ztrue) 2 and assume ztrue follows our PDF, then zest = zmean is the estimator that is "best" given the PDF. This particular result is optimal for Gaussian distributions.</p><p>In general, however, most photo-z PDFs are not Gaussian, but instead can have asymmetric tails and/or extended shapes. The mean zmean is particularly sensitive to these tails, and so estimates that are more "robust" are sometimes preferred. As with the mean, it can likewise be shown that the median z med is the optimal point estimate under "absolute" (L1) loss where the penalty is proportional to |zest -ztrue|. This reduced penalty makes z med less sensitive to the tails. The mode z mode can likewise be shown to be the optimal point estimate under "unforgiving" loss (L0) where the penalty is maximized and constant for all zest = ztrue. This penalty makes z mode only sensitive to the peak of the PDF where the probability is maximized.</p><p>While these various estimators are optimal under different assumptions for how much we want to penalize "incorrect" guesses, none of them are specifically tuned for photo-z estimation. In particular, most PDFs and photo-z applications tend to have a dependence on |zest -ztrue|/(1 + ztrue) rather than just |zest -ztrue|, and also care about being accurate relative to a given "tolerance" &#963;. z best is the point estimate that minimizes the loss relative to these conditions.</p><p>Finally, we may want a point estimate that "explores" the entire PDF, rather than attempting to "summarize" it. Assuming "uniform" loss (i.e. a flat penalty everywhere), any Monte Carlo sample zmc from the PDF serves as a reasonable point estimate. These may better capture the behavior of PDFs by allowing us to probe, e.g., the tails of the distribution but lead to some (additional) amount of random noise being introduced.</p><p>The &#8710;&#931; ratio estimates computed based on each of these various redshift point estimates with respect to our fiducial sample in each redshift bin are shown in the top two rows of <ref type="bibr">Figures 10,</ref><ref type="bibr">11,</ref><ref type="bibr">and 12</ref>. We find that, with the exception of zmc, all of these choices result in negligible ( 1%), albeit sometimes statistically significant (at 3&#963;), differences in the computed &#8710;&#931;. This is likely due to the general quality of our PDFs, which are reasonably well-constrained and unimodal for the majority of objects (see Figure <ref type="figure">6</ref>) and also well-calibrated against the expected underlying redshift distribution (Figure <ref type="figure">7</ref>), leading to very similar point estimates.</p><p>In general, using Monte Carlo redshifts zmc tends to lead to an underestimate of the &#8710;&#931; signal by an increasing amount as a function of the lens redshift. This is due to the (exponentially) increasing sensitivity of the &#8710;&#931; signal at close lens-source separations as well as increasing photo-z uncertainties at higher redshifts. Since Monte Carlo redshifts scatter sources around based on their PDFs, these tend to dilute the computed signals relative to the more "stable" point estimates above.</p><p>Since there is no (relevant) statistical difference between the photo-z point estimates excluding zmc, we decide to use the z best estimate due to its superior performance relative to the other estimators when predicting redshifts for individual objects within the full HSC-SSP S16A Wide sample. For additional comparisons between the per-object accuracy of these photo-z point estimates, see <ref type="bibr">Tanaka et al. (2018)</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="7.3">Photometric Redshift Uncertainties</head><p>Although our photo-z PDFs are well-calibrated with respect to the underlying redshift distribution ( &#167;5.2), in general using point estimates to summarize broad PDFs can lead to distortions in the underlying redshift population <ref type="bibr">(Carrasco Kind &amp; Brunner 2014b)</ref>. In addition, broad photo-z PDFs where the probability density spans a large redshift range are generally seen as more unreliable, with more potential for miscalibrations that can lead to under/overestimated uncertainties on the prediction compared to narrower PDFs. As a result, it is common in many gg lensing analyses to remove "unreliable" photo-z's based on the width of their PDFs.</p><p>We define two main sources of uncertainty that con-tribute to unreliable PDFs. The first is systemic uncertainty: having a poor understanding of the object in question and therefore an unreliable redshift prediction. This can occur if the object is not well-represented within the training set, which leads to them having large &#967; 2 values when comparing to their closest color-magnitude neighbors. We can exploit this fact to flag and remove these sources explicitly.</p><p>The second source of uncertainty is statistical uncertainty: utilizing a point estimate that does not accurately represent the PDF. This can occur if the redshift PDF is overly broad or multi-modal with several possible redshift solutions. We quantify this source of uncertainty by defining the "risk" <ref type="bibr">(Tanaka et al. 2018</ref>) that the point estimate is incorrect is the integral over the PDF with respect to the associated loss</p><p>where the particular loss function</p><p>is taken to be a Lorentzian kernel with a width of &#947; = 0.15. The best redshift estimate and the associated risk z risk are then defined jointly as</p><p>Sources with higher z risk generally have broader PDFs with multiple peaks. See <ref type="bibr">Tanaka et al. (2018)</ref> for additional discussion.</p><p>We divide our sample into a number of sub-samples based on a range of photo-z quality cuts. These are:</p><p>&#8226; basic: &#967; 2 5 6. This is the &#967; 2 5 value corresponding to the 95% cut discussed in &#167;4.5 since P (&#967; 2 5 6) &#8776; 0.95 for a chi-square distributed random variable with 5 degrees of freedom. As expected, this removes &#8764; 5% of sources. The majority of these sources are at brighter magnitudes and lower redshifts and do not not contribute significantly to the gg lensing signal.</p><p>&#8226; medium: In addition to basic, this selection also imposes a cut on the "risk" of a particular photo-z point estimate z risk &lt; 0.25. This generally removes overly broad PDFs and leaves &#8764; 75% of the sample.</p><p>&#8226; strict: In addition to basic, this selection imposes a stricter cut of z risk &lt; 0.15, restricting our estimates to even fouter is displayed in the bottom-right corner along with a Gaussian distribution for reference. Most of the computed ratios are consistent with the expected null result at 3&#963;; those that disagree are highlighted in bold. Although these disagreements are statistically significant, some (e.g., with respect to z med ) are negligible in practice since their impact is 1%. In general, fouter is more unbiased for all samples than f and f inner . See &#167;7 for additional discussion and details. narrower PDFs than medium. This leaves &#8764; 60% of the sample.</p><p>The &#8710;&#931; ratio estimates computed for each of these global photo-z quality cuts with respect to our fiducial sample are shown in the second row of Figures 10, 11, and 12. We find that the computed &#8710;&#931; signals appear insensitive to the global photo-z quality cut chosen, and are statistically consistent with the null result (at 3&#963;). As with &#167;7.2, this is likely due to the fact that our PDFs are both relatively well-constrained and well-calibrated for the majority of our sample, especially since any outlying PDFs are removed by our initial basic quality cuts.</p><p>Since our performance is similar across different global photo-z quality cuts, we opt to use our medium cut for our fiducial sample to compromise between sample size and PDF quality. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="7.4">Lens-Source Separation</head><p>In addition to possible biases based on how the photo-z point estimates trace the underlying source population, it is also imperative to ensure that any possible differences we observe are not dominated by dilution effects or contamination from correlated objects (see &#167;6.2). This is often done by imposing cuts that aim to ensure that the bulk of any source galaxy PDF lies behind the lens population (e.g., <ref type="bibr">Medezinski et al. 2018</ref>; see also &#167;7.2).</p><p>We parameterize this cut using two parameters. The first is a summary statistic detailing the redshift below which X% (z lowX ) of the source galaxy PDF lies, which we use to establish how confidently we can place a source galaxy behind a given lens. In other words, X% of the photo-z PDF is above a redshift of z lowX , which should be greater than the redshift of the lens z lens . The second is a "buffer" &#8710;z lens to establish a minimum separation threshold between the source z lowX and the lens z lens . This term is used to avoid being extremely sensitive to photo-z biases and possible miscalibrations in the corresponding PDFs since the dependence of &#931;crit is highly non-linear when zsource and z lens are very close together.</p><p>We test four lens-source separation cuts:</p><p>for X = 68% and 95% (roughly 1 and 2-sigma) and &#8710;z lens = 0.1 and 0.2, listed roughly in order from most aggressive to most conservative.</p><p>Our results are shown in the third and fourth rows of Figures 10, 11, and 12. We find that all cases are consistent (at 3&#963;) with the null result with the exception of the z low95 + 0.2 &gt; z lens cut for the zbin24 lens sample, which is smaller by &#8776; 1.5%. As the most conservative cut in the lowest redshift bin (which should be least-sensitive to photo-z issues), it is somewhat surprising that we see a noticeable suppression. While it is possible that this is just statistical noise, it might also be the case that the photo-z's in the training set have systematic discrepancies at intermediate redshifts that are accentuated when only the low-z sources are removed.</p><p>In general, however, these results support the gg lensing signal being mostly insensitive to these specific combination of lens-source separation cuts for our HSC-SSP S16A data. This implies that the majority of our sources are correctly selected to be behind the bulk of the lens sample, providing additional (indirect) support that our photo-z PDFs are well-calibrated. These results also suggest that our gg lensing signals do not require any boost factor corrections.</p><p>Given that these lens-source separation cuts perform comparably, we again opt to use a compromise for our fiducial sample by choosing X = 95% and &#8710;z lens = 0.1.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="7.5">High and Low Redshift Sources</head><p>One additional concern is our gg lensing analysis may be sensitive to degrading photo-z quality as a function of redshift. This is in general due to a combination of spectroscopic incompleteness at higher redshifts and fainter magnitudes as well as broader PDFs arising from noisier photometry (see &#167;3). This is a particularly acute concern for this work due to our reliance on many-band photo-z's at the magnitudes and redshifts probed by a significant majority of our weak lensing source galaxies.</p><p>To investigate this effect, we divide our source sample into high-redshift and low-redshift samples to investigate this effect defined by:</p><p>&#8226; zlow: z best 1, which leaves &#8764; 50% of the sample.</p><p>&#8226; zhigh: z best &gt; 1, which leaves &#8764; 50% of the sample.</p><p>The results for our high and low-redshift samples are shown in the fourth row of Figures 10, 11, and 12. Although we find the &#8710;&#931; signals from zlow to be systematically higher than those from zhigh, the effect is not statistically significant and both agree with the null result (at 3&#963;). This provides us with confidence that we can utilize photo-z's for source galaxies at all redshifts when constructing our fiducial sample.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="7.6">Origin of Training Redshifts</head><p>One benefit of the KMCkNN framework outlined in &#167;4 is that we actually have a direct proxy of spectroscopic incompleteness through metrics such as Fphot and Pphot, in addition to indirect proxies such as the high/low redshift split used in &#167;7.5. This allows us to examine how robust our gg lensing signals are depending on the information content used to estimate the photo-z's of individual source galaxies.</p><p>One complication of using Pphot to select source galaxies directly is that it tends to be strongly correlated with magnitude and redshift, with sources with lower redshifts and brighter magnitudes tending to also have lower Pphot. We attempt to alleviate this issue by limiting our analysis to the zhigh subset of galaxies (z best &gt; 1). While this substantially reduces the sample (by 50%), it mitigates some of the extreme differences that can arise due to these effects.</p><p>As a compromise between preserving number density and maximizing differences between sources that are manyband photo-z-dominated versus those that are not, we ultimately split our source galaxies into three subsamples based on spec-z and g/prism-z information content:</p><p>&#8226; "Great": Pphot &lt; 0.5 (i.e. &gt; 50% of information comes from spec-z's and g/prism-z's), which leaves &#8764; 10% of the sample.</p><p>&#8226; "Moderate": 0.5 Pphot &lt; 0.85 (moderately photo-zdominated), which leaves &#8764; 15% of the sample.</p><p>&#8226; "Poorest": Pphot 0.85 (completely photo-zdominated), which leaves &#8764; 25% of the sample.</p><p>The results for these subsamples are shown in the bottom two rows of <ref type="bibr">Figures 10,</ref><ref type="bibr">11,</ref><ref type="bibr">and 12</ref>. We find these signals are entirely consistent with the null result (at 3&#963;). This demonstrates that our gg lensing signals are stable to the origin of the training redshifts (e.g. spec-z, photo-z, etc.) used to compute redshifts for source galaxies. We note, however, that the spec-z samples used to train our photo-z's tend to have targeted very specific populations of galaxies even at higher redshift compared with the broader photometric sample (see &#167;3). This can lead to low Pphot serving as a proxy for selecting galaxy samples in particular regions of colormagnitude space (and thus having different intrinsic properties). These changes in the underlying galaxy population could mask some of the expected impacts from many-band photo-z's alone and make it difficult to extrapolate conclusions beyond this current work.</p><p>In general, as we find that the computed &#8710;&#931; signals are consistent with null results across all lens redshift and Pphot subsamples, we opt to include all sources when constructing our fiducial catalog.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="7.7">Fiducial Lensing Cuts</head><p>Based on the results above, we now define the fiducial sample that all other samples are compared to in Figures 10, 11, and 12. Our reasoning is as follows:</p><p>&#8226; All point estimates (excluding zmc) investigated in this work give gg lensing signals with similar amplitudes. We thus opt to use the z best point estimates ( &#167;7.2) given their improved performance across the broader photometric sample as outlined in <ref type="bibr">Tanaka et al. (2018)</ref>.</p><p>&#8226; Since the three basic quality cuts give similar &#8710;&#931; estimates, we select the medium photo-z quality cuts ( &#167;7.3) as a fiducial choice. This represents a compromise between retaining a larger sample size and removing overly broad photo-z PDFs.</p><p>&#8226; All four combinations of lens-source separation cuts give gg lensing signals that are consistent with each other. As with the global photo-z cuts, we decide to then compromise by selecting z low95 + 0.1 &gt; z lens ( &#167;7.4), which guarantees the vast majority of the PDF is located behind the lens while being slightly less conservative about the enforced &#8710;z lens separation.</p><p>&#8226; Our tests over high (z best 1) and low (z best &lt; 1) subsamples of source galaxies do not show any sign of distortion by photo-z biases arising from changing populations of objects in our training data ( &#167;7.5). To maximize sample size, we thus opt to use galaxies at all available redshifts.</p><p>&#8226; Finally, our tests using subsamples binned by Pphot at z best &gt; 1 also do not find evidence for differences in &#8710;&#931; among the varying subsamples ( &#167;7.6). As a result, we opt to include all photo-z's regardless of their spectroscopic information content.</p><p>These cuts are implemented as defaults in dsigma.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="8">CONCLUSION</head><p>Determining accurate photometric redshifts (photo-z's) remains a key challenge for deep lensing surveys such as HSC-SSP and LSST. At the depths probed by HSC-SSP, there remains a dearth of spectroscopic redshifts available for training, validating, and testing photo-z methods across the colors and magnitudes covered by weak lensing photometric samples. To reach the required coverage to compute photoz's to these objects, the HSC-SSP photo-z team constructed a heterogeneous training set derived from an amalgamation of public spec-z and g/prism-z surveys along with photo-z's derived from deep, many band COSMOS data.</p><p>Since mixing spec-z's and high-quality alternatives (g/prism-z's, photo-z's) will likely occur in future surveys, in this paper we sought to thoroughly investigate their impact on gg lensing analyses through a variety of methods. Our conclusions are as follows:</p><p>(i) Using Self-Organizing Maps (SOMs), we examine the color/magnitude-space coverage of our HSC-SSP training data relative to the HSC-SSP S16A weak lensing photometric sample ( &#167;3). We find that, as expected, our spec-z coverage is highly non-representative relative to the overall sample, with the majority of our redshift information for "typical" galaxies in the weak lensing photometric sample coming from manyband photo-z's.</p><p>(ii) We then investigated whether current spectroscopic survey strategies, which seek to systematically fill in underpopulated regions of color space, can resolve this problem ( &#167;3.2). We find that the assumption that the intrinsic redshift distribution at fixed color is constant as a function of magnitude does not always hold. This mismatch implies that certain regions of color space will likely require spec-z's that also probe the magnitude distribution of future weak lensing samples, complicating current efforts. This effect is in addi-tion to redshift-dependent success rates at fixed color and magnitude.</p><p>(iii) Based on these results, in &#167;4 we develop a hybrid machine learning/Bayesian framework for tracking how subsets of galaxies in our training set contribute to individual photoz predictions explicitly as a function of magnitude. We show that in &#167;5 our approach gives reasonable photo-z predictions and well-calibrated PDFs.</p><p>(iv) Using our fits, we are able to define metrics to reject objects that are poorly represented by the training data and further identify how "reliable" results are based on how significantly many-band photo-z's contribute to the derived PDFs ( &#167;5.2). The results imply that photo-z's computed for objects with i 23 tend to be spec-z-dominated, while those at i 23 tend to be photo-z-dominated.</p><p>(v) Finally, using the full sample of LOWZ and CMASS BOSS galaxies, we investigate the impact various photo-z estimators, quality cuts, lens-source separation constraints, redshift subsamples, and spectroscopic information content can have on gg lensing signals. We find that most cases give results that are consistent with a fiducial baseline sample, indicating that biases in the gg lensing signal due to photo-z bias and scatter are sub-dominant to statistical uncertainties in the HSC-SSP S16A weak lensing data.</p><p>Although we do not find cause for concern in the analysis presented here, we hope that these methods can be used in future work to investigate similar issues when dealing with larger, deeper, and more complex samples leading up to future precision cosmology-oriented surveys such as LSST.</p></div><note xmlns="http://www.tei-c.org/ns/1.0" place="foot" xml:id="foot_0"><p>MNRAS 000, 1-21 (2018)</p></note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="3" xml:id="foot_1"><p>The shift in the mean can be due to asymmetric scattering caused by secondary redshift solutions and changing number densities in color space.</p></note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="4" xml:id="foot_2"><p>This does not address the secondary issue of redshift-dependent failure rates at a fixed magnitude and color, which will also bias the resulting sample.</p></note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="5" xml:id="foot_3"><p>Available online at github.com/joshspeagle/frankenz.MNRAS 000, 1-21 (2018)</p></note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="6" xml:id="foot_4"><p>Available online at https://github.com/dr-guangtou/dsigma.</p></note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="7" xml:id="foot_5"><p>For &#8710;&#931;, we have verified that the weighting applied to R should include the &#931; -2 crit factor as defined in Equation22.MNRAS 000, 1-21 (2018)</p></note>
		</body>
		</text>
</TEI>
