skip to main content


Title: The probabilistic random forest applied to the QUBRICS survey: improving the selection of high-redshift quasars with synthetic data
ABSTRACT

Several recent works have focused on the search for bright, high-z quasars (QSOs) in the South. Among them, the QUasars as BRIght beacons for Cosmology in the Southern hemisphere (QUBRICS) survey has now delivered hundreds of new spectroscopically confirmed QSOs selected by means of machine learning algorithms. Building upon the results obtained by introducing the probabilistic random forest (PRF) for the QUBRICS selection, we explore in this work the feasibility of training the algorithm on synthetic data to improve the completeness in the higher redshift bins. We also compare the performances of the algorithm if colours are used as primary features instead of magnitudes. We generate synthetic data based on a composite QSO spectral energy distribution. We first train the PRF to identify QSOs among stars and galaxies, then separate high-z quasar from low-z contaminants. We apply the algorithm on an updated data set, based on SkyMapper DR3, combined with Gaia eDR3, 2MASS, and WISE magnitudes. We find that employing colours as features slightly improves the results with respect to the algorithm trained on magnitude data. Adding synthetic data to the training set provides significantly better results with respect to the PRF trained only on spectroscopically confirmed QSOs. We estimate, on a testing data set, a completeness of $\sim 86{{\ \rm per\ cent}}$ and a contamination of $\sim 36{{\ \rm per\ cent}}$. Finally, 206 PRF-selected candidates were observed: 149/206 turned out to be genuine QSOs with z > 2.5, 41 with z < 2.5, 3 galaxies and 13 stars. The result confirms the ability of the PRF to select high-z quasars in large data sets.

 
more » « less
NSF-PAR ID:
10376052
Author(s) / Creator(s):
; ; ; ; ; ; ; ; ; ; ; ;
Publisher / Repository:
Oxford University Press
Date Published:
Journal Name:
Monthly Notices of the Royal Astronomical Society
Volume:
517
Issue:
2
ISSN:
0035-8711
Page Range / eLocation ID:
p. 2436-2453
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. ABSTRACT We present two catalogues of active galactic nucleus (AGN) candidates selected from the latest data of two all-sky surveys – Data Release 2 of the Gaia mission and the unWISE catalogue of the Wide-field Infrared Survey Explorer (WISE). We train a random forest classifier to predict the probability of each source in the Gaia–unWISE joint sample being an AGN, PRF, based on Gaia astrometric and photometric measurements and unWISE photometry. The two catalogues, which we designate C75 and R85, are constructed by applying different PRF threshold cuts to achieve an overall completeness of 75 per cent (≈90 per cent at GaiaG ≤ 20 mag) and reliability of 85 per cent, respectively. The C75 (R85) catalogue contains 2734 464 (2182 193) AGN candidates across the effective 36 000 deg2 sky, of which ≈0.91 (0.52) million are new discoveries. Photometric redshifts of the AGN candidates are derived by a random forest regressor using Gaia and WISE magnitudes and colours. The estimated overall photometric redshift accuracy is 0.11. Cross-matching the AGN candidates with a sample of known bright cluster galaxies, we identify a high-probability strongly lensed AGN candidate system, SDSS J1326+4806, with a large image separation of 21${^{\prime\prime}_{.}}$06. All the AGN candidates in our catalogues will have ∼5-yr long light curves from Gaia by the end of the mission, and thus will be a great resource for AGN variability studies. Our AGN catalogues will also be helpful in AGN target selections for future spectroscopic surveys, especially those in the Southern hemisphere. The C75 catalogue can be downloaded at https://www.ast.cam.ac.uk/~ypshu/AGN_Catalogues.html. 
    more » « less
  2. ABSTRACT We introduce a probabilistic approach to select 6 ≤ $z$ ≤ 8 quasar candidates for spectroscopic follow-up, which is based on density estimation in the high-dimensional space inhabited by the optical and near-infrared photometry. Densities are modelled as Gaussian mixtures with principled accounting of errors using the extreme deconvolution (XD) technique, generalizing an approach successfully used to select lower redshift ($z$ ≤ 3) quasars. We train the probability density of contaminants on 1902 071 7-d flux measurements from the 1076 deg2 overlapping area from the Dark Energy Camera Legacy Survey (DECaLS) ($z$), VIKING (YJHKs), and unWISE (W1W2) imaging surveys, after requiring they dropout of DECaLS g and r, whereas the distribution of high-$z$ quasars are trained on synthetic model photometry. Extensive simulations based on these density distributions and current estimates of the quasar luminosity function indicate that this method achieves a completeness of $\ge 56{{\ \rm per\ cent}}$ and an efficiency of $\ge 5{{\ \rm per\ cent}}$ for selecting quasars at 6 < $z$ < 8 with JAB < 21.5. Among the classified sources are 8 known 6 < $z$ < 7 quasars, of which 2/8 are selected suggesting a completeness $\simeq 25{{\ \rm per\ cent}}$, whereas classifying the 6 known (JAB < 21.5) quasars at $z$ > 7 from the entire sky, we select 5/6 or a completeness of $\simeq 80{{\ \rm per\ cent}}$. The failure to select the majority of 6 < $z$ < 7 quasars arises because our quasar density model is based on an empirical quasar spectral energy distribution model that underestimates the scatter in the distribution of fluxes. This new approach to quasar selection paves the way for efficient spectroscopic follow-up of Euclid quasar candidates with ground-based telescopes and James Webb Space Telescope. 
    more » « less
  3. ABSTRACT We present improved results of the measurement of the correlation between galaxies and the intergalactic medium transmission at the end of reionization. We have gathered a sample of 13 spectroscopically confirmed Lyman-break galaxies (LBGs) and 21 Lyman-α emitters (LAEs) at angular separations 20 arcsec ≲ θ ≲ 10 arcmin (∼0.1–4 pMpc at z ∼ 6) from the sightlines to eight background z ≳ 6 quasars. We report for the first time the detection of an excess of Lyman-α transmission spikes at ∼10–60 cMpc from LAEs (3.2σ) and LBGs (1.9σ). We interpret the data with an improved model of the galaxy–Lyman-α transmission and two-point cross-correlations, which includes the enhanced photoionization due to clustered faint sources, enhanced gas densities around the central bright objects and spatial variations of the mean free path. The observed LAE(LBG)–Lyman-α transmission spike two-point cross-correlation function (2PCCF) constrains the luminosity-averaged escape fraction of all galaxies contributing to reionization to $\langle f_{\rm esc} \rangle _{M_{\rm UV}\lt -12} = 0.14_{-0.05}^{+0.28}\, (0.23_{-0.12}^{+0.46})$. We investigate if the 2PCCF measurement can determine whether bright or faint galaxies are the dominant contributors to reionization. Our results show that a contribution from faint galaxies ($M_{\rm UV} \gt -20 \, (2\sigma)$) is necessary to reproduce the observed 2PCCF and that reionization might be driven by different sub-populations around LBGs and LAEs at z ∼ 6. 
    more » « less
  4. null (Ed.)
    ABSTRACT Planck data provide precise constraints on cosmological parameters when assuming the base ΛCDM model, including a 0.17 per cent measurement of the age of the Universe, $t_0=13.797 \pm 0.023\, {\rm Gyr}$. However, the persistence of the ‘Hubble tension’ calls the base ΛCDM model’s completeness into question and has spurred interest in models such as early dark energy (EDE) that modify the assumed expansion history of the Universe. We investigate the effect of EDE on the redshift–time relation z↔t and find that it differs from the base ΛCDM model by at least ${\approx } 4{{\ \rm per\ cent}}$ at all t and z. As long as EDE remains observationally viable, any inferred t ← z or z ← t quoted to a higher level of precision do not reflect the current status of our understanding of cosmology. This uncertainty has important astrophysical implications: the reionization epoch – 10 > z > 6 – corresponds to disjoint lookback time periods in the base ΛCDM and EDE models, and the EDE value of t0 = 13.25 ± 0.17 Gyr is in tension with published ages of some stars, star clusters, and ultrafaint dwarf galaxies. However, most published stellar ages do not include an uncertainty in accuracy (due to, e.g. uncertain distances and stellar physics) that is estimated to be $\sim 7\!-\!10{{\ \rm per\ cent}}$, potentially reconciling stellar ages with $t_{0,\rm EDE}$. We discuss how the big data era for stars is providing extremely precise ages ($\lt 1{{\ \rm per\ cent}}$) and how improved distances and treatment of stellar physics such as convection could result in ages accurate to $4\!-\!5{{\ \rm per\ cent}}$, comparable to the current accuracy of t↔z. Such precise and accurate stellar ages can provide detailed insight into the high-redshift Universe independent of a cosmological model. 
    more » « less
  5. null (Ed.)
    ABSTRACT This paper presents a survey of Mg ii absorbing gas in the vicinity of 380 random galaxies, using 156 background quasi-stellar objects (QSOs) as absorption-line probes. The sample comprises 211 isolated (73 quiescent and 138 star-forming galaxies) and 43 non-isolated galaxies with sensitive constraints for both Mg ii absorption and H α emission. The projected distances span a range from d = 9 to 497 kpc, redshifts of the galaxies range from z = 0.10 to 0.48, and rest-frame absolute B-band magnitudes range from MB = −16.7 to −22.8. Our analysis shows that the rest-frame equivalent width of Mg ii, Wr(2796), depends on halo radius (Rh), B-band luminosity(LB), and stellar mass (Mstar) of the host galaxies, and declines steeply with increasing d for isolated, star-forming galaxies. At the same time, Wr(2796) exhibits no clear trend for either isolated, quiescent galaxies or non-isolated galaxies. In addition, the covering fraction of Mg ii absorbing gas 〈κ〉 is high with 〈κ〉 ≳ 60 per cent at <40 kpc for isolated galaxies and declines rapidly to 〈κ〉 ≈ 0 at d ≳ 100 kpc. Within the gaseous radius, the incidence of Mg ii gas depends sensitively on both Mstar and the specific star formation rate inferred from H α. Different from what is known for massive quiescent haloes, the observed velocity dispersion of Mg ii absorbing gas around star-forming galaxies is consistent with expectations from virial motion, which constrains individual clump mass to $m_{\rm cl} \gtrsim 10^5 \, \rm M_\odot$ and cool gas accretion rate of $\sim 0.7\!-\!2 \, \mathrm{ M}_\odot \, \rm yr^{-1}$. Finally, we find no strong azimuthal dependence of Mg ii absorption for either star-forming or quiescent galaxies. Our results demonstrate that multiple parameters affect the properties of gaseous haloes around galaxies and highlight the need of a homogeneous, absorption-blind sample for establishing a holistic description of chemically enriched gas in the circumgalactic space. 
    more » « less