skip to main content


Title: The scatter in the galaxy–halo connection: a machine learning analysis
ABSTRACT

We apply machine learning (ML), a powerful method for uncovering complex correlations in high-dimensional data, to the galaxy–halo connection of cosmological hydrodynamical simulations. The mapping between galaxy and halo variables is stochastic in the absence of perfect information, but conventional ML models are deterministic and hence cannot capture its intrinsic scatter. To overcome this limitation, we design an ensemble of neural networks with a Gaussian loss function that predict probability distributions, allowing us to model statistical uncertainties in the galaxy–halo connection as well as its best-fitting trends. We extract a number of galaxy and halo variables from the Horizon-AGN and IllustrisTNG100-1 simulations and quantify the extent to which knowledge of some subset of one enables prediction of the other. This allows us to identify the key features of the galaxy–halo connection and investigate the origin of its scatter in various projections. We find that while halo properties beyond mass account for up to 50 per cent of the scatter in the halo-to-stellar mass relation, the prediction of stellar half-mass radius or total gas mass is not substantially improved by adding further halo properties. We also use these results to investigate semi-analytic models for galaxy size in the two simulations, finding that assumptions relating galaxy size to halo size or spin are not successful.

 
more » « less
NSF-PAR ID:
10368215
Author(s) / Creator(s):
; ; ;
Publisher / Repository:
Oxford University Press
Date Published:
Journal Name:
Monthly Notices of the Royal Astronomical Society
Volume:
514
Issue:
3
ISSN:
0035-8711
Page Range / eLocation ID:
p. 4026-4045
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. ABSTRACT

    Galaxy sizes correlate closely with the sizes of their parent dark matter haloes, suggesting a link between halo formation and galaxy growth. However, the precise nature of this relation and its scatter remains to be understood fully, especially for low-mass galaxies. We analyse the galaxy–halo size relation (GHSR) for low-mass ($M_\star \sim 10^{7-9}\, {\rm M}_\odot$) central galaxies over the past 12.5 billion years with the help of cosmological volume simulations (FIREbox) from the Feedback in Realistic Environments (FIRE) project. We find a nearly linear relationship between the half-stellar mass galaxy size R1/2 and the parent dark matter halo virial radius Rvir. This relation evolves only weakly since redshift z = 5: $R_{1/2}\, [{\rm kpc}] = (0.053\pm 0.002)(R_{\rm vir}/35\, {\rm kpc})^{0.934\pm 0.054}$, with a nearly constant scatter $\langle \sigma \rangle = 0.084\, [{\rm dex}]$. While this ratio is similar to what is expected from models where galaxy disc sizes are set by halo angular momentum, the low-mass galaxies in our sample are not angular momentum supported, with stellar rotational to circular velocity ratios vrot/vcirc ∼ 0.15. Introducing redshift as another parameter to the GHSR does not decrease the scatter. Furthermore, this scatter does not correlate with any of the halo properties we investigate – including spin and concentration – suggesting that baryonic processes and feedback physics are instead critical in setting the scatter in the GHSR. Given the relatively small scatter and the weak dependence of the GHSR on redshift and halo properties for these low-mass central galaxies, we propose using galaxy sizes as an independent method from stellar masses to infer halo masses.

     
    more » « less
  2. Abstract

    We investigate the group-scale environment of 15 luminous quasars (luminosityL3000> 1046erg s−1) from the Cosmic Ultraviolet Baryon Survey (CUBS) at redshiftz≈ 1. Using the Multi Unit Spectroscopic Explorer integral field spectrograph on the Very Large Telescope, we conduct a deep galaxy redshift survey in the CUBS quasar fields to identify group members and measure the physical properties of individual galaxies and galaxy groups. We find that the CUBS quasars reside in diverse environments. The majority (11 out of 15) of the CUBS quasars reside in overdense environments with typical halo masses exceeding 1013M, while the remaining quasars reside in moderate-size galaxy groups. No correlation is observed between overdensity and redshift, black hole (BH) mass, or luminosity. Radio-loud quasars (5 out of 15 CUBS quasars) are more likely to be in overdense environments than their radio-quiet counterparts in the sample, consistent with the mean trends from previous statistical observations and clustering analyses. Nonetheless, we also observe radio-loud quasars in moderate groups and radio-quiet quasars in overdense environments, indicating a large scatter in the connection between radio properties and environment. We find that the most UV luminous quasars might be outliers in the stellar mass-to-halo mass relations or may represent departures from the standard single-epoch BH relations.

     
    more » « less
  3. Abstract

    We present our determination of the baryon budget for an X-ray-selected XXL sample of 136 galaxy groups and clusters spanning nearly two orders of magnitude in mass (M500 ∼ 1013–1015 M⊙) and the redshift range 0 ≲ z ≲ 1. Our joint analysis is based on the combination of Hyper Suprime-Cam Subaru Strategic Program (HSC-SSP) weak-lensing mass measurements, XXL X-ray gas mass measurements, and HSC and Sloan Digital Sky Survey multiband photometry. We carry out a Bayesian analysis of multivariate mass-scaling relations of gas mass, galaxy stellar mass, stellar mass of brightest cluster galaxies (BCGs), and soft-band X-ray luminosity, by taking into account the intrinsic covariance between cluster properties, selection effect, weak-lensing mass calibration, and observational error covariance matrix. The mass-dependent slope of the gas mass–total mass (M500) relation is found to be $1.29_{-0.10}^{+0.16}$, which is steeper than the self-similar prediction of unity, whereas the slope of the stellar mass–total mass relation is shallower than unity; $0.85_{-0.09}^{+0.12}$. The BCG stellar mass weakly depends on cluster mass with a slope of $0.49_{-0.10}^{+0.11}$. The baryon, gas mass, and stellar mass fractions as a function of M500 agree with the results from numerical simulations and previous observations. We successfully constrain the full intrinsic covariance of the baryonic contents. The BCG stellar mass shows the larger intrinsic scatter at a given halo total mass, followed in order by stellar mass and gas mass. We find a significant positive intrinsic correlation coefficient between total (and satellite) stellar mass and BCG stellar mass and no evidence for intrinsic correlation between gas mass and stellar mass. All the baryonic components show no redshift evolution.

     
    more » « less
  4. ABSTRACT

    We introduce the thesan project, a suite of large volume ($L_\mathrm{box} = 95.5 \, \mathrm{cMpc}$) radiation-magnetohydrodynamic simulations that simultaneously model the large-scale statistical properties of the intergalactic medium during reionization and the resolved characteristics of the galaxies responsible for it. The flagship simulation has dark matter and baryonic mass resolutions of $3.1 \times 10^6\, {\rm M_\odot }$ and $5.8 \times 10^5\, {\rm M_\odot }$, respectively. The gravitational forces are softened on scales of 2.2 ckpc with the smallest cell sizes reaching 10 pc at z = 5.5, enabling predictions down to the atomic cooling limit. The simulations use an efficient radiation hydrodynamics solver (arepo-rt) that precisely captures the interaction between ionizing photons and gas, coupled to well-tested galaxy formation (IllustrisTNG) and dust models to accurately predict the properties of galaxies. Through a complementary set of medium resolution simulations we investigate the changes to reionization introduced by different assumptions for ionizing escape fractions, varying dark matter models, and numerical convergence. The fiducial simulation and model variations are calibrated to produce realistic reionization histories that match the observed evolution of the global neutral hydrogen fraction and electron scattering optical depth to reionization. They also match a wealth of high-redshift observationally inferred data, including the stellar-to-halo-mass relation, galaxy stellar mass function, star formation rate density, and the mass–metallicity relation, despite the galaxy formation model being mainly calibrated at z = 0. We demonstrate that different reionization models give rise to varied bubble size distributions that imprint unique signatures on the 21 cm emission, especially on the slope of the power spectrum at large spatial scales, enabling current and upcoming 21 cm experiments to accurately characterize the sources that dominate the ionizing photon budget.

     
    more » « less
  5. ABSTRACT

    We present a machine learning (ML) approach for the prediction of galaxies’ dark matter halo masses which achieves an improved performance over conventional methods. We train three ML algorithms (XGBoost, random forests, and neural network) to predict halo masses using a set of synthetic galaxy catalogues that are built by populating dark matter haloes in N-body simulations with galaxies and that match both the clustering and the joint distributions of properties of galaxies in the Sloan Digital Sky Survey (SDSS). We explore the correlation of different galaxy- and group-related properties with halo mass, and extract the set of nine features that contribute the most to the prediction of halo mass. We find that mass predictions from the ML algorithms are more accurate than those from halo abundance matching (HAM) or dynamical mass estimates (DYN). Since the danger of this approach is that our training data might not accurately represent the real Universe, we explore the effect of testing the model on synthetic catalogues built with different assumptions than the ones used in the training phase. We test a variety of models with different ways of populating dark matter haloes, such as adding velocity bias for satellite galaxies. We determine that, though training and testing on different data can lead to systematic errors in predicted masses, the ML approach still yields substantially better masses than either HAM or DYN. Finally, we apply the trained model to a galaxy and group catalogue from the SDSS DR7 and present the resulting halo masses.

     
    more » « less