Feedback from active galactic nuclei and stellar processes changes the matter distribution on small scales, leading to significant systematic uncertainty in weak lensing constraints on cosmology. We investigate how the observable properties of groupscale haloes can constrain feedback’s impact on the matter distribution using Cosmology and Astrophysics with MachinE Learning Simulations (CAMELS). Extending the results of previous work to smaller halo masses and higher wavenumber, k, we find that the baryon fraction in haloes contains significant information about the impact of feedback on the matter power spectrum. We explore how the thermal Sunyaev Zel’dovich (tSZ) signal from groupscale haloes contains similar information. Using recent Dark Energy Survey weak lensing and Atacama Cosmology Telescope tSZ crosscorrelation measurements and models trained on CAMELS, we obtain 10 per cent constraints on feedback effects on the power spectrum at $k \sim 5\, h\, {\rm Mpc}^{1}$. We show that with future surveys, it will be possible to constrain baryonic effects on the power spectrum to $\mathcal {O}(\lt 1~{{\ \rm per\ cent}})$ at $k = 1\, h\, {\rm Mpc}^{1}$ and $\mathcal {O}(3~{{\ \rm per\ cent}})$ at $k = 5\, h\, {\rm Mpc}^{1}$ using the methods that we introduce here. Finally, we investigate the impact of feedback on the matter bispectrum, finding that tSZ observables are highly informative in this case.
Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher.
Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?
Some links on this page may take you to nonfederal websites. Their policies may differ from this site.

ABSTRACT 
Abstract As the next generation of large galaxy surveys come online, it is becoming increasingly important to develop and understand the machinelearning tools that analyze big astronomical data. Neural networks are powerful and capable of probing deep patterns in data, but they must be trained carefully on large and representative data sets. We present a new “hump” of the Cosmology and Astrophysics with MachinE Learning Simulations (CAMELS) project: CAMELSSAM, encompassing one thousand darkmatteronly simulations of (100
h ^{−1}cMpc)^{3}with different cosmological parameters (Ω_{m}andσ _{8}) and run through the Santa Cruz semianalytic model for galaxy formation over a broad range of astrophysical parameters. As a proof of concept for the power of this vast suite of simulated galaxies in a large volume and broad parameter space, we probe the power of simple clustering summary statistics to marginalize over astrophysics and constrain cosmology using neural networks. We use the twopoint correlation, countincells, and void probability functions, and we probe nonlinear and linear scales across 0.68 <R <27h ^{−1}cMpc. We find our neural networks can both marginalize over the uncertainties in astrophysics to constrain cosmology to 3%–8% error across various types of galaxy selections, while simultaneously learning about the SCSAM astrophysical parameters. This work encompasses vital first steps toward creating algorithms able to marginalize over the uncertainties in our galaxy formation models and measure the underlying cosmology of our Universe. CAMELSSAM has been publicly released alongside the rest of CAMELS, and it offers great potential to many applications of machine learning in astrophysics:https://camelssam.readthedocs.io . 
ABSTRACT Feedback from active galactic nuclei (AGNs) and supernovae can affect measurements of integrated Sunyaev–Zeldovich (SZ) flux of haloes (YSZ) from cosmic microwave background (CMB) surveys, and cause its relation with the halo mass (YSZ–M) to deviate from the selfsimilar powerlaw prediction of the virial theorem. We perform a comprehensive study of such deviations using CAMELS, a suite of hydrodynamic simulations with extensive variations in feedback prescriptions. We use a combination of two machine learning tools (random forest and symbolic regression) to search for analogues of the Y–M relation which are more robust to feedback processes for low masses ($M\lesssim 10^{14}\, \mathrm{ h}^{1} \, \mathrm{ M}_\odot$); we find that simply replacing Y → Y(1 + M*/Mgas) in the relation makes it remarkably selfsimilar. This could serve as a robust multiwavelength mass proxy for lowmass clusters and galaxy groups. Our methodology can also be generally useful to improve the domain of validity of other astrophysical scaling relations. We also forecast that measurements of the Y–M relation could provide per cent level constraints on certain combinations of feedback parameters and/or rule out a major part of the parameter space of supernova and AGN feedback models used in current stateoftheart hydrodynamic simulations. Our results can be useful for using upcoming SZ surveys (e.g. SO, CMBS4) and galaxy surveys (e.g. DESI and Rubin) to constrain the nature of baryonic feedback. Finally, we find that the alternative relation, Y–M*, provides complementary information on feedback than Y–M.

Abstract In a novel approach employing implicit likelihood inference (ILI), also known as likelihoodfree inference, we calibrate the parameters of cosmological hydrodynamic simulations against observations, which has previously been unfeasible due to the high computational cost of these simulations. For computational efficiency, we train neural networks as emulators on ∼1000 cosmological simulations from the CAMELS project to estimate simulated observables, taking as input the cosmological and astrophysical parameters, and use these emulators as surrogates for the cosmological simulations. Using the cosmic star formation rate density (SFRD) and, separately, the stellar mass functions (SMFs) at different redshifts, we perform ILI on selected cosmological and astrophysical parameters (Ω
_{m} ,σ _{8}, stellar wind feedback, and kinetic black hole feedback) and obtain full sixdimensional posterior distributions. In the performance test, the ILI from the emulated SFRD (SMFs) can recover the target observables with a relative error of 0.17% (0.4%). We find that degeneracies exist between the parameters inferred from the emulated SFRD, confirmed with new full cosmological simulations. We also find that the SMFs can break the degeneracy in the SFRD, which indicates that the SMFs provide complementary constraints for the parameters. Further, we find that a parameter combination inferred from an observationally inferred SFRD reproduces the target observed SFRD very well, whereas, in the case of the SMFs, the inferred and observed SMFs show significant discrepancies that indicate potential limitations of the current galaxy formation modeling and calibration framework, and/or systematic differences and inconsistencies between observations of the SMFs. 
Abstract Active galactic nuclei (AGNs) feedback models are generally calibrated to reproduce galaxy observables such as the stellar mass function and the bimodality in galaxy colors. We use variations of the AGN feedback implementations in the IllustrisTNG (TNG) and
Simba cosmological hydrodynamic simulations to show that the lowredshift Lyα forest can provide constraints on the impact of AGN feedback. We show that TNG overpredicts the number density of absorbers at column densitiesN _{HI}< 10^{14}cm^{−2}compared to data from the Cosmic Origins Spectrograph (in agreement with previous work), and we demonstrate explicitly that its kinetic feedback mode, which is primarily responsible for galaxy quenching, has a negligible impact on the column density distribution (CDD) of absorbers. In contrast, we show that the fiducialSimba model, which includes AGN jet feedback, is the preferred fit to the observed CDD of thez = 0.1 Lyα forest across 5 orders of magnitude in column density. We show that theSimba results with jets produce a quantitatively better fit to the observational data than theSimba results without jets, even when the ultraviolet background is left as a free parameter. AGN jets inSimba are high speed, collimated, weakly interacting with the interstellar medium (via brief hydrodynamic decoupling), and heated to the halo virial temperature. Collectively these properties result in stronger longrange impacts on the intergalactic medium when compared to TNG’s kinetic feedback mode, which drives isotropic winds with lower velocities at the galactic radius. Our results suggest that the lowredshift Lyα forest provides plausible evidence for longrange AGN jet feedback. 
Abstract From 1000 hydrodynamic simulations of the CAMELS project, each with a different value of the cosmological and astrophysical parameters, we generate 15,000 gas temperature maps. We use a stateoftheart deep convolutional neural network to recover missing data from those maps. We mimic the missing data by applying regular and irregular binary masks that cover either 15% or 30% of the area. We quantify the reliability of our results using two summary statistics: (1) the distance between the probability density functions, estimated using the Kolmogorov–Smirnov (KS) test, and (2) the 2D power spectrum. We find an excellent agreement between the model prediction and the unmasked maps when using the power spectrum: better than 1% for
k < 20h Mpc^{−1}for any irregular mask. For regular masks, we observe a systematic offset of ∼5% when covering 15% of the maps, while the results become unreliable when 30% of the data is missing. The observed KS testp values favor the null hypothesis that the reconstructed and the groundtruth maps are drawn from the same underlying distribution when irregular masks are used. For regularshaped masks, on the other hand, we find a strong evidence that the two distributions do not match each other. Finally, we use the model, trained on gas temperature maps, to inpaint maps from fields not used during model training. We find that, visually, our model is able to reconstruct the missing pixels from the maps of those fields with great accuracy, although its performance using summary statistics depends strongly on the considered field. 
Abstract It is important to understand the cycle of baryons through the circumgalactic medium (CGM) in the context of galaxy formation and evolution. In this study, we forecast constraints on the feedback processes heating the CGM with current and future Sunyaev–Zeldovich (SZ) observations. To constrain these processes, we use a suite of cosmological simulations, the Cosmology and Astrophysics with MachinE Learning Simulations (CAMELS). CAMELS varies four different feedback parameters of two previously existing hydrodynamical simulations, IllustrisTNG and SIMBA. We capture the dependences of SZ radial profiles on these feedback parameters with an emulator, calculate their derivatives, and forecast future constraints on these feedback parameters from upcoming experiments. We find that for a galaxy sample similar to what would be obtained with the Dark Energy Spectroscopic Instrument at the Simons Observatory, all four feedback parameters can be constrained (some within the 10% level), indicating that future observations will be able to further restrict the parameter space for these subgrid models. Given the modeled galaxy sample and forecasted errors in this work, we find that the inner SZ profiles contribute more to the constraining power than the outer profiles. Finally, we find that, despite the wide range of parameter variation in active galactic feedback in the CAMELS simulation suite, we cannot reproduce the thermal SZ signal of galaxies selected by the Baryon Oscillation Spectroscopic Survey as measured by the Atacama Cosmology Telescope.

ABSTRACT The James Webb Space Telescope will have the power to characterize highredshift quasars at z ≥ 6 with an unprecedented depth and spatial resolution. While the brightest quasars at such redshift (i.e. with bolometric luminosity $L_{\rm bol}\geqslant 10^{46}\, \rm erg/s$) provide us with key information on the most extreme objects in the Universe, measuring the black hole (BH) mass and Eddington ratios of fainter quasars with $L_{\rm bol}= 10^{45}10^{46}\, \rm erg\,s^{ 1}$ opens a path to understand the buildup of more normal BHs at z ≥ 6. In this paper, we show that the Illustris, TNG100, TNG300, HorizonAGN, EAGLE, and SIMBA largescale cosmological simulations do not agree on whether BHs at z ≥ 4 are overmassive or undermassive at fixed galaxy stellar mass with respect to the MBH − M⋆ scaling relation at z = 0 (BH mass offsets). Our conclusions are unchanged when using the local scaling relation produced by each simulation or empirical relations. We find that the BH mass offsets of the simulated faint quasar population at z ≥ 4, unlike those of bright quasars, represent the BH mass offsets of the entire BH population, for all the simulations. Thus, a population of faint quasars with $L_{\rm bol}= 10^{45}10^{46}\, \rm erg\,s^{ 1}$ observed by JWST can provide key constraints on the assembly of BHs at high redshift. Moreover, this will help constraining the highredshift regime of cosmological simulations, including BH seeding, early growth, and coevolution with the host galaxies. Our results also motivate the need for simulations of larger cosmological volumes down to z ∼ 6, with the same diversity of subgrid physics, in order to gain statistics on the most extreme objects at high redshift.

Abstract Recent work has pointed out the potential existence of a tight relation between the cosmological parameter Ω m , at fixed Ω b , and the properties of individual galaxies in stateoftheart cosmological hydrodynamic simulations. In this paper, we investigate whether such a relation also holds for galaxies from simulations run with a different code that makes use of a distinct subgrid physics: Astrid. We also find that in this case, neural networks are able to infer the value of Ω m with a ∼10% precision from the properties of individual galaxies, while accounting for astrophysics uncertainties, as modeled in Cosmology and Astrophysics with MachinE Learning (CAMELS). This tight relationship is present at all considered redshifts, z ≤ 3, and the stellar mass, the stellar metallicity, and the maximum circular velocity are among the most important galaxy properties behind the relation. In order to use this method with real galaxies, one needs to quantify its robustness: the accuracy of the model when tested on galaxies generated by codes different from the one used for training. We quantify the robustness of the models by testing them on galaxies from four different codes: IllustrisTNG, SIMBA, Astrid, and Magneticum. We show that the models perform well on a large fraction of the galaxies, but fail dramatically on a small fraction of them. Removing these outliers significantly improves the accuracy of the models across simulation codes.more » « lessFree, publiclyaccessible full text available August 29, 2024

Abstract We train graph neural networks to perform fieldlevel likelihoodfree inference using galaxy catalogs from stateoftheart hydrodynamic simulations of the CAMELS project. Our models are rotational, translational, and permutation invariant and do not impose any cut on scale. From galaxy catalogs that only contain 3D positions and radial velocities of ∼1000 galaxies in tiny ( 25 h − 1 Mpc ) 3 volumes our models can infer the value of Ω m with approximately 12% precision. More importantly, by testing the models on galaxy catalogs from thousands of hydrodynamic simulations, each having a different efficiency of supernova and active galactic nucleus feedback, run with five different codes and subgrid models—IllustrisTNG, SIMBA, Astrid, Magneticum, SWIFTEAGLE—we find that our models are robust to changes in astrophysics, subgrid physics, and subhalo/galaxy finder. Furthermore, we test our models on 1024 simulations that cover a vast region in parameter space—variations in five cosmological and 23 astrophysical parameters—finding that the model extrapolates really well. Our results indicate that the key to building a robust model is the use of both galaxy positions and velocities, suggesting that the network has likely learned an underlying physical relation that does not depend on galaxy formation and is valid on scales larger than ∼10 h −1 kpc.more » « less