Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher.
Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?
Some links on this page may take you to non-federal websites. Their policies may differ from this site.
-
Abstract The rapid advancement of large-scale cosmological simulations has opened new avenues for cosmological and astrophysical research. However, the increasing diversity among cosmological simulation models presents a challenge to therobustness. In this work, we develop the Model-Insensitive ESTimator (Miest), a machine that canrobustlyestimate the cosmological parameters, Ωmandσ8, from neural hydrogen maps of simulation models in the Cosmology and Astrophysics with MachinE Learning Simulations project—IllustrisTNG,SIMBA, Astrid, and SWIFT-Eagle. An estimator is consideredrobustif it possesses a consistent predictive power across all simulations, including those used during the training phase. We train our machine using multiple simulation models and ensure that it only extracts common features between the models while disregarding the model-specific features. This allows us to develop a novel model that is capable of accurately estimating parameters across a range of simulation models, without being biased toward any particular model. Upon the investigation of the latent space—a set of summary statistics, we find that the implementation ofrobustnessleads to the blending of latent variables across different models, demonstrating the removal of model-specific features. In comparison to a standard machine lackingrobustness, the average performance of Mieston the unseen simulations during the training phase has been improved by ∼17% for Ωmand 38% forσ8. By using a machine learning approach that can extractrobust, yet physical features, we hope to improve our understanding of galaxy formation and evolution in a (subgrid) model-insensitive manner, and ultimately, gain insight into the underlying physical processes responsible forrobustness.more » « lessFree, publicly-accessible full text available September 19, 2026
-
Abstract Cosmological simulations like CAMELS and IllustrisTNG characterize hundreds of thousands of galaxies using various internal properties. Previous studies have demonstrated that machine learning can be used to infer the cosmological parameter Ωmfrom the internal properties of even a single randomly selected simulated galaxy. This ability was hypothesized to originate from galaxies occupying a low-dimensional manifold within a higher-dimensional galaxy property space, which shifts with variations in Ωm. In this work, we investigate how galaxies occupy the high-dimensional galaxy property space, particularly the effect of Ωmand other cosmological and astrophysical parameters on the putative manifold. We achieve this by using an autoencoder with an information-ordered bottleneck, a neural layer with adaptive compression, to perform dimensionality reduction on individual galaxy properties from CAMELS simulations, which are run with various combinations of cosmological and astrophysical parameters. We find that for an autoencoder trained on the fiducial set of parameters, the reconstruction error increases significantly when the test set deviates from fiducial values of ΩmandASN1, indicating that these parameters shift galaxies off the fiducial manifold. In contrast, variations in other parameters such asσ8cause negligible error changes, suggesting galaxies shift along the manifold. These findings provide direct evidence that the ability to infer Ωmfrom individual galaxies is tied to the way Ωmshifts the manifold. Physically, this implies that parameters likeσ8produce galaxy property changes resembling natural scatter, while parameters like ΩmandASN1create unsampled properties, extending beyond the natural scatter in the fiducial model.more » « lessFree, publicly-accessible full text available June 12, 2026
-
Abstract We present a study on the inference of cosmological and astrophysical parameters using stacked galaxy cluster profiles. Utilizing the CAMELS-zoomGZ simulations, we explore how various cluster properties—such as X-ray surface brightness, gas density, temperature, metallicity, and Compton-y profiles—can be used to predict parameters within the 28-dimensional parameter space of the IllustrisTNG model. Through neural networks, we achieve a high correlation coefficient of 0.97 or above for all cosmological parameters, including Ωm,H0, andσ8, and over 0.90 for the remaining astrophysical parameters, showcasing the effectiveness of these profiles for parameter inference. We investigate the impact of different radial cuts, with bins ranging from 0.1R200cto 0.7R200c, to simulate current observational constraints. Additionally, we perform a noise sensitivity analysis, adding up to 40% Gaussian noise (corresponding to signal-to-noise ratios as low as 2.5), revealing that key parameters such as Ωm,H0, and the initial mass function slope remain robust even under extreme noise conditions. We also compare the performance of full radial profiles against integrated quantities, finding that profiles generally lead to more accurate parameter inferences. Our results demonstrate that stacked galaxy cluster profiles contain crucial information on both astrophysical processes within groups and clusters and the underlying cosmology of the Universe. This underscores their significance for interpreting the complex data expected from next-generation surveys and reveals, for the first time, their potential as a powerful tool for parameter inference.more » « lessFree, publicly-accessible full text available March 6, 2026
-
Abstract Galaxy formation models within cosmological hydrodynamical simulations contain numerous parameters with nontrivial influences over the resulting properties of simulated cosmic structures and galaxy populations. It is computationally challenging to sample these high dimensional parameter spaces with simulations, in particular for halos in the high-mass end of the mass function. In this work, we develop a novel sampling and reduced variance regression method,CARPoolGP, which leverages built-in correlations between samples in different locations of high dimensional parameter spaces to provide an efficient way to explore parameter space and generate low-variance emulations of summary statistics. We use this method to extend the Cosmology and Astrophysics with machinE Learning Simulations to include a set of 768 zoom-in simulations of halos in the mass range of 1013–1014.5M⊙h−1that span a 28-dimensional parameter space in the IllustrisTNG model. With these simulations and the CARPoolGP emulation method, we explore parameter trends in the ComptonY–M, black hole mass–halo mass, and metallicity–mass relations, as well as thermodynamic profiles and quenched fractions of satellite galaxies. We use these emulations to provide a physical picture of the complex interplay between supernova and active galactic nuclei feedback. We then use emulations of theY–Mrelation of massive halos to perform Fisher forecasts on astrophysical parameters for future Sunyaev–Zeldovich observations and find a significant improvement in forecasted constraints. We publicly release both the simulation suite and CARPoolGP software package.more » « less
-
ABSTRACT We quantify the cosmological spread of baryons relative to their initial neighbouring dark matter distribution using thousands of state-of-the-art simulations from the Cosmology and Astrophysics with MachinE Learning Simulations (CAMELS) project. We show that dark matter particles spread relative to their initial neighbouring distribution owing to chaotic gravitational dynamics on spatial scales comparable to their host dark matter halo. In contrast, gas in hydrodynamic simulations spreads much further from the initial neighbouring dark matter owing to feedback from supernovae (SNe) and active galactic nuclei (AGN). We show that large-scale baryon spread is very sensitive to model implementation details, with the fiducial simba model spreading ∼40 per cent of baryons >1 Mpc away compared to ∼10 per cent for the IllustrisTNG and astrid models. Increasing the efficiency of AGN-driven outflows greatly increases baryon spread while increasing the strength of SNe-driven winds can decrease spreading due to non-linear coupling of stellar and AGN feedback. We compare total matter power spectra between hydrodynamic and paired N-body simulations and demonstrate that the baryonic spread metric broadly captures the global impact of feedback on matter clustering over variations of cosmological and astrophysical parameters, initial conditions, and (to a lesser extent) galaxy formation models. Using symbolic regression, we find a function that reproduces the suppression of power by feedback as a function of wave number (k) and baryonic spread up to $$k \sim 10\, h$$ Mpc−1 in SIMBA while highlighting the challenge of developing models robust to variations in galaxy formation physics implementation.more » « less
-
ABSTRACT Extracting information from the total matter power spectrum with the precision needed for upcoming cosmological surveys requires unraveling the complex effects of galaxy formation processes on the distribution of matter. We investigate the impact of baryonic physics on matter clustering at z = 0 using a library of power spectra from the Cosmology and Astrophysics with MachinE Learning Simulations project, containing thousands of $$(25\, h^{-1}\, {\rm Mpc})^3$$ volume realizations with varying cosmology, initial random field, stellar and active galactic nucleus (AGN) feedback strength and subgrid model implementation methods. We show that baryonic physics affects matter clustering on scales $$k \gtrsim 0.4\, h\, \mathrm{Mpc}^{-1}$$ and the magnitude of this effect is dependent on the details of the galaxy formation implementation and variations of cosmological and astrophysical parameters. Increasing AGN feedback strength decreases halo baryon fractions and yields stronger suppression of power relative to N-body simulations, while stronger stellar feedback often results in weaker effects by suppressing black hole growth and therefore the impact of AGN feedback. We find a broad correlation between mean baryon fraction of massive haloes (M200c > 1013.5 M⊙) and suppression of matter clustering but with significant scatter compared to previous work owing to wider exploration of feedback parameters and cosmic variance effects. We show that a random forest regressor trained on the baryon content and abundance of haloes across the full mass range 1010 ≤ Mhalo/M⊙<1015 can predict the effect of galaxy formation on the matter power spectrum on scales k = 1.0–20.0 $$h\, \mathrm{Mpc}^{-1}$$.more » « less
-
Abstract We present CAMELS-ASTRID, the third suite of hydrodynamical simulations in the Cosmology and Astrophysics with MachinE Learning (CAMELS) project, along with new simulation sets that extend the model parameter space based on the previous frameworks of CAMELS-TNG and CAMELS-SIMBA, to provide broader training sets and testing grounds for machine-learning algorithms designed for cosmological studies. CAMELS-ASTRID employs the galaxy formation model following the ASTRID simulation and contains 2124 hydrodynamic simulation runs that vary three cosmological parameters (Ωm,σ8, Ωb) and four parameters controlling stellar and active galactic nucleus (AGN) feedback. Compared to the existing TNG and SIMBA simulation suites in CAMELS, the fiducial model of ASTRID features the mildest AGN feedback and predicts the least baryonic effect on the matter power spectrum. The training set of ASTRID covers a broader variation in the galaxy populations and the baryonic impact on the matter power spectrum compared to its TNG and SIMBA counterparts, which can make machine-learning models trained on the ASTRID suite exhibit better extrapolation performance when tested on other hydrodynamic simulation sets. We also introduce extension simulation sets in CAMELS that widely explore 28 parameters in the TNG and SIMBA models, demonstrating the enormity of the overall galaxy formation model parameter space and the complex nonlinear interplay between cosmology and astrophysical processes. With the new simulation suites, we show that building robust machine-learning models favors training and testing on the largest possible diversity of galaxy formation models. We also demonstrate that it is possible to train accurate neural networks to infer cosmological parameters using the high-dimensional TNG-SB28 simulation set.more » « less
-
Abstract In a novel approach employing implicit likelihood inference (ILI), also known as likelihood-free inference, we calibrate the parameters of cosmological hydrodynamic simulations against observations, which has previously been unfeasible due to the high computational cost of these simulations. For computational efficiency, we train neural networks as emulators on ∼1000 cosmological simulations from the CAMELS project to estimate simulated observables, taking as input the cosmological and astrophysical parameters, and use these emulators as surrogates for the cosmological simulations. Using the cosmic star formation rate density (SFRD) and, separately, the stellar mass functions (SMFs) at different redshifts, we perform ILI on selected cosmological and astrophysical parameters (Ωm,σ8, stellar wind feedback, and kinetic black hole feedback) and obtain full six-dimensional posterior distributions. In the performance test, the ILI from the emulated SFRD (SMFs) can recover the target observables with a relative error of 0.17% (0.4%). We find that degeneracies exist between the parameters inferred from the emulated SFRD, confirmed with new full cosmological simulations. We also find that the SMFs can break the degeneracy in the SFRD, which indicates that the SMFs provide complementary constraints for the parameters. Further, we find that a parameter combination inferred from an observationally inferred SFRD reproduces the target observed SFRD very well, whereas, in the case of the SMFs, the inferred and observed SMFs show significant discrepancies that indicate potential limitations of the current galaxy formation modeling and calibration framework, and/or systematic differences and inconsistencies between observations of the SMFs.more » « less
-
Abstract The Cosmology and Astrophysics with Machine Learning Simulations (CAMELS) project was developed to combine cosmology with astrophysics through thousands of cosmological hydrodynamic simulations and machine learning. CAMELS contains 4233 cosmological simulations, 2049 N -body simulations, and 2184 state-of-the-art hydrodynamic simulations that sample a vast volume in parameter space. In this paper, we present the CAMELS public data release, describing the characteristics of the CAMELS simulations and a variety of data products generated from them, including halo, subhalo, galaxy, and void catalogs, power spectra, bispectra, Ly α spectra, probability distribution functions, halo radial profiles, and X-rays photon lists. We also release over 1000 catalogs that contain billions of galaxies from CAMELS-SAM: a large collection of N -body simulations that have been combined with the Santa Cruz semianalytic model. We release all the data, comprising more than 350 terabytes and containing 143,922 snapshots, millions of halos, galaxies, and summary statistics. We provide further technical details on how to access, download, read, and process the data at https://camels.readthedocs.io .more » « less
An official website of the United States government
