skip to main content


Title: Constraining Cosmology with Machine Learning and Galaxy Clustering: The CAMELS-SAM Suite
Abstract

As the next generation of large galaxy surveys come online, it is becoming increasingly important to develop and understand the machine-learning tools that analyze big astronomical data. Neural networks are powerful and capable of probing deep patterns in data, but they must be trained carefully on large and representative data sets. We present a new “hump” of the Cosmology and Astrophysics with MachinE Learning Simulations (CAMELS) project: CAMELS-SAM, encompassing one thousand dark-matter-only simulations of (100h−1cMpc)3with different cosmological parameters (Ωmandσ8) and run through the Santa Cruz semi-analytic model for galaxy formation over a broad range of astrophysical parameters. As a proof of concept for the power of this vast suite of simulated galaxies in a large volume and broad parameter space, we probe the power of simple clustering summary statistics to marginalize over astrophysics and constrain cosmology using neural networks. We use the two-point correlation, count-in-cells, and void probability functions, and we probe nonlinear and linear scales across 0.68 <R<27h−1cMpc. We find our neural networks can both marginalize over the uncertainties in astrophysics to constrain cosmology to 3%–8% error across various types of galaxy selections, while simultaneously learning about the SC-SAM astrophysical parameters. This work encompasses vital first steps toward creating algorithms able to marginalize over the uncertainties in our galaxy formation models and measure the underlying cosmology of our Universe. CAMELS-SAM has been publicly released alongside the rest of CAMELS, and it offers great potential to many applications of machine learning in astrophysics:https://camels-sam.readthedocs.io.

 
more » « less
Award ID(s):
2108944
NSF-PAR ID:
10442824
Author(s) / Creator(s):
; ; ; ; ; ; ;
Publisher / Repository:
DOI PREFIX: 10.3847
Date Published:
Journal Name:
The Astrophysical Journal
Volume:
954
Issue:
1
ISSN:
0004-637X
Format(s):
Medium: X Size: Article No. 11
Size(s):
["Article No. 11"]
Sponsoring Org:
National Science Foundation
More Like this
  1. ABSTRACT

    The circum-galactic medium (CGM) can feasibly be mapped by multiwavelength surveys covering broad swaths of the sky. With multiple large data sets becoming available in the near future, we develop a likelihood-free Deep Learning technique using convolutional neural networks (CNNs) to infer broad-scale physical properties of a galaxy’s CGM and its halo mass for the first time. Using CAMELS (Cosmology and Astrophysics with MachinE Learning Simulations) data, including IllustrisTNG, SIMBA, and Astrid models, we train CNNs on Soft X-ray and 21-cm (H i) radio two-dimensional maps to trace hot and cool gas, respectively, around galaxies, groups, and clusters. Our CNNs offer the unique ability to train and test on ‘multifield’ data sets comprised of both H i and X-ray maps, providing complementary information about physical CGM properties and improved inferences. Applying eRASS:4 survey limits shows that X-ray is not powerful enough to infer individual haloes with masses log (Mhalo/M⊙) < 12.5. The multifield improves the inference for all halo masses. Generally, the CNN trained and tested on Astrid (SIMBA) can most (least) accurately infer CGM properties. Cross-simulation analysis – training on one galaxy formation model and testing on another – highlights the challenges of developing CNNs trained on a single model to marginalize over astrophysical uncertainties and perform robust inferences on real data. The next crucial step in improving the resulting inferences on the physical properties of CGM depends on our ability to interpret these deep-learning models.

     
    more » « less
  2. Abstract

    We present CAMELS-ASTRID, the third suite of hydrodynamical simulations in the Cosmology and Astrophysics with MachinE Learning (CAMELS) project, along with new simulation sets that extend the model parameter space based on the previous frameworks of CAMELS-TNG and CAMELS-SIMBA, to provide broader training sets and testing grounds for machine-learning algorithms designed for cosmological studies. CAMELS-ASTRID employs the galaxy formation model following the ASTRID simulation and contains 2124 hydrodynamic simulation runs that vary three cosmological parameters (Ωm,σ8, Ωb) and four parameters controlling stellar and active galactic nucleus (AGN) feedback. Compared to the existing TNG and SIMBA simulation suites in CAMELS, the fiducial model of ASTRID features the mildest AGN feedback and predicts the least baryonic effect on the matter power spectrum. The training set of ASTRID covers a broader variation in the galaxy populations and the baryonic impact on the matter power spectrum compared to its TNG and SIMBA counterparts, which can make machine-learning models trained on the ASTRID suite exhibit better extrapolation performance when tested on other hydrodynamic simulation sets. We also introduce extension simulation sets in CAMELS that widely explore 28 parameters in the TNG and SIMBA models, demonstrating the enormity of the overall galaxy formation model parameter space and the complex nonlinear interplay between cosmology and astrophysical processes. With the new simulation suites, we show that building robust machine-learning models favors training and testing on the largest possible diversity of galaxy formation models. We also demonstrate that it is possible to train accurate neural networks to infer cosmological parameters using the high-dimensional TNG-SB28 simulation set.

     
    more » « less
  3. Abstract Galaxies can be characterized by many internal properties such as stellar mass, gas metallicity, and star formation rate. We quantify the amount of cosmological and astrophysical information that the internal properties of individual galaxies and their host dark matter halos contain. We train neural networks using hundreds of thousands of galaxies from 2000 state-of-the-art hydrodynamic simulations with different cosmologies and astrophysical models of the CAMELS project to perform likelihood-free inference on the value of the cosmological and astrophysical parameters. We find that knowing the internal properties of a single galaxy allows our models to infer the value of Ω m , at fixed Ω b , with a ∼10% precision, while no constraint can be placed on σ 8 . Our results hold for any type of galaxy, central or satellite, massive or dwarf, at all considered redshifts, z ≤ 3, and they incorporate uncertainties in astrophysics as modeled in CAMELS. However, our models are not robust to changes in subgrid physics due to the large intrinsic differences the two considered models imprint on galaxy properties. We find that the stellar mass, stellar metallicity, and maximum circular velocity are among the most important galaxy properties to determine the value of Ω m . We believe that our results can be explained by considering that changes in the value of Ω m , or potentially Ω b /Ω m , affect the dark matter content of galaxies, which leaves a signature in galaxy properties distinct from the one induced by galactic processes. Our results suggest that the low-dimensional manifold hosting galaxy properties provides a tight direct link between cosmology and astrophysics. 
    more » « less
  4. Abstract A wealth of cosmological and astrophysical information is expected from many ongoing and upcoming large-scale surveys. It is crucial to prepare for these surveys now and develop tools that can efficiently extract most information. We present HIF low : a fast generative model of the neutral hydrogen (H i ) maps that is conditioned only on cosmology (Ω m and σ 8 ) and designed using a class of normalizing flow models, the masked autoregressive flow. HIF low is trained on the state-of-the-art simulations from the Cosmology and Astrophysics with MachinE Learning Simulations (CAMELS) project. HIF low has the ability to generate realistic diverse maps without explicitly incorporating the expected two-dimensional maps structure into the flow as an inductive bias. We find that HIF low is able to reproduce the CAMELS average and standard deviation H i power spectrum within a factor of ≲2, scoring a very high R 2 > 90%. By inverting the flow, HIF low provides a tractable high-dimensional likelihood for efficient parameter inference. We show that the conditional HIF low on cosmology is successfully able to marginalize over astrophysics at the field level, regardless of the stellar and AGN feedback strengths. This new tool represents a first step toward a more powerful parameter inference, maximizing the scientific return of future H i surveys, and opening a new avenue to minimize the loss of complex information due to data compression down to summary statistics. 
    more » « less
  5. Abstract The Cosmology and Astrophysics with Machine Learning Simulations (CAMELS) project was developed to combine cosmology with astrophysics through thousands of cosmological hydrodynamic simulations and machine learning. CAMELS contains 4233 cosmological simulations, 2049 N -body simulations, and 2184 state-of-the-art hydrodynamic simulations that sample a vast volume in parameter space. In this paper, we present the CAMELS public data release, describing the characteristics of the CAMELS simulations and a variety of data products generated from them, including halo, subhalo, galaxy, and void catalogs, power spectra, bispectra, Ly α spectra, probability distribution functions, halo radial profiles, and X-rays photon lists. We also release over 1000 catalogs that contain billions of galaxies from CAMELS-SAM: a large collection of N -body simulations that have been combined with the Santa Cruz semianalytic model. We release all the data, comprising more than 350 terabytes and containing 143,922 snapshots, millions of halos, galaxies, and summary statistics. We provide further technical details on how to access, download, read, and process the data at https://camels.readthedocs.io . 
    more » « less