skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: HIFlow: Generating Diverse Hi Maps and Inferring Cosmology while Marginalizing over Astrophysics Using Normalizing Flows
Abstract A wealth of cosmological and astrophysical information is expected from many ongoing and upcoming large-scale surveys. It is crucial to prepare for these surveys now and develop tools that can efficiently extract most information. We present HIF low : a fast generative model of the neutral hydrogen (H i ) maps that is conditioned only on cosmology (Ω m and σ 8 ) and designed using a class of normalizing flow models, the masked autoregressive flow. HIF low is trained on the state-of-the-art simulations from the Cosmology and Astrophysics with MachinE Learning Simulations (CAMELS) project. HIF low has the ability to generate realistic diverse maps without explicitly incorporating the expected two-dimensional maps structure into the flow as an inductive bias. We find that HIF low is able to reproduce the CAMELS average and standard deviation H i power spectrum within a factor of ≲2, scoring a very high R 2 > 90%. By inverting the flow, HIF low provides a tractable high-dimensional likelihood for efficient parameter inference. We show that the conditional HIF low on cosmology is successfully able to marginalize over astrophysics at the field level, regardless of the stellar and AGN feedback strengths. This new tool represents a first step toward a more powerful parameter inference, maximizing the scientific return of future H i surveys, and opening a new avenue to minimize the loss of complex information due to data compression down to summary statistics.  more » « less
Award ID(s):
2108944
PAR ID:
10459003
Author(s) / Creator(s):
; ; ; ; ; ; ; ; ; ; ; ; ;
Date Published:
Journal Name:
The Astrophysical Journal
Volume:
937
Issue:
2
ISSN:
0004-637X
Page Range / eLocation ID:
83
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract As the next generation of large galaxy surveys come online, it is becoming increasingly important to develop and understand the machine-learning tools that analyze big astronomical data. Neural networks are powerful and capable of probing deep patterns in data, but they must be trained carefully on large and representative data sets. We present a new “hump” of the Cosmology and Astrophysics with MachinE Learning Simulations (CAMELS) project: CAMELS-SAM, encompassing one thousand dark-matter-only simulations of (100h−1cMpc)3with different cosmological parameters (Ωmandσ8) and run through the Santa Cruz semi-analytic model for galaxy formation over a broad range of astrophysical parameters. As a proof of concept for the power of this vast suite of simulated galaxies in a large volume and broad parameter space, we probe the power of simple clustering summary statistics to marginalize over astrophysics and constrain cosmology using neural networks. We use the two-point correlation, count-in-cells, and void probability functions, and we probe nonlinear and linear scales across 0.68 <R<27h−1cMpc. We find our neural networks can both marginalize over the uncertainties in astrophysics to constrain cosmology to 3%–8% error across various types of galaxy selections, while simultaneously learning about the SC-SAM astrophysical parameters. This work encompasses vital first steps toward creating algorithms able to marginalize over the uncertainties in our galaxy formation models and measure the underlying cosmology of our Universe. CAMELS-SAM has been publicly released alongside the rest of CAMELS, and it offers great potential to many applications of machine learning in astrophysics:https://camels-sam.readthedocs.io. 
    more » « less
  2. ABSTRACT The circum-galactic medium (CGM) can feasibly be mapped by multiwavelength surveys covering broad swaths of the sky. With multiple large data sets becoming available in the near future, we develop a likelihood-free Deep Learning technique using convolutional neural networks (CNNs) to infer broad-scale physical properties of a galaxy’s CGM and its halo mass for the first time. Using CAMELS (Cosmology and Astrophysics with MachinE Learning Simulations) data, including IllustrisTNG, SIMBA, and Astrid models, we train CNNs on Soft X-ray and 21-cm (H i) radio two-dimensional maps to trace hot and cool gas, respectively, around galaxies, groups, and clusters. Our CNNs offer the unique ability to train and test on ‘multifield’ data sets comprised of both H i and X-ray maps, providing complementary information about physical CGM properties and improved inferences. Applying eRASS:4 survey limits shows that X-ray is not powerful enough to infer individual haloes with masses log (Mhalo/M⊙) < 12.5. The multifield improves the inference for all halo masses. Generally, the CNN trained and tested on Astrid (SIMBA) can most (least) accurately infer CGM properties. Cross-simulation analysis – training on one galaxy formation model and testing on another – highlights the challenges of developing CNNs trained on a single model to marginalize over astrophysical uncertainties and perform robust inferences on real data. The next crucial step in improving the resulting inferences on the physical properties of CGM depends on our ability to interpret these deep-learning models. 
    more » « less
  3. The circum-galactic medium (CGM) can feasibly be mapped by multiwavelength surveys covering broad swaths of the sky. With multiple large data sets becoming available in the near future, we develop a likelihood-free Deep Learning technique using convolutional neural networks (CNNs) to infer broad-scale physical properties of a galaxy’s CGM and its halo mass for the first time. Using CAMELS (Cosmology and Astrophysics with MachinE Learning Simulations) data, including IllustrisTNG, SIMBA, and Astrid models, we train CNNs on Soft X-ray and 21-cm (H I ) radio two-dimensional maps to trace hot and cool gas, respectively, around galaxies, groups, and clusters. Our CNNs offer the unique ability to train and test on ‘multifield’ data sets comprised of both H I and X-ray maps, providing complementary information about physical CGM properties and impro v ed inferences. Applying eRASS:4 surv e y limits shows that X-ray is not powerful enough to infer individual haloes with masses log ( M halo /M  ) < 12.5. The multifield impro v es the inference for all halo masses. Generally, the CNN trained and tested on Astrid (SIMBA) can most (least) accurately infer CGM properties. Cross-simulation analysis –training on one galaxy formation model and testing on another –highlights the challenges of developing CNNs trained on a single model to marginalize over astrophysical uncertainties and perform robust inferences on real data. The next crucial step in improving the resulting inferences on the physical properties of CGM depends on our ability to interpret these deep-learning models. 
    more » « less
  4. Abstract We present the Cosmology and Astrophysics with Machine Learning Simulations (CAMELS) Multifield Data set (CMD), a collection of hundreds of thousands of 2D maps and 3D grids containing many different properties of cosmic gas, dark matter, and stars from more than 2000 distinct simulated universes at several cosmic times. The 2D maps and 3D grids represent cosmic regions that span ∼100 million light-years and have been generated from thousands of state-of-the-art hydrodynamic and gravity-only N -body simulations from the CAMELS project. Designed to train machine-learning models, CMD is the largest data set of its kind containing more than 70 TB of data. In this paper we describe CMD in detail and outline a few of its applications. We focus our attention on one such task, parameter inference, formulating the problems we face as a challenge to the community. We release all data and provide further technical details at https://camels-multifield-dataset.readthedocs.io . 
    more » « less
  5. Abstract We present a study on the inference of cosmological and astrophysical parameters using stacked galaxy cluster profiles. Utilizing the CAMELS-zoomGZ simulations, we explore how various cluster properties—such as X-ray surface brightness, gas density, temperature, metallicity, and Compton-y profiles—can be used to predict parameters within the 28-dimensional parameter space of the IllustrisTNG model. Through neural networks, we achieve a high correlation coefficient of 0.97 or above for all cosmological parameters, including Ωm,H0, andσ8, and over 0.90 for the remaining astrophysical parameters, showcasing the effectiveness of these profiles for parameter inference. We investigate the impact of different radial cuts, with bins ranging from 0.1R200cto 0.7R200c, to simulate current observational constraints. Additionally, we perform a noise sensitivity analysis, adding up to 40% Gaussian noise (corresponding to signal-to-noise ratios as low as 2.5), revealing that key parameters such as Ωm,H0, and the initial mass function slope remain robust even under extreme noise conditions. We also compare the performance of full radial profiles against integrated quantities, finding that profiles generally lead to more accurate parameter inferences. Our results demonstrate that stacked galaxy cluster profiles contain crucial information on both astrophysical processes within groups and clusters and the underlying cosmology of the Universe. This underscores their significance for interpreting the complex data expected from next-generation surveys and reveals, for the first time, their potential as a powerful tool for parameter inference. 
    more » « less