As the next generation of large galaxy surveys come online, it is becoming increasingly important to develop and understand the machine-learning tools that analyze big astronomical data. Neural networks are powerful and capable of probing deep patterns in data, but they must be trained carefully on large and representative data sets. We present a new “hump” of the Cosmology and Astrophysics with MachinE Learning Simulations (CAMELS) project: CAMELS-SAM, encompassing one thousand dark-matter-only simulations of (100
This content will become publicly available on July 3, 2025
- PAR ID:
- 10526434
- Publisher / Repository:
- Maynooth Academic Publishing
- Date Published:
- Journal Name:
- The Open Journal of Astrophysics
- Volume:
- 7
- ISSN:
- 2565-6120
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
Abstract h −1cMpc)3with different cosmological parameters (Ωm andσ 8) and run through the Santa Cruz semi-analytic model for galaxy formation over a broad range of astrophysical parameters. As a proof of concept for the power of this vast suite of simulated galaxies in a large volume and broad parameter space, we probe the power of simple clustering summary statistics to marginalize over astrophysics and constrain cosmology using neural networks. We use the two-point correlation, count-in-cells, and void probability functions, and we probe nonlinear and linear scales across 0.68 <R <27h −1cMpc. We find our neural networks can both marginalize over the uncertainties in astrophysics to constrain cosmology to 3%–8% error across various types of galaxy selections, while simultaneously learning about the SC-SAM astrophysical parameters. This work encompasses vital first steps toward creating algorithms able to marginalize over the uncertainties in our galaxy formation models and measure the underlying cosmology of our Universe. CAMELS-SAM has been publicly released alongside the rest of CAMELS, and it offers great potential to many applications of machine learning in astrophysics:https://camels-sam.readthedocs.io . -
The performance of inference with machine learning (ML) models and its integration with analytical query processing have become critical bottlenecks for data analysis in many organizations. An ML inference pipeline typically consists of a preprocessing workflow followed by prediction with an ML model. Current approaches for in-database inference implement preprocessing operators and ML algorithms in the database either natively, by transpiling code to SQL, or by executing user-defined functions in guest languages such as Python. In this work, we present a radically different approach that approximates an end-to-end inference pipeline (preprocessing plus prediction) using a light-weight embedding that discretizes a carefully selected subset of the input features and an index that maps data points in the embedding space to aggregated predictions of an ML model. We replace a complex preprocessing workflow and model-based inference with a simple feature transformation and an index lookup. Our framework improves inference latency by several orders of magnitude while maintaining similar prediction accuracy compared to the pipeline it approximates.
-
Abstract The Cosmology and Astrophysics with Machine Learning Simulations (CAMELS) project was developed to combine cosmology with astrophysics through thousands of cosmological hydrodynamic simulations and machine learning. CAMELS contains 4233 cosmological simulations, 2049 N -body simulations, and 2184 state-of-the-art hydrodynamic simulations that sample a vast volume in parameter space. In this paper, we present the CAMELS public data release, describing the characteristics of the CAMELS simulations and a variety of data products generated from them, including halo, subhalo, galaxy, and void catalogs, power spectra, bispectra, Ly α spectra, probability distribution functions, halo radial profiles, and X-rays photon lists. We also release over 1000 catalogs that contain billions of galaxies from CAMELS-SAM: a large collection of N -body simulations that have been combined with the Santa Cruz semianalytic model. We release all the data, comprising more than 350 terabytes and containing 143,922 snapshots, millions of halos, galaxies, and summary statistics. We provide further technical details on how to access, download, read, and process the data at https://camels.readthedocs.io .more » « less
-
ABSTRACT The physics of baryons in haloes, and their subsequent influence on the total matter phase space, has a rich phenomenology and must be well understood in order to pursue a vast set of questions in both cosmology and astrophysics. We use the Cosmology and Astrophysics with MachinE Learning Simulation (Camels) suite to quantify the impact of four different galaxy formation parameters/processes (as well as two cosmological parameters) on the concentration–mass relation, cvir−Mvir. We construct a simulation-informed non-linear model for concentration as a function of halo mass, redshift, and six cosmological/astrophysical parameters. This is done for two galaxy formation models, IllustrisTNG and Simba, using 1000 simulations of each. We extract the imprints of galaxy formation across a wide range in mass $M_{\rm vir}\in [10^{11}, 10^{14.5}] \, {\rm M}_\odot \, h^{-1}$ and in redshift z ∈ [0, 6] finding many strong mass- and redshift-dependent features. Comparisons between the IllustrisTNG and Simba results show the astrophysical model choices cause significant differences in the mass and redshift dependence of these baryon imprints. Finally, we use existing observational measurements of cvir−Mvir to provide rough limits on the four astrophysical parameters. Our non-linear model is made publicly available and can be used to include Camels-based baryon imprints in any halo model-based analysis.
-
Abstract Galaxies can be characterized by many internal properties such as stellar mass, gas metallicity, and star formation rate. We quantify the amount of cosmological and astrophysical information that the internal properties of individual galaxies and their host dark matter halos contain. We train neural networks using hundreds of thousands of galaxies from 2000 state-of-the-art hydrodynamic simulations with different cosmologies and astrophysical models of the CAMELS project to perform likelihood-free inference on the value of the cosmological and astrophysical parameters. We find that knowing the internal properties of a single galaxy allows our models to infer the value of Ω m , at fixed Ω b , with a ∼10% precision, while no constraint can be placed on σ 8 . Our results hold for any type of galaxy, central or satellite, massive or dwarf, at all considered redshifts, z ≤ 3, and they incorporate uncertainties in astrophysics as modeled in CAMELS. However, our models are not robust to changes in subgrid physics due to the large intrinsic differences the two considered models imprint on galaxy properties. We find that the stellar mass, stellar metallicity, and maximum circular velocity are among the most important galaxy properties to determine the value of Ω m . We believe that our results can be explained by considering that changes in the value of Ω m , or potentially Ω b /Ω m , affect the dark matter content of galaxies, which leaves a signature in galaxy properties distinct from the one induced by galactic processes. Our results suggest that the low-dimensional manifold hosting galaxy properties provides a tight direct link between cosmology and astrophysics.more » « less