Abstract We present a proof-of-concept simulation-based inference on Ωmandσ8from the Sloan Digital Sky Survey (SDSS) Baryon Oscillation Spectroscopic Survey (BOSS) LOWZ Northern Galactic Cap (NGC) catalog using neural networks and domain generalization techniques without the need of summary statistics. Using rapid light-cone simulations L-picola, mock galaxy catalogs are produced that fully incorporate the observational effects. The collection of galaxies is fed as input to a point cloud-based network,Minkowski-PointNet. We also add relatively more accurate Gadgetmocks to obtain robust and generalizable neural networks. By explicitly learning the representations that reduce the discrepancies between the two different data sets via the semantic alignment loss term, we show that the latent space configuration aligns into a single plane in which the two cosmological parameters form clear axes. Consequently, during inference, the SDSS BOSS LOWZ NGC catalog maps onto the plane, demonstrating effective generalization and improving prediction accuracy compared to non-generalized models. Results from the ensemble of 25 independently trained machines find Ωm= 0.339 ± 0.056 andσ8= 0.801 ± 0.061, inferred only from the distribution of galaxies in the light-cone slices without relying on any indirect summary statistics. A single machine that best adapts to the Gadgetmocks yields a tighter prediction of Ωm= 0.282 ± 0.014 andσ8= 0.786 ± 0.036. We emphasize that adaptation across multiple domains can enhance the robustness of the neural networks in observational data. 
                        more » 
                        « less   
                    This content will become publicly available on September 19, 2026
                            
                            Toward Robustness across Cosmological Simulation Models I llustris TNG, SIMBA, A strid , and S wift -E agle
                        
                    
    
            Abstract The rapid advancement of large-scale cosmological simulations has opened new avenues for cosmological and astrophysical research. However, the increasing diversity among cosmological simulation models presents a challenge to therobustness. In this work, we develop the Model-Insensitive ESTimator (Miest), a machine that canrobustlyestimate the cosmological parameters, Ωmandσ8, from neural hydrogen maps of simulation models in the Cosmology and Astrophysics with MachinE Learning Simulations project—IllustrisTNG,SIMBA, Astrid, and SWIFT-Eagle. An estimator is consideredrobustif it possesses a consistent predictive power across all simulations, including those used during the training phase. We train our machine using multiple simulation models and ensure that it only extracts common features between the models while disregarding the model-specific features. This allows us to develop a novel model that is capable of accurately estimating parameters across a range of simulation models, without being biased toward any particular model. Upon the investigation of the latent space—a set of summary statistics, we find that the implementation ofrobustnessleads to the blending of latent variables across different models, demonstrating the removal of model-specific features. In comparison to a standard machine lackingrobustness, the average performance of Mieston the unseen simulations during the training phase has been improved by ∼17% for Ωmand 38% forσ8. By using a machine learning approach that can extractrobust, yet physical features, we hope to improve our understanding of galaxy formation and evolution in a (subgrid) model-insensitive manner, and ultimately, gain insight into the underlying physical processes responsible forrobustness. 
        more » 
        « less   
        
    
                            - Award ID(s):
- 2108678
- PAR ID:
- 10643540
- Publisher / Repository:
- American Astronomical Society
- Date Published:
- Journal Name:
- The Astrophysical Journal
- Volume:
- 991
- Issue:
- 1
- ISSN:
- 0004-637X
- Page Range / eLocation ID:
- 120
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
- 
            
- 
            Abstract We present CAMELS-ASTRID, the third suite of hydrodynamical simulations in the Cosmology and Astrophysics with MachinE Learning (CAMELS) project, along with new simulation sets that extend the model parameter space based on the previous frameworks of CAMELS-TNG and CAMELS-SIMBA, to provide broader training sets and testing grounds for machine-learning algorithms designed for cosmological studies. CAMELS-ASTRID employs the galaxy formation model following the ASTRID simulation and contains 2124 hydrodynamic simulation runs that vary three cosmological parameters (Ωm,σ8, Ωb) and four parameters controlling stellar and active galactic nucleus (AGN) feedback. Compared to the existing TNG and SIMBA simulation suites in CAMELS, the fiducial model of ASTRID features the mildest AGN feedback and predicts the least baryonic effect on the matter power spectrum. The training set of ASTRID covers a broader variation in the galaxy populations and the baryonic impact on the matter power spectrum compared to its TNG and SIMBA counterparts, which can make machine-learning models trained on the ASTRID suite exhibit better extrapolation performance when tested on other hydrodynamic simulation sets. We also introduce extension simulation sets in CAMELS that widely explore 28 parameters in the TNG and SIMBA models, demonstrating the enormity of the overall galaxy formation model parameter space and the complex nonlinear interplay between cosmology and astrophysical processes. With the new simulation suites, we show that building robust machine-learning models favors training and testing on the largest possible diversity of galaxy formation models. We also demonstrate that it is possible to train accurate neural networks to infer cosmological parameters using the high-dimensional TNG-SB28 simulation set.more » « less
- 
            Abstract Cosmological simulations like CAMELS and IllustrisTNG characterize hundreds of thousands of galaxies using various internal properties. Previous studies have demonstrated that machine learning can be used to infer the cosmological parameter Ωmfrom the internal properties of even a single randomly selected simulated galaxy. This ability was hypothesized to originate from galaxies occupying a low-dimensional manifold within a higher-dimensional galaxy property space, which shifts with variations in Ωm. In this work, we investigate how galaxies occupy the high-dimensional galaxy property space, particularly the effect of Ωmand other cosmological and astrophysical parameters on the putative manifold. We achieve this by using an autoencoder with an information-ordered bottleneck, a neural layer with adaptive compression, to perform dimensionality reduction on individual galaxy properties from CAMELS simulations, which are run with various combinations of cosmological and astrophysical parameters. We find that for an autoencoder trained on the fiducial set of parameters, the reconstruction error increases significantly when the test set deviates from fiducial values of ΩmandASN1, indicating that these parameters shift galaxies off the fiducial manifold. In contrast, variations in other parameters such asσ8cause negligible error changes, suggesting galaxies shift along the manifold. These findings provide direct evidence that the ability to infer Ωmfrom individual galaxies is tied to the way Ωmshifts the manifold. Physically, this implies that parameters likeσ8produce galaxy property changes resembling natural scatter, while parameters like ΩmandASN1create unsampled properties, extending beyond the natural scatter in the fiducial model.more » « less
- 
            Abstract There is untapped cosmological information in galaxy redshift surveys in the nonlinear regime. In this work, we use theAemulussuite of cosmologicalN-body simulations to construct Gaussian process emulators of galaxy clustering statistics at small scales (0.1–50h−1Mpc) in order to constrain cosmological and galaxy bias parameters. In addition to standard statistics—the projected correlation functionwp(rp), the redshift-space monopole of the correlation functionξ0(s), and the quadrupoleξ2(s)—we emulate statistics that include information about the local environment, namely the underdensity probability functionPU(s) and the density-marked correlation functionM(s). This extends the model ofAemulusIII for redshift-space distortions by including new statistics sensitive to galaxy assembly bias. In recovery tests, we find that the beyond-standard statistics significantly increase the constraining power on cosmological parameters of interest: includingPU(s) andM(s) improves the precision of our constraints on Ωmby 27%,σ8by 19%, and the growth of structure parameter,fσ8, by 12% compared to standard statistics. We additionally find that scales below ∼6h−1Mpc contain as much information as larger scales. The density-sensitive statistics also contribute to constraining halo occupation distribution parameters and a flexible environment-dependent assembly bias model, which is important for extracting the small-scale cosmological information as well as understanding the galaxy–halo connection. This analysis demonstrates the potential of emulating beyond-standard clustering statistics at small scales to constrain the growth of structure as a test of cosmic acceleration.more » « less
- 
            Abstract Recent works have discovered a relatively tight correlation between Ωmand the properties of individual simulated galaxies. Because of this, it has been shown that constraints on Ωmcan be placed using the properties of individual galaxies while accounting for uncertainties in astrophysical processes such as feedback from supernovae and active galactic nuclei. In this work, we quantify whether using the properties of multiple galaxies simultaneously can tighten those constraints. For this, we train neural networks to perform likelihood-free inference on the value of two cosmological parameters (Ωmandσ8) and four astrophysical parameters using the properties of several galaxies from thousands of hydrodynamic simulations of the CAMELS project. We find that using properties of more than one galaxy increases the precision of the Ωminference. Furthermore, using multiple galaxies enables the inference of other parameters that were poorly constrained with one single galaxy. We show that the same subset of galaxy properties are responsible for the constraints on Ωmfrom one and multiple galaxies. Finally, we quantify the robustness of the model and find that without identifying the model range of validity, the model does not perform well when tested on galaxies from other galaxy formation models.more » « less
 An official website of the United States government
An official website of the United States government 
				
			 
					 
					
