  Free, publicly-accessible full text available April 1, 2023
  2. Abstract We present the Cosmology and Astrophysics with Machine Learning Simulations (CAMELS) Multifield Data set (CMD), a collection of hundreds of thousands of 2D maps and 3D grids containing many different properties of cosmic gas, dark matter, and stars from more than 2000 distinct simulated universes at several cosmic times. The 2D maps and 3D grids represent cosmic regions that span ∼100 million light-years and have been generated from thousands of state-of-the-art hydrodynamic and gravity-only N -body simulations from the CAMELS project. Designed to train machine-learning models, CMD is the largest data set of its kind containing more than 70 TB of data. In this paper we describe CMD in detail and outline a few of its applications. We focus our attention on one such task, parameter inference, formulating the problems we face as a challenge to the community. We release all data and provide further technical details at .
    We investigate the sensitivity to the effects of lensing magnification on large-scale structure analyses combining photometric cosmic shear and galaxy clustering data (i.e. the now commonly called ‘3 × 2-point’ analysis). Using a Fisher matrix bias formalism, we disentangle the contribution to the bias on cosmological parameters caused by ignoring the effects of magnification in a theory fit from individual elements in the data vector, for Stage-III and Stage-IV surveys. We show that the removal of elements of the data vectors that are dominated by magnification does not guarantee a reduction in the cosmological bias due to the magnification signal, but can instead increase the sensitivity to magnification. We find that the most sensitive elements of the data vector come from the shear-clustering cross-correlations, particularly between the highest redshift shear bin and any lower redshift lens sample, and that the parameters ΩM, $S_8=\sigma _8\sqrt{\Omega _\mathrm{ M}/0.3}$, and w0 show the most significant biases for both survey models. Our forecasts predict that current analyses are not significantly biased by magnification, but this bias will become highly significant with the continued increase of statistical power in the near future. We therefore conclude that future surveys should measure and model the magnification as partmore »of their flagship ‘3 × 2-point’ analysis.

