Abstract As the next generation of large galaxy surveys come online, it is becoming increasingly important to develop and understand the machine-learning tools that analyze big astronomical data. Neural networks are powerful and capable of probing deep patterns in data, but they must be trained carefully on large and representative data sets. We present a new “hump” of the Cosmology and Astrophysics with MachinE Learning Simulations (CAMELS) project: CAMELS-SAM, encompassing one thousand dark-matter-only simulations of (100h−1cMpc)3with different cosmological parameters (Ωmandσ8) and run through the Santa Cruz semi-analytic model for galaxy formation over a broad range of astrophysical parameters. As a proof of concept for the power of this vast suite of simulated galaxies in a large volume and broad parameter space, we probe the power of simple clustering summary statistics to marginalize over astrophysics and constrain cosmology using neural networks. We use the two-point correlation, count-in-cells, and void probability functions, and we probe nonlinear and linear scales across 0.68 <R<27h−1cMpc. We find our neural networks can both marginalize over the uncertainties in astrophysics to constrain cosmology to 3%–8% error across various types of galaxy selections, while simultaneously learning about the SC-SAM astrophysical parameters. This work encompasses vital first steps toward creating algorithms able to marginalize over the uncertainties in our galaxy formation models and measure the underlying cosmology of our Universe. CAMELS-SAM has been publicly released alongside the rest of CAMELS, and it offers great potential to many applications of machine learning in astrophysics:https://camels-sam.readthedocs.io.
more »
« less
Debiasing with Diffusion: Probabilistic Reconstruction of Dark Matter Fields from Galaxies with CAMELS
Abstract Galaxies are biased tracers of the underlying cosmic web, which is dominated by dark matter (DM) components that cannot be directly observed. Galaxy formation simulations can be used to study the relationship between DM density fields and galaxy distributions. However, this relationship can be sensitive to assumptions in cosmology and astrophysical processes embedded in galaxy formation models, which remain uncertain in many aspects. In this work, we develop a diffusion generative model to reconstruct DM fields from galaxies. The diffusion model is trained on the CAMELS simulation suite that contains thousands of state-of-the-art galaxy formation simulations with varying cosmological parameters and subgrid astrophysics. We demonstrate that the diffusion model can predict the unbiased posterior distribution of the underlying DM fields from the given stellar density fields while being able to marginalize over uncertainties in cosmological and astrophysical models. Interestingly, the model generalizes to simulation volumes ≈500 times larger than those it was trained on and across different galaxy formation models. The code for reproducing these results can be found athttps://github.com/victoriaono/variational-diffusion-cdm✎.
more »
« less
- Award ID(s):
- 2019786
- PAR ID:
- 10528355
- Publisher / Repository:
- DOI PREFIX: 10.3847
- Date Published:
- Journal Name:
- The Astrophysical Journal
- Volume:
- 970
- Issue:
- 2
- ISSN:
- 0004-637X
- Format(s):
- Medium: X Size: Article No. 174
- Size(s):
- Article No. 174
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Abstract While space-borne optical and near-infrared facilities have succeeded in delivering a precise and spatially resolved picture of our Universe, their small survey area is known to underrepresent the true diversity of galaxy populations. Ground-based surveys have reached comparable depths but at lower spatial resolution, resulting in source confusion that hampers accurate photometry extractions. What once was limited to the infrared regime has now begun to challenge ground-based ultradeep surveys, affecting detection and photometry alike. Failing to address these challenges will mean forfeiting a representative view into the distant Universe. We introduceThe Farmer: an automated, reproducible profile-fitting photometry package that pairs a library of smooth parametric models fromThe Tractorwith a decision tree that determines the best-fit model in concert with neighboring sources. Photometry is measured by fitting the models on other bands leaving brightness free to vary. The resulting photometric measurements are naturally total, and no aperture corrections are required. Supporting diagnostics (e.g.,χ2) enable measurement validation. As fitting models is relatively time intensive,The Farmeris built with high-performance computing routines. We benchmarkThe Farmeron a set of realistic COSMOS-like images and find accurate photometry, number counts, and galaxy shapes.The Farmeris already being utilized to produce catalogs for several large-area deep extragalactic surveys where it has been shown to tackle some of the most challenging optical and near-infrared data available, with the promise of extending to other ultradeep surveys expected in the near future.The Farmeris available to download from GitHub (https://github.com/astroweaver/the_farmer) and Zenodo (https://doi.org/10.5281/zenodo.8205817).more » « less
-
Abstract New observational facilities are probing astrophysical transients such as stellar explosions and gravitational-wave sources at ever-increasing redshifts, while also revealing new features in source property distributions. To interpret these observations, we need to compare them to predictions from stellar population models. Such models require the metallicity-dependent cosmic star formation history ( ) as an input. Large uncertainties remain in the shape and evolution of this function. In this work, we propose a simple analytical function for . Variations of this function can be easily interpreted because the parameters link to its shape in an intuitive way. We fit our analytical function to the star-forming gas of the cosmological TNG100 simulation and find that it is able to capture the main behavior well. As an example application, we investigate the effect of systematic variations in the parameters on the predicted mass distribution of locally merging binary black holes. Our main findings are that (i) the locations of features are remarkably robust against variations in the metallicity-dependent cosmic star formation history, and (ii) the low-mass end is least affected by these variations. This is promising as it increases our chances of constraining the physics that govern the formation of these objects (https://github.com/LiekeVanSon/SFRD_fit/tree/7348a1ad0d2ed6b78c70d5100fb3cd2515493f02/).more » « less
-
dadi-cli: Automated and distributed population genetic model inference from allele frequency spectraAbstract Summarydadi is a popular software package for inferring models of demographic history and natural selection from population genomic data. But using dadi requires Python scripting and manual parallelization of optimization jobs. We developed dadi-cli to simplify dadi usage and also enable straighforward distributed computing. Availability and Implementationdadi-cli is implemented in Python and released under the Apache License 2.0. The source code is available athttps://github.com/xin-huang/dadi-cli. dadi-cli can be installed via PyPI and conda, and is also available through Cacao on Jetstream2https://cacao.jetstream-cloud.org/.more » « less
-
Abstract The abundance of faint dwarf galaxies is determined by the underlying population of low-mass dark matter (DM) halos and the efficiency of galaxy formation in these systems. Here, we quantify potential galaxy formation and DM constraints from future dwarf satellite galaxy surveys. We generate satellite populations using a suite of Milky Way (MW)–mass cosmological zoom-in simulations and an empirical galaxy–halo connection model, and assess sensitivity to galaxy formation and DM signals when marginalizing over galaxy–halo connection uncertainties. We find that a survey of all satellites around one MW-mass host can constrain a galaxy formation cutoff at peak virial masses of at the 1σlevel; however, a tail toward low prevents a 2σmeasurement. In this scenario, combining hosts with differing bright satellite abundances significantly reduces uncertainties on at the 1σlevel, but the 2σtail toward low persists. We project that observations of one (two) complete satellite populations can constrain warm DM models withmWDM≈ 10 keV (20 keV). Subhalo mass function (SHMF) suppression can be constrained to ≈70%, 60%, and 50% that in cold dark matter (CDM) at peak virial masses of 108, 109, and 1010M⊙, respectively; SHMF enhancement constraints are weaker (≈20, 4, and 2 times that in CDM, respectively) due to galaxy–halo connection degeneracies. These results motivate searches for faint dwarf galaxies beyond the MW and indicate that ongoing missions like Euclid and upcoming facilities including the Vera C. Rubin Observatory and Nancy Grace Roman Space Telescope will probe new galaxy formation and DM physics.more » « less