skip to main content

Title: Inferring Halo Masses with Graph Neural Networks
Abstract Understanding the halo–galaxy connection is fundamental in order to improve our knowledge on the nature and properties of dark matter. In this work, we build a model that infers the mass of a halo given the positions, velocities, stellar masses, and radii of the galaxies it hosts. In order to capture information from correlations among galaxy properties and their phase space, we use Graph Neural Networks (GNNs), which are designed to work with irregular and sparse data. We train our models on galaxies from more than 2000 state-of-the-art simulations from the Cosmology and Astrophysics with MachinE Learning Simulations project. Our model, which accounts for cosmological and astrophysical uncertainties, is able to constrain the masses of the halos with a ∼0.2 dex accuracy. Furthermore, a GNN trained on a suite of simulations is able to preserve part of its accuracy when tested on simulations run with a different code that utilizes a distinct subgrid physics model, showing the robustness of our method. The PyTorch Geometric implementation of the GNN is publicly available on GitHub ( ).  more » « less
Award ID(s):
Author(s) / Creator(s):
; ; ; ; ; ; ; ; ;
Date Published:
Journal Name:
The Astrophysical Journal
Page Range / eLocation ID:
Medium: X
Sponsoring Org:
National Science Foundation
More Like this

    Cosmological simulations are reaching the resolution necessary to study ultra-faint dwarf galaxies. Observations indicate that in small populations, the stellar initial mass function (IMF) is not fully populated; rather, stars are sampled in a way that can be approximated as coming from an underlying probability density function. To ensure the accuracy of cosmological simulations in the ultra-faint regime, we present an improved treatment of the IMF. We implement a self-consistent, stochastically populated IMF in cosmological hydrodynamic simulations. We test our method using high-resolution simulations of a Milky Way halo, run to z = 6, yielding a sample of nearly 100 galaxies. We also use an isolated dwarf galaxy to investigate the resulting systematic differences in galaxy properties. We find that a stochastic IMF in simulations makes feedback burstier, strengthening feedback, and quenching star formation earlier in small dwarf galaxies. For galaxies in haloes with mass ≲ 108.5 M⊙, a stochastic IMF typically leads to lower stellar mass compared to a continuous IMF, sometimes by more than an order of magnitude. We show that existing methods of ensuring discrete supernovae incorrectly determine the mass of the star particle and its associated feedback. This leads to overcooling of surrounding gas, with at least ∼10 per cent higher star formation and ∼30 per cent higher cold gas content. Going forwards, to accurately model dwarf galaxies and compare to observations, it will be necessary to incorporate a stochastically populated IMF that samples the full spectrum of stellar masses.

    more » « less

    In order to prepare for the upcoming wide-field cosmological surveys, large simulations of the Universe with realistic galaxy populations are required. In particular, the tendency of galaxies to naturally align towards overdensities, an effect called intrinsic alignments (IA), can be a major source of systematics in the weak lensing analysis. As the details of galaxy formation and evolution relevant to IA cannot be simulated in practice on such volumes, we propose as an alternative a Deep Generative Model. This model is trained on the IllustrisTNG-100 simulation and is capable of sampling the orientations of a population of galaxies so as to recover the correct alignments. In our approach, we model the cosmic web as a set of graphs, where the graphs are constructed for each halo, and galaxy orientations as a signal on those graphs. The generative model is implemented on a Generative Adversarial Network architecture and uses specifically designed Graph-Convolutional Networks sensitive to the relative 3D positions of the vertices. Given (sub)halo masses and tidal fields, the model is able to learn and predict scalar features such as galaxy and dark matter subhalo shapes; and more importantly, vector features such as the 3D orientation of the major axis of the ellipsoid and the complex 2D ellipticities. For correlations of 3D orientations the model is in good quantitative agreement with the measured values from the simulation, except for at very small and transition scales. For correlations of 2D ellipticities, the model is in good quantitative agreement with the measured values from the simulation on all scales. Additionally, the model is able to capture the dependence of IA on mass, morphological type, and central/satellite type.

    more » « less

    The physical origin of the seeds of supermassive black holes (SMBHs), with postulated initial masses ranging from ∼105 M⊙ to as low as ∼102 M⊙, is currently unknown. Most existing cosmological hydrodynamic simulations adopt very simple, ad hoc prescriptions for BH seeding and seed at unphysically high masses ∼105–106 M⊙. In this work, we introduce a novel sub-grid BH seeding model for cosmological simulations that is directly calibrated to high-resolution zoom simulations that explicitly resolve ∼103 M⊙ seeds forming within haloes with pristine, dense gas. We trace the BH growth along galaxy merger trees until their descendants reach masses of ∼104 or 105 M⊙. The results are used to build a new stochastic seeding model that directly seeds these descendants in lower resolution versions of our zoom region. Remarkably, we find that by seeding the descendants simply based on total galaxy mass, redshift and an environmental richness parameter, we can reproduce the results of the detailed gas-based seeding model. The baryonic properties of the host galaxies are well reproduced by the mass-based seeding criterion. The redshift-dependence of the mass-based criterion captures the combined influence of halo growth, dense gas formation, and metal enrichment on the formation of ∼103 M⊙ seeds. The environment-based seeding criterion seeds the descendants in rich environments with higher numbers of neighbouring galaxies. This accounts for the impact of unresolved merger dominated growth of BHs, which produces faster growth of descendants in richer environments with more extensive BH merger history. Our new seed model will be useful for representing a variety of low-mass seeding channels within next-generation larger volume uniform cosmological simulations.

    more » « less
  4. Abstract

    We are entering an era in which we will be able to detect and characterize hundreds of dwarf galaxies within the Local Volume. It is already known that a strong dichotomy exists in the gas content and star formation properties of field dwarf galaxies versus satellite dwarfs of larger galaxies. In this work, we study the more subtle differences that may be detectable in galaxies as a function of distance from a massive galaxy, such as the Milky Way. We compare smoothed particle hydrodynamic simulations of dwarf galaxies formed in a Local Volume-like environment (several megaparsecs away from a massive galaxy) to those formed nearer to Milky Way–mass halos. We find that the impact of environment on dwarf galaxies extends even beyond the immediate region surrounding Milky Way–mass halos. Even before being accreted as satellites, dwarf galaxies near a Milky Way–mass halo tend to have higher stellar masses for their halo mass than more isolated galaxies. Dwarf galaxies in high-density environments also tend to grow faster and form their stars earlier. We show observational predictions that demonstrate how these trends manifest in lower quenching rates, higher Hifractions, and bluer colors for more isolated dwarf galaxies.

    more » « less

    We present a machine learning (ML) approach for the prediction of galaxies’ dark matter halo masses which achieves an improved performance over conventional methods. We train three ML algorithms (XGBoost, random forests, and neural network) to predict halo masses using a set of synthetic galaxy catalogues that are built by populating dark matter haloes in N-body simulations with galaxies and that match both the clustering and the joint distributions of properties of galaxies in the Sloan Digital Sky Survey (SDSS). We explore the correlation of different galaxy- and group-related properties with halo mass, and extract the set of nine features that contribute the most to the prediction of halo mass. We find that mass predictions from the ML algorithms are more accurate than those from halo abundance matching (HAM) or dynamical mass estimates (DYN). Since the danger of this approach is that our training data might not accurately represent the real Universe, we explore the effect of testing the model on synthetic catalogues built with different assumptions than the ones used in the training phase. We test a variety of models with different ways of populating dark matter haloes, such as adding velocity bias for satellite galaxies. We determine that, though training and testing on different data can lead to systematic errors in predicted masses, the ML approach still yields substantially better masses than either HAM or DYN. Finally, we apply the trained model to a galaxy and group catalogue from the SDSS DR7 and present the resulting halo masses.

    more » « less