skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Neural Networks as Optimal Estimators to Marginalize Over Baryonic Effects
Abstract Many different studies have shown that a wealth of cosmological information resides on small, nonlinear scales. Unfortunately, there are two challenges to overcome to utilize that information. First, we do not know the optimal estimator that will allow us to retrieve the maximum information. Second, baryonic effects impact that regime significantly and in a poorly understood manner. Ideally, we would like to use an estimator that extracts the maximum cosmological information while marginalizing over baryonic effects. In this work we show that neural networks can achieve that when considering some simple scenarios. We made use of data where the maximum amount of cosmological information is known: power spectra and 2D Gaussian density fields. We also contaminate the data with simplified baryonic effects and train neural networks to predict the value of the cosmological parameters. For this data, we show that neural networks can (1) extract the maximum available cosmological information, (2) marginalize over baryonic effects, and (3) extract cosmological information that is buried in the regime dominated by baryonic physics. We also show that neural networks learn the priors of the data they are trained on, affecting their extrapolation properties. We conclude that a promising strategy to maximize the scientific return of cosmological experiments is to train neural networks on state-of-the-art numerical simulations with different strengths and implementations of baryonic effects.  more » « less
Award ID(s):
2108944
PAR ID:
10331327
Author(s) / Creator(s):
; ; ; ; ; ;
Date Published:
Journal Name:
The Astrophysical Journal
Volume:
928
Issue:
1
ISSN:
0004-637X
Page Range / eLocation ID:
44
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract We train graph neural networks on halo catalogs from Gadget N -body simulations to perform field-level likelihood-free inference of cosmological parameters. The catalogs contain ≲5000 halos with masses ≳10 10 h −1 M ⊙ in a periodic volume of ( 25 h − 1 Mpc ) 3 ; every halo in the catalog is characterized by several properties such as position, mass, velocity, concentration, and maximum circular velocity. Our models, built to be permutationally, translationally, and rotationally invariant, do not impose a minimum scale on which to extract information and are able to infer the values of Ω m and σ 8 with a mean relative error of ∼6%, when using positions plus velocities and positions plus masses, respectively. More importantly, we find that our models are very robust: they can infer the value of Ω m and σ 8 when tested using halo catalogs from thousands of N -body simulations run with five different N -body codes: Abacus, CUBEP 3 M, Enzo, PKDGrav3, and Ramses. Surprisingly, the model trained to infer Ω m also works when tested on thousands of state-of-the-art CAMELS hydrodynamic simulations run with four different codes and subgrid physics implementations. Using halo properties such as concentration and maximum circular velocity allow our models to extract more information, at the expense of breaking the robustness of the models. This may happen because the different N -body codes are not converged on the relevant scales corresponding to these parameters. 
    more » « less
  2. Abstract Upcoming photometric surveys will discover tens of thousands of Type Ia supernovae (SNe Ia), vastly outpacing the capacity of our spectroscopic resources. In order to maximize the scientific return of these observations in the absence of spectroscopic information, we must accurately extract key parameters, such as SN redshifts, with photometric information alone. We present Photo-zSNthesis, a convolutional neural network-based method for predicting full redshift probability distributions from multi-band supernova lightcurves, tested on both simulated Sloan Digital Sky Survey (SDSS) and Vera C. Rubin Legacy Survey of Space and Time data as well as observed SDSS SNe. We show major improvements over predictions from existing methods on both simulations and real observations as well as minimal redshift-dependent bias, which is a challenge due to selection effects, e.g., Malmquist bias. Specifically, we show a 61× improvement in prediction bias 〈Δz〉 on PLAsTiCC simulations and 5× improvement on real SDSS data compared to results from a widely used photometric redshift estimator, LCFIT+Z. The PDFs produced by this method are well constrained and will maximize the cosmological constraining power of photometric SNe Ia samples. 
    more » « less
  3. Multiview analysis aims to extract common information from data entities across different domains (e.g., acoustic, visual, text). Canonical correlation analysis (CCA) is one of the classic tools for this problem, which estimates the shared latent information via linear transforming the different views of data. CCA has also been generalized to the nonlinear regime, where kernel methods and neural networks are introduced to replace the linear transforms. While the theoretical aspects of linear CCA are relatively well understood, nonlinear multiview analysis is still largely intuition-driven. In this work, our interest lies in the identifiability of shared latent information under a nonlinear multiview analysis framework. We propose a model identification criterion for learning latent information from multiview data, under a reasonable data generating model. We show that minimizing this criterion leads to identification of the latent shared information up to certain indeterminacy. We also propose a neural network based implementation and an efficient algorithm to realize the criterion. Our analysis is backed by experiments on both synthetic and real data. 
    more » « less
  4. Injecting discrete logical constraints into neural network learning is one of the main challenges in neuro-symbolic AI. We find that a straight-through-estimator, a method introduced to train binary neural networks, could effectively be applied to incorporate logical constraints into neural network learning. More specifically, we design a systematic way to represent discrete logical constraints as a loss function; minimizing this loss using gradient descent via a straight-through-estimator updates the neural network's weights in the direction that the binarized outputs satisfy the logical constraints. The experimental results show that by leveraging GPUs and batch training, this method scales significantly better than existing neuro-symbolic methods that require heavy symbolic computation for computing gradients. Also, we demonstrate that our method applies to different types of neural networks, such as MLP, CNN, and GNN, making them learn with no or fewer labeled data by learning directly from known constraints. 
    more » « less
  5. Abstract Artificial neural networks are increasingly used for geophysical modeling to extract complex nonlinear patterns from geospatial data. However, it is difficult to understand how networks make predictions, limiting trust in the model, debugging capacity, and physical insights. EXplainable Artificial Intelligence (XAI) techniques expose how models make predictions, but XAI results may be influenced by correlated features. Geospatial data typically exhibit substantial autocorrelation. With correlated input features, learning methods can produce many networks that achieve very similar performance (e.g., arising from different initializations). Since the networks capture different relationships, their attributions can vary. Correlated features may also cause inaccurate attributions because XAI methods typically evaluate isolated features, whereas networks learn multifeature patterns. Few studies have quantitatively analyzed the influence of correlated features on XAI attributions. We use a benchmark framework of synthetic data with increasingly strong correlation, for which the ground truth attribution is known. For each dataset, we train multiple networks and compare XAI-derived attributions to the ground truth. We show that correlation may dramatically increase the variance of the derived attributions, and investigate the cause of the high variance: is it because different trained networks learn highly different functions or because XAI methods become less faithful in the presence of correlation? Finally, we show XAI applied to superpixels, instead of single grid cells, substantially decreases attribution variance. Our study is the first to quantify the effects of strong correlation on XAI, to investigate the reasons that underlie these effects, and to offer a promising way to address them. 
    more » « less