Many domains of science have developed complex simulations to describe phenomena of interest. While these simulations provide high-fidelity models, they are poorly suited for inference and lead to challenging inverse problems. We review the rapidly developing field of simulation-based inference and identify the forces giving additional momentum to the field. Finally, we describe how the frontier is expanding so that a broad audience can appreciate the profound influence these developments may have on science.
more » « less- NSF-PAR ID:
- 10157149
- Publisher / Repository:
- Proceedings of the National Academy of Sciences
- Date Published:
- Journal Name:
- Proceedings of the National Academy of Sciences
- ISSN:
- 0027-8424
- Page Range / eLocation ID:
- Article No. 201912789
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
In this position paper, we describe research on knowledge graph-empowered materials science prediction and discovery. The research consists of several key components including ontology mapping, materials data annotation, and information extraction from unstructured scholarly articles. We argue that although big data generated by simulations and experiments have motivated and accelerated the data-driven science, the distribution and heterogeneity of materials science-related big data hinders major advancements in the field. Knowledge graphs, as semantic hubs, integrate disparate data and provide a feasible solution to addressing this challenge. We design a knowledge-graph based approach for data discovery, extraction, and integration in materials science.more » « less
-
We present geometrical and physical optics simulation results for the Simons Observatory Large Aperture Telescope. This work was developed as part of the general design process for the telescope, allowing us to evaluate the impact of various design choices on performance metrics and potential systematic effects. The primary goal of the simulations was to evaluate the final design of the reflectors and the cold optics that are now being built. We describe nonsequential ray tracing used to inform the design of the cold optics, including absorbers internal to each optics tube. We discuss ray tracing simulations of the telescope structure that allow us to determine geometries that minimize detector loading and mitigate spurious near-field effects that have not been resolved by the internal baffling. We also describe physical optics simulations, performed over a range of frequencies and field locations, that produce estimates of monochromatic far-field beam patterns, which in turn are used to gauge general optical performance. Finally, we describe simulations that shed light on beam sidelobes from panel gap diffraction.
-
Developing methods of automated inference that are able to provide users with compelling human-readable justifications for why the answer to a question is correct is critical for domains such as science and medicine, where user trust and detecting costly errors are limiting factors to adoption. One of the central barriers to training question answering models on explainable inference tasks is the lack of gold explanations to serve as training data. In this paper we present a corpus of explanations for standardized science exams, a recent challenge task for question answering. We manually construct a corpus of detailed explanations for nearly all publicly available standardized elementary science question (approximately 1,680 3 rd through 5 th grade questions) and represent these as “explanation graphs” - sets of lexically overlapping sentences that describe how to arrive at the correct answer to a question through a combination of domain and world knowledge. We also provide an explanation-centered tablestore, a collection of semi-structured tables that contain the knowledge to construct these elementary science explanations. Together, these two knowledge resources map out a substantial portion of the knowledge required for answering and explaining elementary science exams, and provide both structured and free-text training data for the explainable inference task.more » « less
-
ABSTRACT We investigate the accuracy requirements for field-level inference of cluster and void masses using data from galaxy surveys. We introduce a two-step framework that takes advantage of the fact that cluster masses are determined by flows on larger scales than the clusters themselves. First, we determine the integration accuracy required to perform field-level inference of cosmic initial conditions on these large scales by fitting to late-time galaxy counts using the Bayesian Origin Reconstruction from Galaxies (BORG) algorithm. A 20-step COLA integrator is able to accurately describe the density field surrounding the most massive clusters in the local super-volume ($\lt 135\, {h^{-1}\mathrm{\, Mpc}}$), but does not by itself lead to converged virial mass estimates. Therefore, we carry out ‘posterior resimulations’, using full N-body dynamics while sampling from the inferred initial conditions, and thereby obtain estimates of masses for nearby massive clusters. We show that these are in broad agreement with existing estimates, and find that mass functions in the local super-volume are compatible with ΛCDM.
-
Abstract Background Genetic barcoding provides a high-throughput way to simultaneously track the frequencies of large numbers of competing and evolving microbial lineages. However making inferences about the nature of the evolution that is taking place remains a difficult task.
Results Here we describe an algorithm for the inference of fitness effects and establishment times of beneficial mutations from barcode sequencing data, which builds upon a Bayesian inference method by enforcing self-consistency between the population mean fitness and the individual effects of mutations within lineages. By testing our inference method on a simulation of 40,000 barcoded lineages evolving in serial batch culture, we find that this new method outperforms its predecessor, identifying more adaptive mutations and more accurately inferring their mutational parameters.
Conclusion Our new algorithm is particularly suited to inference of mutational parameters when read depth is low. We have made Python code for our serial dilution evolution simulations, as well as both the old and new inference methods, available on GitHub (
https://github.com/FangfeiLi05/FitMut2 ), in the hope that it can find broader use by the microbial evolution community.