In this position paper, we describe research on knowledge graph-empowered materials science prediction and discovery. The research consists of several key components including ontology mapping, materials data annotation, and information extraction from unstructured scholarly articles. We argue that although big data generated by simulations and experiments have motivated and accelerated the data-driven science, the distribution and heterogeneity of materials science-related big data hinders major advancements in the field. Knowledge graphs, as semantic hubs, integrate disparate data and provide a feasible solution to addressing this challenge. We design a knowledge-graph based approach for data discovery, extraction, and integration in materials science.
more »
« less
The frontier of simulation-based inference
Many domains of science have developed complex simulations to describe phenomena of interest. While these simulations provide high-fidelity models, they are poorly suited for inference and lead to challenging inverse problems. We review the rapidly developing field of simulation-based inference and identify the forces giving additional momentum to the field. Finally, we describe how the frontier is expanding so that a broad audience can appreciate the profound influence these developments may have on science.
more »
« less
- PAR ID:
- 10157149
- Publisher / Repository:
- Proceedings of the National Academy of Sciences
- Date Published:
- Journal Name:
- Proceedings of the National Academy of Sciences
- ISSN:
- 0027-8424
- Page Range / eLocation ID:
- Article No. 201912789
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
ABSTRACT Ecology often seeks to answer causal questions, and while ecologists have a rich history of experimental approaches, novel observational data streams and the need to apply insights across naturally occurring conditions pose opportunities and challenges. Other fields have developed causal inference approaches that can enhance and expand our ability to answer ecological causal questions using observational or experimental data. However, the lack of comprehensive resources applying causal inference to ecological settings and jargon from multiple disciplines creates barriers. We introduce approaches for causal inference, discussing the main frameworks for counterfactual causal inference, how causal inference differs from other research aims and key challenges; the application of causal inference in experimental and quasi‐experimental study designs; appropriate interpretation of the results of causal inference approaches given their assumptions and biases; foundational papers; and the data requirements and trade‐offs between internal and external validity posed by different designs. We highlight that these designs generally prioritise internal validity over generalisability. Finally, we identify opportunities and considerations for ecologists to further integrate causal inference with synthesis science and meta‐analysis and expand the spatiotemporal scales at which causal inference is possible. We advocate for ecology as a field to collectively define best practices for causal inference.more » « less
-
Developing methods of automated inference that are able to provide users with compelling human-readable justifications for why the answer to a question is correct is critical for domains such as science and medicine, where user trust and detecting costly errors are limiting factors to adoption. One of the central barriers to training question answering models on explainable inference tasks is the lack of gold explanations to serve as training data. In this paper we present a corpus of explanations for standardized science exams, a recent challenge task for question answering. We manually construct a corpus of detailed explanations for nearly all publicly available standardized elementary science question (approximately 1,680 3 rd through 5 th grade questions) and represent these as “explanation graphs” - sets of lexically overlapping sentences that describe how to arrive at the correct answer to a question through a combination of domain and world knowledge. We also provide an explanation-centered tablestore, a collection of semi-structured tables that contain the knowledge to construct these elementary science explanations. Together, these two knowledge resources map out a substantial portion of the knowledge required for answering and explaining elementary science exams, and provide both structured and free-text training data for the explainable inference task.more » « less
-
Biologists routinely fit novel and complex statistical models to push the limits of our understanding. Examples include, but are not limited to, flexible Bayesian approaches (e.g. BUGS, stan), frequentist and likelihood‐based approaches (e.g. packageslme4) and machine learning methods.These software and programs afford the user greater control and flexibility in tailoring complex hierarchical models. However, this level of control and flexibility places a higher degree of responsibility on the user to evaluate the robustness of their statistical inference. To determine how often biologists are running model diagnostics on hierarchical models, we reviewed 50 recently published papers in 2021 in the journalNature Ecology & Evolution, and we found that the majority of published papers didnotreport any validation of their hierarchical models, making it difficult for the reader to assess the robustness of their inference. This lack of reporting likely stems from a lack of standardized guidance for best practices and standard methods.Here, we provide a guide to understanding and validating complex models using data simulations. To determine how often biologists use data simulation techniques, we also reviewed 50 recently published papers in 2021 in the journalMethods Ecology & Evolution. We found that 78% of the papers that proposed a new estimation technique, package or model used simulations or generated data in some capacity (18 of 23 papers); but very few of those papers (5 of 23 papers) included either a demonstration that the code could recover realistic estimates for a dataset with known parameters or a demonstration of the statistical properties of the approach. To distil the variety of simulations techniques and their uses, we provide a taxonomy of simulation studies based on the intended inference. We also encourage authors to include a basic validation study whenever novel statistical models are used, which in general, is easy to implement.Simulating data helps a researcher gain a deeper understanding of the models and their assumptions and establish the reliability of their estimation approaches. Wider adoption of data simulations by biologists can improve statistical inference, reliability and open science practices.more » « less
-
Schmidt, Dirk; Vernet, Elise; Jackson, Kathryn J (Ed.)We present progress on a conceptual design for a new Keck multi-conjugate adaptive optics system capable of visible light correction with a near-diffraction-limited spatial resolution. The KOLA (Keck Optical LGS AO) system will utilize a planned adaptive secondary mirror (ASM), 2 additional high-altitude deformable mirrors (DMs), and ≳ 8 laser guide stars (LGS) to sense and correct atmospheric turbulence. The field of regard for selecting guide stars will be 2’ and the corrected science field of view will be 60”. We describe science cases, system requirements, and performance simulations for the system performed with error budget spreadsheet tools and MAOS physical optics simulations. We will also present results from trade studies for the actuator count on the ASM. KOLA will feed a new optical imager and IFU spectrograph in addition to the planned Liger optical + infrared (λ > 850 nm) imager and IFU spectrograph. Performance simulations show KOLA will deliver a Strehl of 12% at g’, 21% at r’, 53% at Y, and 87% at K bands on axis with nearly uniform image quality over a 40”×40” field of view in the optical and over 60”×60” beyond 1 μm. Ultimately, the system will deliver spatial resolutions superior to HST and JWST (∼17 mas at r’-band) and comparable to the planned first-generation infrared AO systems for the ELTs.more » « less
An official website of the United States government
