skip to main content


Search for: All records

Creators/Authors contains: "Morgan, Dane"

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Abstract

    One compelling vision of the future of materials discovery and design involves the use of machine learning (ML) models to predict materials properties and then rapidly find materials tailored for specific applications. However, realizing this vision requires both providing detailed uncertainty quantification (model prediction errors and domain of applicability) and making models readily usable. At present, it is common practice in the community to assess ML model performance only in terms of prediction accuracy (e.g. mean absolute error), while neglecting detailed uncertainty quantification and robust model accessibility and usability. Here, we demonstrate a practical method for realizing both uncertainty and accessibility features with a large set of models. We develop random forest ML models for 33 materials properties spanning an array of data sources (computational and experimental) and property types (electrical, mechanical, thermodynamic, etc). All models have calibrated ensemble error bars to quantify prediction uncertainty and domain of applicability guidance enabled by kernel-density-estimate-based feature distance measures. All data and models are publicly hosted on the Garden-AI infrastructure, which provides an easy-to-use, persistent interface for model dissemination that permits models to be invoked with only a few lines of Python code. We demonstrate the power of this approach by using our models to conduct a fully ML-based materials discovery exercise to search for new stable, highly active perovskite oxide catalyst materials.

     
    more » « less
  2. The rapid development and large body of literature on machine learning potentials (MLPs) can make it difficult to know how to proceed for researchers who are not experts but wish to use these tools. The spirit of this review is to help such researchers by serving as a practical, accessible guide to the state-of-the-art in MLPs. This review paper covers a broad range of topics related to MLPs, including (i) central aspects of how and why MLPs are enablers of many exciting advancements in molecular modeling, (ii) the main underpinnings of different types of MLPs, including their basic structure and formalism, (iii) the potentially transformative impact of universal MLPs for both organic and inorganic systems, including an overview of the most recent advances, capabilities, downsides, and potential applications of this nascent class of MLPs, (iv) a practical guide for estimating and understanding the execution speed of MLPs, including guidance for users based on hardware availability, type of MLP used, and prospective simulation size and time, (v) a manual for what MLP a user should choose for a given application by considering hardware resources, speed requirements, energy and force accuracy requirements, as well as guidance for choosing pre-trained potentials or fitting a new potential from scratch, (vi) discussion around MLP infrastructure, including sources of training data, pre-trained potentials, and hardware resources for training, (vii) summary of some key limitations of present MLPs and current approaches to mitigate such limitations, including methods of including long-range interactions, handling magnetic systems, and treatment of excited states, and finally (viii) we finish with some more speculative thoughts on what the future holds for the development and application of MLPs over the next 3-10+ years. 
    more » « less
    Free, publicly-accessible full text available January 13, 2026
  3. Accurate and comprehensive material databases extracted from research papers are crucial for ma- terials science and engineering, but their development requires significant human effort. With large language models (LLMs) transforming the way humans interact with text, LLMs provide an oppor- tunity to revolutionize data extraction. In this study, we demonstrate a simple and efficient method for extracting materials data from full-text research papers leveraging the capabilities of LLMs com- bined with human supervision. This approach is particularly suitable for mid-sized databases and requires minimal to no coding or prior knowledge about the extracted property. It offers high recall and nearly perfect precision in the resulting database. The method is easily adaptable to new and superior language models, ensuring continued utility. We show this by evaluating and comparing its performance on GPT-3 and GPT-3.5/4 (which underlie ChatGPT), as well as free alternatives such as BART and DeBERTaV3. We provide a detailed analysis of the method’s performance in extracting sentences containing bulk modulus data, achieving up to 90% precision at 96% recall, depending on the amount of human effort involved. We further demonstrate the method’s broader effectiveness by developing a database of critical cooling rates for metallic glasses over twice the size of previous human curated databases. 
    more » « less
    Free, publicly-accessible full text available June 12, 2025
  4. Abstract

    There has been a growing effort to replace manual extraction of data from research papers with automated data extraction based on natural language processing, language models, and recently, large language models (LLMs). Although these methods enable efficient extraction of data from large sets of research papers, they require a significant amount of up-front effort, expertise, and coding. In this work, we propose the method that can fully automate very accurate data extraction with minimal initial effort and background, using an advanced conversational LLM. consists of a set of engineered prompts applied to a conversational LLM that both identify sentences with data, extract that data, and assure the data’s correctness through a series of follow-up questions. These follow-up questions largely overcome known issues with LLMs providing factually inaccurate responses. can be applied with any conversational LLMs and yields very high quality data extraction. In tests on materials data, we find precision and recall both close to 90% from the best conversational LLMs, like GPT-4. We demonstrate that the exceptional performance is enabled by the information retention in a conversational model combined with purposeful redundancy and introducing uncertainty through follow-up prompts. These results suggest that approaches similar to , due to their simplicity, transferability, and accuracy are likely to become powerful tools for data extraction in the near future. Finally, databases for critical cooling rates of metallic glasses and yield strengths of high entropy alloys are developed using .

     
    more » « less
  5. In this work, we propose a linear machine learning force matching approach that can directly extract pair atomic interactions from ab initio calculations in amorphous structures. The local feature representation is specifically chosen to make the linear weights a force field as a force/potential function of the atom pair distance. Consequently, this set of functions is the closest representation of the ab initio forces, given the two-body approximation and finite scanning in the configurational space. We validate this approach in amorphous silica. Potentials in the new force field (consisting of tabulated Si–Si, Si–O, and O–O potentials) are significantly different than existing potentials that are commonly used for silica, even though all of them produce the tetrahedral network structure and roughly similar glass properties. This suggests that the commonly used classical force fields do not offer fundamentally accurate representations of the atomic interaction in silica. The new force field furthermore produces a lower glass transition temperature (Tg ∼ 1800 K) and a positive liquid thermal expansion coefficient, suggesting the extraordinarily high Tg and negative liquid thermal expansion of simulated silica could be artifacts of previously developed classical potentials. Overall, the proposed approach provides a fundamental yet intuitive way to evaluate two-body potentials against ab initio calculations, thereby offering an efficient way to guide the development of classical force fields.

     
    more » « less
  6. In laser powder bed fusion processes, keyholes are the gaseous cavities formed where laser interacts with metal, and their morphologies play an important role in defect formation and the final product quality. The in-situ X-ray imaging technique can monitor the keyhole dynamics from the side and capture keyhole shapes in the X-ray image stream. Keyhole shapes in X-ray images are then often labeled by humans for analysis, which increasingly involves attempting to correlate keyhole shapes with defects using machine learning. However, such labeling is tedious, time-consuming, error-prone, and cannot be scaled to large data sets. To use keyhole shapes more readily as the input to machine learning methods, an automatic tool to identify keyhole regions is desirable. In this paper, a deep-learning-based computer vision tool that can automatically segment keyhole shapes out of X-ray images is presented. The pipeline contains a filtering method and an implementation of the BASNet deep learning model to semantically segment the keyhole morphologies out of X-ray images. The presented tool shows promising average accuracy of 91.24% for keyhole area, and 92.81% for boundary shape, for a range of test dataset conditions in Al6061 (and one AliSi10Mg) alloys, with 300 training images/labels and 100 testing images for each trial. Prospective users may apply the presently trained tool or a retrained version following the approach used here to automatically label keyhole shapes in large image sets.

     
    more » « less
  7. Abstract

    Electron counting can be performed algorithmically for monolithic active pixel sensor direct electron detectors to eliminate readout noise and Landau noise arising from the variability in the amount of deposited energy for each electron. Errors in existing counting algorithms include mistakenly counting a multielectron strike as a single electron event, and inaccurately locating the incident position of the electron due to lateral spread of deposited energy and dark noise. Here, we report a supervised deep learning (DL) approach based on Faster region-based convolutional neural network (R-CNN) to recognize single electron events at varying electron doses and voltages. The DL approach shows high accuracy according to the near-ideal modulation transfer function (MTF) and detector quantum efficiency for sparse images. It predicts, on average, 0.47 pixel deviation from the incident positions for 200 kV electrons versus 0.59 pixel using the conventional counting method. The DL approach also shows better robustness against coincidence loss as the electron dose increases, maintaining the MTF at half Nyquist frequency above 0.83 as the electron density increases to 0.06 e−/pixel. Thus, the DL model extends the advantages of counting analysis to higher dose rates than conventional methods.

     
    more » « less