skip to main content

Title: Machine learning for automated experimentation in scanning transmission electron microscopy

Machine learning (ML) has become critical for post-acquisition data analysis in (scanning) transmission electron microscopy, (S)TEM, imaging and spectroscopy. An emerging trend is the transition to real-time analysis and closed-loop microscope operation. The effective use of ML in electron microscopy now requires the development of strategies for microscopy-centric experiment workflow design and optimization. Here, we discuss the associated challenges with the transition to active ML, including sequential data analysis and out-of-distribution drift effects, the requirements for edge operation, local and cloud data storage, and theory in the loop operations. Specifically, we discuss the relative contributions of human scientists and ML agents in the ideation, orchestration, and execution of experimental workflows, as well as the need to develop universal hyper languages that can apply across multiple platforms. These considerations will collectively inform the operationalization of ML in next-generation experimentation.

more » « less
Author(s) / Creator(s):
; ; ; ; ; ; ; ; ; ; ;
Publisher / Repository:
Nature Publishing Group
Date Published:
Journal Name:
npj Computational Materials
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract

    We develop the first molecular dynamics model of airway mucus based on the detailed physical properties and chemical structure of the predominant gel‐forming mucin MUC5B. Our airway mucus model leverages the LAMMPS open‐source code [], based on the statistical physics of polymers, from single molecules to networks. On top of the LAMMPS platform, the chemical structure of MUC5B is used to superimpose proximity‐based, noncovalent, transient interactions within and between the specific domains of MUC5B polymers. We explore feasible ranges of hydrophobic and electrostatic interaction strengths between MUC5B domains with 9 nm spatial and 1 ns temporal resolution. Our goal here is to propose and test a mechanistic hypothesis for a striking clinical observation with respect to airway mucus: a 10‐fold increase in nonswellable, dense structures called flakes during progression of cystic fibrosis disease. Among the myriad possible effects that might promote self‐organization of MUC5B networks into flake structures, we hypothesize and confirm that the clinically confirmed increase in mucin concentration, from 1.5 to 5 mg/ml, alone is sufficient to drive the structure changes observed with scanning electron microscopy images from experimental samples. We postprocess the LAMMPS simulated data sets at 1.5 and 5 mg/ml, both to image the structure transition and compare with scanning electron micrographs and to show that the 3.33‐fold increase in concentration induces closer proximity of interacting electrostatic and hydrophobic domains, thereby amplifying the proximity‐based strength of the interactions.

    more » « less
  2. Abstract

    As machine learning (ML) has matured, it has opened a new frontier in theoretical and computational chemistry by offering the promise of simultaneous paradigm shifts in accuracy and efficiency. Nowhere is this advance more needed, but also more challenging to achieve, than in the discovery of open‐shell transition metal complexes. Here, localizeddorfelectrons exhibit variable bonding that is challenging to capture even with the most computationally demanding methods. Thus, despite great promise, clear obstacles remain in constructing ML models that can supplement or even replace explicit electronic structure calculations. In this article, I outline the recent advances in building ML models in transition metal chemistry, including the ability to approach sub‐kcal/mol accuracy on a range of properties with tailored representations, to discover and enumerate complexes in large chemical spaces, and to reveal opportunities for design through analysis of feature importance. I discuss unique considerations that have been essential to enabling ML in open‐shell transition metal chemistry, including (a) the relationship of data set size/diversity, model complexity, and representation choice, (b) the importance of quantitative assessments of both theory and model domain of applicability, and (c) the need to enable autonomous generation of reliable, large data sets both for ML model training and in active learning or discovery contexts. Finally, I summarize the next steps toward making ML a mainstream tool in the accelerated discovery of transition metal complexes.

    This article is categorized under:

    Electronic Structure Theory > Density Functional Theory

    Software > Molecular Modeling

    Computer and Information Science > Chemoinformatics

    more » « less
  3. Abstract

    Identifying point defects and other structural anomalies using scanning transmission electron microscopy (STEM) is important to understand a material's properties caused by the disruption of the regular pattern of crystal lattice. Due to improvements in instrumentation stability and electron optics, atomic‐resolution images with a field of view of several hundred nanometers can now be routinely acquired at 1–10 Hz frame rates and such data, which often contain thousands of atomic columns, need to be analyzed. To date, image analysis is performed largely manually, but recent developments in computer vision (CV) and machine learning (ML) now enable automated analysis of atomic structures and associated defects. Here, the authors report on how a Convolutional Variational Autoencoder (CVAE) can be utilized to detect structural anomalies in atomic‐resolution STEM images. Specifically, the training set is limited to perfect crystal images , and the performance of a CVAE in differentiating between single‐crystal bulk data or point defects is demonstrated. It is found that the CVAE can reproduce the perfect crystal data but not the defect input data. The disagreesments between the CVAE‐predicted data for defects allows for a clear and automatic distinction and differentiation of several point defect types.

    more » « less
  4. Abstract

    The rise of automation and machine learning (ML) in electron microscopy has the potential to revolutionize materials research through autonomous data collection and processing. A significant challenge lies in developing ML models that rapidly generalize to large data sets under varying experimental conditions. We address this by employing a cycle generative adversarial network (CycleGAN) with a reciprocal space discriminator, which augments simulated data with realistic spatial frequency information. This allows the CycleGAN to generate images nearly indistinguishable from real data and provide labels for ML applications. We showcase our approach by training a fully convolutional network (FCN) to identify single atom defects in a 4.5 million atom data set, collected using automated acquisition in an aberration-corrected scanning transmission electron microscope (STEM). Our method produces adaptable FCNs that can adjust to dynamically changing experimental variables with minimal intervention, marking a crucial step towards fully autonomous harnessing of microscopy big data.

    more » « less
  5. Abstract

    Machine learning (ML) has become a valuable tool to assist and improve materials characterization, enabling automated interpretation of experimental results with techniques such as X-ray diffraction (XRD) and electron microscopy. Because ML models are fast once trained, there is a key opportunity to bring interpretation in-line with experiments and make on-the-fly decisions to achieve optimal measurement effectiveness, which creates broad opportunities for rapid learning and information extraction from experiments. Here, we demonstrate such a capability with the development of autonomous and adaptive XRD. By coupling an ML algorithm with a physical diffractometer, this method integrates diffraction and analysis such that early experimental information is leveraged to steer measurements toward features that improve the confidence of a model trained to identify crystalline phases. We validate the effectiveness of an adaptive approach by showing that ML-driven XRD can accurately detect trace amounts of materials in multi-phase mixtures with short measurement times. The improved speed of phase detection also enables in situ identification of short-lived intermediate phases formed during solid-state reactions using a standard in-house diffractometer. Our findings showcase the advantages of in-line ML for materials characterization and point to the possibility of more general approaches for adaptive experimentation.

    more » « less