Abstract As machine learning (ML) has matured, it has opened a new frontier in theoretical and computational chemistry by offering the promise of simultaneous paradigm shifts in accuracy and efficiency. Nowhere is this advance more needed, but also more challenging to achieve, than in the discovery of open‐shell transition metal complexes. Here, localizeddorfelectrons exhibit variable bonding that is challenging to capture even with the most computationally demanding methods. Thus, despite great promise, clear obstacles remain in constructing ML models that can supplement or even replace explicit electronic structure calculations. In this article, I outline the recent advances in building ML models in transition metal chemistry, including the ability to approach sub‐kcal/mol accuracy on a range of properties with tailored representations, to discover and enumerate complexes in large chemical spaces, and to reveal opportunities for design through analysis of feature importance. I discuss unique considerations that have been essential to enabling ML in open‐shell transition metal chemistry, including (a) the relationship of data set size/diversity, model complexity, and representation choice, (b) the importance of quantitative assessments of both theory and model domain of applicability, and (c) the need to enable autonomous generation of reliable, large data sets both for ML model training and in active learning or discovery contexts. Finally, I summarize the next steps toward making ML a mainstream tool in the accelerated discovery of transition metal complexes. This article is categorized under: Electronic Structure Theory > Density Functional Theory Software > Molecular Modeling Computer and Information Science > Chemoinformatics
more »
« less
Machine learning for automated experimentation in scanning transmission electron microscopy
Abstract Machine learning (ML) has become critical for post-acquisition data analysis in (scanning) transmission electron microscopy, (S)TEM, imaging and spectroscopy. An emerging trend is the transition to real-time analysis and closed-loop microscope operation. The effective use of ML in electron microscopy now requires the development of strategies for microscopy-centric experiment workflow design and optimization. Here, we discuss the associated challenges with the transition to active ML, including sequential data analysis and out-of-distribution drift effects, the requirements for edge operation, local and cloud data storage, and theory in the loop operations. Specifically, we discuss the relative contributions of human scientists and ML agents in the ideation, orchestration, and execution of experimental workflows, as well as the need to develop universal hyper languages that can apply across multiple platforms. These considerations will collectively inform the operationalization of ML in next-generation experimentation.
more »
« less
- Award ID(s):
- 2215789
- PAR ID:
- 10481094
- Publisher / Repository:
- Nature Publishing Group
- Date Published:
- Journal Name:
- npj Computational Materials
- Volume:
- 9
- Issue:
- 1
- ISSN:
- 2057-3960
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Understanding chromatin organization requires integrating measurements of genome connectivity and physical structure. It is well established that cohesin is essential for TAD and loop connectivity features in Hi-C, but the corresponding change in physical structure has not been studied using electron microscopy. Pairing chromatin scanning transmission electron tomography with multiomic analysis and single-molecule localization microscopy, we study the role of cohesin in regulating the conformationally defined chromatin nanoscopic packing domains. Our results indicate that packing domains are not physical manifestation of TADs. Using electron microscopy, we found that only 20% of packing domains are lost upon RAD21 depletion. The effect of RAD21 depletion is restricted to small, poorly packed (nascent) packing domains. In addition, we present evidence that cohesin-mediated loop extrusion generates nascent domains that undergo maturation through nucleosome posttranslational modifications. Our results demonstrate that a 3D genomic structure, composed of packing domains, is generated through cohesin activity and nucleosome modifications.more » « less
-
ABSTRACT CRISPR-Cas12a is widely used for genome editing and biomarker detection since it can create targeted double-stranded DNA breaks and promote non-specific DNA cleavage after identifying specific DNA. To mitigate the off-target DNA cleavage of Cas12a, we previously developed aFrancisella novicidaCas12a variant (FnoCas12aKD2P) by introducing double proline substitutions (K969P/D970P) in a conserved helix called the bridge helix (BH). In this work, we used cryogenic electron microscopy (cryoEM) to understand the molecular mechanisms of BH-mediated activation of Cas12a. We captured five structures of FnoCas12aKD2Pat different states of conformational activation. Comparison with wild-type (FnoCas12aWT) structures unravels a mechanism where BH acts as a trigger that allosterically activates REC lobe movements by tracking the number of base pairs in the growing RNA-DNA hybrid to undergo a loop-to-helical transition and bending to latch onto the hybrid. The transition of the BH is coupled to the previously reported loop-to-helix transition of the “lid”, essential for opening RuvC endonuclease, through direct interactions of residues of the BH and the lid. We also observe structural details of cooperativity of BH and “helix-1” of RuvC for activation, a previously proposed interaction. Overall, our study enables development of high-fidelity Cas12a and Cas9 variants by BH-modifications.more » « less
-
Abstract We discuss the emerging advances and opportunities at the intersection of machine learning (ML) and climate physics, highlighting the use of ML techniques, including supervised, unsupervised, and equation discovery, to accelerate climate knowledge discoveries and simulations. We delineate two distinct yet complementary aspects: (a) ML for climate physics and (b) ML for climate simulations. Although physics-free ML-based models, such as ML-based weather forecasting, have demonstrated success when data are abundant and stationary, the physics knowledge and interpretability of ML models become crucial in the small-data/nonstationary regime to ensure generalizability. Given the absence of observations, the long-term future climate falls into the small-data regime. Therefore, ML for climate physics holds a critical role in addressing the challenges of ML for climate simulations. We emphasize the need for collaboration among climate physics, ML theory, and numerical analysis to achieve reliable ML-based models for climate applications.more » « less
-
Abstract Piezoresponse force microscopy (PFM) is routinely used to probe the nanoscale electromechanical response of ferroelectric and piezoelectric materials. However, many challenges remain in the interpretation of the recovered signal. Specifically, many non‐ferroelectric contributions affect the measured response, ranging from electrostatics, to charge injection and trapping, and topographic cross‐talk. Recently, machine learning (ML) has been utilized to identify multiple contributors within complex data systems, such as PFM response. A substantial advancement in ML approaches for PFM techniques is offered by dimensional stacking, enabling encoding of physical and/or chemical correlations within the materials' response across different data dimensions spanning varying ranges. However, dimensional stacking requires appropriate scaling for each dimension (before ML analysis) to minimize undesired information loss. Here, the impact of clustering globally and locally scaled parameters in polarization switching experiments via resonant PFM (RPFM) are discussed. Specifically, dimensional stacking of scaled parameters can mask or enhance ferroelectric and non‐ferroelectric behaviors, and aid identification of various physical phenomena contributing to the measured RPFM response. This study highlights the importance of data curation for ML, and its role in identifying signal contributors to scanning probe microscopy (SPM)‐based techniques with multidimensional data, such as resonant and/or spectroscopic SPM.more » « less
An official website of the United States government
