Abstract While the impact of machine learning (ML) has been felt everywhere, its effect has been most transformative where large, high-quality datasets are available. For promising materials spaces, such as transition metal coordination complexes and metal–organic frameworks, the large chemical diversity has not yet been matched by similarly large datasets, and computational datasets (e.g., from density functional theory) may not be predictive. Extraction of experimental data from the literature represents an alternative approach to the data-driven design of materials. This perspective will describe efforts in (i) extracting experimental data; (ii) associating extracted data with known chemical structures; (iii) leveraging data in ML and screening; (iv) designing materials with enriched stability; and (v) using experimental data to improve high-throughput workflows. I will summarize some of the outstanding challenges and opportunities for data enrichment with high-throughput experimentation and large language models. Graphical abstract
more »
« less
Automated crystal system identification from electron diffraction patterns using multiview opinion fusion machine learning
A bottleneck in high-throughput nanomaterials discovery is the pace at which new materials can be structurally characterized. Although current machine learning (ML) methods show promise for the automated processing of electron diffraction patterns (DPs), they fail in high-throughput experiments where DPs are collected from crystals with random orientations. Inspired by the human decision-making process, a framework for automated crystal system classification from DPs with arbitrary orientations was developed. A convolutional neural network was trained using evidential deep learning, and the predictive uncertainties were quantified and leveraged to fuse multiview predictions. Using vector map representations of DPs, the framework achieves a testing accuracy of 0.94 in the examples considered, is robust to noise, and retains remarkable accuracy using experimental data. This work highlights the ability of ML to be used to accelerate experimental high-throughput materials data analytics.
more »
« less
- Award ID(s):
- 2219489
- PAR ID:
- 10524898
- Publisher / Repository:
- PNAS
- Date Published:
- Journal Name:
- Proceedings of the National Academy of Sciences
- Volume:
- 120
- Issue:
- 46
- ISSN:
- 0027-8424
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
ABSTRACT Block copolymers play a vital role in materials science due to their diverse self‐assembly behavior. Traditionally, exploring the block copolymer self‐assembly and associated structure–property relationships involve iterative synthesis, characterization, and theory, which is labor‐intensive both experimentally and computationally. Here, we introduce a versatile, high‐throughput workflow toward materials discovery that integrates controlled polymerization and automated chromatographic separation with a novel physics‐informed machine‐learning algorithm for the rapid analysis of small‐angle X‐ray scattering data. Leveraging the expansive and high‐quality experimental data sets generated by fractionating polymers using automated chromatography, this machine‐learning method effectively reduces data dimensionality by extracting chemical‐independent features from SAXS data. This new approach allows for the rapid and accurate prediction of morphologies without repetitive and time‐consuming manual analysis, achieving out‐of‐sample predictive accuracy of around 95% for both novel and existing materials in the training data set. By focusing on a subset of samples with large predictive uncertainty, only a small fraction of the samples needs to be inspected to further improve accuracy. Collectively, the synergistic combination of controlled synthesis, automated chromatography, and data‐driven analysis creates a powerful workflow that markedly expedites the discovery of structure–property relationships in advanced soft materials.more » « less
-
Supramolecular polymer blends (SPBs) represent a versatile class of polymers whose morphology directly determines their macroscopic properties. However, rational design of SPBs remains hindered by the lack of predictive models describing how molecular features and intermolecular interactions determine morphology. Here, we report a data-driven high-throughput workflow integrating modular synthesis, robotic sample formulation and processing, automated morphology characterization, and machine learning (ML) for SPBs discovery. Using a plug-and-play modular synthetic strategy, 33 hydrogen-bonding end-functional homopolymer precursors were prepared and orthogonally paired to fabricate 260 SPBs within one day. A custom automated atomic force microscopy (AFM) protocol enabled systematic morphological characterization, producing 2340 images with little human intervention. Average phase separation sizes (e.g. domain spacings) was extracted from processed AFM data using multiple complementary approaches and applied to ML model training. Leveraging the high-throughput sample formation and characterization, a high-quality database was curated for SPBs, allowing training of ML models. Guided by support vector regression (SVR) model, target morphologies of 50, 100, and 150 nm were successfully predicted and experimentally validated. This work demonstrates the potential of coupling high-throughput experimentation with ML to accelerate polymer blends phase discovery, providing one of the first large-scale, experimentally derived datasets specifically designed for supramolecular polymer research.more » « less
-
AbstractComputational methods and machine learning (ML) are reshaping materials science by accelerating their discovery, design, and optimization. Traditional approaches such as density functional theory and molecular dynamics have been instrumental in studying materials at the atomic level. However, their high computational cost and, in certain cases, limited accuracy can restrict the scope ofin silicoexploration. ML promises to accelerate material property prediction and design. However, in many areas, the volume and fidelity of the data are critical barriers. Active learning can reduce the reliance on large data sets, and simulation has emerged as a critical tool for generating data on the fly. Despite these advances, challenges remain, particularly in data quality, model interpretability, and bridging the gap between computational predictions and experimental validation. Future research should develop automated frameworks capable of designing and testing materials for specific applications, and integrating ML with traditional simulations and experiments can contribute to this goal. Graphic abstractmore » « less
-
Per- and polyfluoroalkyl substances (PFAS) contamination has posed a significant environmental and public health challenge due to their ubiquitous nature. Adsorption has emerged as a promising remediation technique, yet optimizing adsorption efficiency remains complex due to the diverse physicochemical properties of PFAS and the wide range of adsorbent materials. Traditional modeling approaches, such as response surface methodology (RSM), struggled to capture nonlinear interactions, while standalone machine learning (ML) models required extensive datasets. This study addressed these limitations by developing hybrid RSM-ML models to improve the prediction and optimization of PFAS adsorption. A comprehensive dataset was constructed using experimental adsorption data, integrating key parameters such as pH, pHpzc, surface area, temperature, and PFAS molecular properties. RSM was employed to model adsorption behavior, while gradient boosting (GB), random forest (RF), and extreme gradient boosting (XGB) were used to enhance predictive performance. Hybrid models—linear, RMSE-based, multiplicative, and meta-learning—were developed and evaluated. The meta-learning HOP-RSM-GB model achieved near-perfect accuracy (R² = 1.00, RMSE = 10.59), outperforming all other models. Surface plots revealed that low pH and high pHpzc maximized the adsorption while increasing log Kow consistently enhanced PFAS adsorption. These findings establish hybrid RSM-ML modeling as a powerful framework for optimizing PFAS remediation strategies. The integration of statistical and machine learning approaches significantly improves predictive accuracy, reduces experimental costs, and provides deeper insights into adsorption mechanisms. This study underscores the importance of data-driven approaches in environmental engineering and highlights future opportunities for integrating ML-driven modeling with experimental adsorption research.more » « less
An official website of the United States government

