skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Transfer Learning Facilitates the Prediction of Polymer–Surface Adhesion Strength
Machine learning (ML) accelerates the exploration of material properties and their links to the structure of the underlying molecules. In previous work [Shi et al. ACS Applied Materials & Interfaces 2022, 14, 37161−37169.], ML models were applied to predict the adhesive free energy of polymer–surface interactions with high accuracy from the knowledge of the sequence data, demonstrating successes in inverse-design of polymer sequence for known surface compositions. While the method was shown to be successful in designing polymers for a known surface, extensive data sets were needed for each specific surface in order to train the surrogate models. Ideally, one should be able to infer information about similar surfaces without having to regenerate a full complement of adhesion data for each new case. In the current work, we demonstrate a transfer learning (TL) technique using a deep neural network to improve the accuracy of ML models trained on small data sets by pretraining on a larger database from a related system and fine-tuning the weights of all layers with a small amount of additional data. The shared knowledge from the pretrained model facilitates the prediction accuracy significantly on small data sets. We also explore the limits of database size on accuracy and the optimal tuning of network architecture and parameters for our learning tasks. While applied to a relatively simple coarse-grained (CG) polymer model, the general lessons of this study apply to detailed modeling studies and the broader problems of inverse materials design.  more » « less
Award ID(s):
2048285 2143346
PAR ID:
10423098
Author(s) / Creator(s):
; ; ; ;
Date Published:
Journal Name:
Journal of Chemical Theory and Computation
ISSN:
1549-9618
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Polymeric membranes have become essential for energy-efficient gas separations such as natural gas sweetening, hydrogen separation, and carbon dioxide capture. Polymeric membranes face challenges like permeability-selectivity tradeoffs, plasticization, and physical aging, limiting their broader applicability. Machine learning (ML) techniques are increasingly used to address these challenges. This review covers current ML applications in polymeric gas separation membrane design, focusing on three key components: polymer data, representation methods, and ML algorithms. Exploring diverse polymer datasets related to gas separation, encompassing experimental, computational, and synthetic data, forms the foundation of ML applications. Various polymer representation methods are discussed, ranging from traditional descriptors and fingerprints to deep learning-based embeddings. Furthermore, we examine diverse ML algorithms applied to gas separation polymers. It provides insights into fundamental concepts such as supervised and unsupervised learning, emphasizing their applications in the context of polymer membranes. The review also extends to advanced ML techniques, including data-centric and model-centric methods, aimed at addressing challenges unique to polymer membranes, focusing on accurate screening and inverse design. 
    more » « less
  2. In the past decade, academia and industry have embraced machine learning (ML) for database management system (DBMS) automation. These efforts have focused on designing ML models that predict DBMS behavior to support picking actions (e.g., building indexes) that improve the system's performance. Recent developments in ML have created automated methods for finding good models. Such advances shift the bottleneck from DBMS model design to obtaining the training data necessary for building these models. But generating good training data is challenging and requires encoding subject matter expertise into DBMS instrumentation. Existing methods for training data collection are bespoke to individual DBMS components and do not account for (1) how workload trends affect the system and (2) the subtle interactions between internal system components. Consequently, the models created from this data do not support holistic tuning across subsystems and require frequent retraining to boost their accuracy. This paper presents the architecture of a database gym, an integrated environment that provides a unified API of pluggable components for obtaining high-quality training data. The goal of a database gym is to simplify ML model training and evaluation to accelerate autonomous DBMS research. But unlike gyms in other domains that rely on custom simulators, a database gym uses the DBMS itself to create simulation environments for ML training. Thus, we discuss and prescribe methods for overcoming challenges in DBMS simulation, which include demanding requirements for performance, simulation fidelity, and DBMS-generated hints for guiding training processes. 
    more » « less
  3. Design optimization, and particularly adjoint-based multi-physics shape and topology optimization, is time-consuming and often requires expensive iterations to converge to desired designs. In response, researchers have developed Machine Learning (ML) approaches — often referred to as Inverse Design methods — to either replace or accelerate tools like Topology optimization (TO). However, these methods have their own hidden, non-trivial costs including that of data generation, training, and refinement of ML-produced designs. This begs the question: when is it actually worth learning Inverse Design, compared to just optimizing designs without ML assistance? This paper quantitatively addresses this question by comparing the costs and benefits of three different Inverse Design ML model families on a Topology Optimization (TO) task, compared to just running the optimizer by itself. We explore the relationship between the size of training data and the predictive power of each ML model, as well as the computational and training costs of the models and the extent to which they accelerate or hinder TO convergence. The results demonstrate that simpler models, such as K-Nearest Neighbors and Random Forests, are more effective for TO warmstarting with limited training data, while more complex models, such as Deconvolutional Neural Networks, are preferable with more data. We also emphasize the need to balance the benefits of using larger training sets with the costs of data generation when selecting the appropriate ID model. Finally, the paper addresses some challenges that arise when using ML predictions to warmstart optimization, and provides some suggestions for budget and resource management. 
    more » « less
  4. Surface tension is a critical property that influences polymer behavior at interfaces and affects applications ranging from coatings to biomedical devices. Traditional experimental methods for measuring polymer surface tension are time-consuming, costly, and sensitive to environmental conditions. Computational approaches such as molecular dynamics (MD) simulations are valuable but computationally intensive, especially for polymers with long chains. This study investigates the use of machine learning (ML) techniques to predict polymer surface tension using different levels of molecular representation, focusing on multilinear regression (MLR), random forest (RF), and graph neural networks (GNNs). A data set of 317 homopolymers collected from the PolyInfo database is used to train and evaluate these models. Descriptors are derived at various levels of complexity, ranging from manually calculated features to graph-based representations. The GNN approach captures the intrinsic connectivity of polymer structures, while the MLR and RF models rely on manually crafted descriptors. The performance of these models is compared with experimental data, with the GNN model demonstrating superior accuracy due to its ability to directly learn from molecular graphs. Our results show that GNNs can better capture complex nonlinear relationships in polymer structures than traditional descriptorbased methods, suggesting their significant potential for accelerating polymer design and development. The study also includes validation of model predictions against molecular dynamics simulations, highlighting the potential of GNNs to accurately model polymer interfacial properties. 
    more » « less
  5. null (Ed.)
    The ever-increasing demand for novel polymers with superior properties requires a deeper understanding and exploration of the chemical space. Recently, data-driven approaches to explore the chemical space for polymer design have emerged. Among them, inverse design strategies for designing polymers with specific properties have evolved to be a significant materials informatics platform by learning hidden knowledge from materials data as well as smartly navigating the chemical space in an optimized way. In this review, we first summarize the progress in the representation of polymers, a prerequisite step for the inverse design of polymers. Then, we systematically introduce three data-driven strategies implemented for the inverse design of polymers, i.e. , high-throughput virtual screening, global optimization, and generative models. Finally, we discuss the challenges and opportunities of the data-driven strategies as well as optimization algorithms employed in the inverse design of polymers. 
    more » « less