We propose a chemical language processing model to predict polymers’ glass transition temperature (Tg) through a polymer language (SMILES, Simplified Molecular Input Line Entry System) embedding and recurrent neural network. This model only receives the SMILES strings of a polymer’s repeat units as inputs and considers the SMILES strings as sequential data at the character level. Using this method, there is no need to calculate any additional molecular descriptors or fingerprints of polymers, and thereby, being very computationally efficient. More importantly, it avoids the difficulties to generate molecular descriptors for repeat units containing polymerization point ‘*’. Results show that the trained model demonstrates reasonable prediction performance on unseen polymer’s Tg. Besides, this model is further applied for high-throughput screening on an unlabeled polymer database to identify high-temperature polymers that are desired for applications in extreme environments. Our work demonstrates that the SMILES strings of polymer repeat units can be used as an effective feature representation to develop a chemical language processing model for predictions of polymer Tg. The framework of this model is general and can be used to construct structure–property relationships for other polymer properties.
more »
« less
This content will become publicly available on March 1, 2026
Data-Driven Modeling and Design of Sustainable High Tg Polymers
This paper develops a machine learning methodology for the rapid and robust prediction of the glass transition temperature (Tg) for polymers for the targeted application of sustainable high-temperature polymers. The machine learning framework combines multiple techniques to develop a feature set encompassing all relative aspects of polymer chemistry, to extract and explain correlations between features and Tg, and to develop and apply a high-throughput predictive model. In this work, we identify aspects of the chemistry that most impact Tg, including a parameter related to rotational degrees of freedom and a backbone index based on a steric hindrance parameter. Building on this scientific understanding, models are developed on different types of data to ensure robustness, and experimental validation is obtained through the testing of new polymer chemistry with remarkable Tg. The ability of our model to predict Tg shows that the relevant information is contained within the topological descriptors, while the requirement of non-linear manifold transformation of the data also shows that the relationships are complex and cannot be captured through traditional regression approaches. Building on the scientific understanding obtained from the correlation analyses, coupled with the model performance, it is shown that the rigidity and interaction dynamics of the polymer structure are key to tuning for achieving targeted performance. This work has implications for future rapid optimization of chemistries
more »
« less
- Award ID(s):
- 2113695
- PAR ID:
- 10600339
- Publisher / Repository:
- MDPI
- Date Published:
- Journal Name:
- International Journal of Molecular Sciences
- Volume:
- 26
- Issue:
- 6
- ISSN:
- 1422-0067
- Page Range / eLocation ID:
- 2743
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
ABSTRACT The discovery of high‐performance shape memory polymers (SMPs) with enhanced glass transition temperatures (Tg) is of paramount importance in fields such as geothermal energy, oil and gas, aerospace, and other high‐temperature applications, where materials are required to exhibit shape memory effect at extremely high‐temperature conditions. Here, we employ a novel machine learning framework that integrates transfer learning with variational autoencoders (VAE) to efficiently explore the chemical design space of SMPs and identify new candidates with high Tg values. We systematically investigate the effect of different latent space dimensions on the VAE model performance. Several machine learning models are then trained to predict Tg. We find that the SVM model demonstrates the highest predictive accuracy, withR2values exceeding 0.87 and a mean absolute percentage error as low as 6.43% on the test set. Through systematic molar ratio adjustments and VAE‐based fingerprinting, we discover novel SMP candidates with Tg values between 190°C and 200°C, suitable for high‐temperature applications. These findings underscore the effectiveness of combining VAEs and SVM for SMP discovery, offering a scalable and efficient method for identifying polymers with tailored thermal properties.more » « less
-
The emergence of data-intensive scientific discovery and machine learning has dramatically changed the way in which scientists and engineers approach materials design. Nevertheless, for designing macromolecules or polymers, one limitation is the lack of appropriate methods or standards for converting systems into chemically informed, machine-readable representations. This featurization process is critical to building predictive models that can guide polymer discovery. Although standard molecular featurization techniques have been deployed on homopolymers, such approaches capture neither the multiscale nature nor topological complexity of copolymers, and they have limited application to systems that cannot be characterized by a single repeat unit. Herein, we present, evaluate, and analyze a series of featurization strategies suitable for copolymer systems. These strategies are systematically examined in diverse prediction tasks sourced from four distinct datasets that enable understanding of how featurization can impact copolymer property prediction. Based on this comparative analysis, we suggest directly encoding polymer size in polymer representations when possible, adopting topological descriptors or convolutional neural networks when the precise polymer sequence is known, and using chemically informed unit representations when developing extrapolative models. These results provide guidance and future directions regarding polymer featurization for copolymer design by machine learning.more » « less
-
Scientific communities are increasingly adopting machine learning and deep learning models in their applications to accelerate scientific insights. High performance computing systems are pushing the frontiers of performance with a rich diversity of hardware resources and massive scale-out capabilities. There is a critical need to understand fair and effective benchmarking of machine learning applications that are representative of real-world scientific use cases. MLPerf ™ is a community-driven standard to benchmark machine learning workloads, focusing on end-to-end performance metrics. In this paper, we introduce MLPerf HPC, a benchmark suite of large-scale scientific machine learning training applications, driven by the MLCommons ™ Association. We present the results from the first submission round including a diverse set of some of the world’s largest HPC systems. We develop a systematic framework for their joint analysis and compare them in terms of data staging, algorithmic convergence and compute performance. As a result, we gain a quantitative understanding of optimizations on different subsystems such as staging and on-node loading of data, compute-unit utilization and communication scheduling enabling overall >10× (end-to-end) performance improvements through system scaling. Notably, our analysis shows a scale-dependent interplay between the dataset size, a system’s memory hierarchy and training convergence that underlines the importance of near-compute storage. To overcome the data-parallel scalability challenge at large batch-sizes, we discuss specific learning techniques and hybrid data-and-model parallelism that are effective on large systems. We conclude by characterizing each benchmark with respect to low-level memory, I/O and network behaviour to parameterize extended roofline performance models in future rounds.more » « less
-
In Machine learning (ML) and deep learning (DL), hyperparameter tuning is the process of selecting the combination of optimal hyperparameters that give the best performance. Thus, the behavior of some machine learning (ML) and deep learning (DL) algorithms largely depend on their hyperparameters. While there has been a rapid growth in the application of machine learning (ML) and deep learning (DL) algorithms to Additive manufacturing (AM) techniques, little to no attention has been paid to carefully selecting and optimizing the hyperparameters of these algorithms in order to investigate their influence and achieve the best possible model performance. In this work, we demonstrate the effect of a grid search hyperparameter tuning technique on a Multilayer perceptron (MLP) model using datasets obtained from a Fused Filament Fabrication (FFF) AM process. The FFF dataset was extracted from the MakerBot MethodX 3D printer using internet of things (IoT) sensors. Three (3) hyperparameters were considered – the number of neurons in the hidden layer, learning rate, and the number of epochs. In addition, two different train-to-test ratios were considered to investigate their effects on the AM process data. The dataset consisted of five (5) dominant input parameters which include layer thickness, build orientation, extrusion temperature, building temperature, and print speed and three (3) output parameters: dimension accuracy, porosity, and tensile strength. RMSE, and the computational time, CT, were both selected as the hyperparameter performance metrics. The experimental results reveal the optimal configuration of hyperparameters that contributed to the best performance of the MLP model.more » « less
An official website of the United States government
