Molecular Representation Learning (MRL) has proven impactful in numerous biochemical applications such as drug discovery and enzyme design. While Graph Neural Networks (GNNs) are effective at learning molecular representations from a 2D molecular graph or a single 3D structure, existing works often overlook the flexible nature of molecules, which continuously interconvert across conformations via chemical bond rotations and minor vibrational perturbations. To better account for molecular flexibility, some recent works formulate MRL as an ensemble learning problem, focusing on explicitly learning from a set of conformer structures. However, most of these studies have limited datasets, tasks, and models. In this work, we introduce the first MoleculAR Conformer Ensemble Learning (MARCEL) benchmark to thoroughly evaluate the potential of learning on con- former ensembles and suggest promising research directions. MARCEL includes four datasets covering diverse molecule- and reaction-level properties of chemically diverse molecules including organocatalysts and transition-metal catalysts, extending beyond the scope of common GNN benchmarks that are confined to drug-like molecules. In addition, we conduct a comprehensive empirical study, which benchmarks representative 1D, 2D, and 3D MRL models, along with two strategies that explicitly incorporate conformer ensembles into 3D models. Our findings reveal that direct learning from an accessible conformer space can improve performance on a variety of tasks and models.
more »
« less
Generative BigSMILES: an extension for polymer informatics, computer simulations & ML/AI
The BigSMILES notation, a concise tool for polymer ensemble representation, is augmented here by introducing an enhanced version called generative BigSMILES. G-BigSMILES is designed for generative workflows, and is complemented by tailored software tools for ease of use. This extension integrates additional data, including reactivity ratios (or connection probabilities among repeat units), molecular weight distributions, and ensemble size. An algorithm, interpretable as a generative graph is devised that utilizes these data, enabling molecule generation from defined polymer ensembles. Consequently, the G-BigSMILES notation allows for efficient specification of complex molecular ensembles via a streamlined line notation, thereby providing a foundational tool for automated polymeric materials design. In addition, the graph interpretation of the G-BigSMILES notation sets the stage for robust machine learning methods capable of encapsulating intricate polymeric ensembles. The combination of G-BigSMILES with advanced machine learning techniques will facilitate straightforward property determination and in silico polymeric material synthesis automation. This integration has the potential to significantly accelerate materials design processes and advance the field of polymer science.
more »
« less
- Award ID(s):
- 2134795
- PAR ID:
- 10479766
- Publisher / Repository:
- Publishing
- Date Published:
- Journal Name:
- Digital Discovery
- ISSN:
- 2635-098X
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Quantifying the differences between networks is a challenging and ever-present problem in network science. In recent years, a multitude of diverse, ad hoc solutions to this problem have been introduced. Here, we propose that simple and well-understood ensembles of random networks—such as Erdős–Rényi graphs, random geometric graphs, Watts–Strogatz graphs, the configuration model and preferential attachment networks—are natural benchmarks for network comparison methods. Moreover, we show that the expected distance between two networks independently sampled from a generative model is a useful property that encapsulates many key features of that model. To illustrate our results, we calculate this within-ensemble graph distance and related quantities for classic network models (and several parameterizations thereof) using 20 distance measures commonly used to compare graphs. The within-ensemble graph distance provides a new framework for developers of graph distances to better understand their creations and for practitioners to better choose an appropriate tool for their particular task.more » « less
-
Abstract Polymers play an integral role in various applications, from everyday use to advanced technologies. In the era of machine learning (ML), polymer informatics has become a vital field for efficiently designing and developing polymeric materials. However, the focus of polymer informatics has predominantly centered on single-component polymers, leaving the vast chemical space of polymer blends relatively unexplored. This study employs a high-throughput molecular dynamics (MD) simulation combined with active learning (AL) to uncover polymer blends with enhanced thermal conductivity (TC) compared to the constituent single-component polymers. Initially, the TC of about 600 amorphous single-component polymers and 200 amorphous polymer blends with varying blending ratios are determined through MD simulations. The optimal representation method for polymer blends is identified, which involves a weighted sum approach that extends existing polymer representation from single-component polymers to polymer blends. An AL framework, combining MD simulation and ML, is employed to explore the TC of approximately 550,000 unlabeled polymer blends. The AL framework proves highly effective in accelerating the discovery of high-performance polymer blends for thermal transport. Additionally, we delve into the relationship between TC, radius of gyration (Rg), and hydrogen bonding, highlighting the roles of inter- and intra-chain interactions in thermal transport in amorphous polymer blends. A significant positive association between TC andRgimprovement and an indirect contribution from H-bond interaction to TC enhancement are revealed through a log-linear model and an odds ratio calculation, emphasizing the impact of increasingRgand H-bond interactions on enhancing polymer blend TC.more » « less
-
Despite growing interest in polymers under extreme conditions, most atomistic molecular dynamics simulations cannot describe the bond scission events underlying failure modes in polymer networks undergoing large strains. In this work, we propose a physics-based machine learning approach that can detect and perform bond breaking with near quantum-chemical accuracy on-the-fly in atomistic simulations. Particularly, we demonstrate that by coarse-graining highly correlated neighboring bonds, the prediction accuracy can be dramatically improved. By comparing with existing quantum mechanics/molecular mechanics methods, our approach is approximately two orders of magnitude more efficient and exhibits improved sensitivity toward rare bond breaking events at low strain. The proposed bond breaking molecular dynamics scheme enables fast and accurate modeling of strain hardening and material failure in polymer networks and can accelerate the design of polymeric materials under extreme conditions.more » « less
-
The field of polymer membrane design is primarily based on empirical observation, which limits discovery of new materials optimized for separating a given gas pair. Instead of relying on exhaustive experimental investigations, we trained a machine learning (ML) algorithm, using a topological, path-based hash of the polymer repeating unit. We used a limited set of experimental gas permeability data for six different gases in ~700 polymeric constructs that have been measured to date to predict the gas-separation behavior of over 11,000 homopolymers not previously tested for these properties. To test the algorithm’s accuracy, we synthesized two of the most promising polymer membranes predicted by this approach and found that they exceeded the upper bound for CO 2 /CH 4 separation performance. This ML technique, which is trained using a relatively small body of experimental data (and no simulation data), evidently represents an innovative means of exploring the vast phase space available for polymer membrane design.more » « less
An official website of the United States government

