Organic molecules and polymers have a broad range of applications in biomedical, chemical, and materials science fields. Traditional design approaches for organic molecules and polymers are mainly experimentally-driven, guided by experience, intuition, and conceptual insights. Though they have been successfully applied to discover many important materials, these methods are facing significant challenges due to the tremendous demand of new materials and vast design space of organic molecules and polymers. Accelerated and inverse materials design is an ideal solution to these challenges. With advancements in high-throughput computation, artificial intelligence (especially machining learning, ML), and the growth of materials databases, ML-assisted materials design is emerging as a promising tool to flourish breakthroughs in many areas of materials science and engineering. To date, using ML-assisted approaches, the quantitative structure property/activity relation for material property prediction can be established more accurately and efficiently. In addition, materials design can be revolutionized and accelerated much faster than ever, through ML-enabled molecular generation and inverse molecular design. In this perspective, we review the recent progresses in ML-guided design of organic molecules and polymers, highlight several successful examples, and examine future opportunities in biomedical, chemical, and materials science fields. We further discuss the relevant challenges to solve in order to fully realize the potential of ML-assisted materials design for organic molecules and polymers. In particular, this study summarizes publicly available materials databases, feature representations for organic molecules, open-source tools for feature generation, methods for molecular generation, and ML models for prediction of material properties, which serve as a tutorial for researchers who have little experience with ML before and want to apply ML for various applications. Last but not least, it draws insights into the current limitations of ML-guided design of organic molecules and polymers. We anticipate that ML-assisted materials design for organic molecules and polymers will be the driving force in the near future, to meet the tremendous demand of new materials with tailored properties in different fields.
more »
« less
Machine Learning Applied to Single-Molecule Activity Prediction
Catalytic processes are used in about 1/3 of US manufacturing, from the field of chemical engineering to renewable energy. Assessing the activity of single-molecules, or individual molecules, is necessary to the development of efficient catalysts. Their heterogeneity structure leads to particle-specific catalytic activity. Experimentation with single-molecules can be time consuming and difficult. We purpose a Machine learning (ML) model that allows chemical researchers to run shorter single-molecule experiments to obtain the same level of results. We use common and widely understood ML methods to reduce complexity and enable accessibility to the chemical engineering community. We reduce the experiment time by up to 83%. Our evaluation shows that a small data set is sufficient to train an acceptable model. 300 experiments are needed, including the validation set. We use a well understood multilayer perceptron (MLP) model. We show that more complex models are not necessary and simpler methods are not sufficient.
more »
« less
- PAR ID:
- 10507765
- Publisher / Repository:
- ACM
- Date Published:
- Journal Name:
- SC-W '23: Proceedings of the SC '23 Workshops of The International Conference on High Performance Computing
- ISBN:
- 9798400707858
- Page Range / eLocation ID:
- 66 to 72
- Subject(s) / Keyword(s):
- data sets, neural networks, chemistry, single-molecule, machine learning
- Format(s):
- Medium: X
- Location:
- Denver CO USA
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Traditional studies of enzymatic activity rely on the combined kinetics of millions of enzyme molecules to produce a product, an experimental approach that may wash out heterogeneities that exist between individual enzymes. Evaluating these properties on an enzyme-by-enzyme basis represents an unambiguous means of elucidating heterogeneities; however, the quantification of enzymatic activity at the single-enzyme level is fundamentally limited by the maximum catalytic rate, k cat , inherent to a given enzyme. For electrochemical methods measuring current, single enzymes must turn over greater than 10 7 molecules per second to produce a measurable signal on the order of 10 −12 A. Enzymes with this capability are extremely rare in nature, with typical k cat values for biologically relevant enzymes falling between 1 and 10 000 s −1 . Thus, clever amplification strategies are necessary to electrochemically detect the vast majority of enzymes. This review details the progress toward the electroanalytical detection and evaluation of single enzyme kinetics largely focused on the nanoimpact method, a chronoamperometric detection strategy that monitors the change in the current-time profile associated with stochastic collisions of freely diffusing entities ( e.g. , enzymes) onto a microelectrode or nanoelectrode surface. We discuss the experimental setups and methods developed in the last decade toward the quantification of single molecule enzymatic rates. Special emphasis is given to the limitations of measurement science in the observation of single enzyme activity and feasible methods of signal amplification with reasonable bandwidth.more » « less
-
Modern chemical science and industries critically depend on the application of various catalytic methods. However, the underlying molecular mechanisms of these processes still remain not fully understood. Recent experimental advances that produced highly-efficient nanoparticle catalysts allowed researchers to obtain more quantitative descriptions, opening the way to clarify the microscopic picture of catalysis. Stimulated by these developments, we present a minimal theoretical model that investigates the effect of heterogeneity in catalytic processes at the single-particle level. Using a discrete-state stochastic framework that accounts for the most relevant chemical transitions, we explicitly evaluated the dynamics of chemical reactions on single heterogeneous nanocatalysts with different types of active sites. It is found that the degree of stochastic noise in nanoparticle catalytic systems depends on several factors that include the heterogeneity of catalytic efficiencies of active sites and distinctions between chemical mechanisms on different active sites. The proposed theoretical approach provides a single-molecule view of heterogeneous catalysis and also suggests possible quantitative routes to clarify some important molecular details of nanocatalysts.more » « less
-
Abstract Ab-initio molecular dynamics enables following the dynamics of biological systems from the first principles, describing the electronic structure and offering the opportunity to “watch” the evolution of biochemical processes with unique resolution, beyond the capabilities of state-of-the-art experimental techniques. This article reports the role of first-principles ( ab-initio ) molecular dynamics (MD) in the CRISPR-Cas9 genome editing revolution, achieving a profound understanding of the enzymatic function and offering valuable insights for enzyme engineering. We introduce the methodologies and explain the use of ab-initio MD simulations to establish the two-metal dependent mechanism of DNA cleavage in the RuvC domain of the Cas9 enzyme, and how a second catalytic domain, HNH, cleaves the target DNA with the aid of a single metal ion. A detailed description of how ab-initio MD is combined with free-energy methods—i.e., thermodynamic integration and metadynamics—to break and form chemical bonds is given, explaining the use of these methods to determine the chemical landscape and establish the catalytic mechanism in CRISPR-Cas9. The critical role of classical methods is also discussed, explaining theory and application of constant pH MD simulations, used to accurately predict the catalytic residues’ protonation states. Overall, first-principles methods are shown to unravel the electronic structure and reveal the catalytic mechanism of the Cas9 enzyme, providing valuable insights that can serve for the design of genome editing tools with improved catalytic efficiency or controllable activity.more » « less
-
null (Ed.)Feature attributions and counterfactual explanations are popular approaches to explain a ML model. The former assigns an importance score to each input feature, while the latter provides input examples with minimal changes to alter the model's predictions. To unify these approaches, we provide an interpretation based on the actual causality framework and present two key results in terms of their use. First, we present a method to generate feature attribution explanations from a set of counterfactual examples. These feature attributions convey how important a feature is to changing the classification outcome of a model, especially on whether a subset of features is necessary and/or sufficient for that change, which attribution-based methods are unable to provide. Second, we show how counterfactual examples can be used to evaluate the goodness of an attribution-based explanation in terms of its necessity and sufficiency. As a result, we highlight the complimentary of these two approaches. Our evaluation on three benchmark datasets --- Adult-Income, LendingClub, and German-Credit --- confirms the complimentary. Feature attribution methods like LIME and SHAP and counterfactual explanation methods like Wachter et al. and DiCE often do not agree on feature importance rankings. In addition, by restricting the features that can be modified for generating counterfactual examples, we find that the top-k features from LIME or SHAP are often neither necessary nor sufficient explanations of a model's prediction. Finally, we present a case study of different explanation methods on a real-world hospital triage problem.more » « less
An official website of the United States government

