Abstract Motivation:Despite its great success in various physical modeling, differential geometry (DG) has rarely been devised as a versatile tool for analyzing large, diverse, and complex molecular and biomolecular datasets because of the limited understanding of its potential power in dimensionality reduction and its ability to encode essential chemical and biological information in differentiable manifolds. Results:We put forward a differential geometry‐based geometric learning (DG‐GL) hypothesis that the intrinsic physics of three‐dimensional (3D) molecular structures lies on a family of low‐dimensional manifolds embedded in a high‐dimensional data space. We encode crucial chemical, physical, and biological information into 2D element interactive manifolds, extracted from a high‐dimensional structural data space via a multiscale discrete‐to‐continuum mapping using differentiable density estimators. Differential geometry apparatuses are utilized to construct element interactive curvatures in analytical forms for certain analytically differentiable density estimators. These low‐dimensional differential geometry representations are paired with a robust machine learning algorithm to showcase their descriptive and predictive powers for large, diverse, and complex molecular and biomolecular datasets. Extensive numerical experiments are carried out to demonstrate that the proposed DG‐GL strategy outperforms other advanced methods in the predictions of drug discovery‐related protein‐ligand binding affinity, drug toxicity, and molecular solvation free energy. Availability and implementation:http://weilab.math.msu.edu/DG‐GL/ Contact:wei@math.msu.edu
more »
« less
A review of mathematical representations of biomolecular data
Recently, machine learning (ML) has established itself in various worldwide benchmarking competitions in computational biology, including Critical Assessment of Structure Prediction (CASP) and Drug Design Data Resource (D3R) Grand Challenges. However, the intricate structural complexity and high ML dimensionality of biomolecular datasets obstruct the efficient application of ML algorithms in the field. In addition to data and algorithm, an efficient ML machinery for biomolecular predictions must include structural representation as an indispensable component. Mathematical representations that simplify the biomolecular structural complexity and reduce ML dimensionality have emerged as a prime winner in D3R Grand Challenges. This review is devoted to the recent advances in developing low-dimensional and scalable mathematical representations of biomolecules in our laboratory. We discuss three classes of mathematical approaches, including algebraic topology, differential geometry, and graph theory. We elucidate how the physical and biological challenges have guided the evolution and development of these mathematical apparatuses for massive and diverse biomolecular data. We focus the performance analysis on protein–ligand binding predictions in this review although these methods have had tremendous success in many other applications, such as protein classification, virtual screening, and the predictions of solubility, solvation free energies, toxicity, partition coefficients, protein folding stability changes upon mutation, etc.
more »
« less
- PAR ID:
- 10170687
- Date Published:
- Journal Name:
- Physical Chemistry Chemical Physics
- Volume:
- 22
- Issue:
- 8
- ISSN:
- 1463-9076
- Page Range / eLocation ID:
- 4343 to 4367
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Polymeric membranes have become essential for energy-efficient gas separations such as natural gas sweetening, hydrogen separation, and carbon dioxide capture. Polymeric membranes face challenges like permeability-selectivity tradeoffs, plasticization, and physical aging, limiting their broader applicability. Machine learning (ML) techniques are increasingly used to address these challenges. This review covers current ML applications in polymeric gas separation membrane design, focusing on three key components: polymer data, representation methods, and ML algorithms. Exploring diverse polymer datasets related to gas separation, encompassing experimental, computational, and synthetic data, forms the foundation of ML applications. Various polymer representation methods are discussed, ranging from traditional descriptors and fingerprints to deep learning-based embeddings. Furthermore, we examine diverse ML algorithms applied to gas separation polymers. It provides insights into fundamental concepts such as supervised and unsupervised learning, emphasizing their applications in the context of polymer membranes. The review also extends to advanced ML techniques, including data-centric and model-centric methods, aimed at addressing challenges unique to polymer membranes, focusing on accurate screening and inverse design.more » « less
-
Peptide misfolding and aberrant assembly in membranous micro-environments have been associated with numerous neurodegenerative diseases. The biomolecular mechanisms and biophysical implications of these amyloid membrane interactions have been under extensive research and can assist in understanding disease pathogenesis and potential development of rational therapeutics. But, the complex nature and diversity of biomolecular interactions, structural transitions, and dependence on local environmental conditions have made accurate microscopic characterization challenging. In this review, using cases of Alzheimer's disease (amyloid-beta peptide), Parkinson's disease (alpha-synuclein peptide) and Huntington's disease (huntingtin protein), we illustrate existing challenges in experimental investigations and summarize recent relevant numerical simulation studies into amyloidogenic peptide–membrane interactions. In addition we project directions for future in silico studies and discuss shortcomings of current computational approaches.more » « less
-
null (Ed.)Glasses have been an integral part of human life for more than 2000 years. Despite several years of research and analysis, some fundamental and practical questions on glasses still remain unanswered. While most of the earlier approaches were based on (i) expert knowledge and intuition, (ii) Edisonian trial and error, or (iii) physics-driven modeling and analysis, recent studies suggest that data-driven techniques, such as artificial intelligence (AI) and machine learning (ML), can provide fresh perspectives to tackle some of these questions. In this article, we identify 21 grand challenges in glass science, the solutions of which are either enabling AI and ML or enabled by AI and ML to accelerate the field of glass science. The challenges presented here range from fundamental questions related to glass formation and composition–processing–property relationships to industrial problems such as automated flaw detection in glass manufacturing. We believe that the present article will instill enthusiasm among the readers to explore some of the grand challenges outlined here and to discover many more challenges that can advance the field of glass science, engineering, and technology.more » « less
-
Machine learning (ML) is becoming an effective tool for studying 2D materials. Taking as input computed or experimental materials data, ML algorithms predict the structural, electronic, mechanical, and chemical properties of 2D materials that have yet to be discovered. Such predictions expand investigations on how to synthesize 2D materials and use them in various applications, as well as greatly reduce the time and cost to discover and understand 2D materials. This tutorial review focuses on the understanding, discovery, and synthesis of 2D materials enabled by or benefiting from various ML techniques. We introduce the most recent efforts to adopt ML in various fields of study regarding 2D materials and provide an outlook for future research opportunities. The adoption of ML is anticipated to accelerate and transform the study of 2D materials and their heterostructures.more » « less
An official website of the United States government

