Recently, molecular fingerprints extracted from threedimensional (3D) structures using advanced mathematics, such as algebraic topology, differential geometry, and graph theory have been paired with efficient machine learning, especially deep learning algorithms to outperform other methods in drug discovery applications and competitions. This raises the question of whether classical 2D fingerprints are still valuable in computeraided drug discovery. This work considers 23 datasets associated with four typical problems, namely protein–ligand binding, toxicity, solubility and partition coefficient to assess the performance of eight 2D fingerprints. Advanced machine learning algorithms including random forest, gradient boosted decision tree, singletask deep neural network and multitask deep neural network are employed to construct efficient 2Dfingerprint based models. Additionally, appropriate consensus models are built to further enhance the performance of 2Dfingerprintbased methods. It is demonstrated that 2Dfingerprintbased models perform as well as the stateoftheart 3D structurebased models for the predictions of toxicity, solubility, partition coefficient and protein–ligand binding affinity based on only ligand information. However, 3D structurebased models outperform 2D fingerprintbased methods in complexbased protein–ligand binding affinity predictions.
GeoMol: Torsional Geometric Generation of Molecular 3D Conformer Ensembles
Prediction of a molecule's 3D conformer ensemble from the molecular graph holds a key role in areas of cheminformatics and drug discovery. Existing generative models have several drawbacks including lack of modeling important molecular geometry elements (e.g. torsion angles), separate optimization stages prone to error accumulation, and the need for structure finetuning based on approximate classical forcefields or computationally expensive methods such as metadynamics with approximate quantum mechanics calculations at each geometry. We propose GeoMolan endtoend, nonautoregressive and SE(3)invariant machine learning approach to generate distributions of lowenergy molecular 3D conformers. Leveraging the power of message passing neural networks (MPNNs) to capture local and global graph information, we predict local atomic 3D structures and torsion angles, avoiding unnecessary overparameterization of the geometric degrees of freedom (e.g. one angle per nonterminal bond). Such local predictions suffice both for the training loss computation, as well as for the full deterministic conformer assembly (at test time). We devise a nonadversarial optimal transport based loss function to promote diverse conformer generation. GeoMol predominantly outperforms popular opensource, commercial, or stateoftheart machine learning (ML) models, while achieving significant speedups. We expect such differentiable 3D structure generators to significantly impact molecular modeling and related applications.
 Award ID(s):
 1918839
 Publication Date:
 NSFPAR ID:
 10320124
 Journal Name:
 Advances in neural information processing systems
 ISSN:
 10495258
 Sponsoring Org:
 National Science Foundation
More Like this


With the recent advancement of deep learning, molecular representation learning  automating the discovery of feature representation of molecular structure, has attracted significant attention from both chemists and machine learning researchers. Deep learning can facilitate a variety of downstream applications, including bioproperty prediction, chemical reaction prediction, etc. Despite the fact that current SMILES string or molecular graph molecular representation learning algorithms (via sequence modeling and graph neural networks, respectively) have achieved promising results, there is no work to integrate the capabilities of both approaches in preserving molecular characteristics (e.g, atomic cluster, chemical bond) for further improvement. In this paper, we propose GraSeq, a joint graph and sequence representation learning model for molecular property prediction. Specifically, GraSeq makes a complementary combination of graph neural networks and recurrent neural networks for modeling two types of molecular inputs, respectively. In addition, it is trained by the multitask loss of unsupervised reconstruction and various downstream tasks, using limited size of labeled datasets. In a variety of chemical property prediction tests, we demonstrate that our GraSeq model achieves better performance than stateoftheart approaches.

Embedding properties of network realizations of dissipative reduced order models Jörn Zimmerling, Mikhail Zaslavsky,Rob Remis, Shasri Moskow, Alexander Mamonov, Murthy Guddati, Vladimir Druskin, and Liliana Borcea Mathematical Sciences Department, Worcester Polytechnic Institute https://www.wpi.edu/people/vdruskin Abstract Realizations of reduced order models of passive SISO or MIMO LTI problems can be transformed to tridiagonal and blocktridiagonal forms, respectively, via dierent modications of the Lanczos algorithm. Generally, such realizations can be interpreted as ladder resistorcapacitorinductor (RCL) networks. They gave rise to network syntheses in the rst half of the 20th century that was at the base of modern electronics design and consecutively to MOR that tremendously impacted many areas of engineering (electrical, mechanical, aerospace, etc.) by enabling ecient compression of the underlining dynamical systems. In his seminal 1950s works Krein realized that in addition to their compressing properties, network realizations can be used to embed the data back into the state space of the underlying continuum problems. In more recent works of the authors Krein's ideas gave rise to socalled nitedierence Gaussian quadrature rules (FDGQR), allowing to approximately map the ROM statespace representation to its full order continuum counterpart on a judicially chosen grid. Thus, the state variables can be accessed directly from themore »

Observational estimates of Antarctic ice loss have accelerated in recent decades, and worstcase scenarios of modeling studies have suggested potentially catastrophic sea level rise (~2 meters) by the end of the century. However, modeled contributions to global mean sea level from the Antarctic icesheet (AIS) in the 21st century are highly uncertain, in part because icesheet model parameters are poorly constrained. Individual icesheet model runs are also deterministic and not computationally efficient enough to generate the continuous probability distributions required for incorporation into a holistic framework of probabilistic sealevel projections. To address these shortfalls, we statistically emulate an icesheet model using Gaussian Process (GP) regression. GP modeling is a nonparametric machinelearning technique which maps inputs (e.g. forcing or model parameters) to target outputs (e.g. sealevel contributions from the Antarctic icesheet) and has the inherent and important advantage that emulator uncertainty is explicitly quantified. We construct emulators for the last interglacial period and an RCP8.5 scenario, and separately for the western, eastern, and total AIS. Separate emulation of western and eastern AIS is important because their evolutions and physical responses to climate forcing are distinct. The emulators are trained on 196 ensemble members for each scenario, composed by varying the parametersmore »

Despite its potential to overcome the design and processing barriers of traditional subtractive and formative manufacturing techniques, the use of laser powder bed fusion (LPBF) metal additive manufacturing is currently limited due to its tendency to create flaws. A multitude of LPBFrelated flaws, such as partlevel deformation, cracking, and porosity are linked to the spatiotemporal temperature distribution in the part during the process. The temperature distribution, also called the thermal history, is a function of several factors encompassing material properties, part geometry and orientation, processing parameters, placement of supports, among others. These broad range of factors are difficult and expensive to optimize through empirical testing alone. Consequently, fast and accurate models to predict the thermal history are valuable for mitigating flaw formation in LPBFprocessed parts. In our prior works, we developed a graph theorybased approach for predicting the temperature distribution in LPBF parts. This meshfree approach was compared with both nonproprietary and commercial finite element packages, and the thermal history predictions were experimentally validated with in situ infrared thermal imaging data. It was found that the graph theoryderived thermal history predictions converged within 30–50% of the time of nonproprietary finite element analysis for a similar level of prediction error. However,more »