skip to main content


The NSF Public Access Repository (PAR) system and access will be unavailable from 11:00 PM ET on Thursday, February 13 until 2:00 AM ET on Friday, February 14 due to maintenance. We apologize for the inconvenience.

Title: ClassicalGSG : Prediction of log P using classical molecular force fields and geometric scattering for graphs

This work examines methods for predicting the partition coefficient (logP) for a dataset of small molecules. Here, we use atomic attributes such as radius and partial charge, which are typically used as force field parameters in classical molecular dynamics simulations. These atomic attributes are transformed into index‐invariant molecular features using a recently developed method called geometric scattering for graphs (GSG). We call this approach “ClassicalGSG” and examine its performance under a broad range of conditions and hyperparameters. We train ClassicalGSG logPpredictors with neural networks using 10,722 molecules from the OpenChem dataset and apply them to predict the logPvalues from four independent test sets. The ClassicalGSG method's performance is compared to a baseline model that employs graph convolutional networks. Our results show that the best prediction accuracies are obtained using atomic attributes generated with the CHARMM generalized force field and 2D molecular structures.

more » « less
Award ID(s):
1845856 1761320
Author(s) / Creator(s):
 ;  ;  
Publisher / Repository:
Wiley Blackwell (John Wiley & Sons)
Date Published:
Journal Name:
Journal of Computational Chemistry
Page Range / eLocation ID:
p. 1006-1017
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract

    The logarithm ofn‐octanol–water partition coefficient (logP) is frequently used as an indicator of lipophilicity in drug discovery, which has substantial impacts on the absorption, distribution, metabolism, excretion, and toxicity of a drug candidate. Considering that the experimental measurement of the property is costly and time‐consuming, it is of great importance to develop reliable prediction models for logP. In this study, we developed a transfer free energy‐based logP prediction model‐FElogP. FElogP is based on the simple principle that logP is determined by the free energy change of transferring a molecule from water ton‐octanol. The underlying physical method to calculate transfer free energy is the molecular mechanics‐Poisson Boltzmann surface area (MM‐PBSA), thus this method is named as free energy‐based logP (FElogP). The superiority of FElogP model was validated by a large set of 707 structurally diverse molecules in the ZINC database for which the measurement was of high quality. Encouragingly, FElogP outperformed several commonly‐used QSPR or machine learning‐based logP models, as well as some continuum solvation model‐based methods. The root‐mean‐square error (RMSE) and Pearson correlation coefficient (R) between the predicted and measured values are 0.91 log units and 0.71, respectively, while the runner‐up, the logP model implemented in OpenBabel had an RMSE of 1.13 log units and R of 0.67. Given the fact that FElogP was not parameterized against experimental logP directly, its excellent performance is likely to be expanded to arbitrary organic molecules covered by the general AMBER force fields.

    more » « less
  2. Abstract

    The chemical stability and the low‐lying singlet and triplet excited states of BN‐n‐acenes (n = 1–7) were studied using single reference and multireference methodologies. From the calculations, descriptors such as the singlet‐triplet splitting, the natural orbital (NO) occupations and aromaticity indexes are used to provide structural and energetic analysis. The boron and nitrogen atoms form an isoelectronic pair of two carbon atoms, which was used for the complete substitution of these units in the acene series. The structural analysis confirms the effects originated from the insertion of a uniform pattern of electronegativity difference within the molecular systems. The covalent bonds tend to be strongly polarized which does not happen in the case of a carbon‐only framework. This effect leads to a charge transfer between neighbor atoms resulting in a more strengthened structure, keeping the aromaticity roughly constant along the chain. The singlet‐triplet splitting also agrees with this stability trend, maintaining a consistent gap value for all molecules. The BN‐n‐acenes molecules possess a ground state with monoconfigurational character indicating their electronic stability. The low‐lying singlet excited states have charge transfer character, which proceeds from nitrogen to boron.

    more » « less
  3. Abstract

    A next‐generation protocol (Poltype 2) has been developed which automatically generates AMOEBA polarizable force field parameters for small molecules. Both features and computational efficiency have been drastically improved. Notable advances include improved database transferability using SMILES, robust torsion fitting, non‐aromatic ring torsion parameterization, coupled torsion‐torsion parameterization, Van der Waals parameter refinement using ab initio dimer data and an intelligent fragmentation scheme that produces parameters with dramatically reduced ab initio computational cost. Additional improvements include better local frame assignment for atomic multipoles, automated formal charge assignment, Zwitterion detection, smart memory resource defaults, parallelized fragment job submission, incorporation of Psi4 quantum package, ab initio error handling, ionization state enumeration, hydration free energy prediction and binding free energy prediction. For validation, we have applied Poltype 2 to ~1000 FDA approved drug molecules from DrugBank. The ab initio molecular dipole moments and electrostatic potential values were compared with Poltype 2 derived AMOEBA counterparts. Parameters were further substantiated by calculating hydration free energy (HFE) on 40 small organic molecules and were compared with experimental data, resulting in an RMSE error of 0.59 kcal/mol. The torsion database has expanded to include 3543 fragments derived from FDA approved drugs. Poltype 2 provides a convenient utility for applications including binding free energy prediction for computational drug discovery. Further improvement will focus on automated parameter refinement by experimental liquid properties, expansion of the Van der Waals parameter database and automated parametrization of modified bio‐fragments such as amino and nucleic acids.

    more » « less
  4. Abstract

    Identification of the molecular networks that facilitated the evolution of multicellular animals from their unicellular ancestors is a fundamental problem in evolutionary cellular biology. Choanoflagellates are recognized as the closest extant nonmetazoan ancestors to animals. These unicellular eukaryotes can adopt a multicellular‐like “rosette” state. Therefore, they are compelling models for the study of early multicellularity. Comparative studies revealed that a number of putative human orthologs are present in choanoflagellate genomes, suggesting that a subset of these genes were necessary for the emergence of multicellularity. However, previous work is largely based on sequence alignments alone, which does not confirm structural nor functional similarity. Here, we focus on the PDZ domain, a peptide‐binding domain which plays critical roles in myriad cellular signaling networks and which underwent a gene family expansion in metazoan lineages. Using a customized sequence similarity search algorithm, we identified 178 PDZ domains in theMonosiga brevicollisproteome. This includes 11 previously unidentified sequences, which we analyzed using Rosetta and homology modeling. To assess conservation of protein structure, we solved high‐resolution crystal structures of representativeM. brevicollisPDZ domains that are homologous to human Dlg1 PDZ2, Dlg1 PDZ3, GIPC, and SHANK1 PDZ domains. To assess functional conservation, we calculated binding affinities for mbGIPC, mbSHANK1, mbSNX27, and mbDLG‐3 PDZ domains fromM. brevicollis. Overall, we find that peptide selectivity is generally conserved between these two disparate organisms, with one possible exception, mbDLG‐3. Overall, our results provide novel insight into signaling pathways in a choanoflagellate model of primitive multicellularity.

    more » « less
  5. Abstract

    The use of direct CH arylation cross‐coupling polymerization was evaluated for the synthesis of donor–acceptor conjugated co‐polymers using the novel donor 1,6‐didecylnaphtho[1,2‐b:5,6‐b']difuran and either thieno[3,4‐c]pyrrole‐4,6‐dione (TPD) or 1,4‐diketopyrrolo[3,4‐c]pyrrole (DPP) as the acceptor. Thiophene and furan moieties were used to flank the DPP group and the impact of these heterocycles on the polymers' properties was evaluated. The alkyl chains on the diketopyrrolopyrrole monomers were varied to engineer the solubility and morphology of the materials. All of the polymers have similar optoelectronic properties with narrow optical band gaps around 1.3 eV, which is ideal for solar energy harvesting. Unfortunately, these polymers also had high‐lying highest occupied molecular orbital levels of −4.8 to −5.1, and as a result bulk‐heterojunction photovoltaic cells fabricated using the soluble fullerene derivative PC71BM as the electron‐acceptor and these polymers as donor materials exhibited poor performance due to limited Vocvalues. An examination of the films from these blends indicates that film‐thickness and morphology were also a major hindrance to performance and a potential point of improvement for future materials.

    more » « less