skip to main content

Title: ClassicalGSG : Prediction of log P using classical molecular force fields and geometric scattering for graphs

This work examines methods for predicting the partition coefficient (logP) for a dataset of small molecules. Here, we use atomic attributes such as radius and partial charge, which are typically used as force field parameters in classical molecular dynamics simulations. These atomic attributes are transformed into index‐invariant molecular features using a recently developed method called geometric scattering for graphs (GSG). We call this approach “ClassicalGSG” and examine its performance under a broad range of conditions and hyperparameters. We train ClassicalGSG logPpredictors with neural networks using 10,722 molecules from the OpenChem dataset and apply them to predict the logPvalues from four independent test sets. The ClassicalGSG method's performance is compared to a baseline model that employs graph convolutional networks. Our results show that the best prediction accuracies are obtained using atomic attributes generated with the CHARMM generalized force field and 2D molecular structures.

more » « less
Award ID(s):
1845856 1761320
Author(s) / Creator(s):
 ;  ;  
Publisher / Repository:
Wiley Blackwell (John Wiley & Sons)
Date Published:
Journal Name:
Journal of Computational Chemistry
Page Range / eLocation ID:
p. 1006-1017
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract

    The logarithm ofn‐octanol–water partition coefficient (logP) is frequently used as an indicator of lipophilicity in drug discovery, which has substantial impacts on the absorption, distribution, metabolism, excretion, and toxicity of a drug candidate. Considering that the experimental measurement of the property is costly and time‐consuming, it is of great importance to develop reliable prediction models for logP. In this study, we developed a transfer free energy‐based logP prediction model‐FElogP. FElogP is based on the simple principle that logP is determined by the free energy change of transferring a molecule from water ton‐octanol. The underlying physical method to calculate transfer free energy is the molecular mechanics‐Poisson Boltzmann surface area (MM‐PBSA), thus this method is named as free energy‐based logP (FElogP). The superiority of FElogP model was validated by a large set of 707 structurally diverse molecules in the ZINC database for which the measurement was of high quality. Encouragingly, FElogP outperformed several commonly‐used QSPR or machine learning‐based logP models, as well as some continuum solvation model‐based methods. The root‐mean‐square error (RMSE) and Pearson correlation coefficient (R) between the predicted and measured values are 0.91 log units and 0.71, respectively, while the runner‐up, the logP model implemented in OpenBabel had an RMSE of 1.13 log units and R of 0.67. Given the fact that FElogP was not parameterized against experimental logP directly, its excellent performance is likely to be expanded to arbitrary organic molecules covered by the general AMBER force fields.

    more » « less
  2. This data set for the manuscript entitled "Design of Peptides that Fold and Self-Assemble on Graphite" includes all files needed to run and analyze the simulations described in the this manuscript in the molecular dynamics software NAMD, as well as the output of the simulations. The files are organized into directories corresponding to the figures of the main text and supporting information. They include molecular model structure files (NAMD psf or Amber prmtop format), force field parameter files (in CHARMM format), initial atomic coordinates (pdb format), NAMD configuration files, Colvars configuration files, NAMD log files, and NAMD output including restart files (in binary NAMD format) and trajectories in dcd format (downsampled to 10 ns per frame). Analysis is controlled by shell scripts (Bash-compatible) that call VMD Tcl scripts or python scripts. These scripts and their output are also included.

    Version: 2.0

    Changes versus version 1.0 are the addition of the free energy of folding, adsorption, and pairing calculations (Sim_Figure-7) and shifting of the figure numbers to accommodate this addition.

    Conventions Used in These Files

    Structure Files
    - graph_*.psf or sol_*.psf (original NAMD (XPLOR?) format psf file including atom details (type, charge, mass), as well as definitions of bonds, angles, dihedrals, and impropers for each dipeptide.)

    - graph_*.pdb or sol_*.pdb (initial coordinates before equilibration)
    - repart_*.psf (same as the above psf files, but the masses of non-water hydrogen atoms have been repartitioned by VMD script repartitionMass.tcl)
    - freeTop_*.pdb (same as the above pdb files, but the carbons of the lower graphene layer have been placed at a single z value and marked for restraints in NAMD)
    - amber_*.prmtop (combined topology and parameter files for Amber force field simulations)
    - repart_amber_*.prmtop (same as the above prmtop files, but the masses of non-water hydrogen atoms have been repartitioned by ParmEd)

    Force Field Parameters
    CHARMM format parameter files:
    - par_all36m_prot.prm (CHARMM36m FF for proteins)
    - par_all36_cgenff_no_nbfix.prm (CGenFF v4.4 for graphene) The NBFIX parameters are commented out since they are only needed for aromatic halogens and we use only the CG2R61 type for graphene.
    - toppar_water_ions_prot_cgenff.str (CHARMM water and ions with NBFIX parameters needed for protein and CGenFF included and others commented out)

    Template NAMD Configuration Files
    These contain the most commonly used simulation parameters. They are called by the other NAMD configuration files (which are in the namd/ subdirectory):
    - template_min.namd (minimization)
    - template_eq.namd (NPT equilibration with lower graphene fixed)
    - template_abf.namd (for adaptive biasing force)

    - namd/min_*.0.namd

    - namd/eq_*.0.namd

    Adaptive biasing force calculations
    - namd/eabfZRest7_graph_chp1404.0.namd
    - namd/eabfZRest7_graph_chp1404.1.namd (continuation of eabfZRest7_graph_chp1404.0.namd)

    Log Files
    For each NAMD configuration file given in the last two sections, there is a log file with the same prefix, which gives the text output of NAMD. For instance, the output of namd/eabfZRest7_graph_chp1404.0.namd is eabfZRest7_graph_chp1404.0.log.

    Simulation Output
    The simulation output files (which match the names of the NAMD configuration files) are in the output/ directory. Files with the extensions .coor, .vel, and .xsc are coordinates in NAMD binary format, velocities in NAMD binary format, and extended system information (including cell size) in text format. Files with the extension .dcd give the trajectory of the atomic coorinates over time (and also include system cell information). Due to storage limitations, large DCD files have been omitted or replaced with new DCD files having the prefix stride50_ including only every 50 frames. The time between frames in these files is 50 * 50000 steps/frame * 4 fs/step = 10 ns. The system cell trajectory is also included for the NPT runs are output/eq_*.xst.

    Files with the .sh extension can be found throughout. These usually provide the highest level control for submission of simulations and analysis. Look to these as a guide to what is happening. If there are scripts with step1_*.sh and step2_*.sh, they are intended to be run in order, with step1_*.sh first.


    The directory contents are as follows. The directories Sim_Figure-1 and Sim_Figure-8 include README.txt files that describe the files and naming conventions used throughout this data set.

    Sim_Figure-1: Simulations of N-acetylated C-amidated amino acids (Ac-X-NHMe) at the graphite–water interface.

    Sim_Figure-2: Simulations of different peptide designs (including acyclic, disulfide cyclized, and N-to-C cyclized) at the graphite–water interface.

    Sim_Figure-3: MM-GBSA calculations of different peptide sequences for a folded conformation and 5 misfolded/unfolded conformations.

    Sim_Figure-4: Simulation of four peptide molecules with the sequence cyc(GTGSGTG-GPGG-GCGTGTG-SGPG) at the graphite–water interface at 370 K.

    Sim_Figure-5: Simulation of four peptide molecules with the sequence cyc(GTGSGTG-GPGG-GCGTGTG-SGPG) at the graphite–water interface at 295 K.

    Sim_Figure-5_replica: Temperature replica exchange molecular dynamics simulations for the peptide cyc(GTGSGTG-GPGG-GCGTGTG-SGPG) with 20 replicas for temperatures from 295 to 454 K.

    Sim_Figure-6: Simulation of the peptide molecule cyc(GTGSGTG-GPGG-GCGTGTG-SGPG) in free solution (no graphite).

    Sim_Figure-7: Free energy calculations for folding, adsorption, and pairing for the peptide CHP1404 (sequence: cyc(GTGSGTG-GPGG-GCGTGTG-SGPG)). For folding, we calculate the PMF as function of RMSD by replica-exchange umbrella sampling (in the subdirectory Folding_CHP1404_Graphene/). We make the same calculation in solution, which required 3 seperate replica-exchange umbrella sampling calculations (in the subdirectory Folding_CHP1404_Solution/). Both PMF of RMSD calculations for the scrambled peptide are in Folding_scram1404/. For adsorption, calculation of the PMF for the orientational restraints and the calculation of the PMF along z (the distance between the graphene sheet and the center of mass of the peptide) are in Adsorption_CHP1404/ and Adsorption_scram1404/. The actual calculation of the free energy is done by a shell script ("") in the 1_free_energy/ subsubdirectory. Processing of the PMFs must be done first in the 0_pmf/ subsubdirectory. Finally, files for free energy calculations of pair formation for CHP1404 are found in the Pair/ subdirectory.

    Sim_Figure-8: Simulation of four peptide molecules with the sequence cyc(GTGSGTG-GPGG-GCGTGTG-SGPG) where the peptides are far above the graphene–water interface in the initial configuration.

    Sim_Figure-9: Two replicates of a simulation of nine peptide molecules with the sequence cyc(GTGSGTG-GPGG-GCGTGTG-SGPG) at the graphite–water interface at 370 K.

    Sim_Figure-9_scrambled: Two replicates of a simulation of nine peptide molecules with the control sequence cyc(GGTPTTGGGGGGSGGPSGTGGC) at the graphite–water interface at 370 K.

    Sim_Figure-10: Adaptive biasing for calculation of the free energy of the folded peptide as a function of the angle between its long axis and the zigzag directions of the underlying graphene sheet.


    This material is based upon work supported by the US National Science Foundation under grant no. DMR-1945589. A majority of the computing for this project was performed on the Beocat Research Cluster at Kansas State University, which is funded in part by NSF grants CHE-1726332, CNS-1006860, EPS-1006860, and EPS-0919443. This work used the Extreme Science and Engineering Discovery Environment (XSEDE), which is supported by National Science Foundation grant number ACI-1548562, through allocation BIO200030. 
    more » « less
  3. Abstract

    A next‐generation protocol (Poltype 2) has been developed which automatically generates AMOEBA polarizable force field parameters for small molecules. Both features and computational efficiency have been drastically improved. Notable advances include improved database transferability using SMILES, robust torsion fitting, non‐aromatic ring torsion parameterization, coupled torsion‐torsion parameterization, Van der Waals parameter refinement using ab initio dimer data and an intelligent fragmentation scheme that produces parameters with dramatically reduced ab initio computational cost. Additional improvements include better local frame assignment for atomic multipoles, automated formal charge assignment, Zwitterion detection, smart memory resource defaults, parallelized fragment job submission, incorporation of Psi4 quantum package, ab initio error handling, ionization state enumeration, hydration free energy prediction and binding free energy prediction. For validation, we have applied Poltype 2 to ~1000 FDA approved drug molecules from DrugBank. The ab initio molecular dipole moments and electrostatic potential values were compared with Poltype 2 derived AMOEBA counterparts. Parameters were further substantiated by calculating hydration free energy (HFE) on 40 small organic molecules and were compared with experimental data, resulting in an RMSE error of 0.59 kcal/mol. The torsion database has expanded to include 3543 fragments derived from FDA approved drugs. Poltype 2 provides a convenient utility for applications including binding free energy prediction for computational drug discovery. Further improvement will focus on automated parameter refinement by experimental liquid properties, expansion of the Van der Waals parameter database and automated parametrization of modified bio‐fragments such as amino and nucleic acids.

    more » « less
  4. Abstract

    The chemical stability and the low‐lying singlet and triplet excited states of BN‐n‐acenes (n = 1–7) were studied using single reference and multireference methodologies. From the calculations, descriptors such as the singlet‐triplet splitting, the natural orbital (NO) occupations and aromaticity indexes are used to provide structural and energetic analysis. The boron and nitrogen atoms form an isoelectronic pair of two carbon atoms, which was used for the complete substitution of these units in the acene series. The structural analysis confirms the effects originated from the insertion of a uniform pattern of electronegativity difference within the molecular systems. The covalent bonds tend to be strongly polarized which does not happen in the case of a carbon‐only framework. This effect leads to a charge transfer between neighbor atoms resulting in a more strengthened structure, keeping the aromaticity roughly constant along the chain. The singlet‐triplet splitting also agrees with this stability trend, maintaining a consistent gap value for all molecules. The BN‐n‐acenes molecules possess a ground state with monoconfigurational character indicating their electronic stability. The low‐lying singlet excited states have charge transfer character, which proceeds from nitrogen to boron.

    more » « less
  5. Abstract

    Bite force is a performance metric commonly used to link cranial morphology with dietary ecology, as the strength of forces produced by the feeding apparatus largely constrains the foods an individual can consume. At a macroevolutionary scale, there is evidence that evolutionary changes in the anatomical elements involved in producing bite force have contributed to dietary diversification in mammals. Much less is known about how these elements change over postnatal ontogeny. Mammalian diets drastically shift over ontogeny—from drinking mother's milk to feeding on adult foods—presumably with equally drastic changes in the morphology of the feeding apparatus and bite performance. Here, we investigate ontogenetic morphological changes in the insectivorous big brown bat (Eptesicus fuscus), which has an extreme, positive allometric increase in bite force during development. Using contrast‐enhanced micro‐computed tomography scans of a developmental series from birth to adult morphology, we quantified skull shape and measured skeletal and muscular parameters directly related to bite force production. We found pronounced changes in the skull over ontogeny, including a large increase in the volume of the temporalis and masseter muscles, and an expansion of the skull dome and sagittal crest that would serve to increase the temporalis attachment area. These changes indicate that development of the jaw adductors play an important role in the development of biting performance of these bats. Notably, static bite force increases with positive allometry with respect to all anatomical measures examined, suggesting that modifications in biting dynamics and/or improved motor coordination also contribute to increases in biting performance.

    more » « less