skip to main content


Title: Top P - S : Persistent homology-based multi-task deep neural networks for simultaneous predictions of partition coefficient and aqueous solubility: Simultaneous Predictions of Partition Coefficient and Aqueous Solubility
Award ID(s):
1721024
NSF-PAR ID:
10056283
Author(s) / Creator(s):
 ;  ;  ;  
Publisher / Repository:
Wiley Blackwell (John Wiley & Sons)
Date Published:
Journal Name:
Journal of Computational Chemistry
Volume:
39
Issue:
20
ISSN:
0192-8651
Page Range / eLocation ID:
1444 to 1454
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract

    The logarithm ofn‐octanol–water partition coefficient (logP) is frequently used as an indicator of lipophilicity in drug discovery, which has substantial impacts on the absorption, distribution, metabolism, excretion, and toxicity of a drug candidate. Considering that the experimental measurement of the property is costly and time‐consuming, it is of great importance to develop reliable prediction models for logP. In this study, we developed a transfer free energy‐based logP prediction model‐FElogP. FElogP is based on the simple principle that logP is determined by the free energy change of transferring a molecule from water ton‐octanol. The underlying physical method to calculate transfer free energy is the molecular mechanics‐Poisson Boltzmann surface area (MM‐PBSA), thus this method is named as free energy‐based logP (FElogP). The superiority of FElogP model was validated by a large set of 707 structurally diverse molecules in the ZINC database for which the measurement was of high quality. Encouragingly, FElogP outperformed several commonly‐used QSPR or machine learning‐based logP models, as well as some continuum solvation model‐based methods. The root‐mean‐square error (RMSE) and Pearson correlation coefficient (R) between the predicted and measured values are 0.91 log units and 0.71, respectively, while the runner‐up, the logP model implemented in OpenBabel had an RMSE of 1.13 log units and R of 0.67. Given the fact that FElogP was not parameterized against experimental logP directly, its excellent performance is likely to be expanded to arbitrary organic molecules covered by the general AMBER force fields.

     
    more » « less
  2. null (Ed.)
    The synthesis of a low-molecular weight, neutral, porphyrin meso-tetra(dioxan-2-yl)porphyrin of significant solubility in aqueous solution is described using 4 × 1 or 2 + 2-type approaches. The key intermediate dioxan-2-carbaldehyde is accessible in either racemic or in stereo-pure forms from commercially available starting materials in three steps, allowing also the preparation of chiral porphyrins. 
    more » « less
  3. null (Ed.)
  4. Fugacity is a fundamental thermodynamical property of gas and gas mixtures to determine their behavior and dynamics in complex systems. Fugacity can be deduced experimentally from the measurements of volume as a function of pressure at constant temperature or calculated iteratively using analytical equations of states (EOS). Experimental measurement is time-consuming, and analytical models based on EOS are computationally demanding, especially when an approximate but quick estimation is desired. In this work, machine learning (ML) is employed as a viable alternative to analytical EOSs for quick and accurate approximation of CO2 fugacity coefficients. Five different ML algorithms are used to estimate the fugacity coefficients of pure CO2 as a function of pressure (≤ 2000 bar) and temperature (≤ 1000 °C). A combination of experimental and pseudo-experimental (obtained from an analytical EOS) data of CO2 fugacity coefficients is used to train, validate, and test the models. The best results were found using the Extreme Gradient Boosting algorithm, which showed a mean square error of only 0.0002 in the validation data and an average deviation of only 1.3% in the test data (pure prediction). To quantify the effectiveness of the machine learning techniques, results from the best-performing model are compared with two state-of-the-art analytical models. The ML model with significantly less computational complexity showed similar accuracy to the analytical models. The estimated fugacity data are then used to compute the CO2 solubility in aqueous NaCl solution of different concentrations, and a maximum deviation of only 3.2% from the experimental data is observed. 
    more » « less