skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Search for: All records

Award ID contains: 1955260

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Abstract Small molecules have been playing a crucial role in drug discovery; however, some exhibit nonspecific inhibitory effects during hit screening due to the formation of colloidal aggregators. Such false positives often lead to significant research costs and time investment. Therefore, to identify potential aggregating compounds efficiently and accurately at an early stage of drug discovery, we employed several machine learning techniques to develop classification models for identifying promiscuous aggregating inhibitors. Using a training dataset of 10 000 aggregators and 10 000 nonaggregators, models were trained by combining four different molecular representations with various machine learning algorithms. We found that the best-performing model is the one that employs path-based FP2 fingerprints in conjunction with the cubic support vector machine algorithm, which achieved the highest accuracy and area under the receiver operating characteristic curve values for both the validation and test datasets while maintaining high sensitivity and specificity levels (>0.93). Additionally, we have proposed a new model interpretation method, global sensitivity analysis (GSA), to complement the well-recognized SHapley Additive exPlanations analysis. Several comparative studies have shown that GSA is a time-efficient and accurate approach for identifying crucial descriptors that contribute to model prediction, especially in the scenario where the dataset contains a substantial number of data entries with a limited set of descriptors. Our models as well as GSA findings can provide useful guidance on screening library design to minimize false positives. 
    more » « less
    Free, publicly-accessible full text available May 1, 2026
  2. Abstract Catalytic constant (Kcat) is to describe the efficiency of catalyzing reactions. The Kcat value of an enzyme-substrate pair indicates the rate an enzyme converts saturated substrates into product during the catalytic process. However, it is challenging to construct robust prediction models for this important property. Most of the existing models, including the one recently published by Nature Catalysis (Li et al.), are suffering from the overfitting issue. In this study, we proposed a novel protocol to construct Kcat prediction models, introducing an intermedia step to separately develop substrate and protein processors. The substrate processor leverages analyzing Simplified Molecular Input Line Entry System (SMILES) strings using a graph neural network model, attentive FP, while the protein processor abstracts protein sequence information utilizing long short-term memory architecture. This protocol not only mitigates the impact of data imbalance in the original dataset but also provides greater flexibility in customizing the general-purpose Kcat prediction model to enhance the prediction accuracy for specific enzyme classes. Our general-purpose Kcat prediction model demonstrates significantly enhanced stability and slightly better accuracy (R2 value of 0.54 versus 0.50) in comparison with Li et al.’s model using the same dataset. Additionally, our modeling protocol enables personalization of fine-tuning the general-purpose Kcat model for specific enzyme categories through focused learning. Using Cytochrome P450 (CYP450) enzymes as a case study, we achieved the best R2 value of 0.64 for the focused model. The high-quality performance and expandability of the model guarantee its broad applications in enzyme engineering and drug research & development. 
    more » « less
  3. Abstract Intranasal diamorphine (IND), approved for managing breakthrough pain in the UK, has been identified as an acceptable alternative offering effective, expedient, and less traumatic analgesia for children. However, the current dose regimen in pediatric populations relies on clinical expertise while the pharmacokinetics properties are poorly understood. This study aimed to develop diamorphine population pharmacokinetics (pop‐PK) models and simulate the IND dosing in virtual pediatric subjects. An integrated four‐compartment pop‐PK model with first‐order absorption and elimination provided an appropriate fit and characterized publicly available 385 concentration measurements of diamorphine, 6‐monoacetylmorphine, and morphine collected from adults. Body weight allometry and renal function maturation (age) were incorporated into the final model, serving as two covariates. The estimated IND relative bioavailability was around 52% compared with intramuscularly injected diamorphine. Using this final model, the morphine plasma concentrations, as the active metabolite for pain relief, were simulated in virtual subjects. The utility of model extrapolation was supported by external verification with acceptable average fold errors of 1.06 ± 0.30 and 0.83 ± 0.07 for morphine maximum concentration and exposures. Meanwhile, the simulated morphine concentration–time profiles could recover the PK profiles observed in children after a single dose of IND. The model‐based dosing simulations were therefore assessed in four children age groups to match the therapeutic window of morphine concentrations in steady state (10–20 μg/L). Our study demonstrates that the dose regimen of 0.3 mg/kg loading dose plus 0.1 mg/kg hourly maintenance dose is generally appropriate for multiple pediatric populations with breakthrough pain, in the view of PK. 
    more » « less
    Free, publicly-accessible full text available March 1, 2026
  4. ABSTRACT Accurate prediction of protein–peptide complex structures plays a critical role in structure‐based drug design, including antibody design. Most peptide‐docking benchmark studies were conducted using crystal structures of protein–peptide complexes; as such, the performance of the current peptide docking tools in the practical setting is unknown. Here, the practical setting implies there are no crystal or other experimental structures for the complex, nor for the receptor and peptide. In this work, we have developed a practical docking protocol that incorporated two famous machine learning models, AlphaFold 2 for structural prediction and ANI‐2x for ab initio potential prediction, to achieve a high success rate in modeling protein–peptide complex structures. The docking protocol consists of three major stages. In the first stage, the 3D structure of the receptor is predicted by AlphaFold 2 using the monomer mode, and that of the peptide is predicted by AlphaFold 2 using the multimer mode. We found that it is essential to include the receptor information to generate a high‐quality 3D structure of the peptide. In the second stage, rigid protein–peptide docking is performed using ZDOCK software. In the last stage, the top 10 docking poses are relaxed and refined by ANI‐2x in conjunction with our in‐house geometry optimization algorithm—conjugate gradient with backtracking line search (CG‐BS). CG‐BS was developed by us to more efficiently perform geometry optimization, which takes the potential and force directly from ANI‐2x machine learning models. The docking protocol achieved a very encouraging performance for a set of 62 very challenging protein–peptide systems which had an overall success rate of 34% if only the top 1 docking poses were considered. This success rate increased to 45% if the top 3 docking poses were considered. It is emphasized that this encouraging protein–peptide docking performance was achieved without using any crystal or experimental structures. 
    more » « less
  5. Abstract The logarithm ofn‐octanol–water partition coefficient (logP) is frequently used as an indicator of lipophilicity in drug discovery, which has substantial impacts on the absorption, distribution, metabolism, excretion, and toxicity of a drug candidate. Considering that the experimental measurement of the property is costly and time‐consuming, it is of great importance to develop reliable prediction models for logP. In this study, we developed a transfer free energy‐based logP prediction model‐FElogP. FElogP is based on the simple principle that logP is determined by the free energy change of transferring a molecule from water ton‐octanol. The underlying physical method to calculate transfer free energy is the molecular mechanics‐Poisson Boltzmann surface area (MM‐PBSA), thus this method is named as free energy‐based logP (FElogP). The superiority of FElogP model was validated by a large set of 707 structurally diverse molecules in the ZINC database for which the measurement was of high quality. Encouragingly, FElogP outperformed several commonly‐used QSPR or machine learning‐based logP models, as well as some continuum solvation model‐based methods. The root‐mean‐square error (RMSE) and Pearson correlation coefficient (R) between the predicted and measured values are 0.91 log units and 0.71, respectively, while the runner‐up, the logP model implemented in OpenBabel had an RMSE of 1.13 log units and R of 0.67. Given the fact that FElogP was not parameterized against experimental logP directly, its excellent performance is likely to be expanded to arbitrary organic molecules covered by the general AMBER force fields. 
    more » « less
  6. Abstract Accurate estimation of solvation free energy (SFE) lays the foundation for accurate prediction of binding free energy. The Poisson‐Boltzmann (PB) or generalized Born (GB) combined with surface area (SA) continuum solvation method (PBSA and GBSA) have been widely used in SFE calculations because they can achieve good balance between accuracy and efficiency. However, the accuracy of these methods can be affected by several factors such as the charge models, polar and nonpolar SFE calculation methods and the atom radii used in the calculation. In this work, the performance of the ABCG2 (AM1‐BCC‐GAFF2) charge model as well as other two charge models, that is, RESP (Restrained Electrostatic Potential) and AM1‐BCC (Austin Model 1‐bond charge corrections), on the SFE prediction of 544 small molecules in water by PBSA/GBSA was evaluated. In order to improve the performance of the PBSA prediction based on the ABCG2 charge, we further explored the influence of atom radii on the prediction accuracy and yielded a set of atom radius parameters for more accurate SFE prediction using PBSA based on the ABCG2/GAFF2 by reproducing the thermodynamic integration (TI) calculation results. The PB radius parameters of carbon, oxygen, sulfur, phosphorus, chloride, bromide and iodine, were adjusted. New atom types,on,oi,hn1,hn2,hn3, were introduced to further improve the fitting performance. Then, we tuned the parameters in the nonpolar SFE model using the experimental SFE data and the PB calculation results. By adopting the new radius parameters and new nonpolar SFE model, the root mean square error (RMSE) of the SFE calculation for the 544 molecules decreased from 2.38 to 1.05 kcal/mol. Finally, the new radius parameters were applied in the prediction of protein‐ligand binding free energies using the MM‐PBSA method. For the eight systems tested, we could observe higher correlation between the experiment data and calculation results and smaller prediction errors for the absolute binding free energies, demonstrating that our new radius parameters can improve the free energy calculation using the MM‐PBSA method. 
    more » « less
  7. Structure-based virtual screening utilizes molecular docking to explore and analyze ligand–macromolecule interactions, crucial for identifying and developing potential drug candidates. Although there is availability of several widely used docking programs, the accurate prediction of binding affinity and binding mode still presents challenges. In this study, we introduced a novel protocol that combines our in-house geometry optimization algorithm, the conjugate gradient with backtracking line search (CG-BS), which is capable of restraining and constraining rotatable torsional angles and other geometric parameters with a highly accurate machine learning potential, ANI-2x, renowned for its precise molecular energy predictions reassembling the wB97X/6-31G(d) model. By integrating this protocol with binding pose prediction using the Glide, we conducted additional structural optimization and potential energy prediction on 11 small molecule–macromolecule and 12 peptide–macromolecule systems. We observed that ANI-2x/CG-BS greatly improved the docking power, not only optimizing binding poses more effectively, particularly when the RMSD of the predicted binding pose by Glide exceeded around 5 Å, but also achieving a 26% higher success rate in identifying those native-like binding poses at the top rank compared to Glide docking. As for the scoring and ranking powers, ANI-2x/CG-BS demonstrated an enhanced performance in predicting and ranking hundreds or thousands of ligands over Glide docking. For example, Pearson’s and Spearman’s correlation coefficients remarkedly increased from 0.24 and 0.14 with Glide docking to 0.85 and 0.69, respectively, with the addition of ANI-2x/CG-BS for optimizing and ranking small molecules binding to the bacterial ribosomal aminoacyl-tRNA receptor. These results suggest that ANI-2x/CG-BS holds considerable potential for being integrated into virtual screening pipelines due to its enhanced docking performance. 
    more » « less