NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Advancing promiscuous aggregating inhibitor analysis with intelligent machine learning classification

https://doi.org/10.1093/bib/bbaf205

Wang, Luxuan; Ji, Beihong; Zhai, Jingchen; Wang, Junmei (May 2025, Briefings in Bioinformatics)

Abstract Small molecules have been playing a crucial role in drug discovery; however, some exhibit nonspecific inhibitory effects during hit screening due to the formation of colloidal aggregators. Such false positives often lead to significant research costs and time investment. Therefore, to identify potential aggregating compounds efficiently and accurately at an early stage of drug discovery, we employed several machine learning techniques to develop classification models for identifying promiscuous aggregating inhibitors. Using a training dataset of 10 000 aggregators and 10 000 nonaggregators, models were trained by combining four different molecular representations with various machine learning algorithms. We found that the best-performing model is the one that employs path-based FP2 fingerprints in conjunction with the cubic support vector machine algorithm, which achieved the highest accuracy and area under the receiver operating characteristic curve values for both the validation and test datasets while maintaining high sensitivity and specificity levels (>0.93). Additionally, we have proposed a new model interpretation method, global sensitivity analysis (GSA), to complement the well-recognized SHapley Additive exPlanations analysis. Several comparative studies have shown that GSA is a time-efficient and accurate approach for identifying crucial descriptors that contribute to model prediction, especially in the scenario where the dataset contains a substantial number of data entries with a limited set of descriptors. Our models as well as GSA findings can provide useful guidance on screening library design to minimize false positives.
more » « less
Full Text Available
AI ‐Assisted Protein–Peptide Complex Prediction in a Practical Setting

https://doi.org/10.1002/jcc.70137

Wang, Darren Y; Wang, Luxuan; Mi, Andrew; Wang, Junmei (May 2025, Journal of Computational Chemistry)

ABSTRACT Accurate prediction of protein–peptide complex structures plays a critical role in structure‐based drug design, including antibody design. Most peptide‐docking benchmark studies were conducted using crystal structures of protein–peptide complexes; as such, the performance of the current peptide docking tools in the practical setting is unknown. Here, the practical setting implies there are no crystal or other experimental structures for the complex, nor for the receptor and peptide. In this work, we have developed a practical docking protocol that incorporated two famous machine learning models, AlphaFold 2 for structural prediction and ANI‐2x for ab initio potential prediction, to achieve a high success rate in modeling protein–peptide complex structures. The docking protocol consists of three major stages. In the first stage, the 3D structure of the receptor is predicted by AlphaFold 2 using the monomer mode, and that of the peptide is predicted by AlphaFold 2 using the multimer mode. We found that it is essential to include the receptor information to generate a high‐quality 3D structure of the peptide. In the second stage, rigid protein–peptide docking is performed using ZDOCK software. In the last stage, the top 10 docking poses are relaxed and refined by ANI‐2x in conjunction with our in‐house geometry optimization algorithm—conjugate gradient with backtracking line search (CG‐BS). CG‐BS was developed by us to more efficiently perform geometry optimization, which takes the potential and force directly from ANI‐2x machine learning models. The docking protocol achieved a very encouraging performance for a set of 62 very challenging protein–peptide systems which had an overall success rate of 34% if only the top 1 docking poses were considered. This success rate increased to 45% if the top 3 docking poses were considered. It is emphasized that this encouraging protein–peptide docking performance was achieved without using any crystal or experimental structures.
more » « less
Full Text Available
NNKcat: deep neural network to predict catalytic constants (Kcat) by integrating protein sequence and substrate structure with enhanced data imbalance handling

https://doi.org/10.1093/bib/bbaf212

Zhai, Jingchen; Qi, Xiguang; Cai, Lianjin; Liu, Yue; Tang, Haocheng; Xie, Lei; Wang, Junmei (May 2025, Briefings in Bioinformatics)

Abstract Catalytic constant (Kcat) is to describe the efficiency of catalyzing reactions. The Kcat value of an enzyme-substrate pair indicates the rate an enzyme converts saturated substrates into product during the catalytic process. However, it is challenging to construct robust prediction models for this important property. Most of the existing models, including the one recently published by Nature Catalysis (Li et al.), are suffering from the overfitting issue. In this study, we proposed a novel protocol to construct Kcat prediction models, introducing an intermedia step to separately develop substrate and protein processors. The substrate processor leverages analyzing Simplified Molecular Input Line Entry System (SMILES) strings using a graph neural network model, attentive FP, while the protein processor abstracts protein sequence information utilizing long short-term memory architecture. This protocol not only mitigates the impact of data imbalance in the original dataset but also provides greater flexibility in customizing the general-purpose Kcat prediction model to enhance the prediction accuracy for specific enzyme classes. Our general-purpose Kcat prediction model demonstrates significantly enhanced stability and slightly better accuracy (R2 value of 0.54 versus 0.50) in comparison with Li et al.’s model using the same dataset. Additionally, our modeling protocol enables personalization of fine-tuning the general-purpose Kcat model for specific enzyme categories through focused learning. Using Cytochrome P450 (CYP450) enzymes as a case study, we achieved the best R2 value of 0.64 for the focused model. The high-quality performance and expandability of the model guarantee its broad applications in enzyme engineering and drug research & development.
more » « less
Full Text Available
Intranasal diamorphine population pharmacokinetics modeling and simulation in pediatric breakthrough pain

https://doi.org/10.1002/psp4.13186

Cai, Lianjin; Zhai, Jingchen; Ji, Beihong; Han, Fengyang; Niu, Taoyu; Wang, Luxuan; Wang, Junmei (March 2025, CPT: Pharmacometrics & Systems Pharmacology)

Abstract Intranasal diamorphine (IND), approved for managing breakthrough pain in the UK, has been identified as an acceptable alternative offering effective, expedient, and less traumatic analgesia for children. However, the current dose regimen in pediatric populations relies on clinical expertise while the pharmacokinetics properties are poorly understood. This study aimed to develop diamorphine population pharmacokinetics (pop‐PK) models and simulate the IND dosing in virtual pediatric subjects. An integrated four‐compartment pop‐PK model with first‐order absorption and elimination provided an appropriate fit and characterized publicly available 385 concentration measurements of diamorphine, 6‐monoacetylmorphine, and morphine collected from adults. Body weight allometry and renal function maturation (age) were incorporated into the final model, serving as two covariates. The estimated IND relative bioavailability was around 52% compared with intramuscularly injected diamorphine. Using this final model, the morphine plasma concentrations, as the active metabolite for pain relief, were simulated in virtual subjects. The utility of model extrapolation was supported by external verification with acceptable average fold errors of 1.06 ± 0.30 and 0.83 ± 0.07 for morphine maximum concentration and exposures. Meanwhile, the simulated morphine concentration–time profiles could recover the PK profiles observed in children after a single dose of IND. The model‐based dosing simulations were therefore assessed in four children age groups to match the therapeutic window of morphine concentrations in steady state (10–20 μg/L). Our study demonstrates that the dose regimen of 0.3 mg/kg loading dose plus 0.1 mg/kg hourly maintenance dose is generally appropriate for multiple pediatric populations with breakthrough pain, in the view of PK.
more » « less
Full Text Available
Accurate Free Energy Calculation via Multiscale Simulations Driven by Hybrid Machine Learning and Molecular Mechanics Potentials

https://doi.org/10.1021/acs.jctc.5c00598

Wang, Xujian; Wu, Xiongwu; Brooks, Bernard R; Wang, Junmei (July 2025, Journal of Chemical Theory and Computation)

Full Text Available
ABCG2: A Milestone Charge Model for Accurate Solvation Free Energy Calculation

https://doi.org/10.1021/acs.jctc.5c00038

He, Xibing; Man, Viet H; Yang, Wei; Lee, Tai-Sung; Wang, Junmei (March 2025, Journal of Chemical Theory and Computation)

Full Text Available
Effects of All-Atom and Coarse-Grained Molecular Mechanics Force Fields on Amyloid Peptide Assembly: The Case of a Tau K18 Monomer

https://doi.org/10.1021/acs.jcim.4c01448

He, Xibing; Man, Viet Hoang; Gao, Jie; Wang, Junmei (December 2024, Journal of Chemical Information and Modeling)

Full Text Available
Geometry Optimization Algorithms in Conjunction with the Machine Learning Potential ANI-2x Facilitate the Structure-Based Virtual Screening and Binding Mode Prediction

https://doi.org/10.3390/biom14060648

Wang, Luxuan; He, Xibing; Ji, Beihong; Han, Fengyang; Niu, Taoyu; Cai, Lianjin; Zhai, Jingchen; Hao, Dongxiao; Wang, Junmei (June 2024, Biomolecules)

Structure-based virtual screening utilizes molecular docking to explore and analyze ligand–macromolecule interactions, crucial for identifying and developing potential drug candidates. Although there is availability of several widely used docking programs, the accurate prediction of binding affinity and binding mode still presents challenges. In this study, we introduced a novel protocol that combines our in-house geometry optimization algorithm, the conjugate gradient with backtracking line search (CG-BS), which is capable of restraining and constraining rotatable torsional angles and other geometric parameters with a highly accurate machine learning potential, ANI-2x, renowned for its precise molecular energy predictions reassembling the wB97X/6-31G(d) model. By integrating this protocol with binding pose prediction using the Glide, we conducted additional structural optimization and potential energy prediction on 11 small molecule–macromolecule and 12 peptide–macromolecule systems. We observed that ANI-2x/CG-BS greatly improved the docking power, not only optimizing binding poses more effectively, particularly when the RMSD of the predicted binding pose by Glide exceeded around 5 Å, but also achieving a 26% higher success rate in identifying those native-like binding poses at the top rank compared to Glide docking. As for the scoring and ranking powers, ANI-2x/CG-BS demonstrated an enhanced performance in predicting and ranking hundreds or thousands of ligands over Glide docking. For example, Pearson’s and Spearman’s correlation coefficients remarkedly increased from 0.24 and 0.14 with Glide docking to 0.85 and 0.69, respectively, with the addition of ANI-2x/CG-BS for optimizing and ranking small molecules binding to the bacterial ribosomal aminoacyl-tRNA receptor. These results suggest that ANI-2x/CG-BS holds considerable potential for being integrated into virtual screening pipelines due to its enhanced docking performance.
more » « less
Full Text Available
Editorial: In silico gating mechanism studies and modulator discovery for MscL

https://doi.org/10.3389/fchem.2024.1376617

Wang, Junmei; Blount, Paul; Hou, Tingjun; Sokabe, Masahiro (February 2024, Frontiers in Chemistry)

Full Text Available
In Silico Screening of Natural Flavonoids against 3-Chymotrypsin-like Protease of SARS-CoV-2 Using Machine Learning and Molecular Modeling

https://doi.org/10.3390/molecules28248034

Cai, Lianjin; Han, Fengyang; Ji, Beihong; He, Xibing; Wang, Luxuan; Niu, Taoyu; Zhai, Jingchen; Wang, Junmei (December 2023, Molecules)

The “Long-COVID syndrome” has posed significant challenges due to a lack of validated therapeutic options. We developed a novel multi-step virtual screening strategy to reliably identify inhibitors against 3-chymotrypsin-like protease of SARS-CoV-2 from abundant flavonoids, which represents a promising source of antiviral and immune-boosting nutrients. We identified 57 interacting residues as contributors to the protein-ligand binding pocket. Their energy interaction profiles constituted the input features for Machine Learning (ML) models. The consensus of 25 classifiers trained using various ML algorithms attained 93.9% accuracy and a 6.4% false-positive-rate. The consensus of 10 regression models for binding energy prediction also achieved a low root-mean-square error of 1.18 kcal/mol. We screened out 120 flavonoid hits first and retained 50 drug-like hits after predefined ADMET filtering to ensure bioavailability and safety profiles. Furthermore, molecular dynamics simulations prioritized nine bioactive flavonoids as promising anti-SARS-CoV-2 agents exhibiting both high structural stability (root-mean-square deviation < 5 Å for 218 ns) and low MM/PBSA binding free energy (<−6 kcal/mol). Among them, KB-2 (PubChem-CID, 14630497) and 9-O-Methylglyceofuran (PubChem-CID, 44257401) displayed excellent binding affinity and desirable pharmacokinetic capabilities. These compounds have great potential to serve as oral nutraceuticals with therapeutic and prophylactic properties as care strategies for patients with long-COVID syndrome.
more » « less
Full Text Available

« Prev Next »

Search for: All records