Molecular docking is a computational technique used to predict ligand binding potential, conformation, and location for a given receptor, and is regarded as an attractive method to use in drug design due to its relatively low computational and monetary cost. However, molecular docking programs tend not to be accessible to novice users. To increase general access to molecular docking, basil_dock utilizes a series of easy-to-use Jupyter notebooks that do not assume familiarity with molecular docking procedures and concepts, requiring little command-line usage and software installation. The notebooks, divided based on the different steps in the molecular docking process, focus on user customization and flexibility as well as teaching users the basis behind molecular docking. The first version of basil_dock allows users to choose from receptors uploaded to the Protein Data Bank and to add additional ligands as desired. Users can then select between the Vina and Smina docking engines and change ligand functional groups to see how the substitution of atom groups affects binding affinity and ligand conformation. Machine learning algorithms can then be utilized to determine residues in the receptor and atom groups in the ligand that are likely to be integral to forming the ligand-protein complex and to discern which ligands are likely to be orally bioactive based on Lipinski’s Rule of Five. 
                        more » 
                        « less   
                    This content will become publicly available on April 13, 2026
                            
                            Molecular docking with Python in Jupyter Notebooks: Towards the development of accessible docking procedures
                        
                    
    
            Molecular docking is a computational technique used to predict ligand binding potential, conformation, and location for a given receptor, and is regarded as an attractive method to use in drug design due to its relatively low computational and monetary cost. However, molecular docking programs tend not to be accessible to novice users. Most docking programs require at least a basic knowledge of command line and computer programming to install and configure the program. Additionally, tutorials for the most commonly used programs tend to be inflexible, requiring a specific molecule or set of molecules to be bound to a specific receptor, and need the installation and usage of other programs or websites to download and prepare structures. To increase general access to molecular docking, basil_dock utilizes a series of easy-to-use Jupyter notebooks that do not assume user familiarity with molecular docking procedures and concepts, requiring little command line usage and software installation. The series includes four notebooks that were created to reflect the different steps in the molecular docking process: (1) the preparation of ligand and protein files prior to docking, (2) the docking of ligands to a protein receptor, (3) analyzing the resulting data and determining how different functional groups in the ligand can affect protein-ligand binding, and (4) identifying essential locations for binding within the ligand and protein. The notebooks enable novice users flexibility and customization in exploring docking procedures and systems, as well as teaching users the basis behind molecular docking without having to leave the environment to obtain information and materials from other applications. The first version of basil_dock allows users to choose from receptors uploaded to the Protein Data Bank and to add additional ligands as desired. Users can then select between the Vina and Smina docking engines and change ligand functional groups to see how the substitution of atom groups affects binding affinity and ligand conformation. The data can then be analyzed to determine residues in the receptor and atom groups in the ligand that are likely to be integral to forming the ligand-protein complex and to discern which ligands are likely to be orally bioactive based on Lipinski’s Rule of Five. From this work, a package of python scripts has been created to streamline the generating, splitting, and writing of ligand files, greatly reducing the number of errors arising from attempting to split a comprehensive ligand file manually. Libraries used in basil_dock include Vina, Smina, RDKit, openbabel, and MDAnalysis. While the package has been designed based off the needs of basil_dock, it has been created to be extensible. Support for this project was provided by NSF 2142033 
        more » 
        « less   
        
    
                            - Award ID(s):
- 2142033
- PAR ID:
- 10592069
- Publisher / Repository:
- ASBMB
- Date Published:
- Format(s):
- Medium: X
- Location:
- ASBMB National Meeting, Chicago, IL
- Sponsoring Org:
- National Science Foundation
More Like this
- 
            
- 
            Virtual screening is a cost- and time-effective alternative to traditional high-throughput screening in the drug discovery process. Both virtual screening approaches, structure-based molecular docking and ligand-based cheminformatics, suffer from computational cost, low accuracy, and/or reliance on prior knowledge of a ligand that binds to a given target. Here, we propose a neural network framework, NeuralDock, which accelerates the process of high-quality computational docking by a factor of 10 6 , and does not require prior knowledge of a ligand that binds to a given target. By approximating both protein-small molecule conformational sampling and energy-based scoring, NeuralDock accurately predicts the binding energy, and affinity of a protein-small molecule pair, based on protein pocket 3D structure and small molecule topology. We use NeuralDock and 25 GPUs to dock 937 million molecules from the ZINC database against superoxide dismutase-1 in 21 h, which we validate with physical docking using MedusaDock. Due to its speed and accuracy, NeuralDock may be useful in brute-force virtual screening of massive chemical libraries and training of generative drug models.more » « less
- 
            Metabotropic glutamate receptors (mGluRs) play an important role in regulating glutamate signal pathways, which are involved in neuropathy and periphery homeostasis. mGluR4, which belongs to Group III mGluRs, is most widely distributed in the periphery among all the mGluRs. It has been proved that the regulation of this receptor is involved in diabetes, colorectal carcinoma and many other diseases. However, the application of structure-based drug design to identify small molecules to regulate the mGluR4 receptor is limited due to the absence of a resolved mGluR4 protein structure. In this work, we first built a homology model of mGluR4 based on a crystal structure of mGluR8, and then conducted hierarchical virtual screening (HVS) to identify possible active ligands for mGluR4. The HVS protocol consists of three hierarchical filters including Glide docking, molecular dynamic (MD) simulation and binding free energy calculation. We successfully prioritized active ligands of mGluR4 from a set of screening compounds using HVS. The predicted active ligands based on binding affinities can almost cover all the experiment-determined active ligands, with only one ligand missed. The correlation between the measured and predicted binding affinities is significantly improved for the MM-PB/GBSA-WSAS methods compared to the Glide docking method. More importantly, we have identified hotspots for ligand binding, and we found that SER157 and GLY158 tend to contribute to the selectivity of mGluR4 ligands, while ALA154 and ALA155 could account for the ligand selectivity to mGluR8. We also recognized other 5 key residues that are critical for ligand potency. The difference of the binding profiles between mGluR4 and mGluR8 can guide us to develop more potent and selective modulators. Moreover, we evaluated the performance of IPSF, a novel type of scoring function trained by a machine learning algorithm on residue–ligand interaction profiles, in guiding drug lead optimization. The cross-validation root-mean-square errors (RMSEs) are much smaller than those by the endpoint methods, and the correlation coefficients are comparable to the best endpoint methods for both mGluRs. Thus, machine learning-based IPSF can be applied to guide lead optimization, albeit the total number of actives/inactives are not big, a typical scenario in drug discovery projects.more » « less
- 
            Predicting the binding structure of a small molecule ligand to a protein -- a task known as molecular docking -- is critical to drug design. Recent deep learning methods that treat docking as a regression problem have decreased runtime compared to traditional search-based methods but have yet to offer substantial improvements in accuracy. We instead frame molecular docking as a generative modeling problem and develop DiffDock, a diffusion generative model over the non-Euclidean manifold of ligand poses. To do so, we map this manifold to the product space of the degrees of freedom (translational, rotational, and torsional) involved in docking and develop an efficient diffusion process on this space. Empirically, DiffDock obtains a 38% top-1 success rate (RMSD<2A) on PDBBind, significantly outperforming the previous state-of-the-art of traditional docking (23%) and deep learning (20%) methods. Moreover, while previous methods are not able to dock on computationally folded structures (maximum accuracy 10.4%), DiffDock maintains significantly higher precision (21.7%). Finally, DiffDock has fast inference times and provides confidence estimates with high selective accuracy.more » « less
- 
            Abstract In the ligand prediction category of CASP15, the challenge was to predict the positions and conformations of small molecules binding to proteins that were provided as amino acid sequences or as models generated by the AlphaFold2 program. For most targets, we used our template‐based ligand docking program ClusPro ligTBM, also implemented as a public server available athttps://ligtbm.cluspro.org/. Since many targets had multiple chains and a number of ligands, several templates, and some manual interventions were required. In a few cases, no templates were found, and we had to use direct docking using the Glide program. Nevertheless, ligTBM was shown to be a very useful tool, and by any ranking criteria, our group was ranked among the top five best‐performing teams. In fact, all the best groups used template‐based docking methods. Thus, it appears that the AlphaFold2‐generated models, despite the high accuracy of the predicted backbone, have local differences from the x‐ray structure that make the use of direct docking methods more challenging. The results of CASP15 confirm that this limitation can be frequently overcome by homology‐based docking.more » « less
 An official website of the United States government
An official website of the United States government 
				
			 
					 
					
