NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

RCSB Protein Data Bank (RCSB.org): delivery of experimentally-determined PDB structures alongside one million computed structure models of proteins from artificial intelligence/machine learning

https://doi.org/10.1093/nar/gkac1077

Burley, Stephen K.; Bhikadiya, Charmi; Bi, Chunxiao; Bittrich, Sebastian; Chao, Henry; Chen, Li; Craig, Paul A.; Crichlow, Gregg V.; Dalenberg, Kenneth; Duarte, Jose M.; et al (November 2022, Nucleic Acids Research)

Abstract The Research Collaboratory for Structural Bioinformatics Protein Data Bank (RCSB PDB), founding member of the Worldwide Protein Data Bank (wwPDB), is the US data center for the open-access PDB archive. As wwPDB-designated Archive Keeper, RCSB PDB is also responsible for PDB data security. Annually, RCSB PDB serves >10 000 depositors of three-dimensional (3D) biostructures working on all permanently inhabited continents. RCSB PDB delivers data from its research-focused RCSB.org web portal to many millions of PDB data consumers based in virtually every United Nations-recognized country, territory, etc. This Database Issue contribution describes upgrades to the research-focused RCSB.org web portal that created a one-stop-shop for open access to ∼200 000 experimentally-determined PDB structures of biological macromolecules alongside >1 000 000 incorporated Computed Structure Models (CSMs) predicted using artificial intelligence/machine learning methods. RCSB.org is a ‘living data resource.’ Every PDB structure and CSM is integrated weekly with related functional annotations from external biodata resources, providing up-to-date information for the entire corpus of 3D biostructure data freely available from RCSB.org with no usage limitations. Within RCSB.org, PDB structures and the CSMs are clearly identified as to their provenance and reliability. Both are fully searchable, and can be analyzed and visualized using the full complement of RCSB.org web portal capabilities.
more » « less
Advanced Searches of the Protein Data Bank using Python in Jupyter Notebooks

Greever, Victoria; Rack, Anna; Pick, Murphy; Schoneman, Lee; Craig, Paul A (April 2025, ASBMB)

The Protein Data Bank (PDB) holds an extensive amount of information, and can be a vital tool when performing background research for biochemical work. In an attempt to make the information in the PDB more accessible, the RCSB Search API was employed within Jupyter Notebooks to create more customizable and user-friendly tools with Python code. Areas of focus include searches targeting ligands with specific characteristics, searches for FDA Approved Drugs, as well as sequence searches, used to search for entries based on different sequence characteristics. This code has been built into Jupyter Notebook templates that include examples of these searches as well as annotated code that users can customize to more efficiently run advanced searches on the PDB and download structure and small molecule files returned by the search. These notebooks also walk users through different ways to organize or utilize the returns from advanced searches. Future plans include increasing the amount and type of information available from a search, improved ease of access for visualizing and downloading search results, and expanding the scope of our notebooks to cover more types of searches. This research was supported by NSF-IUSE award number 2142033.
more » « less
Free, publicly-accessible full text available April 14, 2026
Molecular docking with Python in Jupyter Notebooks: Towards the development of accessible docking procedures

Schoneman, Lee; Craig, Paul A (April 2025, ASBMB)

Molecular docking is a computational technique used to predict ligand binding potential, conformation, and location for a given receptor, and is regarded as an attractive method to use in drug design due to its relatively low computational and monetary cost. However, molecular docking programs tend not to be accessible to novice users. Most docking programs require at least a basic knowledge of command line and computer programming to install and configure the program. Additionally, tutorials for the most commonly used programs tend to be inflexible, requiring a specific molecule or set of molecules to be bound to a specific receptor, and need the installation and usage of other programs or websites to download and prepare structures. To increase general access to molecular docking, basil_dock utilizes a series of easy-to-use Jupyter notebooks that do not assume user familiarity with molecular docking procedures and concepts, requiring little command line usage and software installation. The series includes four notebooks that were created to reflect the different steps in the molecular docking process: (1) the preparation of ligand and protein files prior to docking, (2) the docking of ligands to a protein receptor, (3) analyzing the resulting data and determining how different functional groups in the ligand can affect protein-ligand binding, and (4) identifying essential locations for binding within the ligand and protein. The notebooks enable novice users flexibility and customization in exploring docking procedures and systems, as well as teaching users the basis behind molecular docking without having to leave the environment to obtain information and materials from other applications. The first version of basil_dock allows users to choose from receptors uploaded to the Protein Data Bank and to add additional ligands as desired. Users can then select between the Vina and Smina docking engines and change ligand functional groups to see how the substitution of atom groups affects binding affinity and ligand conformation. The data can then be analyzed to determine residues in the receptor and atom groups in the ligand that are likely to be integral to forming the ligand-protein complex and to discern which ligands are likely to be orally bioactive based on Lipinski’s Rule of Five. From this work, a package of python scripts has been created to streamline the generating, splitting, and writing of ligand files, greatly reducing the number of errors arising from attempting to split a comprehensive ligand file manually. Libraries used in basil_dock include Vina, Smina, RDKit, openbabel, and MDAnalysis. While the package has been designed based off the needs of basil_dock, it has been created to be extensible. Support for this project was provided by NSF 2142033
more » « less
Free, publicly-accessible full text available April 13, 2026
BASIL, RCSB Protein Data Bank, and the NSF

Craig, Paul A; Hall, Bonnie L (March 2025, RCSB Protein Data Bank; https://pdb101.rcsb.org/train/education-corner)

This year, the National Science Foundation (NSF) is celebratingits 75th anniversary. NSF support was essential in the originaldevelopment of BASIL (Biochemistry Authentic Scientific InquiryLab). Ongoing NSF support over the past ten years has enabled the BASILcommunity to grow in numbers and in collaboration with other teacher/scholar teamswho are seeking to change undergraduate biochemistry education. At the same time,NSF support has also provided support for our most critical online resource, theRCSB Protein Data Bank, which has always provided us with the structures that westudy and, increasingly, is providing us with the tools that our students use to explorethese structures and predict their function.
more » « less
Free, publicly-accessible full text available March 31, 2026
Advanced searches of the Protein Data Bank in Jupyter notebooks

Rack, Anna; Pick, Murphy; Greever, Victoria; Craig, Paul A (March 2025, American Chemical Society)

The Protein Data Bank (PDB) holds an extensive amount of information, and can be a vital tool when performing background research for biochemical work. In an attempt to make the information in the PDB more accessible, the RCSB Search API was employed within Jupyter Notebooks to create more customizable and user-friendly tools with simple Python code. Areas of focus include structure motif searches used to predict the function of proteins based on the 3-dimensional shape of their active sites, searches for FDA Approved Drugs, as well as searches targeting ligands with specific characteristics. This code has been built into Jupyter Notebook templates that include both examples of these searches as well as annotated code that users can customize to more efficiently run advanced searches on the PDB and download structure and small molecule files returned by the search. Future plans include increasing the amount and type of information available from a search, as well as expanding the scope of our notebooks to cover more types of searches.
more » « less
Free, publicly-accessible full text available March 24, 2026
Molecular docking with Python in Jupyter Notebooks: Towards the development of accessible docking procedures

Schoneman, Lee; Craig, Paul A (March 2025, American Chemical Society)

Molecular docking is a computational technique used to predict ligand binding potential, conformation, and location for a given receptor, and is regarded as an attractive method to use in drug design due to its relatively low computational and monetary cost. However, molecular docking programs tend not to be accessible to novice users. To increase general access to molecular docking, basil_dock utilizes a series of easy-to-use Jupyter notebooks that do not assume familiarity with molecular docking procedures and concepts, requiring little command-line usage and software installation. The notebooks, divided based on the different steps in the molecular docking process, focus on user customization and flexibility as well as teaching users the basis behind molecular docking. The first version of basil_dock allows users to choose from receptors uploaded to the Protein Data Bank and to add additional ligands as desired. Users can then select between the Vina and Smina docking engines and change ligand functional groups to see how the substitution of atom groups affects binding affinity and ligand conformation. Machine learning algorithms can then be utilized to determine residues in the receptor and atom groups in the ligand that are likely to be integral to forming the ligand-protein complex and to discern which ligands are likely to be orally bioactive based on Lipinski’s Rule of Five.
more » « less
Free, publicly-accessible full text available March 23, 2026
Expanding the BASIL CURE

https://doi.org/10.35459/tbp.2024.000273

Koeppe, Julia R; Dattelbaum, Jonathan D; Hall, Bonnie L; Mills, Stephen A; Offerdahl, Erika G; Pikaart, Michael J; Roberts, Rebecca; Sikora, Arthur; Craig, Paul A (January 2025, The Biophysicist)

In the Biochemistry Authentic Scientific Inquiry Lab (BASIL) course-based undergraduate research experience, students use a series of computational (sequence and structure comparison, docking) and wet lab (protein expression, purification, and concentration; sodium dodecyl sulfate-polyacrylamide gel electrophoresis [SDS-PAGE]; enzyme activity and kinetics) modules to predict and test the function of protein structures of unknown function found in the Protein Data Bank and UniProt. BASIL was established in 2015 with a core of 10 faculty members on six campuses, with the support of an educational researcher and doctoral student on a seventh campus. Since that time, the number of participating faculty members and campuses has grown, and we have adapted our curriculum to improve access for all who are interested. We have also expanded our curriculum to include new developments that are appearing in computational approaches to life science research. In this article, we provide a history of BASIL, explain our current approach, describe how we have addressed challenges that have appeared, and describe our curriculum development pipeline and our plans for moving forward in a sustainable and equitable fashion.
more » « less
Free, publicly-accessible full text available January 31, 2026
Incorporating Coding into the Classroom: An Important Component of Modern Bioinformatics Instruction

https://doi.org/10.1080/0047231X.2024.2405593

Orench-Rivera, Nichole; Bednarski, April; Craig, Paul; Talbot, Austin (January 2025, Journal of College Science Teaching)

Advancements in computation and machine learning have revolutionized science, enabling researchers to address once insurmountable challenges. Bioinformatics, a field that heavily relies on computer-driven analysis of biological data, has greatly benefited from these developments. However, traditional bioinformatics instruction frequently lacks the necessary coding skills. This article explores the transformation of a bioinformatics course in which feedback from students revealed limitations in traditional web application interfaces and the absence of coding automated pipelines for real-world applications. To address these shortcomings, the authors redesigned the project to incorporate computer programming using Google Colaboratory, where students access databases and websites by coding. The curriculum outlined the integration of modern programming skills with essential bioinformatics concepts. This article evaluates the effectiveness of this redesign by analyzing a selfresponse survey completed by course participants. Results show a positive impact on students’ perception of science and scientific research. Bayesian statistical analysis reveals that the programming component significantly predicts students’ career clarity in science and their pursuit of graduate education. Integrating coding exercises in bioinformatics education enhances students’ preparedness for real-world applications. The freely available GitHub repository will facilitate adoption. By embracing computational tools, students can become adept researchers capable of tackling complex biological questions.
more » « less
Full Text Available
The BASIL CURE Initiative: Transforming How Students Learn Biochemistry Through Real Research

https://doi.org/10.33548/SCIENTIA1168

Craig, Paul A; Hall, Bonnie L; Koeppe, Julia R; Roberts, Rebecca (January 2025, Scientia)

Teaching students how to think like scientists is a critical but challenging goal in biochemistry education. The Biochemistry Authentic Scientific Inquiry Lab (BASIL) initiative was conceived by Dr Paul Craig from the Rochester Institute of Technology and is led by colleagues across multiple institutions. They have developed an innovative curriculum that transforms traditional cookbook-style laboratory courses into authentic research experiences, also known as a Course-based Undergraduate Research Experience (CURE). By investigating real proteins with unknown functions, students learn essential scientific skills while expanding our knowledge of protein biochemistry.
more » « less
Full Text Available
The elephant in the room—Should we be teaching coding to basic science students?

https://doi.org/10.1002/bmb.21873

Novak, Walter; Craig, Paul; Foster, Michael (January 2025, Biochemistry and Molecular Biology Education)

Editorial in Biochemistry & Molecular Biology Education
more » « less
Full Text Available

« Prev Next »

Search for: All records