The Protein Data Bank (PDB) holds an extensive amount of information, and can be a vital tool when performing background research for biochemical work. In an attempt to make the information in the PDB more accessible, the RCSB Search API was employed within Jupyter Notebooks to create more customizable and user-friendly tools with Python code. Areas of focus include searches targeting ligands with specific characteristics, searches for FDA Approved Drugs, as well as sequence searches, used to search for entries based on different sequence characteristics. This code has been built into Jupyter Notebook templates that include examples of these searches as well as annotated code that users can customize to more efficiently run advanced searches on the PDB and download structure and small molecule files returned by the search. These notebooks also walk users through different ways to organize or utilize the returns from advanced searches. Future plans include increasing the amount and type of information available from a search, improved ease of access for visualizing and downloading search results, and expanding the scope of our notebooks to cover more types of searches. This research was supported by NSF-IUSE award number 2142033.
more »
« less
This content will become publicly available on March 24, 2026
Advanced searches of the Protein Data Bank in Jupyter notebooks
The Protein Data Bank (PDB) holds an extensive amount of information, and can be a vital tool when performing background research for biochemical work. In an attempt to make the information in the PDB more accessible, the RCSB Search API was employed within Jupyter Notebooks to create more customizable and user-friendly tools with simple Python code. Areas of focus include structure motif searches used to predict the function of proteins based on the 3-dimensional shape of their active sites, searches for FDA Approved Drugs, as well as searches targeting ligands with specific characteristics. This code has been built into Jupyter Notebook templates that include both examples of these searches as well as annotated code that users can customize to more efficiently run advanced searches on the PDB and download structure and small molecule files returned by the search. Future plans include increasing the amount and type of information available from a search, as well as expanding the scope of our notebooks to cover more types of searches.
more »
« less
- Award ID(s):
- 2142033
- PAR ID:
- 10592066
- Publisher / Repository:
- American Chemical Society
- Date Published:
- Format(s):
- Medium: X
- Location:
- ACS National Meeting, San Diego, CA
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
We present in this paper an automated method to assess the quality of Jupyter notebooks. The quality of notebooks is assessed in terms of reproducibility and executability. Specifically, we automatically extract a number of expert-defined features for each notebook, perform a feature selection step, and then trained supervised binary classifiers to predict whether a notebook is reproducible and executable, respectively. We also experimented with semantic code embeddings to capture the notebooks' semantics. We have evaluated these methods on a dataset of 306,539 notebooks and achieved an F1 score of 0.87 for reproducibility and 0.96 for executability (using expert-defined features) and an F1 score of 0.81 for reproducibility and 0.78 for executability (using code embeddings). Our results suggest that semantic code embeddings can be used to determine with good performance the reproducibility and executability of Jupyter notebooks, and since they can be automatically derived, they have the advantage of no need for expert involvement to define features.more » « less
-
Marschall, Tobias (Ed.)Abstract MotivationJBrowse Jupyter is a package that aims to close the gap between Python programming and genomic visualization. Web-based genome browsers are routinely used for publishing and inspecting genome annotations. Historically they have been deployed at the end of bioinformatics pipelines, typically decoupled from the analysis itself. However, emerging technologies such as Jupyter notebooks enable a more rapid iterative cycle of development, analysis and visualization. ResultsWe have developed a package that provides a Python interface to JBrowse 2’s suite of embeddable components, including the primary Linear Genome View. The package enables users to quickly set up, launch and customize JBrowse views from Jupyter notebooks. In addition, users can share their data via Google’s Colab notebooks, providing reproducible interactive views. Availability and implementationJBrowse Jupyter is released under the Apache License and is available for download on PyPI. Source code and demos are available on GitHub at https://github.com/GMOD/jbrowse-jupyter.more » « less
-
Computational notebooks promote exploration by structuring code, output, and explanatory text, into cells. The input code and rich outputs help users iteratively investigate ideas as they explore or analyze data. The links between these cells–how the cells depend on each other–are important in understanding how analyses have been developed and how the results can be reproduced. Specifically, a code cell that uses a particular identifier depends on the cell where that identifier is defined or mutated. Because notebooks promote fluid editing where cells can be moved and run in any order, cell dependencies are not always clear or easy to follow. We examine different tools that seek to address this problem by extending Jupyter notebooks and evaluate how well they support users in accomplishing tasks that require understanding dependencies. We also evaluate visualization techniques that provide views of the dependencies to help users navigate cell dependencies.more » « less
-
Debugging is a critical but challenging task for programmers. This paper proposes ChatDBG, an AI-powered debugging assistant. ChatDBG integrates large language models (LLMs) to significantly enhance the capabilities and user-friendliness of conventional debuggers. ChatDBG lets programmers engage in a collaborative dialogue with the debugger, allowing them to pose complex questions about program state, perform root cause analysis for crashes or assertion failures, and explore open-ended queries like why is x null?. To handle these queries, ChatDBG grants the LLM autonomy to take the wheel: it can act as an independent agent capable of querying and controlling the debugger to navigate through stacks and inspect program state. It then reports its findings and yields back control to the programmer. By leveraging the real-world knowledge embedded in LLMs, ChatDBG can diagnose issues identifiable only through the use of domain-specific reasoning. Our ChatDBG prototype integrates with standard debuggers including LLDB and GDB for native code and Pdb for Python. Our evaluation across a diverse set of code, including C/C++ code with known bugs and a suite of Python code including standalone scripts and Jupyter notebooks, demonstrates that ChatDBG can successfully analyze root causes, explain bugs, and generate accurate fixes for a wide range of real-world errors. For the Python programs, a single query led to an actionable bug fix 67% of the time; one additional follow-up query increased the success rate to 85%. ChatDBG has seen rapid uptake; it has already been downloaded more than 75,000 times.more » « less
An official website of the United States government
