skip to main content

Title: PyRosetta Jupyter Notebooks Teach Biomolecular Structure Prediction and Design
ABSTRACT Biomolecular structure drives function, and computational capabilities have progressed such that the prediction and computational design of biomolecular structures is increasingly feasible. Because computational biophysics attracts students from many different backgrounds and with different levels of resources, teaching the subject can be challenging. One strategy to teach diverse learners is with interactive multimedia material that promotes self-paced, active learning. We have created a hands-on education strategy with a set of 16 modules that teach topics in biomolecular structure and design, from fundamentals of conformational sampling and energy evaluation to applications, such as protein docking, antibody design, and RNA structure prediction. Our modules are based on PyRosetta, a Python library that encapsulates all computational modules and methods in the Rosetta software package. The workshop-style modules are implemented as Jupyter Notebooks that can be executed in the Google Colaboratory, allowing learners access with just a Web browser. The digital format of Jupyter Notebooks allows us to embed images, molecular visualization movies, and interactive coding exercises. This multimodal approach may better reach students from different disciplines and experience levels, as well as attract more researchers from smaller labs and cognate backgrounds to leverage PyRosetta in science and engineering research. All materials are more » freely available at « less
; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; more » ; ; ; ; « less
Award ID(s):
Publication Date:
Journal Name:
The Biophysicist
Sponsoring Org:
National Science Foundation
More Like this
  1. Ensuring the public has a fundamental understanding of human–microbe interactions, immune responses, and vaccines is a critical challenge in the midst of a pandemic. These topics are commonly taught in undergraduate- and graduate-level microbiology and immunology courses; however, creating engaging methods of teaching these complex concepts to students of all ages is necessary to keep younger students interested when science seems hard. Building on the Tactile Teaching Tools with Guided Inquiry Learning (TTT-GIL) method we used to create an interactive lac operon molecular puzzle, we report here two TTT-GIL activities designed to engage diverse learners from middle schoolers to masters students in exploring molecular interactions within the immune system. By pairing physical models with structured activities built on the constructivist framework of Process-Oriented Guided Inquiry Learning (POGIL), TTT-GIL activities guide learners through their interaction with the model, using the Learning Cycle to facilitate construction of new concepts. Moreover, TTT-GIL activities are designed utilizing Universal Design for Learning (UDL) principles to include all learners through multiple means of engagement, representation, and action. The TTT-GIL activities reported here include a web-enhanced activity designed to teach concepts related to antibody–epitope binding and specificity to deaf and hard-of-hearing middle and high school students inmore »a remote setting and a team-based activity that simulates the evolution of the Major Histocompatibility Complex (MHC) haplotype of a population exposed to pathogens. These activities incorporate TTT-GIL to engage learners in the exploration of fundamental immunology concepts and can be adapted for use with learners of different levels and educational backgrounds.« less
  2. Responding to the need to teach remotely due to COVID-19, we used readily available computational approaches (and developed associated tutorials ( to teach virtual Course-Based Undergraduate Research Experience (CURE) laboratories that fulfil generally accepted main components of CUREs or Undergraduate Research Experiences (UREs): Scientific Background, Hypothesis Development, Proposal, Experiments, Teamwork, Data Analysis, Conclusions, and Presentation1. We then developed and taught remotely, in three phases, protein-centric CURE activities that are adaptable to virtually any protein, emphasizing contributions of noncovalent interactions to structure, binding and catalysis (an ASBMB learning framework2 foundational concept). The courses had five learning goals (unchanged in the virtual format),focused on i) use of primary literature and bioinformatics, ii) the roles of non-covalent interactions, iii) keeping accurate laboratory notebooks, iv) hypothesis development and research proposal writing, and, v) presenting the project and drawing evidence based conclusions The first phase, Developing a Research Proposal, contains three modules, and develops hallmarks of a good student-developed hypothesis using available literature (PubMed3) and preliminary observations obtained using bioinformatics, Module 1: Using Primary Literature and Data Bases (Protein Data Base4, Blast5 and Clustal Omega6), Module 2: Molecular Visualization (PyMol7 and Chimera8), culminating in a research proposal (Module 3). Provided rubrics guide student expectations. Inmore »the second phase, Preparing the Proteins, students prepared necessary proteins and mutants using Module 4: Creating and Validating Models, which leads users through creating mutants with PyMol, homology modeling with Phyre29 or Missense10, energy minimization using RefineD11 or ModRefiner12, and structure validation using MolProbity13. In the third phase, Computational Experimental Approaches to Explore the Questions developed from the Hypothesis, students selected appropriate tools to perform their experiments, chosen from computational techniques suitable for a CURE laboratory class taught remotely. Questions, paired with computational approaches were selected from Modules 5: Exploring Titratable Groups in a Protein using H++14, 6: Exploring Small Molecule Ligand Binding (with SwissDock15), 7: Exploring Protein-Protein Interaction (with HawkDock16), 8: Detecting and Exploring Potential Binding Sites on a Protein (with POCASA17 and SwissDock), and 9: Structure-Activity Relationships of Ligand Binding & Drug Design (with SwissDock, Open Eye18 or the Molecular Operating Environment (MOE)19). All involve freely available computational approaches on publicly accessible web-based servers around the world (with the exception of MOE). Original literature/Journal club activities on approaches helped students suggest tie-ins to wet lab experiments they could conduct in the future to complement their computational approaches. This approach allowed us to continue using high impact CURE teaching, without changing our course learning goals. Quantitative data (including replicates) was collected and analyzed during regular class periods. Students developed evidence-based conclusions and related them to their research questions and hypotheses. Projects culminated in a presentation where faculty feedback was facilitated with the Virtual Presentation platform from QUBES20 These computational approaches are readily adaptable for topics accessible for first to senior year classes and individual research projects (UREs). We used them in both partial and full semester CUREs in various institutional settings. We believe this format can benefit faculty and students from a wide variety of teaching institutions under conditions where remote teaching is necessary.« less
  3. This paper introduces a web-based interactive educational platform for 3D/polyhedral graphic statics (PGS) [1]. The Block Research Group (BRG) at ETH Zürich developed a dynamic learning and teaching platform for structural design. This tool is based on traditional graphic statics. It uses interactive 2D drawings to help designers and engineers with all skill levels to understand and utilize the methods [2]. However, polyhedral graphic statics is not easy to learn because of its characteristics in three-dimensional. All the existing computational design tools are heavily dependent on the modeling software such as Rhino or the Python-based computational framework like Compass [3]. In this research, we start with the procedural approach, developing libraries using JavaScript, Three.js, and WebGL to facilitate the construction by making it independent from any software. This framework is developed based on the mathematical and computational algorithms deriving the global equilibrium of the structure, optimizing the balanced relationship between the external magnitudes and the internal forces, visualizing the dynamic reciprocal polyhedral diagrams with corresponding topological data. This instant open-source application and the visualization interface provide a more operative platform for students, educators, practicers, and designers in an interactive manner, allowing them to learn not only the topological relationship butmore »also to deepen their knowledge and understanding of structures in the steps for the construction of the form and force diagrams and analyze it. In the simplified single-node example, the multi-step geometric procedures intuitively illustrate 3D structural reciprocity concepts. With the intuitive control panel, the user can move the constraint point’s location through the inserted gumball function, the force direction of the form diagram will be dynamically changed from compression-only to tension and compression combined. Users can also explore and design innovative, efficient spatial structures with changeable boundary conditions and constraints through real-time manipulating both force distribution and geometric form, such as adding the number of supports or subdividing the global equilibrium in the force diagram. Eventually, there is an option to export the satisfying geometry as a suitable format to share with other fabrication tools. As the online educational environment with different types of geometric examples, it is valuable to use graphical approaches to teach the structural form in an exploratory manner.« less
  4. Abstract

    AQME, automated quantum mechanical environments, is a free and open‐source Python package for the rapid deployment of automated workflows using cheminformatics and quantum chemistry. AQME workflows integrate tasks performed across multiple computational chemistry packages and data formats, preserving all computational protocols, data, and metadata for machine and human users to access and reuse. AQME has a modular structure of independent modules that can be implemented in any sequence, allowing the users to use all or only the desired parts of the program. The code has been developed for researchers with basic familiarity with the Python programming language. The CSEARCH module interfaces to molecular mechanics and semi‐empirical QM (SQM) conformer generation tools (e.g., RDKit and Conformer–Rotamer Ensemble Sampling Tool, CREST) starting from various initial structure formats. The CMIN module enables geometry refinement with SQM and neural network potentials, such as ANI. The QPREP module interfaces with multiple QM programs, such as Gaussian, ORCA, and PySCF. The QCORR module processes QM results, storing structural, energetic, and property data while also enabling automated error handling (i.e., convergence errors, wrong number of imaginary frequencies, isomerization, etc.) and job resubmission. The QDESCP module provides easy access to QM ensemble‐averaged molecular descriptors and computed properties,more »such as NMR spectra. Overall, AQME provides automated, transparent, and reproducible workflows to produce, analyze and archive computational chemistry results. SMILES inputs can be used, and many aspects of tedious human manipulation can be avoided. Installation and execution on Windows, macOS, and Linux platforms have been tested, and the code has been developed to support access through Jupyter Notebooks, the command line, and job submission (e.g., Slurm) scripts. Examples of pre‐configured workflows are available in various formats, and hands‐on video tutorials illustrate their use.

    This article is categorized under:

    Data Science > Chemoinformatics

    Data Science > Computer Algorithms and Programming

    Software > Quantum Chemistry

    « less
  5. Computing landscape is evolving rapidly. Exascale computers have arrived, which can perform 10^18 mathematical operations per second. At the same time, quantum supremacy has been demonstrated, where quantum computers have outperformed these fastest supercomputers for certain problems. Meanwhile, artificial intelligence (AI) is transforming every aspect of science and engineering. A highly anticipated application of the emerging nexus of exascale computing, quantum computing and AI is computational design of new materials with desired functionalities, which has been the elusive goal of the federal materials genome initiative. The rapid change in computing landscape resulting from these developments has not been matched by pedagogical developments needed to train the next generation of materials engineering cyberworkforce. This gap in curricula across colleges and universities offers a unique opportunity to create educational tools, enabling a decentralized training of cyberworkforce. To achieve this, we have developed training modules for a new generation of quantum materials simulator, named AIQ-XMaS (AI and quantum-computing enabled exascale materials simulator), which integrates exascalable quantum, reactive and neural-network molecular dynamics simulations with unique AI and quantum-computing capabilities to study a wide range of materials and devices of high societal impact such as optoelectronics and health. As a singleentry access point to thesemore »training modules, we have also built a CyberMAGICS (cyber training on materials genome innovation for computational software) portal, which includes step-by-step instructions in Jupyter notebooks and associated tutorials, while providing online cloud service for those who do not have access to adequate computing platform. The modules are incorporated into our open-source AIQ-XMaS software suite as tutorial examples and are piloted in classroom and workshop settings to directly train many users at the University of Southern California (USC) and Howard University—one of the largest historically black colleges and universities (HBCUs), with a strong focus on underrepresented groups. In this paper, we summarize these educational developments, including findings from the first CyberMAGICS Workshop for Underrepresented Groups, along with an introduction to the AIQ-XMaS software suite. Our training modules also include a new generation of open programming languages for exascale computing (e.g., OpenMP target) and quantum computing (e.g., Qiskit) used in our scalable simulation and AI engines that underlie AIQ-XMaS. Our training modules essentially support unique dual-degree opportunities at USC in the emerging exa-quantum-AI era: Ph.D. in science or engineering, concurrently with MS in computer science specialized in high-performance computing and simulations, MS in quantum information science or MS in materials engineering with machine learning. The developed modular cyber-training pedagogy is applicable to broad engineering education at large.« less