This content will become publicly available on December 1, 2024
- Award ID(s):
- NSF-PAR ID:
- Date Published:
- Journal Name:
- BMC Bioinformatics
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
Schneidman-Duhovny, Dina (Ed.)Gmxapi provides an integrated, native Python API for both standard and advanced molecular dynamics simulations in GROMACS. The Python interface permits multiple levels of integration with the core GROMACS libraries, and legacy support is provided via an interface that mimics the command-line syntax, so that all GROMACS commands are fully available. Gmxapi has been officially supported since the GROMACS 2019 release and is enabled by default in current versions of the software. Here we describe gmxapi 0.3 and later. Beyond simply wrapping GROMACS library operations, the API permits several advanced operations that are not feasible using the prior command-line interface. First, the API allows custom user plugin code within the molecular dynamics force calculations, so users can execute custom algorithms without modifying the GROMACS source. Second, the Python interface allows tasks to be dynamically defined, so high-level algorithms for molecular dynamics simulation and analysis can be coordinated with loop and conditional operations. Gmxapi makes GROMACS more accessible to custom Python scripting while also providing support for high-level data-flow simulation algorithms that were previously feasible only in external packages.more » « less
AQME, automated quantum mechanical environments, is a free and open‐source Python package for the rapid deployment of automated workflows using cheminformatics and quantum chemistry. AQME workflows integrate tasks performed across multiple computational chemistry packages and data formats, preserving all computational protocols, data, and metadata for machine and human users to access and reuse. AQME has a modular structure of independent modules that can be implemented in any sequence, allowing the users to use all or only the desired parts of the program. The code has been developed for researchers with basic familiarity with the Python programming language. The CSEARCH module interfaces to molecular mechanics and semi‐empirical QM (SQM) conformer generation tools (e.g., RDKit and Conformer–Rotamer Ensemble Sampling Tool, CREST) starting from various initial structure formats. The CMIN module enables geometry refinement with SQM and neural network potentials, such as ANI. The QPREP module interfaces with multiple QM programs, such as Gaussian, ORCA, and PySCF. The QCORR module processes QM results, storing structural, energetic, and property data while also enabling automated error handling (i.e., convergence errors, wrong number of imaginary frequencies, isomerization, etc.) and job resubmission. The QDESCP module provides easy access to QM ensemble‐averaged molecular descriptors and computed properties, such as NMR spectra. Overall, AQME provides automated, transparent, and reproducible workflows to produce, analyze and archive computational chemistry results. SMILES inputs can be used, and many aspects of tedious human manipulation can be avoided. Installation and execution on Windows, macOS, and Linux platforms have been tested, and the code has been developed to support access through Jupyter Notebooks, the command line, and job submission (e.g., Slurm) scripts. Examples of pre‐configured workflows are available in various formats, and hands‐on video tutorials illustrate their use.
This article is categorized under:
Data Science > Chemoinformatics
Data Science > Computer Algorithms and Programming
Software > Quantum Chemistry
Binder is a publicly accessible online service for executing interactive notebooks based on Git repositories. Binder dynamically builds and deploys containers following a recipe stored in the repository, then gives the user a browser-based notebook interface. The Binder group periodically releases a log of container launches from the public Binder service. Archives of launch records are available here. These records do not include identifiable information like IP addresses, but do give the source repo being launched along with some other metadata. The main content of this dataset is in the
binder.sqlitefile. This SQLite database includes launch records from 2018-11-03 to 2021-06-06 in the
eventstable, which has the following schema.
CREATE TABLE events( version INTEGER, timestamp TEXT, provider TEXT, spec TEXT, origin TEXT, ref TEXT, guessed_ref TEXT ); CREATE INDEX idx_timestamp ON events(timestamp);
versionindicates the version of the record as assigned by Binder. The
originfield became available with version 3, and the
reffield with version 4. Older records where this information was not recorded will have the corresponding fields set to null.
timestampis the ISO timestamp of the launch
providergives the type of source repo being launched ("GitHub" is by far the most common). The rest of the explanations assume GitHub, other providers may differ.
specgives the particular branch/release/commit being built. It consists of
originindicates which backend was used. Each has its own storage, compute, etc. so this info might be important for evaluating caching and performance. Note that only recent records include this field. May be null.
refspecifies the git commit that was actually used, rather than the named branch referenced by
spec. Note that this was not recorded from the beginning, so only the more recent entries include it. May be null.
- For records where
refis not available, we attempted to clone the named reference given by
specrather than the specific commit (see below). The
guessed_reffield records the commit found at the time of cloning. If the branch was updated since the container was launched, this will not be the exact version that was used, and instead will refer to whatever was available at the time (early 2021). Depending on the application, this might still be useful information. Selecting only records with version 4 (or non-null
ref) will exclude these guessed commits. May be null.
The Binder launch dataset identifies the source repos that were used, but doesn't give any indication of their contents. We crawled GitHub to get the actual specification files in the repos which were fed into repo2docker when preparing the notebook environments, as well as filesystem metadata of the repos. Some repos were deleted/made private at some point, and were thus skipped. This is indicated by the absence of any row for the given commit (or absence of both
eventstable). The schema is as follows.
CREATE TABLE spec_files ( ref TEXT NOT NULL PRIMARY KEY, ls TEXT, runtime BLOB, apt BLOB, conda BLOB, pip BLOB, pipfile BLOB, julia BLOB, r BLOB, nix BLOB, docker BLOB, setup BLOB, postbuild BLOB, start BLOB );
eventstable. For each repo, we collected spec files into the following fields (see the repo2docker docs for details on what these are). The records in the database are simply the verbatim file contents, with no parsing or further processing performed.
lsfield gives a metadata listing of the repo contents (excluding the
.gitdirectory). This field is JSON encoded with the following structure based on JSON types:
- Object: filesystem directory. Keys are file names within it. Values are the contents, which can be regular files, symlinks, or subdirectories.
- String: symlink. The string value gives the link target.
- Number: regular file. The number value gives the file size in bytes.
CREATE TABLE clean_specs ( ref TEXT NOT NULL PRIMARY KEY, conda_channels TEXT, conda_packages TEXT, pip_packages TEXT, apt_packages TEXT );
clean_specstable provides parsed and validated specifications for some of the specification files (currently Pip, Conda, and APT packages). Each column gives either a JSON encoded list of package requirements, or null. APT packages have been validated using a regex adapted from the repo2docker source. Pip packages have been parsed and normalized using the Requirement class from the pkg_resources package of setuptools. Conda packages have been parsed and normalized using the
conda.models.match_spec.MatchSpecclass included with the library form of Conda (distinct from the command line tool). Users might want to use these parsers when working with the package data, as the specifications can become fairly complex.
missingtable gives the repos that were not accessible, and
event_logsrecords which log files have already been added. These tables are used for updating the dataset and should not be of interest to users.
PmagPy Online: Jupyter Notebooks, the PmagPy Software Package and the Magnetics Information Consortium (MagIC) Database Lisa Tauxe$^1$, Rupert Minnett$^2$, Nick Jarboe$^1$, Catherine Constable$^1$, Anthony Koppers$^2$, Lori Jonestrask$^1$, Nick Swanson-Hysell$^3$ $^1$Scripps Institution of Oceanography, United States of America; $^2$ Oregon State University; $^3$ University of California, Berkely; firstname.lastname@example.org The Magnetics Information Consortium (MagIC), hosted at http://earthref.org/MagIC is a database that serves as a Findable, Accessible, Interoperable, Reusable (FAIR) archive for paleomagnetic and rock magnetic data. It has a flexible, comprehensive data model that can accomodate most kinds of paleomagnetic data. The PmagPy software package is a cross-platform and open-source set of tools written in Python for the analysis of paleomagnetic data that serves as one interface to MagIC, accommodating various levels of user expertise. It is available through github.com/PmagPy. Because PmagPy requires installation of Python, several non-standard Python modules, and the PmagPy software package, there is a speed bump for many practitioners on beginning to use the software. In order to make the software and MagIC more accessible to the broad spectrum of scientists interested in paleo and rock magnetism, we have prepared a set of Jupyter notebooks, hosted on jupyterhub.earthref.org which serve a set of purposes. 1) There is a complete course in Python for Earth Scientists, 2) a set of notebooks that introduce PmagPy (pulling the software package from the github repository) and illustrate how it can be used to create data products and figures for typical papers, and 3) show how to prepare data from the laboratory to upload into the MagIC database. The latter will satisfy expectations from NSF for data archiving and for example the AGU publication data archiving requirements. Getting started To use the PmagPy notebooks online, go to website at https://jupyterhub.earthref.org/. Create an Earthref account using your ORCID and log on. [This allows you to keep files in a private work space.] Open the PmagPy Online - Setup notebook and execute the two cells. Then click on File = > Open and click on the PmagPy_Online folder. Open the PmagPy_online notebook and work through the examples. There are other notebooks that are useful for the working paleomagnetist. Alternatively, you can install Python and the PmagPy software package on your computer (see https://earthref.org/PmagPy/cookbook for instructions). Follow the instructions for "Full PmagPy install and update" through section 1.4 (Quickstart with PmagPy notebooks). This notebook is in the collection of PmagPy notebooks. Overview of MagIC The Magnetics Information Consortium (MagIC), hosted at http://earthref.org/MagIC is a database that serves as a Findable, Accessible, Interoperable, Reusable (FAIR) archive for paleomagnetic and rock magnetic data. Its datamodel is fully described here: https://www2.earthref.org/MagIC/data-models/3.0. Each contribution is associated with a publication via the DOI. There are nine data tables: contribution: metadata of the associated publication. locations: metadata for locations, which are groups of sites (e.g., stratigraphic section, region, etc.) sites: metadata and derived data at the site level (units with a common expectation) samples: metadata and derived data at the sample level. specimens: metadata and derived data at the specimen level. criteria: criteria by which data are deemed acceptable ages: ages and metadata for sites/samples/specimens images: associated images and plots. Overview of PmagPy The functionality of PmagPy is demonstrated within notebooks in the PmagPy repository: PmagPy_online.ipynb: serves as an introdution to PmagPy and MagIC (this conference). It highlights the link between PmagPy and the Findable Accessible Interoperable Reusabe (FAIR) database maintained by the Magnetics Information Consortium (MagIC) at https://earthref.org/MagIC. Other notebooks of interest are: PmagPy_calculations.ipynb: demonstrates many of the PmagPy calculation functions such as those that rotate directions, return statistical parameters, and simulate data from specified distributions. PmagPy_plots_analysis.ipynb: demonstrates PmagPy functions that can be used to visual data as well as those that conduct statistical tests that have associated visualizations. PmagPy_MagIC.ipynb: demonstrates how PmagPy can be used to read and write data to and from the MagIC database format including conversion from many individual lab measurement file formats. Please see also our YouTube channel with more presentations from the 2020 MagIC workshop here: https://www.youtube.com/playlist?list=PLirL2unikKCgUkHQ3m8nT29tMCJNBj4kjmore » « less
We present an open-source eigenmode expansion (EME) software package entirely implemented in the Python programming language. Eigenmode expansion Python (EMEPy) utilizes artificial neural networks to reproduce electromagnetic eigenmode field profiles to accelerate the EME process by a factor of 3. EMEPy provides an intuitive scripting interface, is easily compatible with a number of other Python packages, and is useful for educators and new designers.