Abstract Background The Kyoto Encyclopedia of Genes and Genomes (KEGG) provides organized genomic, biomolecular, and metabolic information and knowledge that is reasonably current and highly useful for a wide range of analyses and modeling. KEGG follows the principles of data stewardship to be findable, accessible, interoperable, and reusable (FAIR) by providing RESTful access to their database entries via their web-accessible KEGG API. However, the overall FAIRness of KEGG is often limited by the library and software package support available in a given programming language. While R library support for KEGG is fairly strong, Python library support has been lacking. Moreover, there is no software that provides extensive command line level support for KEGG access and utilization. Results We present kegg_pull, a package implemented in the Python programming language that provides better KEGG access and utilization functionality than previous libraries and software packages. Not only does kegg_pull include an application programming interface (API) for Python programming, it also provides a command line interface (CLI) that enables utilization of KEGG for a wide range of shell scripting and data analysis pipeline use-cases. As kegg_pull’s name implies, both the API and CLI provide versatile options for pulling (downloading and saving) an arbitrary (user defined) number of database entries from the KEGG API. Moreover, this functionality is implemented to efficiently utilize multiple central processing unit cores as demonstrated in several performance tests. Many options are provided to optimize fault-tolerant performance across a single or multiple processes, with recommendations provided based on extensive testing and practical network considerations. Conclusions The new kegg_pull package enables new flexible KEGG retrieval use cases not available in previous software packages. The most notable new feature that kegg_pull provides is its ability to robustly pull an arbitrary number of KEGG entries with a single API method or CLI command, including pulling an entire KEGG database. We provide recommendations to users for the most effective use of kegg_pull according to their network and computational circumstances. 
                        more » 
                        « less   
                    
                            
                            gmxapi: A GROMACS-native Python interface for molecular dynamics with ensemble and plugin support
                        
                    
    
            Gmxapi provides an integrated, native Python API for both standard and advanced molecular dynamics simulations in GROMACS. The Python interface permits multiple levels of integration with the core GROMACS libraries, and legacy support is provided via an interface that mimics the command-line syntax, so that all GROMACS commands are fully available. Gmxapi has been officially supported since the GROMACS 2019 release and is enabled by default in current versions of the software. Here we describe gmxapi 0.3 and later. Beyond simply wrapping GROMACS library operations, the API permits several advanced operations that are not feasible using the prior command-line interface. First, the API allows custom user plugin code within the molecular dynamics force calculations, so users can execute custom algorithms without modifying the GROMACS source. Second, the Python interface allows tasks to be dynamically defined, so high-level algorithms for molecular dynamics simulation and analysis can be coordinated with loop and conditional operations. Gmxapi makes GROMACS more accessible to custom Python scripting while also providing support for high-level data-flow simulation algorithms that were previously feasible only in external packages. 
        more » 
        « less   
        
    
                            - Award ID(s):
- 1835780
- PAR ID:
- 10390050
- Editor(s):
- Schneidman-Duhovny, Dina
- Date Published:
- Journal Name:
- PLOS Computational Biology
- Volume:
- 18
- Issue:
- 2
- ISSN:
- 1553-7358
- Page Range / eLocation ID:
- e1009835
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
- 
            
- 
            Kemp, Melissa L. (Ed.)Tissue Forge is an open-source interactive environment for particle-based physics, chemistry and biology modeling and simulation. Tissue Forge allows users to create, simulate and explore models and virtual experiments based on soft condensed matter physics at multiple scales, from the molecular to the multicellular, using a simple, consistent interface. While Tissue Forge is designed to simplify solving problems in complex subcellular, cellular and tissue biophysics, it supports applications ranging from classic molecular dynamics to agent-based multicellular systems with dynamic populations. Tissue Forge users can build and interact with models and simulations in real-time and change simulation details during execution, or execute simulations off-screen and/or remotely in high-performance computing environments. Tissue Forge provides a growing library of built-in model components along with support for user-specified models during the development and application of custom, agent-based models. Tissue Forge includes an extensive Python API for model and simulation specification via Python scripts, an IPython console and a Jupyter Notebook, as well as C and C++ APIs for integrated applications with other software tools. Tissue Forge supports installations on 64-bit Windows, Linux and MacOS systems and is available for local installation via conda.more » « less
- 
            This release makes a new command line interface (CLI) available for CaltechDATA. It allows you to create and edit records in CaltechDATA using the API entirely from the command line. You can create metadata entirely on the command line, or utilize a README template https://github.com/caltechlibrary/caltechdata_api/blob/main/templates/README.md, or use the previously supported json file https://github.com/caltechlibrary/caltechdata_api/blob/main/example.json. As we bring online new storage options online for Caltech, the CLI will be updated and enhanced.more » « less
- 
            In today’s Big Data era, data scientists require modern workflows to quickly analyze large-scale datasets using complex codes to maintain the rate of scientific progress. These scientists often rely on available campus resources or off-the-shelf computational systems for their applications. Unified infrastructure or over-provisioned servers can quickly become bottlenecks for specific tasks, wasting time and resources. Composable infrastructure helps solve these problems by providing users with new ways to increase resource utilization. Composable infrastructure disaggregates a computer’s components – CPU, GPU (accelerators), storage and networking – into fluid pools of resources, but typically relies upon infrastructure engineers to architect individual machines. Infrastructure is either managed with specialized command-line utilities, user interfaces or specification files. These management models are cumbersome and difficult to incorporate into data-science workflows. We developed a high-level software API, Composastructure, which, when integrated into modern workflows, can be used by infrastructure engineers as well as data scientists to reorganize composable resources on demand. Composastructure enables infrastructures to be programmable, secure, persistent and reproducible. Our API composes machines, frees resources, supports multi-rack operations, and includes a Python module for Jupyter Notebooks.more » « less
- 
            The study of neuron morphology requires robust and comprehensive methods to quantify the differences between neurons of different subtypes and animal species. Several software packages have been developed for the analysis of neuron tracing results stored in the standard SWC format. The packages, however, provide relatively simple quantifications and their non-extendable architecture prohibit their use for advanced data analysis and visualization. We developed nGauge, a Python toolkit to support the parsing and analysis of neuron morphology data. As an application programming interface (API), nGauge can be referenced by other popular open-source software to create custom informatics analysis pipelines and advanced visualizations. nGauge defines an extendable data structure that handles volumetric constructions (e.g. soma), in addition to the SWC linear reconstructions, while remaining lightweight. This greatly extends nGauge’s data compatibility.more » « less
 An official website of the United States government
An official website of the United States government 
				
			 
					 
					
 
                                    