Abstract AQME, automated quantum mechanical environments, is a free and open‐source Python package for the rapid deployment of automated workflows using cheminformatics and quantum chemistry. AQME workflows integrate tasks performed across multiple computational chemistry packages and data formats, preserving all computational protocols, data, and metadata for machine and human users to access and reuse. AQME has a modular structure of independent modules that can be implemented in any sequence, allowing the users to use all or only the desired parts of the program. The code has been developed for researchers with basic familiarity with the Python programming language. The CSEARCH module interfaces to molecular mechanics and semi‐empirical QM (SQM) conformer generation tools (e.g., RDKit and Conformer–Rotamer Ensemble Sampling Tool, CREST) starting from various initial structure formats. The CMIN module enables geometry refinement with SQM and neural network potentials, such as ANI. The QPREP module interfaces with multiple QM programs, such as Gaussian, ORCA, and PySCF. The QCORR module processes QM results, storing structural, energetic, and property data while also enabling automated error handling (i.e., convergence errors, wrong number of imaginary frequencies, isomerization, etc.) and job resubmission. The QDESCP module provides easy access to QM ensemble‐averaged molecular descriptors and computed properties, such as NMR spectra. Overall, AQME provides automated, transparent, and reproducible workflows to produce, analyze and archive computational chemistry results. SMILES inputs can be used, and many aspects of tedious human manipulation can be avoided. Installation and execution on Windows, macOS, and Linux platforms have been tested, and the code has been developed to support access through Jupyter Notebooks, the command line, and job submission (e.g., Slurm) scripts. Examples of pre‐configured workflows are available in various formats, and hands‐on video tutorials illustrate their use. This article is categorized under:Data Science > ChemoinformaticsData Science > Computer Algorithms and ProgrammingSoftware > Quantum Chemistry
more »
« less
molli: A General-Purpose Python Toolkit for Combinatorial Small Molecule Library Generation, Manipulation, and Feature Extraction.
The management and analysis of large in silico molecular libraries is pivotal in many areas of modern chemistry. The adoption and success of data-oriented approaches to chemical research is dependent on the ease of handling large collections of in silico molecular structures in a programmatic way. Herein, we introduce the MOLecular LIibrary toolkit, “molli”, which is a Python 3 chemoinformatics module that provides a streamlined interface for manipulating large in silico libraries. Three-dimensional, combinatorial molecule libraries can be expanded directly from two-dimensional chemical structure fragments stored in CDXML files with high stereochemical fidelity. Geometry optimization, property calculation, and conformer generation are executed by interfacing with widely used computational chemistry programs such as OpenBabel, RDKit, ORCA, and xTB/CREST. Conformer-dependent grid-based feature calculators provide numerical representation suitable for diversity analysis, and interface to robust three-dimensional visualization tools provide comprehensive images to enhance human understanding of libraries with thousands of members. The package includes command-line interface in addition to Python classes to streamline frequently used workflows. This work describes the development and implementation of molli 1.0 and highlights the available functionality. Parallel performance is benchmarked on various hardware platforms and common workflows are demonstrated for different tasks ranging from optimized grid-based descriptor calculation on catalyst libraries to NMR prediction workflow from CDXML files.
more »
« less
- Award ID(s):
- 2154237
- PAR ID:
- 10522711
- Publisher / Repository:
- ChemRxiv
- Date Published:
- Format(s):
- Medium: X
- Institution:
- University of Illinois at Urbana-Champaign
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Abstract We introduce a new Python interface for the Cassandra Monte Carlo software, molecular simulation design framework (MoSDeF) Cassandra. MoSDeF Cassandra provides a simplified user interface, offers broader interoperability with other molecular simulation codes, enables the construction of programmatic and reproducible molecular simulation workflows, and builds the infrastructure necessary for high‐throughput Monte Carlo studies. Many of the capabilities of MoSDeF Cassandra are enabled via tight integration with MoSDeF. We discuss the motivation and design of MoSDeF Cassandra and proceed to demonstrate both simple use‐cases and more complex workflows, including adsorption in porous media and a combined molecular dynamics – Monte Carlo workflow for computing lateral diffusivity in graphene slit pores. The examples presented herein demonstrate how even relatively complex simulation workflows can be reduced to, at most, a few files of Python code that can be version‐controlled and shared with other researchers. We believe this paradigm will enable more rapid research advances and represents the future of molecular simulations.more » « less
-
This is the simulation data set for the manuscript: Arvelo DM, Comer J, Schmit J, Garcia R (2024) Interfacial water is separated from a hydrophobic silica surface by a gap of 1.2 nm. ACS Nano 18:18683–18692. https://doi.org/10.1021/acsnano.4c05689 This data set includes all files needed to run and analyze the simulations described in the this manuscript in the molecular dynamics software NAMD, as well as the output of the simulations. LAMMPS input files for the ReaxFF simulations are also included. The files are organized into directories corresponding to the figures of the main text and supporting information. They include molecular model structure files (NAMD psf or LAMMPS data), force field parameter files (in CHARMM format or ReaxFF format), initial atomic coordinates (pdb format), NAMD or LAMMPS configuration files, Colvars configuration files, NAMD or LAMMPS log files, and output including restart files (in binary NAMD format) and trajectories in dcd format (downsampled with a stride of 100 to 20 ns per frame). Analysis is controlled by shell scripts (Bash-compatible) that call VMD Tcl scripts or python scripts. These scripts and their output are also included. Version: 1.0. Figure5AC: Simulation of pentadecane on a 5 chains/nm^2 OTS layer. Figure5B_FigureS7: Calculation of force profile for an SiO2 tip asperity model using adaptive biasing force. Systems: octane with 5 chains/nm^2 OTS, octane with 4 chains/nm^2 OTS, decane with 5 chains/nm^2 OTS, water with 5 chains/nm^2 OTS. FigureS6: Simulations showing the effect of octadecane on the structure of the OTS layer for 3 and 5 chains/nm^2 densities. FigureS8: Calculation of the adsorption free energy of tetracosane (C24) at the OTS–water interface using ABF. FigureS9: Python script for estimating the critical concentration to form an alkane layer at the OTS–water interface using the mean-field Ising model. FigureS10: ReaxFF simulation and modeling to create the silanol-terminated amorphous silica model of an AFM tip asperity. FigureS11: Molecular dynamics simulations showing spontaneous assembly of twelve or twenty-four tetracosane (C24) molecules at the interface between water and the alkyl groups of an OTS-conjugated silica surface.more » « less
-
Quantitative structure–activity relationships (QSARs) have long been used in the environmental sciences. More recently, molecular modeling and chemoinformatic methods have become widespread. These methods have the potential to expand and accelerate advances in environmental chemistry because they complement observational and experimental data with “ in silico ” results and analysis. The opportunities and challenges that arise at the intersection between statistical and theoretical in silico methods are most apparent in the context of properties that determine the environmental fate and effects of chemical contaminants (degradation rate constants, partition coefficients, toxicities, etc. ). The main example of this is the calibration of QSARs using descriptor variable data calculated from molecular modeling, which can make QSARs more useful for predicting property data that are unavailable, but also can make them more powerful tools for diagnosis of fate determining pathways and mechanisms. Emerging opportunities for “ in silico environmental chemical science” are to move beyond the calculation of specific chemical properties using statistical models and toward more fully in silico models, prediction of transformation pathways and products, incorporation of environmental factors into model predictions, integration of databases and predictive models into more comprehensive and efficient tools for exposure assessment, and extending the applicability of all the above from chemicals to biologicals and materials.more » « less
-
Abstract Multicomponent reactions enable the synthesis of large molecular libraries from relatively few inputs. This scalability has led to the broad adoption of these reactions by the pharmaceutical industry. Here, we employ the four-component Ugi reaction to demonstrate that multicomponent reactions can provide a basis for large-scale molecular data storage. Using this combinatorial chemistry we encode more than 1.8 million bits of art historical images, including a Cubist drawing by Picasso. Digital data is written using robotically synthesized libraries of Ugi products, and the files are read back using mass spectrometry. We combine sparse mixture mapping with supervised learning to achieve bit error rates as low as 0.11% for single reads, without library purification. In addition to improved scaling of non-biological molecular data storage, these demonstrations offer an information-centric perspective on the high-throughput synthesis and screening of small-molecule libraries.more » « less
An official website of the United States government

