skip to main content


Title: The MolSSI QCA rchive project: An open‐source platform to compute, organize, and share quantum chemistry data
Abstract

The Molecular Sciences Software Institute's (MolSSI) Quantum Chemistry Archive (QCArchive) project is an umbrella name that covers both a central server hosted by MolSSI for community data and the Python‐based software infrastructure that powers automated computation and storage of quantum chemistry (QC) results. The MolSSI‐hosted central server provides the computational molecular sciences community a location to freely access tens of millions of QC computations for machine learning, methodology assessment, force‐field fitting, and more through a Python interface. Facile, user‐friendly mining of the centrally archived quantum chemical data also can be achieved through web applications found athttps://qcarchive.molssi.org. The software infrastructure can be used as a standalone platform to compute, structure, and distribute hundreds of millions of QC computations for individuals or groups of researchers at any scale. The QCArchiveInfrastructureis open‐source (BSD‐3C), code repositories can be found athttps://github.com/MolSSI, and releases can be downloaded via PyPI and Conda.

This article is categorized under:

Electronic Structure Theory > Ab Initio Electronic Structure Methods

Software > Quantum Chemistry

Data Science > Computer Algorithms and Programming

 
more » « less
NSF-PAR ID:
10360762
Author(s) / Creator(s):
 ;  ;  ;  ;  ;  ;  ;  ;  
Publisher / Repository:
Wiley Blackwell (John Wiley & Sons)
Date Published:
Journal Name:
WIREs Computational Molecular Science
Volume:
11
Issue:
2
ISSN:
1759-0876
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract

    As the tools of computational quantum chemistry have continued to mature, larger and more complex molecular systems have become amenable to computational study. However, studies of these complex systems often require the execution of enormous numbers of computations, which can be a tedious and error‐prone process if done manually. We have developed a suite of free, open‐source tools to facilitate the automation of quantum chemistry workflows. These tools are collected under the organization QChASM (Quantum Chemistry Automation and Structure Manipulation) and include functionality for building and manipulating complex molecular structures and performing routine tasks (AaronTools), a toolkit for automating TS optimizations and predictions of the outcomes of selective homogeneous catalytic reactions, and a plug‐in for UCSF ChimeraX that provides a graphical interface for building complex molecular structures and representing output from quantum chemistry computations. These tools are described below, with a focus on the recent Python implementation of AaronTools.

    This article is categorized under:

    Structure and Mechanism > Reaction Mechanisms and Catalysis

    Software > Quantum Chemistry

     
    more » « less
  2. Abstract

    In silico materials design is hampered by the computational complexity of Kohn–Sham DFT, which scales cubically with the system size. Owing to the development of new‐generation kinetic energy density functionals (KEDFs), orbital‐free DFT (OFDFT) can now be successfully applied to a large class of semiconductors and such finite systems as quantum dots and metal clusters. In this work, we present DFTpy, an open‐source software implementing OFDFT written entirely in Python 3 and outsourcing the computationally expensive operations to third‐party modules, such as NumPy and SciPy. When fast simulations are in order, DFTpy exploits the fast Fourier transforms from PyFFTW. New‐generation, nonlocal and density‐dependent‐kernel KEDFs are made computationally efficient by employing linear splines and other methods for fast kernel builds. We showcase DFTpy by solving for the electronic structure of a million‐atom system of aluminum metal which was computed on a single CPU. The Python 3 implementation is object‐oriented, opening the door to easy implementation of new features. As an example, we present a time‐dependent OFDFT implementation (hydrodynamic DFT) which we use to compute the spectra of small metal clusters recovering qualitatively the time‐dependent Kohn–Sham DFT result. The Python codebase allows for easy implementation of application programming interfaces. We showcase the combination of DFTpy and ASE for molecular dynamics simulations of liquid metals. DFTpy is released under the MIT license.

    This article is categorized under:

    Software > Quantum Chemistry

    Electronic Structure Theory > Density Functional Theory

    Data Science > Computer Algorithms and Programming

     
    more » « less
  3. Summary

    This work revisits a publication by Beanet al.(2018) that reports seven amino acid substitutions are essential for the evolution ofl‐DOPA 4,5‐dioxygenase (DODA) activity in Caryophyllales. In this study, we explore several concerns which led us to replicate the analyses of Beanet al.(2018).

    Our comparative analyses, with structural modelling, implicate numerous residues additional to those identified by Beanet al.(2018), with many of these additional residues occurring around the active site of BvDODAα1. We therefore replicated the analyses of Beanet al.(2018) to re‐observe the effect of their original seven residue substitutions in a BvDODAα2 background, that is the BvDODAα2‐mut3 variant.

    Multiplein vivoassays, in bothSaccharomyces cerevisiaeandNicotiana benthamiana, did not result in visible DODA activity in BvDODAα2‐mut3, with betalain production always 10‐fold below BvDODAα1.In vitroassays also revealed substantial differences in both catalytic activity and pH optima between BvDODAα1, BvDODAα2 and BvDODAα2‐mut3 proteins, explaining their differing performancein vivo.

    In summary, we were unable to replicate thein vivoanalyses of Beanet al.(2018), and our quantitativein vivoandin vitroanalyses suggest a minimal effect of these seven residues in altering catalytic activity of BvDODAα2. We conclude that the evolutionary pathway to high DODA activity is substantially more complex than implied by Beanet al.(2018).

     
    more » « less
  4. Abstract

    Environmental DNA (eDNA) metabarcoding is a promising method to monitor species and community diversity that is rapid, affordable and non‐invasive. The longstanding needs of the eDNA community are modular informatics tools, comprehensive and customizable reference databases, flexibility across high‐throughput sequencing platforms, fast multilocus metabarcode processing and accurate taxonomic assignment. Improvements in bioinformatics tools make addressing each of these demands within a single toolkit a reality.

    The new modular metabarcode sequence toolkitAnacapa(https://github.com/limey-bean/Anacapa/) addresses the above needs, allowing users to build comprehensive reference databases and assign taxonomy to raw multilocus metabarcode sequence data. A novel aspect ofAnacapais its database building module, “Creating Reference libraries Using eXisting tools” (CRUX), which generates comprehensive reference databases for specific user‐defined metabarcoding loci. TheQuality Control and ASV Parsingmodule sorts and processes multiple metabarcoding loci and processes merged, unmerged and unpaired reads maximizing recovered diversity.DADA2then detects amplicon sequence variants (ASVs) and theAnacapa Classifiermodule aligns these ASVs toCRUX‐generated reference databases usingBowtie2. Lastly, taxonomy is assigned to ASVs with confidence scores using a Bayesian Lowest Common Ancestor (BLCA) method. TheAnacapa Toolkitalso includes anrpackage,ranacapa, for automated results exploration through standard biodiversity statistical analysis.

    Benchmarking tests verify that theAnacapa Toolkiteffectively and efficiently generates comprehensive reference databases that capture taxonomic diversity, and can assign taxonomy to both MiSeq and HiSeq‐length sequence data. We demonstrate the value of theAnacapa Toolkitin assigning taxonomy to seawater eDNA samples collected in southern California.

    TheAnacapa Toolkitimproves the functionality of eDNA and streamlines biodiversity assessment and management by generating metabarcode specific databases, processing multilocus data, retaining a larger proportion of sequencing reads and expanding non‐traditional eDNA targets. All the components of theAnacapa Toolkitare open and available in a virtual container to ease installation.

     
    more » « less
  5. Abstract

    New X‐ray crystallography and cryo‐electron microscopy (cryo‐EM) approaches yield vast amounts of structural data from dynamic proteins and their complexes. Modeling the full conformational ensemble can provide important biological insights, but identifying and modeling an internally consistent set of alternate conformations remains a formidable challenge. qFit efficiently automates this process by generating a parsimonious multiconformer model. We refactored qFit from a distributed application into software that runs efficiently on a small server, desktop, or laptop. We describe the new qFit 3 software and provide some examples. qFit 3 is open‐source under the MIT license, and is available athttps://github.com/ExcitedStates/qfit-3.0.

     
    more » « less