skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: A benchmark dataset for Hydrogen Combustion
Abstract The generation of reference data for deep learning models is challenging for reactive systems, and more so for combustion reactions due to the extreme conditions that create radical species and alternative spin states during the combustion process. Here, we extend intrinsic reaction coordinate (IRC) calculations withab initioMD simulations and normal mode displacement calculations to more extensively cover the potential energy surface for 19 reaction channels for hydrogen combustion. A total of ∼290,000 potential energies and ∼1,270,000 nuclear force vectors are evaluated with a high quality range-separated hybrid density functional,ωB97X-V, to construct the reference data set, including transition state ensembles, for the deep learning models to study hydrogen combustion reaction.  more » « less
Award ID(s):
1955643
PAR ID:
10367440
Author(s) / Creator(s):
; ; ; ; ; ; ; ; ; ; ; ;
Publisher / Repository:
Nature Publishing Group
Date Published:
Journal Name:
Scientific Data
Volume:
9
Issue:
1
ISSN:
2052-4463
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. We report a new deep learning message passing network that takes inspiration from Newton's equations of motion to learn interatomic potentials and forces. With the advantage of directional information from trainable force vectors, and physics-infused operators that are inspired by Newtonian physics, the entire model remains rotationally equivariant, and many-body interactions are inferred by more interpretable physical features. We test NewtonNet on the prediction of several reactive and non-reactive high quality ab initio data sets including single small molecules, a large set of chemically diverse molecules, and methane and hydrogen combustion reactions, achieving state-of-the-art test performance on energies and forces with far greater data and computational efficiency than other deep learning models. 
    more » « less
  2. Abstract The hitherto elusive monobridged Ge(μ‐H)GeH (X1A′) molecule was prepared in the gas phase by bimolecular reaction of atomic germanium with germane (GeH4). Electronic structure calculations revealed that this reaction commenced on the triplet surface with the formation of a van der Waals complex, followed by insertion of germanium into a germanium‐hydrogen bond over a submerged barrier to form the triplet digermanylidene intermediate (HGeGeH3); the latter underwent intersystem crossing from the triplet to the singlet surface. On the singlet surface, HGeGeH3predominantly isomerized through two successive hydrogen shifts prior to unimolecular decomposition to Ge(μ‐H)GeH isomer, which is in equilibrium with the vinylidene‐type (H2GeGe) and dibridged (Ge(μ‐H2)Ge) isomers. This reaction leads to the formation of cyclic dinuclear germanium molecules, which do not exist on the isovalent C2H2surface, thus deepening our understanding of the role of nonadiabatic reaction dynamics in preparing nonclassical, hydrogen‐bridged isomers carrying main group XIV elements. 
    more » « less
  3. Abstract Hybrid Knowledge‐Guided Machine Learning (KGML) models, which are deep learning models that utilize scientific theory and process‐based model simulations, have shown improved performance over their process‐based counterparts for the simulation of water temperature and hydrodynamics. We highlight the modular compositional learning (MCL) methodology as a novel design choice for the development of hybrid KGML models in which the model is decomposed into modular sub‐components that can be process‐based models and/or deep learning models. We develop a hybrid MCL model that integrates a deep learning model into a modularized, process‐based model. To achieve this, we first train individual deep learning models with the output of the process‐based models. In a second step, we fine‐tune one deep learning model with observed field data. In this study, we replaced process‐based calculations of vertical diffusive transport with deep learning. Finally, this fine‐tuned deep learning model is integrated into the process‐based model, creating the hybrid MCL model with improved overall projections for water temperature dynamics compared to the original process‐based model. We further compare the performance of the hybrid MCL model with the process‐based model and two alternative deep learning models and highlight how the hybrid MCL model has the best performance for projecting water temperature, Schmidt stability, buoyancy frequency, and depths of different isotherms. Modular compositional learning can be applied to existing modularized, process‐based model structures to make the projections more robust and improve model performance by letting deep learning estimate uncertain process calculations. 
    more » « less
  4. Abstract Structured RNA lies at the heart of many central biological processes, from gene expression to catalysis. RNA structure prediction is not yet possible due to a lack of high-quality reference data associated with organismal phenotypes that could inform RNA function. We present GARNET (Gtdb Acquired RNa with Environmental Temperatures), a new database for RNA structural and functional analysis anchored to the Genome Taxonomy Database (GTDB). GARNET links RNA sequences to experimental and predicted optimal growth temperatures of GTDB reference organisms. Using GARNET, we develop sequence- and structure-aware RNA generative models, with overlapping triplet tokenization providing optimal encoding for a GPT-like model. Leveraging hyperthermophilic RNAs in GARNET and these RNA generative models, we identify mutations in ribosomal RNA that confer increased thermostability to theEscherichia coliribosome. The GTDB-derived data and deep learning models presented here provide a foundation for understanding the connections between RNA sequence, structure, and function. 
    more » « less
  5. Abstract Objective. UNet-based deep-learning (DL) architectures are promising dose engines for traditional linear accelerator (Linac) models. Current UNet-based engines, however, were designed differently with various strategies, making it challenging to fairly compare the results from different studies. The objective of this study is to thoroughly evaluate the performance of UNet-based models on magnetic-resonance (MR)-Linac-based intensity-modulated radiation therapy (IMRT) dose calculations.Approach. The UNet-based models, including the standard-UNet, cascaded-UNet, dense-dilated-UNet, residual-UNet, HD-UNet, and attention-aware-UNet, were implemented. The model input is patient CT and IMRT field dose in water, and the output is patient dose calculated by DL model. The reference dose was calculated by the Monaco Monte Carlo module. Twenty training and ten test cases of prostate patients were included. The accuracy of the DL-calculated doses was measured using gamma analysis, and the calculation efficiency was evaluated by inference time.Results. All the studied models effectively corrected low-accuracy doses in water to high-accuracy patient doses in a magnetic field. The gamma passing rates between reference and DL-calculated doses were over 86% (1%/1 mm), 98% (2%/2 mm), and 99% (3%/3 mm) for all the models. The inference times ranged from 0.03 (graphics processing unit) to 7.5 (central processing unit) seconds. Each model demonstrated different strengths in calculation accuracy and efficiency; Res-UNet achieved the highest accuracy, HD-UNet offered high accuracy with the fewest parameters but the longest inference, dense-dilated-UNet was consistently accurate regardless of model levels, standard-UNet had the shortest inference but relatively lower accuracy, and the others showed average performance. Therefore, the best-performing model would depend on the specific clinical needs and available computational resources.Significance. The feasibility of using common UNet-based models for MR-Linac-based dose calculations has been explored in this study. By using the same model input type, patient training data, and computing environment, a fair assessment of the models’ performance was present. 
    more » « less