skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: A Fast, Low-Cost and Simple Method for Predicting Atomic/Inter-Atomic Properties by Combining a Low Dimensional Deep Learning Model with a Fragment Based Graph Convolutional Network
Calculations with high accuracy for atomic and inter-atomic properties, such as nuclear magnetic resonance (NMR) spectroscopy and bond dissociation energies (BDEs) are valuable for pharmaceutical molecule structural analysis, drug exploration, and screening. It is important that these calculations should include relativistic effects, which are computationally expensive to treat. Non-relativistic calculations are less expensive but their results are less accurate. In this study, we present a computational framework for predicting atomic and inter-atomic properties by using machine-learning in a non-relativistic but accurate and computationally inexpensive framework. The accurate atomic and inter-atomic properties are obtained with a low dimensional deep neural network (DNN) embedded in a fragment-based graph convolutional neural network (F-GCN). The F-GCN acts as an atomic fingerprint generator that converts the atomistic local environments into data for the DNN, which improves the learning ability, resulting in accurate results as compared to experiments. Using this framework, the 13C/1H NMR chemical shifts of Nevirapine and phenol O–H BDEs are predicted to be in good agreement with experimental measurement.  more » « less
Award ID(s):
2102317
PAR ID:
10500364
Author(s) / Creator(s):
; ; ; ;
Publisher / Repository:
MDPI
Date Published:
Journal Name:
Crystals
Volume:
12
Issue:
12
ISSN:
2073-4352
Page Range / eLocation ID:
1740
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. UV absorption is widely used for characterizing proteins structures. The mapping of UV spectra to atomic structure of proteins relies on expensive theoretical simulations, circumventing the heavy computational cost which involves repeated quantum-mechanical simulations of excited-state properties of many fluctuating protein geometries, which has been a long-time challenge. Here we show that a neural network machine-learning technique can predict electronic absorption spectra of N -methylacetamide (NMA), which is a widely used model system for the peptide bond. Using ground-state geometric parameters and charge information as descriptors, we employed a neural network to predict transition energies, ground-state, and transition dipole moments of many molecular-dynamics conformations at different temperatures, in agreement with time-dependent density-functional theory calculations. The neural network simulations are nearly 3,000× faster than comparable quantum calculations. Machine learning should provide a cost-effective tool for simulating optical properties of proteins. 
    more » « less
  2. Abstract High Entropy Alloys (HEAs) are composed of more than one principal element and constitute a major paradigm in metals research. The HEA space is vast and an exhaustive exploration is improbable. Therefore, a thorough estimation of the phases present in the HEA is of paramount importance for alloy design. Machine Learning presents a feasible and non-expensive method for predicting possible new HEAs on-the-fly. A deep neural network (DNN) model for the elemental system of: Mn, Ni, Fe, Al, Cr, Nb, and Co is developed using a dataset generated by high-throughput computational thermodynamic calculations using Thermo-Calc. The features list used for the neural network is developed based on literature and freely available databases. A feature significance analysis matches the reported HEAs phase constitution trends on elemental properties and further expands it by providing so far-overlooked features. The final regressor has a coefficient of determination ( r 2 ) greater than 0.96 for identifying the most recurrent phases and the functionality is tested by running optimization tasks that simulate those required in alloy design. The DNN developed constitutes an example of an emulator that can be used in fast, real-time materials discovery/design tasks. 
    more » « less
  3. Abstract Computational modeling of chemical and biological systems at atomic resolution is a crucial tool in the chemist’s toolset. The use of computer simulations requires a balance between cost and accuracy: quantum-mechanical methods provide high accuracy but are computationally expensive and scale poorly to large systems, while classical force fields are cheap and scalable, but lack transferability to new systems. Machine learning can be used to achieve the best of both approaches. Here we train a general-purpose neural network potential (ANI-1ccx) that approaches CCSD(T)/CBS accuracy on benchmarks for reaction thermochemistry, isomerization, and drug-like molecular torsions. This is achieved by training a network to DFT data then using transfer learning techniques to retrain on a dataset of gold standard QM calculations (CCSD(T)/CBS) that optimally spans chemical space. The resulting potential is broadly applicable to materials science, biology, and chemistry, and billions of times faster than CCSD(T)/CBS calculations. 
    more » « less
  4. Abstract BackgroundThe recent development of high-throughput sequencing has created a large collection of multi-omics data, which enables researchers to better investigate cancer molecular profiles and cancer taxonomy based on molecular subtypes. Integrating multi-omics data has been proven to be effective for building more precise classification models. Most current multi-omics integrative models use either an early fusion in the form of concatenation or late fusion with a separate feature extractor for each omic, which are mainly based on deep neural networks. Due to the nature of biological systems, graphs are a better structural representation of bio-medical data. Although few graph neural network (GNN) based multi-omics integrative methods have been proposed, they suffer from three common disadvantages. One is most of them use only one type of connection, either inter-omics or intra-omic connection; second, they only consider one kind of GNN layer, either graph convolution network (GCN) or graph attention network (GAT); and third, most of these methods have not been tested on a more complex classification task, such as cancer molecular subtypes. ResultsIn this study, we propose a novel end-to-end multi-omics GNN framework for accurate and robust cancer subtype classification. The proposed model utilizes multi-omics data in the form of heterogeneous multi-layer graphs, which combine both inter-omics and intra-omic connections from established biological knowledge. The proposed model incorporates learned graph features and global genome features for accurate classification. We tested the proposed model on the Cancer Genome Atlas (TCGA) Pan-cancer dataset and TCGA breast invasive carcinoma (BRCA) dataset for molecular subtype and cancer subtype classification, respectively. The proposed model shows superior performance compared to four current state-of-the-art baseline models in terms of accuracy, F1 score, precision, and recall. The comparative analysis of GAT-based models and GCN-based models reveals that GAT-based models are preferred for smaller graphs with less information and GCN-based models are preferred for larger graphs with extra information. 
    more » « less
  5. An ultra-low-power gesture and gait classification SoC is presented for rehabilitation application featuring (1) mixed-signal feature extraction and integrated low-noise amplifier eliminating expensive ADC and digital feature extraction, (2) an integrated distributed deep neural network (DNN) ASIC supporting a scalable multi-chip neural network for sensor fusion with distortion resiliency for low-cost front end modules, (3) onchip learning of DNN engine allowing in-situ training of user specific operations. A 12-channel 65nm CMOS test chip was fabricated with 1μW power per channel, less than 3ms computation latency, on-chip training for user-specific DNN model and multi-chip networking capability. 
    more » « less