skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Retrospective on a decade of machine learning for chemical discovery
Over the last decade, we have witnessed the emergence of ever more machine learning applications in all aspects of the chemical sciences. Here, we highlight specific achievements of machine learning models in the field of computational chemistry by considering selected studies of electronic structure, interatomic potentials, and chemical compound space in chronological order  more » « less
Award ID(s):
1856165
PAR ID:
10220446
Author(s) / Creator(s):
;
Date Published:
Journal Name:
Nature communications
Issue:
11
ISSN:
2041-1723
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. We report a comprehensive computational study of unsupervised machine learning for extraction of chemically relevant information in X-ray absorption near edge structure (XANES) and in valence-to-core X-ray emission spectra (VtC-XES) for classification of a broad ensemble of sulphorganic molecules. By progressively decreasing the constraining assumptions of the unsupervised machine learning algorithm, moving from principal component analysis (PCA) to a variational autoencoder (VAE) to t-distributed stochastic neighbour embedding (t-SNE), we find improved sensitivity to steadily more refined chemical information. Surprisingly, when embedding the ensemble of spectra in merely two dimensions, t-SNE distinguishes not just oxidation state and general sulphur bonding environment but also the aromaticity of the bonding radical group with 87% accuracy as well as identifying even finer details in electronic structure within aromatic or aliphatic sub-classes. We find that the chemical information in XANES and VtC-XES is very similar in character and content, although they unexpectedly have different sensitivity within a given molecular class. We also discuss likely benefits from further effort with unsupervised machine learning and from the interplay between supervised and unsupervised machine learning for X-ray spectroscopies. Our overall results, i.e. , the ability to reliably classify without user bias and to discover unexpected chemical signatures for XANES and VtC-XES, likely generalize to other systems as well as to other one-dimensional chemical spectroscopies. 
    more » « less
  2. Abstract Process control and optimization have been widely used to solve decision-making problems in chemical engineering applications. However, identifying and tuning the best solution algorithm is challenging and time-consuming. Machine learning tools can be used to automate these steps by learning the behavior of a numerical solver from data. In this paper, we discuss recent advances in (i) the representation of decision-making problems for machine learning tasks, (ii) algorithm selection, and (iii) algorithm configuration for monolithic and decomposition-based algorithms. Finally, we discuss open problems related to the application of machine learning for accelerating process optimization and control. 
    more » « less
  3. The accurate prediction of suitable chiral stationary phases (CSPs) for resolving the enantiomers of a given compound poses a significant challenge in chiral chromatography. Previous attempts at developing machine learning models for structure-based CSP prediction have primarily relied on 1D SMILES strings\footnote{The simplified molecular-input line-entry system (SMILES) is a specification in the form of a line notation for describing the structure of chemical species using short ASCII strings.} or 2D graphical representations of molecular structures, and have met with only limited success. In this study, we apply the recently developed 3D molecular conformation representation learning algorithm, which uses rapid conformational analysis and point clouds of atom positions in 3D space, enabling efficient chemical structure-based machine learning. By harnessing the power of the rapid 3D molecular representation learning and a dataset comprising over 300,000 chromatographic enantioseparation records sourced from the literature, our models afford notable improvements for the chemical structure-based choice of appropriate CSP for enantioseparation, paving the way for more efficient and informed decision-making in the field of chiral chromatography. 
    more » « less
  4. The accurate detection of chemical agents promotes many national security and public safety goals, and robust chemical detection methods can prevent disasters and support effective response to incidents. Mass spectrometry is an important tool in detecting and identifying chemical agents. However, there are high costs and logistical challenges associated with acquiring sufficient lab-generated mass spectrometry data for training machine learning algorithms, including skilled personnel, sample preparation and analysis required for data generation. These high costs of mass spectrometry data collection hinder the development of machine learning and deep learning models to detect and identify chemical agents. Accordingly, the primary objective of our research is to create a mass spectrometry data generation model whose output (synthetic mass spectrometry data) would enhance the performance of downstream machine learning chemical classification models. Such a synthetic data generation model would reduce the need to generate costly real-world data, and provide additional training data to use in combination with lab-generated mass spectrometry data when training classifiers. Our approach is a novel combination of autoencoder-based synthetic data generation combined with a fixed, apriori defined hidden layer geometry. In particular, we train pairs of encoders and decoders with an additional loss term that enforces that the hidden layer passed from the encoder to the decoder match the embedding provided by an external deep learning model designed to predict functional properties of chemicals. We have verified that incorporating our synthetic spectra into a lab-generated dataset enhances the performance of classification algorithms compared to using only the real data. Our synthetic spectra have been successfully matched to lab-generated spectra for their respective chemicals using library matching software, further demonstrating the validity of our work. 
    more » « less
  5. The successful recent application of machine learning methods to scientific problems includes the learning of flexible and accurate atomic-level force-fields for materials and biomolecules from quantum chemical data. In parallel, the machine learning of force-fields at coarser resolutions is rapidly gaining relevance as an efficient way to represent the higherbody interactions needed in coarse-grained force-fields to compensate for the omitted degrees of freedom. Coarsegrained models are important for the study of systems at time and length scales exceeding those of atomistic simulations. However, the development of transferable coarse-grained models via machine learning still presents significant challenges. Here, we discuss recent developments in this field and current efforts to address the remaining challenges. 
    more » « less