skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Machine learning enhanced spectroscopic analysis: towards autonomous chemical mixture characterization for rapid process optimization
Autonomous chemical process development and optimization methods use algorithms to explore the operating parameter space based on feedback from experimentally determined exit stream compositions. Measuring the compositions of multicomponent streams is challenging, requiring multiple analytical techniques to differentiate between similar chemical components in the mixture and determine their concentration. Herein, we describe a universal analytical methodology based on multitarget regression machine learning (ML) models to rapidly determine chemical mixtures' compositions from Fourier transform infrared (FTIR) absorption spectra. Specifically, we used simulated FTIR spectra for up to 6 components in water and tested seven different ML algorithms to develop the methodology. All algorithms resulted in regression models with mean absolute errors (MAE) between 0–0.27 wt%. We validated the methodology with experimental data obtained on mixtures prepared using a network of programmable pumps in line with an FTIR transmission flow cell. ML models were trained using experimental data and evaluated for mixtures of up to 4-components with similar chemical structures, including alcohols ( i.e. , glycerol, isopropanol, and 1-butanol) and nitriles ( i.e. , acrylonitrile, adiponitrile, and propionitrile). Linear regression models predicted concentrations with coefficients of determination, R 2 , between 0.955 and 0.986, while artificial neural network models showed a slightly lower accuracy, with R 2 between 0.854 and 0.977. These R 2 correspond to MAEs of 0.28–0.52 wt% for mixtures with component concentrations between 4–10 wt%. Thus, we demonstrate that ML models can accurately determine the compositions of multicomponent mixtures of similar species, enhancing spectroscopic chemical quantification for use in autonomous, fast process development and optimization.  more » « less
Award ID(s):
1943972
PAR ID:
10319787
Author(s) / Creator(s):
; ; ;
Date Published:
Journal Name:
Digital Discovery
Volume:
1
Issue:
1
ISSN:
2635-098X
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Fugacity is a fundamental thermodynamical property of gas and gas mixtures to determine their behavior and dynamics in complex systems. Fugacity can be deduced experimentally from the measurements of volume as a function of pressure at constant temperature or calculated iteratively using analytical equations of states (EOS). Experimental measurement is time-consuming, and analytical models based on EOS are computationally demanding, especially when an approximate but quick estimation is desired. In this work, machine learning (ML) is employed as a viable alternative to analytical EOSs for quick and accurate approximation of CO2 fugacity coefficients. Five different ML algorithms are used to estimate the fugacity coefficients of pure CO2 as a function of pressure (≤ 2000 bar) and temperature (≤ 1000 °C). A combination of experimental and pseudo-experimental (obtained from an analytical EOS) data of CO2 fugacity coefficients is used to train, validate, and test the models. The best results were found using the Extreme Gradient Boosting algorithm, which showed a mean square error of only 0.0002 in the validation data and an average deviation of only 1.3% in the test data (pure prediction). To quantify the effectiveness of the machine learning techniques, results from the best-performing model are compared with two state-of-the-art analytical models. The ML model with significantly less computational complexity showed similar accuracy to the analytical models. The estimated fugacity data are then used to compute the CO2 solubility in aqueous NaCl solution of different concentrations, and a maximum deviation of only 3.2% from the experimental data is observed. 
    more » « less
  2. We introduce a method for solving the “inverse” phase equilibria problem: How should the interactions among a collection of molecular species be designed in order to achieve a target phase diagram? Using techniques from convex optimization theory, we show how to solve this problem for phase diagrams containing a large number of components and many coexisting phases with prescribed compositions. We apply our approach to commonly used mean-field models of multicomponent fluids and then use molecular simulations to verify that the designed interactions result in the target phase diagrams. Our approach enables the rational design of “programmable” fluids, such as biopolymer and colloidal mixtures, with complex phase behavior. 
    more » « less
  3. Multicomponent refractory alloys have the potential to operate in high-temperature environments. Alloys with heterogeneous/composite microstructure exhibit an optimal combination of high strength and ductility. The present work generates designed compositions using high-throughput computational and machine-learning (ML) models based on elements Mo-Nb-Ti-V-W-Zr manufactured utilizing vacuum arc melting. The experimentally observed phases were consistent with CALPHAD and Scheil simulations. ML models were used to predict the room temperature mechanical properties of the alloy and were validated with experimental mechanical data obtained from the three-point bending and compression tests. This work collectively showcases a data-driven, inverse design methodology that can effectively identify new promising multicomponent refractory alloys. 
    more » « less
  4. The Flory–Huggins theory describes the phase separation of solutions containing polymers. Although it finds widespread application from polymer physics to materials science to biology, the concentrations that coexist in separate phases at equilibrium have not been determined analytically, and numerical techniques are required that restrict the theory’s ease of application. In this work, we derive an implicit analytical solution to the Flory–Huggins theory of one polymer in a solvent by applying a procedure that we call the implicit substitution method. While the solutions are implicit and in the form of composite variables, they can be mapped explicitly to a phase diagram in composition space. We apply the same formalism to multicomponent polymeric systems, where we find analytical solutions for polydisperse mixtures of polymers of one type. Finally, while complete analytical solutions are not possible for arbitrary mixtures, we propose computationally efficient strategies to map out coexistence curves for systems with many components of different polymer types. 
    more » « less
  5. Abstract Preharvest yield estimates can be used for harvest planning, marketing, and prescribing in‐season fertilizer and pesticide applications. One approach that is being widely tested is the use of machine learning (ML) or artificial intelligence (AI) algorithms to estimate yields. However, one barrier to the adoption of this approach is that ML/AI algorithms behave as a black block. An alternative approach is to create an algorithm using Bayesian statistics. In Bayesian statistics, prior information is used to help create the algorithm. However, algorithms based on Bayesian statistics are not often computationally efficient. The objective of the current study was to compare the accuracy and computational efficiency of four Bayesian models that used different assumptions to reduce the execution time. In this paper, the Bayesian multiple linear regression (BLR), Bayesian spatial, Bayesian skewed spatial regression, and the Bayesian nearest neighbor Gaussian process (NNGP) models were compared with ML non‐Bayesian random forest model. In this analysis, soybean (Glycine max) yields were the response variable (y), and spaced‐based blue, green, red, and near‐infrared reflectance that was measured with the PlanetScope satellite were the predictor (x). Among the models tested, the Bayesian (NNGP;R2‐testing = 0.485) model, which captures the short‐range correlation, outperformed the (BLR;R2‐testing = 0.02), Bayesian spatial regression (SRM;R2‐testing = 0.087), and Bayesian skewed spatial regression (sSRM;R2‐testing = 0.236) models. However, associated with improved accuracy was an increase in run time from 534 s for the BLR model to 2047 s for the NNGP model. These data show that relatively accurate within‐field yield estimates can be obtained without sacrificing computational efficiency and that the coefficients have biological meaning. However, all Bayesian models had lowerR2values and higher execution times than the random forest model. 
    more » « less