skip to main content


This content will become publicly available on April 17, 2025

Title: Predictive machine learning models trained on experimental datasets for electrochemical nitrogen reduction
Obtaining useful insights from machine learning models trained on experimental datasets collected across different groups to improve the sustainability of chemical processes can be challenging due to the small size and heterogeneity of the dataset. Here we show that shallow learning models such as decision trees and random forest algorithms can be an effective tool for guiding experimental research in the sustainable chemistry field. This study trained four different machine learning algorithms (linear regression, decision tree, random forest, and multilayer perceptron) using different sized datasets containing up to 520 unique reaction conditions for the nitrogen reduction reaction (NRR) on heterogeneous electrocatalysts. Using the catalyst properties and experimental conditions as the features, we determined the ability of each model to regress the ammonia production rate and the faradaic efficiency. We observed that the shallow learning decision tree and random forest models had equal or better predictive power compared to the deep learning multilayer perceptron models and the simple linear regression models. Moreover, decision tree and random forest models enable the extraction of feature importance, which is a powerful tool in guiding experimental research. Analysis of the models showed the complex interaction between the applied potential and catalysts on the effective rate for the NRR. We also suggest some underexplored catalysts–electrolyte combinations to experimental researchers looking to improve both the rate and efficiency of the NRR reaction.  more » « less
Award ID(s):
1922649
PAR ID:
10536949
Author(s) / Creator(s):
; ;
Publisher / Repository:
Royal Society of Chemistry
Date Published:
Journal Name:
Digital Discovery
Volume:
3
Issue:
4
ISSN:
2635-098X
Page Range / eLocation ID:
667 to 673
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Machine Learning models have the ability to streamline the process by which Youtube video comments are filtered between legitimate comments (ham) and spam. In order to integrate machine learning models into regular usage on media-sharing platforms, recent approaches have aimed to develop models trained on Youtube comments, which have emerged as valuable tools for the classification and have enabled the identification of spam content and enhancing user experience. In this paper, eight machine learning approaches are applied to spam detection for YouTube comments. The eight machine learning models include Gaussian Naive Bayes, logistic regression, K-nearest neighbors (KNN) classifier, multi-layer perceptron (MLP), support vector machine (SVM) classifier, random forest classifier, decision tree classifier, and voting classifier. All eight models perform very well, specifically random forest approach can achieve almost perfect performance with average precision of 100% and AUC-ROC of 0.9841. The computational complexity of the eight machine learning approaches are compared. 
    more » « less
  2. Abstract

    Electrochemistry of surface‐bound molecules is of high importance for numerous electronic and sensor applications. Extracting the electron transfer rate is beneficial for understanding surface‐bound processes, but it requires experimental or computational rigor. We evaluate methods to determine electron transfer rates from large voltammetry sets from experiments via machine learning using decision tree ensembles, neural networks, and gaussian process regression models. We applied these to reproduce computational measures of electron transfer rates modeled by first principles. The best machine learning models were a random forest with 80 decision trees and a neural network with Bayesian regularization, producing root mean squared errors of 0.37 and 0.49 s−1, respectively, corresponding to mean percent errors of 0.38 % and 0.52 %, respectively. This work establishes machine learning methods for rapidly acquiring electron transfer rates across large datasets for widespread applications.

     
    more » « less
  3. The objective of this study is to develop data-driven predictive models for seismic energy dissipation of rocking shallow foundations during earthquake loading using decision tree-based ensemble machine learning algorithms and supervised learning technique. Data from a rocking foundation’s database consisting of dynamic base shaking experiments conducted on centrifuges and shaking tables have been used for the development of a base decision tree regression (DTR) model and four ensemble models: bagging, random forest, adaptive boosting, and gradient boosting. Based on k-fold cross-validation tests of models and mean absolute percentage errors in predictions, it is found that the overall average accuracy of all four ensemble models is improved by about 25%–37% when compared to base DTR model. Among the four ensemble models, gradient boosting and adaptive boosting models perform better than the other two models in terms of accuracy and variance in predictions for the problem considered. 
    more » « less
  4. Compressional velocity (Vp) and bulk density (ρb) logs are essential for characterizing gas hydrates and near-seafloor sediments; however, it is sometimes difficult to acquire these logs due to poor borehole conditions, safety concerns, or cost-related issues. We present a machine learning approach to predict either compressional Vp or ρb logs with high accuracy and low error in near-seafloor sediments within water-saturated intervals, in intervals where hydrate fills fractures, and intervals where hydrate occupies the primary pore space. We use scientific-quality logging-while-drilling well logs, gamma ray, ρb, Vp, and resistivity to train the machine learning model to predict Vp or ρb logs. Of the six machine learning algorithms tested (multilinear regression, polynomial regression, polynomial regression with ridge regularization, K nearest neighbors, random forest, and multilayer perceptron), we find that the random forest and K nearest neighbors algorithms are best suited to predicting Vp and ρb logs based on coefficients of determination (R2) greater than 70% and mean absolute percentage errors less than 4%. Given the high accuracy and low error results for Vp and ρb prediction in both hydrate and water-saturated sediments, we argue that our model can be applied in most LWD wells to predict Vp or ρb logs in near-seafloor siliciclastic sediments on continental slopes irrespective of the presence or absence of gas hydrate.

     
    more » « less
  5. GPS spoofing attacks are a severe threat to unmanned aerial vehicles. These attacks manipulate the true state of the unmanned aerial vehicles, potentially misleading the system without raising alarms. Several techniques, including machine learning, have been proposed to detect these attacks. Most of the studies applied machine learning models without identifying the best hyperparameters, using feature selection and importance techniques, and ensuring that the used dataset is unbiased and balanced. However, no current studies have discussed the impact of model parameters and dataset characteristics on the performance of machine learning models; therefore, this paper fills this gap by evaluating the impact of hyperparameters, regularization parameters, dataset size, correlated features, and imbalanced datasets on the performance of six most commonly known machine learning techniques. These models are Classification and Regression Decision Tree, Artificial Neural Network, Random Forest, Logistic Regression, Gaussian Naïve Bayes, and Support Vector Machine. Thirteen features extracted from legitimate and simulated GPS attack signals are used to perform this investigation. The evaluation was performed in terms of four metrics: accuracy, probability of misdetection, probability of false alarm, and probability of detection. The results indicate that hyperparameters, regularization parameters, correlated features, dataset size, and imbalanced datasets adversely affect a machine learning model’s performance. The results also show that the Classification and Regression Decision Tree classifier has an accuracy of 99.99%, a probability of detection of 99.98%, a probability of misdetection of 0.2%, and a probability of false alarm of 1.005%, after removing correlated features and using tuned parameters in a balanced dataset. Random Forest can achieve an accuracy of 99.94%, a probability of detection of 99.6%, a probability of misdetection of 0.4%, and a probability of false alarm of 1.01% in similar conditions. 
    more » « less