This study developed a hybrid model for predicting dissolved oxygen (DO) using real-time sensor data for thirteen parameters. This novel hybrid model integrated one-dimensional convolutional neural networks (CNN) and long short-term memory (LSTM) to improve the accuracy of prediction for DO in water. The hybrid CNNLSTM model predicted DO concentration in water using soft sensor data. The primary input parameters to the model were temperature, pH, specific conductivity, salinity, density, chlorophyll, and blue-green algae. The model used 38,681 water quality data for training and testing the hybrid deep learning network. The training procedure for the model was successful. The training and test losses were both nearly zero and within a similar range. With a coefficient of determination (R2) of 0.94 and a mean squared error (MSE) of 0.12, the hybrid model indicated higher performance compared to the classical models. The normal distribution of residual errors confirmed the reliability of the DO predictions by the hybrid CNN-LSTM model. Feature importance analysis indicated pH as the most significant predictor and temperature as the second important predictor. The feature importance scores based on extreme gradient boosting (XGBoost) for the pH and temperature were 0.76 and 0.12, respectively. This study indicated that the hybrid model can outperform the classical machine learning models in the real-time prediction of DO concentration.
more »
« less
This content will become publicly available on December 1, 2025
Elemental diffusion coefficient prediction in conventional alloys using machine learning
This paper presents the Machine Learned Diffusion Coefficient Estimator, a comprehensive machine learning framework designed to predict diffusion coefficients in impure metallic (IM) and multi-component alloy (MCA) media. The framework incorporates five machine learning models, each tailored to specific diffusion modes: (1) impurity and (2) self-diffusion in IM media, and (3) self, (4) impurity, and (5) chemical diffusion in MCA media. These models use statistical aggregations of atomic descriptors for both the diffusing elements and the diffusion media, along with the temperature of the diffusion process, as features. Models are trained using the random forest and deep neural network algorithms, with performance evaluated through the coefficient of determination (R2), mean squared error (MSE), and uncertainty estimates. The models within this framework achieve an impressive R2 score above 0.90 with MSE less than 10−16 m2/s, demonstrating high predictive accuracy and reliability for diffusion coefficient.
more »
« less
- PAR ID:
- 10577541
- Publisher / Repository:
- American Institute of Physics
- Date Published:
- Journal Name:
- Chemical Physics Reviews
- Volume:
- 5
- Issue:
- 4
- ISSN:
- 2688-4070
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Improving the fireproof performance of polymers is crucial for ensuring human safety and enabling future space colonization. However, the complexity of the mechanisms for flame retardant and the need for customized material design pose significant challenges. To address these issues, we propose a machine learning (ML) framework based on substructure fingerprinting and self-enforcing deep neural networks (SDNN) to predict the fireproof performance of flame-retardant epoxy resins. Our model is based on a comprehensive understanding of the physical mechanisms of materials and can predict fireproof performance and eliminate the needs for properties descriptors, making it more convenient than previous ML models. With a dataset of only 163 samples, our SDNN models show an average prediction error of 3% for the limited oxygen index (LOI). They also provide satisfactory predictions for the peak of heat release rate PHR and total heat release (THR), with coefficient of determination (R2) values of 0.87 and 0.85, respectively, and average prediction errors less than 17%. Our model outperforms the support vector model SVM for all three indices, making it a state-of-the-art study in the field of flame retardancy. We believe that our framework will be a valuable tool for the design and virtual screening of flame retardants and will contribute to the development of safer and more efficient polymer materials.more » « less
-
In recent years, the utilization of machine learning algorithms and advancements in unmanned aerial vehicle (UAV) technology have caused significant shifts in remote sensing practices. In particular, the integration of machine learning with physical models and their application in UAV–satellite data fusion have emerged as two prominent approaches for the estimation of vegetation biochemistry. This study evaluates the performance of five machine learning regression algorithms (MLRAs) for the mapping of crop canopy chlorophyll at the Kellogg Biological Station (KBS) in Michigan, USA, across three scenarios: (1) application to Landsat 7, RapidEye, and PlanetScope satellite images; (2) application to UAV–satellite data fusion; and (3) integration with the PROSAIL radiative transfer model (hybrid methods PROSAIL + MLRAs). The results indicate that the majority of the five MLRAs utilized in UAV–satellite data fusion perform better than the five PROSAIL + MLRAs. The general trend suggests that the integration of satellite data with UAV-derived information, including the normalized difference red-edge index (NDRE), canopy height model, and leaf area index (LAI), significantly enhances the performance of MLRAs. The UAV–RapidEye dataset exhibits the highest coefficient of determination (R2) and the lowest root mean square errors (RMSE) when employing kernel ridge regression (KRR) and Gaussian process regression (GPR) (R2 = 0.89 and 0.89 and RMSE = 8.99 µg/cm2 and 9.65 µg/cm2, respectively). Similar performance is observed for the UAV–Landsat and UAV–PlanetScope datasets (R2 = 0.86 and 0.87 for KRR, respectively). For the hybrid models, the maximum performance is attained with the Landsat data using KRR and GPR (R2 = 0.77 and 0.51 and RMSE = 33.10 µg/cm2 and 42.91 µg/cm2, respectively), followed by R2 = 0.75 and RMSE = 39.78 µg/cm2 for the PlanetScope data upon integrating partial least squares regression (PLSR) into the hybrid model. Across all hybrid models, the RapidEye data yield the most stable performance, with the R2 ranging from 0.45 to 0.71 and RMSE ranging from 19.16 µg/cm2 to 33.07 µg/cm2. The study highlights the importance of synergizing UAV and satellite data, which enables the effective monitoring of canopy chlorophyll in small agricultural lands.more » « less
-
The field of tissue engineering has made significant advancements with extrusion-based bioprinting, which uses shear forces to create intricate tissue structures. However, the success of this method heavily relies on the rheological properties of bioinks. Most bioinks use shear-thinning. While a few component-based efforts have been reported to predict the viscosity of bioinks, the impact of shear rate has been vastly ignored. To address this gap, our research presents predictive models using machine learning (ML) algorithms, including polynomial fit (PF), decision tree (DT), and random forest (RF), to estimate bioink viscosity based on component weights and shear rate. We utilized novel bioinks composed of varying percentages of alginate (2–5.25%), gelatin (2–5.25%), and TEMPO-Nano fibrillated cellulose (0.5–1%) at shear rates from 0.1 to 100 s−1. Our study analyzed 169 rheological measurements using 80% training and 20% validation data. The results, based on the coefficient of determination (R2) and mean absolute error (MAE), showed that the RF algorithm-based model performed best: [(R2, MAE) RF = (0.99, 0.09), (R2, MAE) PF = (0.95, 0.28), (R2, MAE) DT = (0.98, 0.13)]. These predictive models serve as valuable tools for bioink formulation optimization, allowing researchers to determine effective viscosities without extensive experimental trials to accelerate tissue engineering.more » « less
-
Influence Maximization (IM) is a crucial problem in data science. The goal is to find a fixed-size set of highly influentialseedvertices on a network to maximize the influence spread along the edges. While IM is NP-hard on commonly used diffusion models, a greedy algorithm can achieve (1 - 1/e)-approximation by repeatedly selecting the vertex with the highestmarginal gainin influence as the seed. However, we observe two performance issues in the existing work that prevent them from scaling to today's large-scale graphs: space-inefficient memorization to estimate marginal gain, and time-inefficient seed selection process due to a lack of parallelism. This paper significantly improves the scalability of IM using two key techniques. The first is asketch-compressiontechnique for the independent cascading model on undirected graphs. It allows combining the simulation and sketching approaches to achieve a time-space tradeoff. The second technique includes new data structures for parallel seed selection. Using our new approaches, we implementedPaC-IM: Parallel and Compressed IM. We comparePaC-IMwith state-of-the-art parallel IM systems on a 96-core machine with 1.5TB memory.PaC-IMcan process the ClueWeb graph with 978M vertices and 75B edges in about 2 hours. On average, across all tested graphs, our uncompressed version is 5--18x faster and about 1.4x more space-efficient than existing parallel IM systems. Using compression further saves 3.8x space with only 70% overhead in time on average.more » « less
An official website of the United States government
