Per- and polyfluoroalkyl substances (PFAS) contamination has posed a significant environmental and public health challenge due to their ubiquitous nature. Adsorption has emerged as a promising remediation technique, yet optimizing adsorption efficiency remains complex due to the diverse physicochemical properties of PFAS and the wide range of adsorbent materials. Traditional modeling approaches, such as response surface methodology (RSM), struggled to capture nonlinear interactions, while standalone machine learning (ML) models required extensive datasets. This study addressed these limitations by developing hybrid RSM-ML models to improve the prediction and optimization of PFAS adsorption. A comprehensive dataset was constructed using experimental adsorption data, integrating key parameters such as pH, pHpzc, surface area, temperature, and PFAS molecular properties. RSM was employed to model adsorption behavior, while gradient boosting (GB), random forest (RF), and extreme gradient boosting (XGB) were used to enhance predictive performance. Hybrid models—linear, RMSE-based, multiplicative, and meta-learning—were developed and evaluated. The meta-learning HOP-RSM-GB model achieved near-perfect accuracy (R² = 1.00, RMSE = 10.59), outperforming all other models. Surface plots revealed that low pH and high pHpzc maximized the adsorption while increasing log Kow consistently enhanced PFAS adsorption. These findings establish hybrid RSM-ML modeling as a powerful framework for optimizing PFAS remediation strategies. The integration of statistical and machine learning approaches significantly improves predictive accuracy, reduces experimental costs, and provides deeper insights into adsorption mechanisms. This study underscores the importance of data-driven approaches in environmental engineering and highlights future opportunities for integrating ML-driven modeling with experimental adsorption research.
more »
« less
Interpreting random forest analysis of ecological models to move from prediction to explanation
Abstract As modeling tools and approaches become more advanced, ecological models are becoming more complex. Traditional sensitivity analyses can struggle to identify the nonlinearities and interactions emergent from such complexity, especially across broad swaths of parameter space. This limits understanding of the ecological mechanisms underlying model behavior. Machine learning approaches are a potential answer to this issue, given their predictive ability when applied to complex large datasets. While perceptions that machine learning is a “black box” linger, we seek to illuminate its interpretive potential in ecological modeling. To do so, we detail our process of applying random forests to complex model dynamics to produce both high predictive accuracy and elucidate the ecological mechanisms driving our predictions. Specifically, we employ an empirically rooted ontogenetically stage-structured consumer-resource simulation model. Using simulation parameters as feature inputs and simulation output as dependent variables in our random forests, we extended feature analyses into a simple graphical analysis from which we reduced model behavior to three core ecological mechanisms. These ecological mechanisms reveal the complex interactions between internal plant demography and trophic allocation driving community dynamics while preserving the predictive accuracy achieved by our random forests.
more »
« less
- Award ID(s):
- 2129757
- PAR ID:
- 10493058
- Publisher / Repository:
- Nature
- Date Published:
- Journal Name:
- Scientific Reports
- Volume:
- 13
- Issue:
- 1
- ISSN:
- 2045-2322
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Summary Forests are a critical carbon sink and widespread tree mortality resulting from climate‐induced drought stress has the potential to alter forests from a carbon sink to a source, causing a positive feedback on climate change. Process‐based vegetation models aim to represent the current understanding of the underlying mechanisms governing plant physiological and ecological responses to climate. Yet model accuracy varies across scales, and regional‐scale model predictive skill is frequently poor when compared with observations of drought‐driven mortality. I propose a framework that leverages differences in model predictive skill across spatial scales, mismatches between model predictions and observations, and differences in the mechanisms included and absent across models to advance the understanding of the physiological and ecological processes driving observed patterns drought‐driven mortality.more » « less
-
Abstract Spatial models for occupancy data are used to estimate and map the true presence of a species, which may depend on biotic and abiotic factors as well as spatial autocorrelation. Traditionally researchers have accounted for spatial autocorrelation in occupancy data by using a correlated normally distributed site‐level random effect, which might be incapable of modeling nontraditional spatial dependence such as discontinuities and abrupt transitions. Machine learning approaches have the potential to model nontraditional spatial dependence, but these approaches do not account for observer errors such as false absences. By combining the flexibility of Bayesian hierarchal modeling and machine learning approaches, we present a general framework to model occupancy data that accounts for both traditional and nontraditional spatial dependence as well as false absences. We demonstrate our framework using six synthetic occupancy data sets and two real data sets. Our results demonstrate how to model both traditional and nontraditional spatial dependence in occupancy data, which enables a broader class of spatial occupancy models that can be used to improve predictive accuracy and model adequacy.more » « less
-
Abstract Adhesive bonding of composite materials has become increasingly crucial for advanced engineering applications, offering unique advantages for lightweight and high-performance designs. This study presents a novel framework, physics-informed failure mode proportion prediction (PIFMP) model, for predicting failure mode proportions in composite adhesive joints, addressing critical gaps in understanding mixed-mode failure behaviors. In contrast to conventional approaches that focus solely on force or stress prediction, this research integrates important parameters from multistage manufacturing processes (MMPs) and simulation data into a physics-informed machine learning (PIML) framework, enabling proactive failure prediction and design optimization. The proposed framework unifies data-driven machine learning models with features derived from finite element analysis (FEA), incorporating cohesive zone modeling (CZM) to capture the physical dynamics of adhesive behavior under lap shearing. By embedding FEA-based physics features into the machine learning process and leveraging a time-series transformer model to analyze the temporal progression of interfacial damage and separation, the framework ensures predictive accuracy and physics-informed consistency, enabling precise analysis of failure mechanisms. The empirical study validates the effectiveness and the reliability of the framework, demonstrating enhanced predictive performance through cross-validation. The work establishes a foundational approach for failure analysis and provides a robust basis for future advancements.more » « less
-
We study rare-event simulation for a class of problems where the target hitting sets of interest are defined via modern machine learning tools such as neural networks and random forests. This problem is motivated from fast emerging studies on the safety evaluation of intelligent systems, robustness quantification of learning models, and other potential applications to large-scale simulation in which machine learning tools can be used to approximate complex rare-event set boundaries. We investigate an importance sampling scheme that integrates the dominating point machinery in large deviations and sequential mixed integer programming to locate the underlying dominating points. Our approach works for a range of neural network architectures including fully connected layers, rectified linear units, normalization, pooling and convolutional layers, and random forests built from standard decision trees. We provide efficiency guarantees and numerical demonstration of our approach using a classification model in the UCI Machine Learning Repository.more » « less
An official website of the United States government

