skip to main content


Title: Data-Driven Modeling of Seismic Energy Dissipation of Rocking Foundations Using Decision Tree-Based Ensemble Machine Learning Algorithms
The objective of this study is to develop data-driven predictive models for seismic energy dissipation of rocking shallow foundations during earthquake loading using decision tree-based ensemble machine learning algorithms and supervised learning technique. Data from a rocking foundation’s database consisting of dynamic base shaking experiments conducted on centrifuges and shaking tables have been used for the development of a base decision tree regression (DTR) model and four ensemble models: bagging, random forest, adaptive boosting, and gradient boosting. Based on k-fold cross-validation tests of models and mean absolute percentage errors in predictions, it is found that the overall average accuracy of all four ensemble models is improved by about 25%–37% when compared to base DTR model. Among the four ensemble models, gradient boosting and adaptive boosting models perform better than the other two models in terms of accuracy and variance in predictions for the problem considered.  more » « less
Award ID(s):
2138631
NSF-PAR ID:
10486042
Author(s) / Creator(s):
; ;
Publisher / Repository:
American Society of Civil Engineers
Date Published:
Journal Name:
Geo-Congress 2023
Page Range / eLocation ID:
298 to 308
Format(s):
Medium: X
Location:
Los Angeles, California
Sponsoring Org:
National Science Foundation
More Like this
  1. Experimental results reveal that rocking shallow foundations reduce earthquake-induced force and flexural displacement demands transmitted to structures and can be used as an effective geotechnical seismic isolation mechanism. This paper presents data-driven predictive models for maximum acceleration transmitted to structures founded on rocking shallow foundations during earthquake loading. Results from base-shaking experiments on rocking foundations have been utilized for the development of artificial neural network regression (ANN), k-nearest neighbors regression, support vector regression, random forest regression, adaptive boosting regression, and gradient boosting regression models. Acceleration amplification ratio, defined as the maximum acceleration at the center of gravity of a structure divided by the peak ground acceleration of the earthquake, is considered as the prediction parameter. For five out of six models developed in this study, the overall mean absolute percentage error in predictions in repeated k-fold cross validation tests vary between 0.128 and 0.145, with the ANN model being the most accurate and most consistent. The cross validation mean absolute error in predictions of all six models vary between 0.08 and 0.1, indicating that the maximum acceleration of structures supported by rocking foundations can be predicted within an average error limit of 8% to 10% of the peak ground acceleration of the earthquake.

     
    more » « less
  2. The objective of this study is to develop data-driven predictive models for permanent settlement of rocking shallow foundations during seismic loading using multiple machine learning algorithms and supervised learning technique. Data from a rocking foundation database consisting of dynamic base shaking experiments conducted on centrifuges and shaking tables have been used for the development of k-nearest neighbors regression, support vector regression, and random forest regression models. Based on repeated k-fold cross validation tests of models and mean absolute percentage errors in their predictions, it is found that all three models perform better than a baseline multivariate linear regression model in terms of accuracy and variance in predictions. The average mean absolute errors in predictions of all three models are around 0.005 to 0.006, indicating that the rocking induced permanent settlement can be predicted within an average error limit of 0.5% to 0.6% of the width of the footing. 
    more » « less
  3. null (Ed.)
    Learning nonlinear functions from input-output data pairs is one of the most fundamental problems in machine learning. Recent work has formulated the problem of learning a general nonlinear multivariate function of discrete inputs, as a tensor completion problem with smooth latent factors. We build upon this idea and utilize two ensemble learning techniques to enhance its prediction accuracy. Ensemble methods can be divided into two main groups, parallel and sequential. Bagging also known as bootstrap aggregation is a parallel ensemble method where multiple base models are trained in parallel on different subsets of the data that have been chosen randomly with replacement from the original training data. The output of these models is usually combined and a single prediction is computed using averaging. One of the most popular bagging techniques is random forests. Boosting is a sequential ensemble method where a sequence of base models are fit sequentially to modified versions of the data. Popular boosting algorithms include AdaBoost and Gradient Boosting. We develop two approaches based on these ensemble learning techniques for learning multivariate functions using the Canonical Polyadic Decomposition. We showcase the effectiveness of the proposed ensemble models on several regression tasks and report significant improvements compared to the single model. 
    more » « less
  4. null (Ed.)
    Introduction: Alzheimer’s disease (AD) causes progressive irreversible cognitive decline and is the leading cause of dementia. Therefore, a timely diagnosis is imperative to maximize neurological preservation. However, current treatments are either too costly or limited in availability. In this project, we explored using retinal vasculature as a potential biomarker for early AD diagnosis. This project focuses on stage 3 of a three-stage modular machine learning pipeline which consisted of image quality selection, vessel map generation, and classification [1]. The previous model only used support vector machine (SVM) to classify AD labels which limited its accuracy to 82%. In this project, random forest and gradient boosting were added and, along with SVM, combined into an ensemble classifier, raising the classification accuracy to 89%. Materials and Methods: Subjects classified as AD were those who were diagnosed with dementia in “Dementia Outcome: Alzheimer’s disease” from the UK Biobank Electronic Health Records. Five control groups were chosen with a 5:1 ratio of control to AD patients where the control patients had the same age, gender, and eye side image as the AD patient. In total, 122 vessel images from each group (AD and control) were used. The vessel maps were then segmented from fundus images through U-net. A t-test feature selection was first done on the training folds and the selected features was fed into the classifiers with a p-value threshold of 0.01. Next, 20 repetitions of 5-fold cross validation were performed where the hyperparameters were solely tuned on the training data. An ensemble classifier consisting of SVM, gradient boosting tree, and random forests was built and the final prediction was made through majority voting and evaluated on the test set. Results and Discussion: Through ensemble classification, accuracy increased by 4-12% relative to the individual classifiers, precision by 9-15%, sensitivity by 2-9%, specificity by at least 9-16%, and F1 score by 712%. Conclusions: Overall, a relatively high classification accuracy was achieved using machine learning ensemble classification with SVM, random forest, and gradient boosting. Although the results are very promising, a limitation of this study is that the requirement of needing images of sufficient quality decreased the amount of control parameters that can be implemented. However, through retinal vasculature analysis, this project shows machine learning’s high potential to be an efficient, more cost-effective alternative to diagnosing Alzheimer’s disease. Clinical Application: Using machine learning for AD diagnosis through retinal images will make screening available for a broader population by being more accessible and cost-efficient. Mobile device based screening can also be enabled at primary screening in resource-deprived regions. It can provide a pathway for future understanding of the association between biomarkers in the eye and brain. 
    more » « less
  5. Background

    Clinical prediction models suffer from performance drift as the patient population shifts over time. There is a great need for model updating approaches or modeling frameworks that can effectively use the old and new data.

    Objective

    Based on the paradigm of transfer learning, we aimed to develop a novel modeling framework that transfers old knowledge to the new environment for prediction tasks, and contributes to performance drift correction.

    Methods

    The proposed predictive modeling framework maintains a logistic regression–based stacking ensemble of 2 gradient boosting machine (GBM) models representing old and new knowledge learned from old and new data, respectively (referred to as transfer learning gradient boosting machine [TransferGBM]). The ensemble learning procedure can dynamically balance the old and new knowledge. Using 2010-2017 electronic health record data on a retrospective cohort of 141,696 patients, we validated TransferGBM for hospital-acquired acute kidney injury prediction.

    Results

    The baseline models (ie, transported models) that were trained on 2010 and 2011 data showed significant performance drift in the temporal validation with 2012-2017 data. Refitting these models using updated samples resulted in performance gains in nearly all cases. The proposed TransferGBM model succeeded in achieving uniformly better performance than the refitted models.

    Conclusions

    Under the scenario of population shift, incorporating new knowledge while preserving old knowledge is essential for maintaining stable performance. Transfer learning combined with stacking ensemble learning can help achieve a balance of old and new knowledge in a flexible and adaptive way, even in the case of insufficient new data.

     
    more » « less