The objective of this study is to develop data-driven predictive models for seismic energy
dissipation of rocking shallow foundations during earthquake loading using decision tree-based
ensemble machine learning algorithms and supervised learning technique. Data from a rocking
foundation’s database consisting of dynamic base shaking experiments conducted on centrifuges
and shaking tables have been used for the development of a base decision tree regression (DTR)
model and four ensemble models: bagging, random forest, adaptive boosting, and gradient
boosting. Based on k-fold cross-validation tests of models and mean absolute percentage errors
in predictions, it is found that the overall average accuracy of all four ensemble models is
improved by about 25%–37% when compared to base DTR model. Among the four ensemble
models, gradient boosting and adaptive boosting models perform better than the other two
models in terms of accuracy and variance in predictions for the problem considered.
more »
« less
Supervised Learning via Ensemble Tensor Completion
Learning nonlinear functions from input-output data pairs is one of the most fundamental problems in machine learning. Recent work has formulated the problem of learning a general nonlinear multivariate function of discrete inputs, as a tensor completion problem with smooth latent factors. We build upon this idea and utilize two ensemble learning techniques to enhance its prediction accuracy. Ensemble methods can be divided into two main groups, parallel and sequential. Bagging also known as bootstrap aggregation is a parallel ensemble method where multiple base models are trained in parallel on different subsets of the data that have been chosen randomly with replacement from the original training data. The output of these models is usually combined and a single prediction is computed using averaging. One of the most popular bagging techniques is random forests. Boosting is a sequential ensemble method where a sequence of base models are fit sequentially to modified versions of the data. Popular boosting algorithms include AdaBoost and Gradient Boosting. We develop two approaches based on these ensemble learning techniques for learning multivariate functions using the Canonical Polyadic Decomposition. We showcase the effectiveness of the proposed ensemble models on several regression tasks and report significant improvements compared to the single model.
more »
« less
- Award ID(s):
- 1704074
- PAR ID:
- 10287999
- Date Published:
- Journal Name:
- 2020 Asilomar Conference on on Signals, Systems, and Computers
- Page Range / eLocation ID:
- 196 to 199
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Function approximation from input and output data pairs constitutes a fundamental problem in supervised learning. Deep neural networks are currently the most popular method for learning to mimic the input-output relationship of a general nonlinear system, as they have proven to be very effective in approximating complex highly nonlinear functions. In this work, we show that identifying a general nonlinear function y = ƒ(x1,…,xN) from input-output examples can be formulated as a tensor completion problem and under certain conditions provably correct nonlinear system identification is possible. Specifically, we model the interactions between the N input variables and the scalar output of a system by a single N-way tensor, and setup a weighted low-rank tensor completion problem with smoothness regularization which we tackle using a block coordinate descent algorithm. We extend our method to the multi-output setting and the case of partially observed data, which cannot be readily handled by neural networks. Finally, we demonstrate the effectiveness of the approach using several regression tasks including some standard benchmarks and a challenging student grade prediction task.more » « less
-
null (Ed.)This paper develops an ensemble learning-based linearization approach for power flow with reactive power modeled, where the polynomial regression (PR) is first used as a basic learner to capture the linear relationships between the bus voltages as the independent variables and the active or reactive power as the dependent variable in rectangular coordinates. Then, gradient boosting (GB) and bagging as ensemble learning methods are introduced to combine all basic learners to boost the model performance. The inferred linear power flow model is applied to solve the well-known optimal power flow (OPF) problem. The simulation results on IEEE standard power systems indicate that (1) ensemble learning methods can significantly improve the efficiency of PR, and GB works better than bagging; (2) as for solving OPF, the data-driven model outperforms the DC model and the SDP relaxation in both accuracy, and computational efficiency.more » « less
-
Ensemble models (bagging and gradient-boosting) of relational decision trees have proved to be some of the most effective learning methods in the area of probabilistic logic models (PLMs). While effective, they lose one of the most important benefits of PLMs—interpretability. In this paper we consider the problem of compressing a large set of learned trees into a single explainable model. To this effect, we propose CoTE—Compression of Tree Ensembles—that produces a single small decision list as a compressed representation. CoTE first converts the trees to decision lists and then performs the combination and compression with the aid of the original training set. An experimental evaluation demonstrates the effectiveness of CoTE in several benchmark relational data sets.more » « less
-
Unmanned Aerial Vehicles have been widely used in military and civilian areas. The positioning and return-to-home tasks of UAVs deliberately depend on Global Positioning Systems (GPS). However, the civilian GPS signals are not encrypted, which can motivate numerous cyber-attacks on UAVs, including Global Positioning System spoofing attacks. In these spoofing attacks, a malicious user transmits counterfeit GPS signals. Numerous studies have proposed techniques to detect these attacks. However, these techniques have some limitations, including low probability of detection, high probability of misdetection, and high probability of false alarm. In this paper, we investigate and compare the performances of three ensemble-based machine learning techniques, namely bagging, stacking, and boosting, in detecting GPS attacks. The evaluation metrics are the accuracy, probability of detection, probability of misdetection, probability of false alarm, memory size, processing time, and prediction time per sample. The results show that the stacking model has the best performance compared to the two other ensemble models in terms of all the considered evaluation metrics.more » « less