Abstract Previous studies have identified environmental characteristics that skillfully discriminate between severe and significant-severe weather events, but they have largely been limited by sample size and/or population of predictor variables. Given the heightened societal impacts of significant-severe weather, this topic was revisited using over 150 000 ERA5 reanalysis-derived vertical profiles extracted at the grid-point nearest—and just prior to—tornado and hail reports during the period 1996–2019. Profiles were quality-controlled and used to calculate 84 variables. Several machine learning classification algorithms were trained, tested, and cross-validated on these data to assess skill in predicting severe or significant-severe reports for tornadoes and hail. Random forest classification outperformed all tested methods as measured by cross-validated critical success index scores and area under the receiver operating characteristic curve values. In addition, random forest classification was found to be more reliable than other methods and exhibited negligible frequency bias. The top three most important random forest classification variables for tornadoes were wind speed at 500 hPa, wind speed at 850 hPa, and 0–500-m storm-relative helicity. For hail, storm-relative helicity in the 3–6 km and -10 to -30 °C layers, along with 0–6-km bulk wind shear, were found to be most important. A game theoretic approach was used to help explain the output of the random forest classifiers and establish critical feature thresholds for operational nowcasting and forecasting. A use case of spatial applicability of the random forest model is also presented, demonstrating the potential utility for operational forecasting. Overall, this research supports a growing number of weather and climate studies finding admirable skill in random forest classification applications.
more »
« less
Analysing Traffic Accidents in Terms of Driver Violation Behaviour Types: Machine Learning and Sensitivity Analysis Approaches
ABSTRACT Traffic accidents have become a major concern for governments, organizations and individuals worldwide due to the material and moral losses they cause. It is possible to reduce this concern by taking into account the research conducted by relevant institutions and organizations in this field. The main objective of this study is to categorize traffic accidents according to driver violation types and analyse them using machine learning algorithms and feature sensitivity to identify the most influential variables in each category. For this purpose, traffic accident reports that occurred in Erzurum province in the last 1 year were used to categorize and classify driver violation behaviour types. Five different machine learning algorithms, namely k‐nearest neighbour, support vector machines, naive Bayes, multilayer perception and random forest, were used to examine the success performance of the classification. Among these, 91% successful classification was obtained with the random forest algorithm. Based on the classification obtained from this algorithm, sensitivity analysis was used to reveal the variables that most affect each violation category. The results of the analysis revealed that driver age and vehicle type were the most influential variables for many types of violations. Thanks to this study, the problems were clearly identified by going into the details of driver violation behaviours. At the end of the study, measures to reduce driver violation behaviours were proposed. If the recommendations that can reduce driver behaviour are taken into consideration by transportation authorities and policy makers, traffic accidents can be significantly reduced.
more »
« less
- Award ID(s):
- 2330565
- PAR ID:
- 10645890
- Publisher / Repository:
- IET Intelligent Transport Systems
- Date Published:
- Journal Name:
- IET Intelligent Transport Systems
- Volume:
- 19
- Issue:
- 1
- ISSN:
- 1751-956X
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Driver reliance on automated vehicles (AV) is a critical component of safety particularly during high-risk traffic scenarios. Factors that influence reliance, including trust, situation awareness, fatigue, and demographics, have been independently explored; however, few analyses have investigated predicting AV reliance and compared factors comprehensively. The goals of this study were to develop a random forest (RF) model to predict reliance and to analyze the importance of factors for reliance decisions. We leveraged data from a driving simulation study where participants encountered four traffic events including responding to an illegal vehicle crossing, managing construction zones, stopping at a vandalized stop sign, and a pedestrian detection task. The dataset included reliance decisions and subjective assessments of dispositional trust, situational trust, fatigue, and workload. An RF model fit to the dataset using cross validation achieved an average AUC of 0.81 and accuracy of 0.77 and situational trust emerged as the most influential predictor.more » « less
-
Abstract Vehicle behaviour prediction provides important information for decision‐making in modern intelligent transportation systems. People with different driving styles have considerably different driving behaviours and hence exhibit different behaviour tendency. However, most existing prediction methods do not consider the different tendencies in driving styles and apply the same model to all vehicles. Furthermore, most of the existing driver classification methods rely on offline learning that requires a long observation of driving history and hence are not suitable for real‐time driving behaviour analysis. To facilitate personalised models that can potentially improve vehicle behaviour prediction, the authors propose an algorithm that classifies drivers into different driving styles. The algorithm only requires data from a short observation window and it is more applicable for real‐time online applications compared with existing methods that require a long term observation. Experiment results demonstrate that the proposed algorithm can achieve consistent classification results and provide intuitive interpretation and statistical characteristics of different driving styles, which can be further used for vehicle behaviour prediction.more » « less
-
Predictive modeling often ignores interaction effects among predictors in high-dimensional data because of analytical and computational challenges. Research in interaction selection has been galvanized along with methodological and computational advances. In this study, we aim to investigate the performance of two types of predictive algorithms that can perform interaction selection. Specifically, we compare the predictive performance and interaction selection accuracy of both penalty-based and tree-based predictive algorithms. Penalty-based algorithms included in our comparative study are the regularization path algorithm under the marginality principle (RAMP), the least absolute shrinkage selector operator (LASSO), the smoothed clipped absolute deviance (SCAD), and the minimax concave penalty (MCP). The tree-based algorithms considered are random forest (RF) and iterative random forest (iRF). We evaluate the effectiveness of these algorithms under various regression and classification models with varying structures and dimensions. We assess predictive performance using the mean squared error for regression and accuracy, sensitivity, specificity, balanced accuracy, and F1 score for classification. We use interaction coverage to judge the algorithm’s efficacy for interaction selection. Our findings reveal that the effectiveness of the selected algorithms varies depending on the number of predictors (data dimension) and the structure of the data-generating model, i.e., linear or nonlinear, hierarchical or non-hierarchical. There were at least one or more scenarios that favored each of the algorithms included in this study. However, from the general pattern, we are able to recommend one or more specific algorithm(s) for some specific scenarios. Our analysis helps clarify each algorithm’s strengths and limitations, offering guidance to researchers and data analysts in choosing an appropriate algorithm for their predictive modeling task based on their data structure.more » « less
-
In this empirical study, a framework was developed for binary and multi-class classification of Twitter data. We first introduce a manually built gold standard dataset of 4000 tweets related to the environmental health hazards in Barbados for the period 2014 - 2018. Then, the binary classification was used to categorize each tweet as relevant or irrelevant. Next, the multiclass classification was then used to further classify relevant tweets into four types of community engagement: reporting information, expressing negative engagement, expressing positive engagement, and asking for information. Results indicate that (combination of TF-IDF, psychometric, linguistic, sentiment and Twitter-specific features) using a Random Forest algorithm is the best feature for detecting and predicting binary classification with (87% F1 score). For multi-class classification, TF-IDF using Decision Tree algorithm was the best with (74% F1 score).more » « less
An official website of the United States government

