skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Non-Parametric Machine Learning Modeling of Tree-Caused Power Outage Risk to Overhead Distribution Powerlines
Trees in proximity to power lines can cause significant damage to utility infrastructure during storms, leading to substantial economic and societal costs. This study investigated the effectiveness of non-parametric machine learning algorithms in modeling tree-related outage risks to distribution power lines at a finer spatial scale. We used a vegetation risk model (VRM) comprising 15 predictor variables derived from roadside tree data, landscape information, vegetation management records, and utility infrastructure data. We evaluated the VRM’s performance using decision tree (DT), random forest (RF), k-Nearest Neighbor (k-NN), extreme gradient boosting (XGBoost), and support vector machine (SVM) techniques. The RF algorithm demonstrated the highest performance with an accuracy of 0.753, an AUC-ROC of 0.746, precision of 0.671, and an F1-score of 0.693. The SVM achieved the highest recall value of 0.727. Based on the overall performance, the RF emerged as the best machine learning algorithm, whereas the DT was the least suitable. The DT reported the lowest run times for both hyperparameter optimization (3.93 s) and model evaluation (0.41 s). XGBoost and the SVM exhibited the highest run times for hyperparameter tuning (9438.54 s) and model evaluation (112 s), respectively. The findings of this study are valuable for enhancing the resilience and reliability of the electric grid.  more » « less
Award ID(s):
2022036
PAR ID:
10534992
Author(s) / Creator(s):
; ; ; ; ;
Publisher / Repository:
MDPI AG
Date Published:
Journal Name:
Applied Sciences
Volume:
14
Issue:
12
ISSN:
2076-3417
Page Range / eLocation ID:
4991
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. null (Ed.)
    Due to the difficulties and complications in the quantitative assessment of traumatic brain injury (TBI) and its increasing relevance in today’s world, robust detection of TBI has become more significant than ever. In this work, we investigate several machine learning approaches to assess their performance in classifying electroencephalogram (EEG) data of TBI in a mouse model. Algorithms such as decision trees (DT), random forest (RF), neural network (NN), support vector machine (SVM), K-nearest neighbors (KNN) and convolutional neural network (CNN) were analyzed based on their performance to classify mild TBI (mTBI) data from those of the control group in wake stages for different epoch lengths. Average power in different frequency sub-bands and alpha:theta power ratio in EEG were used as input features for machine learning approaches. Results in this mouse model were promising, suggesting similar approaches may be applicable to detect TBI in humans in practical scenarios. 
    more » « less
  2. Background and Objectives: Sepsis is a leading cause of mortality in intensive care units (ICUs). The development of a robust prognostic model utilizing patients’ clinical data could significantly enhance clinicians’ ability to make informed treatment decisions, potentially improving outcomes for septic patients. This study aims to create a novel machine-learning framework for constructing prognostic tools capable of predicting patient survival or mortality outcome. Methods: A novel dataset is created using concatenated triples of static data, temporal data, and clinical outcomes to expand data size. This structured input trains five machine learning classifiers (KNN, Logistic Regression, SVM, RF, and XGBoost) with advanced feature engineering. Models are evaluated on an independent cohort using AUROC and a new metric, 𝛾, which incorporates the F1 score, to assess discriminative power and generalizability. Results: We developed five prognostic models using the concatenated triple dataset with 10 dynamic features from patient medical records. Our analysis shows that the Extreme Gradient Boosting (XGBoost) model (AUROC = 0.777, F1 score = 0.694) and the Random Forest (RF) model (AUROC = 0.769, F1 score = 0.647), when paired with an ensemble under-sampling strategy, outperform other models. The RF model improves AUROC by 6.66% and reduces overfitting by 54.96%, while the XGBoost model shows a 0.52% increase in AUROC and a 77.72% reduction in overfitting. These results highlight our framework’s ability to enhance predictive accuracy and generalizability, particularly in sepsis prognosis. Conclusion: This study presents a novel modeling framework for predicting treatment outcomes in septic patients, designed for small, imbalanced, and high-dimensional datasets. By using temporal feature encoding, advanced sampling, and dimension reduction techniques, our approach enhances standard classifier performance. The resulting models show improved accuracy with limited data, offering valuable prognostic tools for sepsis management. This framework demonstrates the potential of machine learning in small medical datasets. 
    more » « less
  3. Quantitative analysis of brain disorders such as Autism Spectrum Disorder (ASD) is an ongoing field of research. Machine learning and deep learning techniques have been playing an important role in automating the diagnosis of brain disorders by extracting discriminative features from the brain data. In this study, we propose a model called Auto-ASD-Network in order to classify subjects with Autism disorder from healthy subjects using only fMRI data. Our model consists of a multilayer perceptron (MLP) with two hidden layers. We use an algorithm called SMOTE for performing data augmentation in order to generate artificial data and avoid overfitting, which helps increase the classification accuracy. We further investigate the discriminative power of features extracted using MLP by feeding them to an SVM classifier. In order to optimize the hyperparameters of SVM, we use a technique called Auto Tune Models (ATM) which searches over the hyperparameter space to find the best values of SVM hyperparameters. Our model achieves more than 70% classification accuracy for 4 fMRI datasets with the highest accuracy of 80%. It improves the performance of SVM by 26%, the stand-alone MLP by 16% and the state of the art method in ASD classification by 14%. The implemented code will be available as GPL license on GitHub portal of our lab (https://github.com/PCDS). 
    more » « less
  4. Latifi, S. (Ed.)
    As the popularity of the internet continues to grow, along with the use of web browsers and browser extensions, the threat of malicious browser extensions has increased and therefore demands an effective way to detect and in turn prevent the installation of these malicious extensions. These extensions compromise private user information (including usernames and passwords) and are also able to compromise the user’s computer in the form of Trojans and other malicious software. This paper presents a method which combines machine learning and feature engineering to detect malicious browser extensions. By analyzing the static code of browser extensions and looking for features in the static code, the method predicts whether a browser extension is malicious or benign with a machine learning algorithm. Four machine learning algorithms (SVM, RF, KNN, and XGBoost) were tested with a dataset collected by ourselves in this study. Their detection performance in terms of different performance metrics are discussed. 
    more » « less
  5. Abstract Water sustainability in the built environment requires an accurate estimation of residential water end uses (e.g., showers, toilets, faucets, etc.). In this study, we evaluate the performance of four models (Random Forest, RF; Support Vector Machines, SVM; Logistic Regression, Log‐reg; and Neural Networks, NN) for residential water end‐use classification using actual (measured) and synthetic labeled data sets. We generated synthetic labeled data using Conditional Tabular Generative Adversarial Networks. We then utilized grid search to train each model on their respective optimized hyperparameters. The RF model exhibited the best model performance overall, while the Log‐reg model had the shortest execution times under different balanced and imbalanced (based on number of events per class) synthetic data scenarios, demonstrating a computationally efficient alternative for RF for specific end uses. The NN model exhibited high performance with the tradeoff of longer execution times compared to the other classification models. In the balanced data set scenario, all models achieved closely aligned F1‐scores, ranging from 0.83 to 0.90. However, when faced with imbalanced data reflective of actual conditions, both the SVM and Log‐reg models showed inferior performance compared to the RF and NN models. Overall, we concluded that decision tree‐based models emerge as the optimal choice for classification tasks in the context of water end‐use data. Our study advances residential smart water metering systems through creating synthetic labeled end‐use data and providing insight into the strengths and weaknesses of various supervised machine learning classifiers for end‐use identification. 
    more » « less