skip to main content

Attention:

The NSF Public Access Repository (NSF-PAR) system and access will be unavailable from 11:00 PM ET on Thursday, October 10 until 2:00 AM ET on Friday, October 11 due to maintenance. We apologize for the inconvenience.


Title: Development of category-based scoring support vector regression (CBS-SVR) for drought prediction
Abstract

Using the existing measures for training numerical (non-categorical) prediction models can cause misclassification of droughts. Thus, developing a drought category-based measure is critical. Moreover, the existing fixed drought category thresholds need to be improved. The objective of this research is to develop a category-based scoring support vector regression (CBS-SVR) model based on an improved drought categorization method to overcome misclassification in drought prediction. To derive variable threshold levels for drought categorization, K-means (KM) and Gaussian mixture (GM) clustering are compared with the traditional drought categorization. For drought prediction, CBS-SVR is performed by using the best categorization method. The new drought model was applied to the Red River of the North Basin (RRB) in the USA. In the model training and testing, precipitation, temperature, and actual evapotranspiration were selected as the predictors, and the target variables consisted of multivariate drought indices, as well as bivariate and univariate standardized drought indices. Results indicated that the drought categorization method, variable threshold levels, and the type of drought index were the major factors that influenced the accuracy of drought prediction. The CBS-SVR outperformed the support vector classification and traditional SVR by avoiding overfitting and miscategorization in drought prediction.

 
more » « less
NSF-PAR ID:
10362364
Author(s) / Creator(s):
 ;  
Publisher / Repository:
DOI PREFIX: 10.2166
Date Published:
Journal Name:
Journal of Hydroinformatics
Volume:
24
Issue:
1
ISSN:
1464-7141
Format(s):
Medium: X Size: p. 202-222
Size(s):
p. 202-222
Sponsoring Org:
National Science Foundation
More Like this
  1. The US Drought Monitor (USDM) is a hallmark in real time drought monitoring and assessment as it was developed by multiple agencies to provide an accurate and timely assessment of drought conditions in the US on a weekly basis. The map is built based on multiple physical indicators as well as reported observations from local contributors before human analysts combine the information and produce the drought map using their best judgement. Since human subjectivity is included in the production of the USDM maps, it is not an entirely clear quantitative procedure for other entities to reproduce the maps. In this study, we developed a framework to automatically generate the maps through a machine learning approach by predicting the drought categories across the domain of study. A persistence model served as the baseline model for comparison in the framework. Three machine learning algorithms, logistic regression, random forests, and support vector machines, with four different groups of input data, which formed an overall of 12 different configurations, were used for the prediction of drought categories. Finally, all the configurations were evaluated against the baseline model to select the best performing option. The results showed that our proposed framework could reproduce the drought maps to a near-perfect level with the support vector machines algorithm and the group 4 data. The rest of the findings of this study can be highlighted as: 1) employing the past week drought data as a predictor in the models played an important role in achieving high prediction scores, 2) the nonlinear models, random forest, and support vector machines had a better overall performance compared to the logistic regression models, and 3) with borrowing the neighboring grid cells information, we could compensate the lack of training data in the grid cells with insufficient historical USDM data particularly for extreme and exceptional drought conditions. 
    more » « less
  2. In the recent years, reciprocal link prediction has received some attention from the data mining and social network analysis researchers, who solved this problem as a binary classification task. However, it is also important to predict the interval time for the creation of reciprocal link. This is a challenging problem for two reasons: First, the lack of effective features, because well-known link prediction features are designed for undirected networks and for the binary classification task, hence they do not work well for the interval time prediction; Second, the presence of censored data instances makes the traditional supervised regression methods unsuitable for solving this problem. In this paper, we propose a solution for the reciprocal link interval time prediction task. We map this problem into survival analysis framework and show through extensive experiments on real-world datasets that, survival analysis methods perform better than traditional regression, neural network based model and support vector regression (SVR). 
    more » « less
  3. Manufacturers have faced an increasing need for the development of predictive models that predict mechanical failures and the remaining useful life (RUL) of manufacturing systems or components. Classical model-based or physics-based prognostics often require an in-depth physical understanding of the system of interest to develop closed-form mathematical models. However, prior knowledge of system behavior is not always available, especially for complex manufacturing systems and processes. To complement model-based prognostics, data-driven methods have been increasingly applied to machinery prognostics and maintenance management, transforming legacy manufacturing systems into smart manufacturing systems with artificial intelligence. While previous research has demonstrated the effectiveness of data-driven methods, most of these prognostic methods are based on classical machine learning techniques, such as artificial neural networks (ANNs) and support vector regression (SVR). With the rapid advancement in artificial intelligence, various machine learning algorithms have been developed and widely applied in many engineering fields. The objective of this research is to introduce a random forests (RFs)-based prognostic method for tool wear prediction as well as compare the performance of RFs with feed-forward back propagation (FFBP) ANNs and SVR. Specifically, the performance of FFBP ANNs, SVR, and RFs are compared using an experimental data collected from 315 milling tests. Experimental results have shown that RFs can generate more accurate predictions than FFBP ANNs with a single hidden layer and SVR. 
    more » « less
  4. null (Ed.)
    Abstract Our goal is to understand and optimize human concept learning by predicting the ease of learning of a particular exemplar or category. We propose a method for estimating ease values, quantitative measures of ease of learning, as an alternative to conducting costly empirical training studies. Our method combines a psychological embedding of domain exemplars with a pragmatic categorization model. The two components are integrated using a radial basis function network (RBFN) that predicts ease values. The free parameters of the RBFN are fit using human similarity judgments, circumventing the need to collect human training data to fit more complex models of human categorization. We conduct two category-training experiments to validate predictions of the RBFN. We demonstrate that an instance-based RBFN outperforms both a prototype-based RBFN and an empirical approach using the raw data. Although the human data were collected across diverse experimental conditions, the predicted ease values strongly correlate with human learning performance. Training can be sequenced by (predicted) ease, achieving what is known as fading in the psychology literature and curriculum learning in the machine-learning literature, both of which have been shown to facilitate learning. 
    more » « less
  5. A machine learning-based detection framework is proposed to detect a class of cyber-attacks that redistribute loads by modifying measurements. The detection framework consists of a multi-output support vector regression (SVR) load predictor and a subsequent support vector machine (SVM) attack detector to determine the existence of load redistribution (LR) attacks utilizing loads predicted by the SVR predictor. Historical load data for training the SVR are obtained from the publicly available PJM zonal loads and are mapped to the IEEE 30-bus system. The features to predict loads are carefully extracted from the historical load data capturing both temporal and spatial correlations. The SVM attack detector is trained using normal data and randomly created LR attacks, so that it can maximally explore the attack space. An algorithm to create random LR attacks is introduced. The results show that the SVM detector trained merely using random attacks can effectively detect not only random attacks, but also intelligently designed attacks. Moreover, using the SVR predicted loads to re-dispatch generation when attacks are detected can significantly mitigate the attack consequences. 
    more » « less