<?xml-model href='http://www.tei-c.org/release/xml/tei/custom/schema/relaxng/tei_all.rng' schematypens='http://relaxng.org/ns/structure/1.0'?><TEI xmlns="http://www.tei-c.org/ns/1.0">
	<teiHeader>
		<fileDesc>
			<titleStmt><title level='a'>Drifting Streaming Peaks-over-Threshold-Enhanced Self-Evolving Neural Networks for Short-Term Wind Farm Generation Forecast</title></titleStmt>
			<publicationStmt>
				<publisher></publisher>
				<date>01/01/2023</date>
			</publicationStmt>
			<sourceDesc>
				<bibl> 
					<idno type="par_id">10389652</idno>
					<idno type="doi">10.3390/fi15010017</idno>
					<title level='j'>Future Internet</title>
<idno>1999-5903</idno>
<biblScope unit="volume">15</biblScope>
<biblScope unit="issue">1</biblScope>					

					<author>Yunchuan Liu</author><author>Amir Ghasemkhani</author><author>Lei Yang</author>
				</bibl>
			</sourceDesc>
		</fileDesc>
		<profileDesc>
			<abstract><ab><![CDATA[This paper investigates the short-term wind farm generation forecast. It is observed from the real wind farm generation measurements that wind farm generation exhibits distinct features, such as the non-stationarity and the heterogeneous dynamics of ramp and non-ramp events across different classes of wind turbines. To account for the distinct features of wind farm generation, we propose a Drifting Streaming Peaks-over-Threshold (DSPOT)-enhanced self-evolving neural networks-based short-term wind farm generation forecast. Using DSPOT, the proposed method first classifies the wind farm generation data into ramp and non-ramp datasets, where time-varying dynamics are taken into account by utilizing dynamic ramp thresholds to separate the ramp and non-ramp events. We then train different neural networks based on each dataset to learn the different dynamics of wind farm generation by the NeuroEvolution of Augmenting Topologies (NEAT), which can obtain the best network topology and weighting parameters. As the efficacy of the neural networks relies on the quality of the training datasets (i.e., the classification accuracy of the ramp and non-ramp events), a Bayesian optimization-based approach is developed to optimize the parameters of DSPOT to enhance the quality of the training datasets and the corresponding performance of the neural networks. Based on the developed self-evolving neural networks, both distributional and point forecasts are developed. The experimental results show that compared with other forecast approaches, the proposed forecast approach can substantially improve the forecast accuracy, especially for ramp events. The experiment results indicate that the accuracy improvement in a 60 min horizon forecast in terms of the mean absolute error (MAE) is at least 33.6% for the whole year data and at least 37% for the ramp events. Moreover, the distributional forecast in terms of the continuous rank probability score (CRPS) is improved by at least 35.8% for the whole year data and at least 35.2% for the ramp events.]]></ab></abstract>
		</profileDesc>
	</teiHeader>
	<text><body xmlns="http://www.tei-c.org/ns/1.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xlink="http://www.w3.org/1999/xlink">
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.">Introduction</head><p>To reduce the environmental impacts of the electricity system, much progress can be found to integrate renewable energy resources, such as solar and wind. Indeed, a substantial percentage of this renewable integration <ref type="bibr">[1]</ref> comes from wind energy. Largescale wind power integration has aroused new challenges in power system operations, particularly during wind power ramps. Large ramps have a significant influence on system economics and reliability. For instance, the unexpected wind power ramp events that occurred in Texas <ref type="bibr">[2]</ref> caused a significant economic loss, and such cases were also reported in many other countries <ref type="bibr">[3]</ref>.</p><p>In this paper, we aim to develop accurate forecast approaches for a short-term wind power forecast that accounts for wind power ramps.</p><p>There are many studies on short-term wind power forecast using time-series models (e.g., the autoregressive model <ref type="bibr">[4]</ref>, autoregressive moving average model <ref type="bibr">[5]</ref>, Gaussian process (GP) <ref type="bibr">[6]</ref>, Kalman filtering (KF) <ref type="bibr">[7]</ref>, and Markov chains <ref type="bibr">[8]</ref>). However, these studies cannot effectively capture the non-stationarity and the heterogeneous dynamics of wind farm generation. To address the problem of non-stationary wind generation, the empirical mode decomposition (EMD), complementary empirical mode decomposition (CEEMD) <ref type="bibr">[9]</ref>, improved complete ensemble empirical mode decomposition (iCEEMDAN) <ref type="bibr">[10]</ref>, hybrid model of LSTM and variational mode decomposition (VMD) <ref type="bibr">[11,</ref><ref type="bibr">12]</ref>, and ensemble empirical mode decomposition (EEMD)-based hybrid methods <ref type="bibr">[13]</ref> are proposed, which use Intrinsic Mode Functions (IMFs) as a pre-processing measure and the product of the decomposition components as input for the prediction. However, finding an appropriate number of components or modes is challenging. Recently, artificial intelligence (AI)-based approaches were employed to many applications with success (e.g., Computer Vision (CV) <ref type="bibr">[14]</ref>, Natural Language Processing (NLP) <ref type="bibr">[15]</ref>, and Chess Playing <ref type="bibr">[16]</ref>). Different neural network (NN)-based frameworks <ref type="bibr">[11,</ref><ref type="bibr">[17]</ref><ref type="bibr">[18]</ref><ref type="bibr">[19]</ref><ref type="bibr">[20]</ref><ref type="bibr">[21]</ref><ref type="bibr">[22]</ref><ref type="bibr">[23]</ref><ref type="bibr">[24]</ref><ref type="bibr">[25]</ref><ref type="bibr">[26]</ref><ref type="bibr">[27]</ref><ref type="bibr">[28]</ref><ref type="bibr">[29]</ref><ref type="bibr">[30]</ref><ref type="bibr">[31]</ref><ref type="bibr">[32]</ref><ref type="bibr">[33]</ref><ref type="bibr">[34]</ref> were proposed for the wind generation forecast, e.g., artificial neural networks (ANN) <ref type="bibr">[17]</ref>, a wavelet neural network (WNN) <ref type="bibr">[19]</ref>, an adaptive neuro-fuzzy neural network (ANFIS) <ref type="bibr">[18]</ref>, the long short-term memory (LSTM) model <ref type="bibr">[25]</ref>, a convolutional neural network (CNN) <ref type="bibr">[21,</ref><ref type="bibr">22,</ref><ref type="bibr">26]</ref>, radial neural networks <ref type="bibr">[23]</ref>, a fuzzy wavelet neural network <ref type="bibr">[24]</ref>, a deep echo state network <ref type="bibr">[20]</ref>, a genetic LSTM <ref type="bibr">[27]</ref>, a K-shape-and K-means-guided deep convolutional recurrent network <ref type="bibr">[28]</ref>, a dynamic elastic NET (DELNET) <ref type="bibr">[29]</ref>, an attention temporal convolutional network (ATCN) <ref type="bibr">[30]</ref>, a spatio-temporal correlation model (STCM) based on convolutional neural networks long short-term memory (CNN-LSTM) <ref type="bibr">[32]</ref>, the extended deep sequence-to-sequence long shortterm memory regression (STSR-LSTM) <ref type="bibr">[33]</ref>, a hybrid model with attention mechanism and complete ensemble empirical mode decomposition (CEEMDAN) <ref type="bibr">[34]</ref>, etc.</p><p>Although neural network (NN)-based methods may enhance the forecast accuracy to a certain degree, the existing NN-based approaches may have poor performance during ramp events, simply because the ramp and non-ramp events are not separated when training the NNs. It has been shown that NNs may perform poorly if extreme (or ramp) events are overlooked <ref type="bibr">[35]</ref>. Previous studies <ref type="bibr">[36,</ref><ref type="bibr">37]</ref> have revealed <ref type="bibr">(1)</ref> the non-stationary and seasonal dynamics of wind farm generation and (2) the heterogeneous dynamics of non-ramp and ramp events. Moreover, as different classes of wind turbines are deployed in wind farms, we observe that the dynamics of the wind generation of different classes of wind turbines can be different (see <ref type="bibr">Section 2)</ref>. Thus, employing NNs without considering these distinct features of wind farm generation means wind farm generation cannot be accurately forecast, especially for ramp events (see Section 4). In the previous work, seasonal self-evolving neural networks <ref type="bibr">[38]</ref> are built for different seasons and ramps are defined using fixed thresholds. However, it is observed that the dynamics of wind ramps may change within each season, and due to the time-varying dynamics of wind ramps, it is challenging to use fixed thresholds to accurately capture the dynamics of the wind ramps. To address this challenge, this paper proposes a dynamic threshold-based approach that can adapt to the time-varying dynamics of wind ramps.</p><p>Specifically, we propose Drifting Streaming Peaks-over-Threshold (DSPOT)-enhanced self-evolving neural networks that account for the time-varying dynamics of different wind turbines' power outputs during non-ramp and ramp events in order to achieve a better wind farm generation prediction. First, the proposed DSPOT approach leverages dynamic ramp thresholds to classify the wind generation data of each class of wind turbines into ramp and non-ramp datasets, which can account for the time-varying dynamics of the ramp and non-ramp events across different classes of wind turbines. Then, different NNs are trained for each dataset to learn the heterogeneous dynamics of the different classes of wind turbines' generation, in which the NeuroEvolution of Augmenting Topologies <ref type="bibr">[39]</ref> is adopted to evolve the NNs in order to obtain the best network topology and weighting parameters. As the efficacy of NNs depends on the quality of the training datasets (i.e., the classification accuracy of the ramp and non-ramp events), a Bayesian optimization-based approach is developed to optimize the parameters of DSPOT to enhance the quality of the training datasets and the corresponding performance of the NNs. Ultimately, the proposed DSPOT-enhanced self-evolving neural networks (see Figure <ref type="figure">1</ref>) form a closed loop for optimizing the performance of the wind generation forecast purely based on the data.  Real-world wind farm generation measurements often exhibit distinct features, such as the non-stationarity and the heterogeneous dynamics for ramp and non-ramp events across different classes of wind turbines. Employing existing machine learning approaches without considering these features means wind farm generation cannot be accurately forecast, especially for ramp events. The contributions of this paper can be summarized as follows:</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>&#8226;</head><p>We propose a Drifting Streaming Peaks-over-Threshold (DSPOT)-enhanced self-evolving neural networks-based short-term wind farm generation forecast, which is adaptive machine learning for wind farm generation forecasting. The proposed framework addresses the challenges of the non-stationarity and the ramp dynamics of wind farm generation and can greatly facilitate the integration of wind generation in the real world.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>&#8226;</head><p>The proposed method first classifies the wind farm generation data into the ramp and non-ramp datasets, where time-varying dynamics are captured by utilizing an adaptive thresholding framework to separate the ramp and non-ramp events, based on which different neural networks are trained to learn the dynamics of wind farm generation.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>&#8226;</head><p>As the efficacy of the neural networks relies on the quality of the training datasets (i.e., the classification accuracy of the ramp and non-ramp events), a Bayesian optimizationbased approach is developed to optimize the parameters of the DSPOT algorithm to enhance the quality of the training datasets and the corresponding performance of the neural networks, which enables the model parameters to be adjusted automatically.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>&#8226;</head><p>The experimental results show that compared with other forecast approaches, the proposed forecast approach can substantially improve the forecast accuracy, especially for ramp events.</p><p>The remaining parts of this paper are organized as follows. Section 2 elaborates the distinct features of wind farm generation. Section 3 introduces the proposed wind farm generation forecast approach. Section 4 validates the performance of the proposed approach by using the real wind farm generation data. Section 5 summarizes the paper.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.">Data Description and Key Observations</head><p>This paper uses the same real wind generation data from a large wind farm as our previous works <ref type="bibr">[36]</ref><ref type="bibr">[37]</ref><ref type="bibr">[38]</ref><ref type="bibr">40]</ref>. The wind farm has a rated capacity of 300.5 MW, where two classes of wind turbines are installed: Mitsubishi and GE turbines. There are 221 Mitsubishi turbines with a rated capacity of 1MW and 53 GE turbines with a rated capacity of 1.5 MW (see in Figure <ref type="figure">2</ref>). Each class of wind turbines has distinct power curves as well as a cut-in and cut-off speed. For each class, a meteorological tower (MET), collocated with a wind turbine, is deployed to collect weather information. The instantaneous power outputs of each turbine together with the weather information are saved every 10 min for the years 2009 and 2010. In this paper, we use the power outputs of the Mitsubishi turbines P mit (t) and GE turbines P ge (t), the wind speed W s (t), and the wind direction W dir (t) to develop the proposed NNs.</p><p>From the measurements of the power outputs, we find 1) the non-stationarity of the power measurements and 2) the heterogeneous dynamics of the wind non-ramp and ramp events across each class of turbines as illustrated in Figure <ref type="figure">3</ref>, where the cumulative distribution functions (CDFs) of the wind power measurements of two classes of turbines over different seasons of a year and different ramp events are presented. In addition, it is shown in Figure <ref type="figure">4</ref> that the distributions of the ramps in different time windows l and different time periods are different and follow the generalized Pareto distribution (GPD).  In the previous work <ref type="bibr">[38]</ref>, the non-stationarity is considered by developing seasonal self-evolving neural networks, where the ramp events are defined using fixed thresholds. As observed from Figure <ref type="figure">4</ref>, fixed thresholds cannot fully capture the dynamics of wind ramp events. To address this challenge, we redefine the ramps by using dynamic thresholds, which change over time based on the dynamics of the ramp events, in order to reduce the forecast error of the wind farm generation, especially for ramp events.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.">DSPOT-Enhanced Self-Evolving Neural Networks</head><p>Motivated by the observations in Section 2, we seek to design a short-term forecast of a wind farm generation method that accounts for not only the heterogeneous dynamics of each class of wind turbines but also the time-varying dynamics of ramp and non-ramp events. Inspired by the success of artificial intelligence (AI) in a wide range of fields, our goal is to use neural networks (NNs) to learn these different dynamics of power outputs. Although there are several attempts along this line (e.g., ANNs <ref type="bibr">[41]</ref> and LSTM <ref type="bibr">[42]</ref>), these approaches use a single model and overlook the extreme ramp events, which leads to a poor forecast performance, especially for ramp events. Additionally, to train good NNs, it is critical to have high-quality training datasets (i.e., the ramp and non-ramp datasets should be well separated), which is a challenging task due to the time-varying dynamics of ramp and non-ramp events. Further, when training NNs, it is challenging to find the optimal topology as well as the hyperparameters of NNs.</p><p>To tackle these challenges, we propose DSPOT-enhanced self-evolving neural networks, namely the DSN, for the short-term wind farm generation forecast. The idea is to (1) first classify non-ramp and ramp events using DSPOT, which uses dynamic ramp thresholds to account for the time-varying dynamics of non-ramp and ramp events, and (2) then train different NNs for each dataset to learn the heterogeneous generation dynamics of the different classes of wind turbines, where these NNs can self-evolve based on the data, in order to account for the non-stationarity and reduce the overhead of tuning the topology and hyperparameters of NNs.</p><p>The design of our model is illustrated in Figure <ref type="figure">1</ref>. The historical data are first classified into non-ramp, ramp-up, and ramp-down datasets by DSPOT, in which dynamic thresholds are determined based on recent observations in a moving window with size d, in order to appropriately define ramp and non-ramp events over time.</p><p>Then, we use NeuroEvolution of Augmenting Topologies <ref type="bibr">[39]</ref> to train NNs using the classified datasets, in which the NNs evolve based on a genetic algorithm to obtain the best topology and hyperparameters of NNs. As a result, 6 NNs, i.e., 3 for Mitsubishi and 3 for GE, are built (see Figure <ref type="figure">1</ref>). As the efficacy of NNs relies on the quality of training datasets, i.e., how good different ramp events are labeled, a Bayesian optimization-based method is proposed to optimize the parameters of DSPOT to enhance the quality of the training datasets and the corresponding performance of the NNs. Ultimately, the proposed DSPOT-enhanced self-evolving neural networks form a closed loop for optimizing the performance of wind farm generation forecast purely based on the data. In what follows, the design of each component of the model is described in detail.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.1.">DSPOT-Based Ramp Classifier</head><p>Based on extreme value theory, it is likely that extreme events follow a generalized Pareto distribution (GPD) <ref type="bibr">[43]</ref>, which is observed in wind power ramps in Figure <ref type="figure">4</ref>. Thus motivated, we will develop a data-fitting technique using the GPD model to determine the dynamic threshold z q cat (t) for different ramp events, where the index cat &#8712; {up, down} denotes the category of ramp events and q cat is the quantile of the corresponding ramp event distribution used to determine the threshold z q cat (t). The idea is to first estimate the parameters of the GPD and then use the estimated GPD to find z q cat (t) based on the quantile q cat . To account for the time-varying dynamics of ramp events, the parameters of the GPD will be updated using the recent observed wind power in a moving window with size d.</p><p>Specifically, let P class (t) denote the wind power output at time t, where the index class &#8712; {GE, Mitsubishi} represents the class of wind turbines. In a specified time period l, ramp-up and ramp-down events can be separately expressed as:</p><p>where l and q cat are parameters to be tuned by BO (see Section 3.3) to determine the ramp events. Based on the above definitions of ramp events, we classify the original dataset into ramp-up, ramp-down, and non-ramp datasets, i.e., 3 different datasets for each class of wind turbine. Let X class i , i &#8712; {up, down, non} denote these 3 datasets, where X class up denotes the ramp-up dataset, X class down the ramp-down dataset, and X class non the non-ramp dataset. These datasets will be used to train NNs in Section 3.2. Clearly, the quality of these datasets (i.e., how well different ramp events can be separated) depends on the values of z q up (t) and z q down (t). In this section, we determine z q up (t) and z q down (t) using the GPD model. For ease of presentation, we present how to calculate the dynamic threshold z q (t) for ramp-up events by omitting the index cat in the following. Correspondingly, the dynamic threshold for ramp-down events can be determined using the same procedure.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.1.1.">Calculating z q (t)</head><p>We derive the log-likelihood of the GPD using the recent observations {&#8710;P l class (t)} d in a moving window with size d:</p><p>where &#947; and &#958; are the parameters of the GPD (&#947; = 0). To estimate the parameters of the GPD, we find a solution (&#947; * , &#958; * ) of L by solving the following two equations:</p><p>Grimshaw <ref type="bibr">[43]</ref> has shown that if a solution (&#947; * , &#958; * ) is obtained in this equation, the argument &#946; * = &#947; * /&#958; * is the solution to the scalar equation u(&#946;)v(&#946;) = 1, where</p><p>Here, a set Y q = {Y i } is defined for a given quantile q, i.e., Prob(&#8710;P l class (i) &gt; P th q ) = q, where P th q &gt; 0 is the threshold associated with the quantile q. Y q contains all &#8710;P l class (i) larger than P th q with Y i = &#8710;P l class (i) -P th q &gt; 0. |Y q | denotes the cardinality of Y q . Based on Grimshaw trick <ref type="bibr">[43]</ref>, &#958; * and &#947; * can be obtained using &#946; * by</p><p>As there are multiple possible solutions of &#946; * , we need to find all the solutions in order to best estimate the GPD parameters (&#947;, &#958;) to fit the distribution of ramp events. It is noted that 1 + &#946;Y i must be strictly positive. As Y i is positive, we have &#946; * &#8712; (-1 Y max , +&#8734;). Grimshaw also shows an upper-bound &#946; * max :</p><p>where &#562;, Y max , and Y min are the average amount, the maximum amount, and the minimum amount of Y q , respectively. Therefore, we can perform a numerical root search and find all possible solutions in (-1 Y max , &#946; * max ), in which we choose the solution that maximizes the likelihood L.</p><p>Based on the estimated GPD, we can calculate z q (t) by solving the probability: Prob(&#8710;P l class (i) &gt; z q (t)). Based on <ref type="bibr">[44]</ref>, we leverage the probability of the exceedances of &#8710;P l class (i) over the threshold P th q , Prob{&#8710;P l class (i) &gt; z q (t)|&#8710;P l class (i)</p><p>As Prob(&#8710;P l class (i) &gt; P th q ) = q, we can solve Prob(&#8710;P l class (i) &gt; z q (t)) = q(1 + &#947;( z q (t)-P th q &#958;</p><p>)) -1 &#947; <ref type="bibr">(11)</ref> based on Bayesian theorem. Using <ref type="bibr">(11)</ref>, we can obtain z q (t) by</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.1.2.">DSPOT Algorithm</head><p>Given a quantile q cat , the DSPOT algorithm determines the dynamic threshold z q cat (t) using the recent observations. Based on z q cat (t), wind generation difference &#8710;P l class (t) will be labeled into ramp-up, ramp-down or non-ramp events, and the wind power of recent measurement P class (t) will be added into the corresponding dataset X class i . The details of the DSPOT algorithm are provided in Algorithm 1.</p><p>Specifically, Algorithm 1 will first initialize the thresholds z q up (t) and z q down (t) using the first d + l wind power measurements. Then, Algorithm 1 will update z q up (t) and z q down (t) using the new wind power measurement in the moving window with size d in an online manner, based on which the new wind power measurement will be added into the corresponding dataset X class i . Algorithm 1 will be run for wind power measurements of each class of wind turbines.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Algorithm 1 DSPOT</head><p>Input: {P class (t)}, d, l, q up , and q down . Output: X class up , X class down , and X class non .</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Initialization:</head><p>(1) Calculate initial thresholds z q up , z q down based on Section 3.1.1 using {P class (t)|t = 1, . . . , d + l}.</p><p>(2) Initialize X class up , X class down , and X class non based on z q up and z q down . End Initialization For every t &gt; d + l in {P class (t)} (1) Update z q up (t) and z q down (t) based on Section 3.1.1 using the recent observations {&#8710;P l class (t)} d . (2) Classify &#8710;P l class (t) based on z q up (t) and z q down (t), and add P class (t) into the corresponding dataset X class i .</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.2.">Self-Evolving Neural Network</head><p>A self-evolving neural network (SEN) will be built for each dataset, X class up , X class down , and X class non . When training the neural networks (NNs), each element P class (t + 1) in X class i is treated as the label and the corresponding features contain the wind speed W s (t), the change in wind direction degree W dir (t), and current power measurements {P class (t), P class (t -1), . . . , P class (t -Lag)}, where Lag depends on the measurements (see the discussion in Section 4.1). As demonstrated in Figure <ref type="figure">5</ref>, NEAT <ref type="bibr">[39]</ref> is used to train an NN. NEAT leverages a genetic algorithm (GA) to evolve the NN. It obtains the best network topology and the best weighting parameters by minimizing the forecast error, i.e., min &#8721; t ( Pclass (t) -P class (t)) 2 , where Pclass (t) denotes the forecast from the NN.</p><p>As demonstrated in Figure <ref type="figure">5</ref>, the workflow of NEAT contains random population generation, crossover, mutation, speciation, and evaluation by the fitness function. In this paper, the fitness function is defined using the forecast accuracy: Each gene in the population set corresponds to a neural network. We aim to find the best gene with the largest fitness value (i.e., the lowest prediction error). In NEAT, the topology of an NN is directly encoded into the gene by a direct encoding scheme <ref type="bibr">[45]</ref> in order to avoid Permutations Problem <ref type="bibr">[46]</ref> and Competing Conventions Problem <ref type="bibr">[47]</ref>. Specifically, connection and node (list of inputs, hidden nodes, and outputs) are encoded. Every unit of connection gene describes the connection weight (W), output node (O), input node (I), enable gate (E), and the number of innovation (N) that corresponds to a consecutive arrangement of new generated node. The workflow of NEAT will be elaborated in the following.</p><p>First, initial population (i.e., a set of genes) is generated randomly. Each gene represents an NN. Note that under this random generation, a neural network might contain no route from inputs to outputs, and we will remove these NNs from the initial population. For example, Figure <ref type="figure">6</ref> shows an NN containing 3 inputs (P class (t), W s (t), W dir (t)) and 1 output ( Pclass (t + 1)), where in the first unit of connect gene, I:1 O:5 W:0.5 indicates connection from Node 1 to Node 5 with weight of 0.5, and E:1 means that this is an enabled connection. After generating the initial population, NEAT iteratively optimizes the topology and connection weights of NNs using crossover and mutation. Specifically, nodes and connections of NNs are inserted or removed randomly based on the Poisson distribution <ref type="bibr">[39]</ref>. For example, Figures <ref type="figure">7</ref> and<ref type="figure">8</ref> show possible mutations by appending a connection and a node to a neural network, respectively. After crossover and mutation, topologically homogeneous genes are classified as one speciation determined by compatibility distance <ref type="bibr">[39]</ref>.</p><p>Then, the fitness of species will be evaluated. If the highest fitness of species does not increase or the number of generations is achieved, NEAT will output the species with high fitness value, which will be used for wind generation forecast.   </p><p>where F class </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.4.">Short-Term Wind Farm Generation Forecast</head><p>The proposed DSPOT-enhanced self-evolving neural networks (DSN) will train multiple NNs, which capture different dynamics of wind farm generation. When forecasting wind farm generation, we will first leverage the DSPOT-based ramp classifier to determine whether the current state of wind farm generation is in ramp up, ramp down, or non-ramp. Based on the classified state, we choose the corresponding NNs to forecast the wind farm generation.</p><p>Specifically </p><p>Based on the results of the ramp classifier, we pick the corresponding NNs (i.e., the best gene) for each class of wind turbines. Therefore, the wind farm generation forecast Pag (t + 1) can be achieved by: Pag (t + 1) = Pmit (t + 1) + Pge (t + 1). ( <ref type="formula">16</ref>)</p><p>Equation ( <ref type="formula">16</ref>) is the point forecast of wind farm generation. Distributional forecasts are often needed to manage the uncertainty <ref type="bibr">[49]</ref>. To this end, we leverage the collection of genes generated in NEAT and use the forecasts by these genes to develop distributional forecasts. Let { P(j) ag (t)} represent the set of forecasts offered by each gene j. It is assumed that the forecast error of the point forecasts follows the standard normal distribution with the mean &#181; t and the variance &#963; 2 t as follows:</p><p>where J is the number of genes. Under such assumption, we calculate the (1&#945;) confidence interval of the point forecasts ( <ref type="formula">16</ref>) as follows:</p><p>where Z(1 -&#945; 2 ) represents the point where the cumulative distribution function of the standard normal distribution is equivalent to 1 -&#945; 2 .</p><p>Remark 1. The proposed SENs can be trained offline. As the learning process of each SEN is based on different datasets, we can train these SENs on parallel. This can significantly reduce the training time of these SENs. Furthermore, the learning of SENs needs no AI experts to manually tune the topology and the hyperparameters; SENs can automatically adapt to the changing dynamics of wind farm generation purely based on the data. This can greatly facilitate the implementation of the proposed method in reality. The data used in case studies are described in Section 2. Specifically, we use the data of year 2009 to train the proposed SENs and the data of year 2010 to validate the forecast performance of the proposed approach.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.1.2.">Evaluation Metrics</head><p>Mean absolute error (MAE) and root mean square error (RMSE) are employed to evaluate the forecast performance, i.e.,</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>MAE =</head><p>1</p><p>where N t is the number of data points in the test dataset.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.1.3.">Parameter Tuning</head><p>As discussed in Section 3, the forecast performance of NNs greatly depends on the quality of training datasets, which hinges on the parameters (l, d, q up , q down ) and Lag. To find the best (l, d, q up , q down ), Algorithm 2 is run with 200 attempts.</p><p>To optimize Lag, we evaluate MAE under different values of Lag (see Figure <ref type="figure">9</ref>) and pick the one with the lowest MAE. It is observed that the lowest MAE is achieved when the feature dimension is 9 (i.e., Lag = 7). The seasonal self-evolving neural networks (SSEN) model <ref type="bibr">[38]</ref>.</p><p>The seasonal NEAT model considers four seasons, but it does not split ramp and non-ramp events in the training process, which would lead to a poor performance when ramp events occur. We use a prevailing structure of three layers to build the LSTM with the same configuration in <ref type="bibr">[38]</ref>. The fully connected ANN is used, which includes three layers, and each layer contains 30 nodes.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.2.">Experimental Results</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.2.1.">10 min ahead Forecast</head><p>In Tables <ref type="table">1</ref> and<ref type="table">2</ref>, we compare the 10 min ahead forecast under different models for the whole year data and ramp events in the year 2010, respectively. The forecast results in terms of the MAE and RMSE are normalized using the nominal capacity of 300.5MW of the wind farm. From Tables <ref type="table">1</ref> and<ref type="table">2</ref>, we observe that the proposed approach (DSN) outperforms the benchmarks. Compared with the non-NN-based benchmarks (the AR, MC, and SVM-MC), the proposed approach improves the MAE at least 24.9% for the whole year data and at least 13.8% for the ramp events, respectively. Compared with the NN-based benchmarks, the improvement in the proposed approach (DSN) in terms of the MAE is at least 2.5% for the whole year and at least 1.3% for the ramp events. Such improvements are because of the splitting of the non-ramp and ramp events, which enables the DSN to more effectively learn the different dynamics of the GE and Mitsubishi turbines measurements under non-ramp and ramp events.</p><p>Figures 10-12 illustrate the prediction intervals for the three representative ramp events. The first chosen event is 5 January 2010 because there is a wind power ramp-up event from 4 a.m. to 5 a.m. with a ramp-up rate of 85 Megawatts per hour (MW/H). The second chosen event is 19 March 2010 because of the significant wind power fluctuation from 7 p.m. to 9 p.m. with both ramp-up and ramp-down events of an average ramp rate around 100 MW/H. The final chosen event is 9 October 2010 because of a remarkable ramp-down event from 3 a.m. to 5 a.m. with an average ramp rate of 66.5 MW/H. As demonstrated in those pictures, the actual wind farm generation is mostly confined in the prediction interval achieved from <ref type="bibr">(19)</ref>, regardless of the sharp ramps.  </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.2.2.">Other Forecasting Horizons</head><p>In Tables <ref type="table">3</ref> and<ref type="table">4</ref>, we compare the forecast of different models under different horizons using the whole year data and ramp events in the year 2010, respectively. From Tables <ref type="table">3</ref> and<ref type="table">4</ref>, we observe that the proposed approach outstrips the benchmarks under these forecasting horizons. It is observed in most cases that seasonal NEAT performs worse than NEAT (trained by using the entire year data). It is because the amount of data in a season is not enough for training a good NN compared to the entire year data.</p><p>For the 30 min ahead forecast, compared with the non-NN-based benchmarks (the AR, MC, and SVM-MC), the proposed approach improves the MAE at least 20.6% for the whole year data and at least 17.7% for the ramp events, respectively. Compared with the NN-based benchmarks (NEAT, SNEAT, LSTM, ANN, and SSEN), the enhancement of the proposed approach by the MAE is no less than 19.4% for the whole year data and at least 22.8% for the ramp events. For the 40 min ahead forecast, compared with the non-NN-based benchmarks (the AR, MC, and SVM-MC), the proposed approach improves the MAE at least 32.2% for the whole year data and at least 33.5% for the ramp events, respectively. Compared with the NN-based benchmarks (NEAT, SNEAT, LSTM, ANN, and SSEN), the enhancement of the proposed approach by the MAE is no less than 27.8% for the whole year data and at least 31.6% for the ramp events.</p><p>For the 50 min ahead forecast, compared with the non-NN-based benchmarks (the AR, MC, and SVM-MC), the proposed approach improves the MAE at least 37.1% for the whole year data and at least 34% for the ramp events, respectively. Compared with the NN-based benchmarks (NEAT, SNEAT, LSTM, ANN, and SSEN), the enhancement of the proposed approach by the MAE is no less than 31.6% for the whole year data and at least 27.7% for the ramp events.</p><p>For the 60 min ahead forecast, compared with the non-NN-based benchmarks (the AR, MC, and SVM-MC), the proposed approach improves the MAE at least 41.9% for the whole year data and at least 44.1% for the ramp events, respectively. Compared with the NN-based benchmarks (NEAT, SNEAT, LSTM, ANN, and SSEN), the enhancement of the proposed approach by the MAE is no less than 33.6% for the whole year data and at least 37% for the ramp events.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.2.3.">Distributional Forecast</head><p>The continuous rank probability score (CRPS) is used to evaluate the performance of the proposed distributional forecasts. The CRPS is defined as:</p><p>where Ft (x) is the cumulative density function (cdf) obtained by using the distributional forecast. In addition, U(.) is a unit step function that equals to 1 if x &gt; P ag (t) and 0 otherwise. Generally, the lower the CRPS, the more the distributional forecast is. In Tables <ref type="table">5</ref> and<ref type="table">6</ref>, we compare the forecast of different NN-based models under different horizons using the whole year data and ramp events in the year 2010, respectively. The results of the non-NN models can be found in <ref type="bibr">[37]</ref>. We observe that our model performs much better than other benchmarks for longer prediction horizons (normally longer than 30 min) where the wind ramps are large, while the performance is similar for the 10 min forecast. This indicates the superior performance of the proposed method on handling the uncertainty of the wind. For the 30 min ahead forecast, compared with the NN-based benchmarks (NEAT, SNEAT, LSTM, ANN, and SSEN), the proposed approach improves the CRPS at least 24.9% for the whole year data and at least 21.7% for the ramp events, respectively.</p><p>For the 40 min ahead forecast, compared with the NN-based benchmarks (NEAT, SNEAT, LSTM, ANN, and SSEN), the proposed approach improves the CRPS at least 26% for the whole year data and at least 30% for the ramp events, respectively.</p><p>For the 50 min ahead forecast, compared with the NN-based benchmarks (NEAT, SNEAT, LSTM, ANN, and SSEN), the proposed approach improves the CRPS at least 33.3% for the whole year data and at least 38.8% for the ramp events, respectively.</p><p>For the 60 min ahead forecast, compared with the NN-based benchmarks (NEAT, SNEAT, LSTM, ANN, and SSEN), the proposed approach improves the CRPS at least 35.8% for the whole year data and at least 35.2% for the ramp events, respectively. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.2.4.">Model Updating</head><p>The training time for the self-evolving NN depends on the number of training samples. In our case, the number of ramp-up and ramp-down events in the training datasets is less than 4000, and updating the corresponding models takes only about 3-5 min using a machine with Dual-sockets Intel(R) Xeon(R) CPU E5-2630 v4 @ 2.20GHz. The updating time is much less than the forecasting horizons, and therefore our model can work well in practice.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.2.5.">Discussions</head><p>Based on the experimental results, we observe that the NN-based models outperform the non-NN-based models. By breaking the training datasets into ramp and non-ramp training datasets for distinct classes of wind turbines, the performance of the NNs can be improved.</p><p>Further, the proposed DSPOT-based ramp classifier can better split the ramp and non-ramp events using dynamic thresholds and therefore better capture the heterogeneous dynamics of wind farm generation. Moreover, the proposed DSN can automatically adapt to the changing dynamics of wind farm generation over time, and the model updating time for the DSN is low. Specifically, the number of ramp-up and ramp-down events in the training datasets is less than 4000, and updating the corresponding models takes only about 3-5 min using a machine with Dual-sockets Intel(R) Xeon(R) CPU E5-2630 v4 @ 2.20GHz. The updating time is much less than the forecasting horizons, and therefore our model can work well in practice.</p><p>As shown in the experiments, the performance improvement for the point forecast and the distributional forecast is smaller in the 10 min horizon while the improvement is higher in the 60 min horizon. This might be due to the fact that some baseline models (e.g., the AR, SNEAT, LSTM, and ANN) are not considering ramp events which leads to a deeper forecast degeneration with a longer prediction horizon. Although the SSEN in our previous work <ref type="bibr">[38]</ref> considered the ramp events, it leverages a fixed threshold to distinguish the ramp events. Because our proposed framework adjusts the ramp thresholds dynamically, the accuracy results are superior compared to the existing benchmarks.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.">Conclusions</head><p>We develop the DSPOT-enhanced self-evolving neural networks for the short-term wind power forecast. Specifically, the proposed approach initially classifies the wind farm generation data into ramp and non-ramp datasets using DSPOT, which leverages the dynamic ramp thresholds to account for the time-varying dynamics of the ramp and non-ramp events. We then train different NNs based on each dataset to learn the different dynamics of wind farm generation by NEAT, which are able to obtain the best network topology and weighting parameters. As the efficacy of the neural networks relies on the quality of the training datasets (i.e., the classification accuracy of the ramp and non-ramp events), a Bayesian optimization-based approach is developed to optimize the parameters of DSPOT to enhance the quality of the training datasets and the corresponding performance of the neural networks. The experimental results show that the proposed approach outperforms other forecast approaches.</p><p>In the future work, we plan to leverage the generative adversarial networks (GAN)based models to better classify the ramp events which in turn would improve the quality of the training datasets.</p></div></body>
		</text>
</TEI>
