A combined model for shortterm wind speed forecasting based on empirical mode decomposition, feature selection, support vector regression and crossvalidated lasso
 Published
 Accepted
 Received
 Academic Editor
 Zhiwei Gao
 Subject Areas
 Data Mining and Machine Learning, Data Science
 Keywords
 Wind speed forecasting, Empirical mode decomposition, Feature selection, Support vector regression, Crossvalidated lasso, Multistep wind speed forecasting
 Copyright
 © 2021 Wang
 Licence
 This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ Computer Science) and either DOI or URL of the article must be cited.
 Cite this article
 2021. A combined model for shortterm wind speed forecasting based on empirical mode decomposition, feature selection, support vector regression and crossvalidated lasso. PeerJ Computer Science 7:e732 https://doi.org/10.7717/peerjcs.732
Abstract
Background
The planning and control of wind power production rely heavily on shortterm wind speed forecasting. Due to the nonlinearity and nonstationarity of wind, it is difficult to carry out accurate modeling and prediction through traditional wind speed forecasting models.
Methods
In the paper, we combine empirical mode decomposition (EMD), feature selection (FS), support vector regression (SVR) and crossvalidated lasso (LassoCV) to develop a new wind speed forecasting model, aiming to improve the prediction performance of wind speed. EMD is used to extract the intrinsic mode functions (IMFs) from the original wind speed time series to eliminate the nonstationarity in the time series. FS and SVR are combined to predict the highfrequency IMF obtained by EMD. LassoCV is used to complete the prediction of lowfrequency IMF and trend.
Results
Data collected from two wind stations in Michigan, USA are adopted to test the proposed combined model. Experimental results show that in multistep wind speed forecasting, compared with the classic individual and traditional EMDbased combined models, the proposed model has better prediction performance.
Conclusions
Through the proposed combined model, the wind speed forecast can be effectively improved.
Introduction
As a sustainable and renewable energy alternative to traditional fossil fuels, wind power has attracted widespread attention and rapid development in recent years (Hu et al , 2018). According to the statistical report of the Global Wind Energy Council, the world capacity is about 650.8 GW (Fu et al., 2020), of which the installed capacity in 2019 is 59.7 GW (Global Wind Energy Council, 2020). However, with the increase of gridconnected wind power, the stability of the power system will be challenged (Liu et al., 2018a). This is because wind power is closely related to the nonstationarity of wind speed. Accurate wind speed forecasting will provide support for wind power planning and control, and even help reduce the impact of unexpected events on the stability of the power system (Liu et al., 2018b). But due to the nonlinearity and nonstationarity of wind, it is difficult to establish a satisfactory wind speed forecasting model. To this end, researchers have made great efforts to improve forecasting performance from different aspects, including basic predictive models, preprocessing methods, and combined or hybrid strategies.
For basic predictive models, a variety of methods has been presented, mainly including physical models, statistical models, and machine learning. The physical model usually uses physical parameters such as temperature and pressure to predict wind speed (Heng et al., 2016). Numerical Weather Prediction (NWP) is one of the representative technologies. However, due to the weak correlation between physical parameters and shortterm wind speed, this type of model can only be used for medium and longterm wind speed forecasting, not for shortterm wind speed forecasting. In the shortterm wind speed forecasting, the wind speed is generally predicted by analyzing the inherent laws of historical wind speed data (Chen et al., 2018; Liu et al., 2018b).
The statistical model is a method widely used in shortterm wind speed forecasting, which uses historical data to predict wind speed. Commonly used statistical models have autoregressive (AR) (Lydia et al., 2016a), autoregressive moving average (ARMA) (Torres et al., 2005) and autoregressive integrated moving average (ARIMA) (Wang & Hu, 2015). Kavasseri & Seetharaman (2009) proposed an fARIMA model for wind speed forecasting, and claimed that compared with the persistence model, their model has significantly improved the prediction accuracy. Ait Maatallah et al. (2015) developed a Hammerstein autoregressive model to predict wind speed, and verified that their model has a better root mean square error (RMSE) than ARIMA and ANN. Poggi et al. (2003) developed a model to predict wind speeds of three Mediterranean sites in Corsica based on AR, and proved that the synthetic time series can retain the statistical characteristics of wind speeds. Also, Lydia et al. (2016b) presented a shortterm wind speed forecasting model by combining linear AR and nonlinear AR. In general, the statistical model is based on the linear assumption of data, while the wind speed series have nonlinear characteristics, which makes those methods unable to effectively deal with the nonlinear characteristics of wind.
To solve the problem, machine learning is introduced by researchers to predict wind speed. Normally, machine learning is used as a predictive model or parameter optimization, mainly includes the evolutionary algorithm, extreme learning machine (ELM) algorithm, ANN algorithm and SVM algorithm. Wang (2017) presented a wind speed forecasting model by combining SVM and particle swarm optimization (PSO). Zhang et al. (2019) combined online sequential outlier robust ELM with hybrid mode decomposition (HMD) to predict wind speed. Wang, Li & Bai (2018) developed an error correctionbased ELM model for shortterm wind speed forecasting. Liu et al. (2020) introduced the JayaSVM (Jaya algorithmbased support vector machine) into wind speed forecasting. Krishnaveny et al. (Nair, Vanitha & Jisma, 2017) exploited the performance of three different models, i.e., ANN, ARIMA and hybrid model, in wind speed forecasting. Azeem et al. (2018) investigated the KNNbased and ANNbased models for wind speed forecasting. Recently, deep learning, a new branch of machine learning, has received extensive attention. It has been widely used for regression and classification problems. According to the literature, deep learning can abstract the hidden structure and inherent characteristics of data compared with shallow methods. Khodayar & Wang (2019) introduced a scalable graph convolutional deep learning (GCDLA) for wind speed forecasting. Wang et al. (2016a) investigated a deep belief network model for wind speed forecasting. Khodayar & Wang (2019) combined rough set theory and restricted Boltzmann machines presented a wind speed forecasting. Hong & Satriani (2020) based on a convolutional neural network developed a dayahead wind speed forecasting model. Although researchers claim that deep learning can achieve better performance, these methods are computationally intensive and prone to overfitting on small data sets.
In addition to these basic forecasting models, preprocessing methods such as feature selection (FS) are also introduced in wind speed forecasting. This is because in shortterm wind speed forecasting, the lag of historical wind speed is usually used as the feature, which may lead to a certain degree of redundancy. FS is used to select the best input for the basic predictive model, so that the model can obtain better generalization performance (Li et al., 2018a). For example: Paramasivan & Lopez (2016) employed a ReliefF feature selection algorithm to identify key features, and then used a bagging neural network to predict the wind speed. Niu et al. (2018) presented a multistep wind speed forecasting model using optimal FS, modified bat algorithm and cognition strategy. Botha & Walt (2017) combined FS with SVM to predict shortterm wind speed. Kong et al. (2015) combined feature selection and reduced support vector machines (RSVM) for wind speed forecasting.
Due to the unstable nature of wind, the model of combined or hybridsignal processing technology has become the mainstream of wind speed forecasting. Wherein the signal processing technology is usually employed to decompose the wind speed to reduce or eliminate the instability. Commonly used signal processing techniques have empirical mode decomposition (EMD), variational mode decomposition (VMD) and wavelet transform (WT). Wang et al. (2016b) decomposed wind speed into stable signals using ensemble empirical mode decomposition (EEMD). Sun & Wang (2018) developed a fast ensemble empirical mode decomposition model to improve the accuracy of wind speed forecasting. Tascikaraoglu et al. (2016) based on WT proposed a wind speed forecasting model. Hu & Wang (2015) adopted an empirical wavelet transform (EWT) to extract key information in wind speed time series. Yu, Li & Zhang (2017) explored the performance of EMD, EEMD and complete ensemble empirical mode decomposition with adaptive noise (CEEMDAN) in wind speed forecasting.
In the field of wind speed forecasting, there are mainly three forecast scenarios: shortterm forecasting, mediumterm forecasting and longterm forecasting. Among them, shortterm wind speed forecasting is essential for estimating power generation, and it is difficult to predict accurately due to the nonlinearity and instability of wind speed. Therefore, in the study, we tried to develop a new model to forecast shortterm wind speed. The originality of this model is to propose a combined model of EMD, FS, SVR and Crossvalidated Lasso (LassoCV) for multistep wind speed forecasting. The framework of our study is as follows: (a) EMD is used to extract the intrinsic mode functions (IMFs) from the original wind speed time series; (b) FS and SVR are combined to predict highfrequency IMF; (c) LassoCV is used to complete the prediction of lowfrequency IMF and trend.
The main contributions of the research are as follows:

A novel model based on EMD, FS, SVR and LassoCV is proposed to improve the accuracy of multistep wind speed forecasting, where EMD is used to extract IMFs from the original wind speed data to reduce the nonstationarity of wind speed.

Based on the principle of EMD, the first IMF component decomposed by EMD contains most of the highfrequency information, and an algorithm with good generalization performance is usually required for prediction. We combine FS and SVR to predict the highfrequency IMF (i.e., the first IMF) component.

Compared with the first IMF component, the frequency of the other IMF components decomposed by EMD is much lower and presents a Sinlike curve. Linear regression usually gets better performance. We introduce LassoCV to complete the prediction of lowfrequency IMFs and trend.
The paper is as follows: The framework of the proposed model and the principles involved are introduced in ‘Methods’. ‘Results’ describes the experimental data used in the paper, and the comparison with the classic individual models. ‘Discussion’ discusses the effectiveness of EMD. ‘Conclusion’ concludes the study.
Methods
The whole process of the proposed model
The architecture of our proposed model is shown in Fig. 1. The whole process is as follows:

Use EMD to decompose wind speed into a series of IMFs. EMD algorithm is introduced in ‘Empirical model decomposition’

Combine FS and SVR to predict the highfrequency IMF obtained by EMD. FS and SVR algorithms are provided in ‘Feature selection’ and ‘Support vector regression’, respectively.

Use LassoCV to complete the prediction of the lowfrequency IMF and trend. LassoCV algorithm is listed in ‘Crossvalidated lasso’.

Performance evaluation. The performance indicators are introduced in ‘Prediction performance criteria’, and the experimental results and analysis are given in ‘Results’ and ‘Discussion’.
Empirical model decomposition
Due to the nonstationarity, intermittent and inherent nature of wind speed, it is difficult to directly predict the future wind speed. One possible solution is to decompose different frequencies from chaotic wind data (Bokde et al., 2019) and use models to predict them separately. Based on this idea, the study introduces signal processing technology to decompose wind speed. Common signal decomposition algorithms include Wavelet transform, morphology filters, EMD and many others. Wavelet transform is not adaptive and follows the prior knowledge of its mother wavelet, so somewhat limits its ability to extract nonlinear and nonstationary components from the data. Similarly, the morphology filters have to select the shape and the length of the structural element. There is no uniform standard and depends on human experience, whereas EMD has received great attention from researchers because of its superior performance and easytounderstand. Therefore, in this study, we used EMD for preprocessing the wind speed.
EMD is essentially a nonlinear signal analysis method that can handle nonlinear and nonstationary time series (Huang et al., 1998). EMD uses the timescale characteristics of the data to decompose the signal, and does not need to set any basis functions in advance. In theory, EMD can be applied to any type of signal. Since EMD was proposed, it has been rapidly applied to many different engineering fields such as marine and atmospheric research, seismic record analysis and mechanical fault diagnosis (Gao & Liu, 2021).
The basic idea of EMD is to decompose nonstationary time series signals into a series of IMFs along with a residue (Huang et al., 1998). The IMF should meet two principles: (1) the number of extreme and zero values must be equal or differ by at most one; (2) the average value of upper envelop and lower envelope must be zero (Ziqiang & Puthusserypady, 2007). Let $s\left(t\right),$t =1 , 2, …, l be a time series. EMD decomposition steps are as follows:
Step 1: Identify the local minima and maxima of the time series.
Step 2: Use cubic splines to interpolate local minima and maxima values to generate lower ${s}_{l}\left(t\right)$ and upper ${s}_{u}\left(t\right)$.
Step 3: Computer the average envelope of the upper and lower envelopes $m}_{t}=\frac{{s}_{u}\left(t\right)+{s}_{l}\left(t\right)}{2$
Step 4: Subtract the average envelope from the original time series $h\left(t\right)=s\left(t\right){m}_{t}$
Step 5: Check $h\left(t\right)$ if meets the two principles of IMF. If so, treat $h\left(t\right)$ as the new IMF $c\left(t\right)$ and calculate the residual signal $r\left(t\right)=s\left(t\right)h\left(t\right)$. Otherwise, replace $h\left(t\right)$ with $s\left(t\right)$, and then repeat steps 1 to 5.
Step 6: Set $r\left(t\right)$ as new $s\left(t\right)$ and repeat steps 1 to 5 until all IMFs are obtained.
Through the whole process, a set of IMFs from high to low frequency can be extracted from the time series. Therefore, the original time series can be expressed as: $s\left(t\right)=\sum _{i=1}^{n}{c}_{i}\left(t\right)+{r}_{n}\left(t\right)$ where n is the number of IMFs. ${c}_{i}\left(t\right)$ refers to the IMF, which is periodic and almost orthogonal to each other (Li et al., 2018b). ${r}_{n}\left(t\right)$ is the final residual representing the trend of $s\left(t\right)$.
Feature selection
After obtaining the IMF components of wind speed, we need to predict it. In the study, we use the observed and lag of the IMF components as the raw features, respectively forecast each IMF component, and add all the predicted IMF components to get the final wind speed. Despite, the raw features contain sufficient information for forecasting, some irrelevant or partially relevant features in the raw features may have a negative impact on the model. To avoid the impact, a common strategy is to use feature selection to remove irrelevant features. Commonly used feature selection algorithms include filter method, wrapper method, heuristic search algorithm, embedded method (Chandrashekar & Sahin, 2014). In this study, we use the filter method. In order to obtain scores of different variables, we use the univariate linear regression test to calculate the correlation between features and output (Liu et al., 2019b), which is defined as: $Co{r}_{i}=\frac{\left(X\left[:,i\right]mean\left(X\left[:,i\right]\right)\right)\mathrm{\ast}\left(ymean\left(y\right)\right)}{std\left(X\left[:,i\right]\right)\mathrm{\ast}std\left(y\right)}$ where X is an N × M matrix, each column is a feature. y is the N × 1 vector of the output we are interested in. Based on the rank of correlation, the irrelevant or partially relevant features are removed.
Support vector regression
The support vector machine (SVM) is a learning method based on structural risk minimization criteria, which can minimize the expected risk and obtain better generalization performance on unknown data. The support vector regression (SVR) is an extension of SVM for regression problems (Drucker et al., 1997). Due to the nonlinear and nonstationary nature of wind speed, SVR is widely used in shortterm wind speed forecasting (Khosravi et al., 2018; Liu et al., 2019a; SantamaríaBonfil, ReyesBallesteros & Gershenson, 2016). In the research, we use EMD to decompose the IMF components of wind speed, and the highfrequency IMF component contains the nonlinear and nonstationary part of wind speed. In order to obtain better generalization performance, we refer to existing research and use SVR to predict it.
The main idea of SVR is to implement linear regression in the highdimensional feature space obtained by mapping the original input through a predefined function $\varnothing \left(x\right)$, and to minimize structure risks (Chen et al., 2018). Given a set of samples $\left\{{x}_{i},{y}_{i}\right\},$i =1 , 2, …, N, y_{i} is the output and x_{i} is the input. The objective is: $\begin{array}{c}\hfill f\left(x\right)={W}^{T}\varnothing \left(x\right)+b\hfill \\ \hfill R\left[f\right]=\frac{1}{2}{\u2225W\u2225}^{2}+C\sum _{i=1}^{N}L\left({x}_{i},{y}_{i},f\left({x}_{i}\right)\right)\hfill \end{array}$ where W and b are the regression coefficient and bias, respectively. C is the penalty coefficient. $L\left({x}_{i},{y}_{i},f\left({x}_{i}\right)\right)$ represents the loss function, and $R\left[f\right]$ is the structure risk. The corresponding constrained optimization problem can be expressed as: $\begin{array}{c}\hfill \mathit{min}\frac{1}{2}{\u2225W\u2225}^{2}+C\sum _{i=1}^{N}\left({\xi}_{i}+{\xi}_{i}^{\mathrm{\ast}}\right)\hfill \\ \hfill \begin{array}{cc}\hfill s.t.\hfill & \hfill {y}_{i}{W}^{T}\varphi \left(x\right)b\le \varepsilon +{\xi}_{i}\hfill \\ \hfill \hfill & \hfill {W}^{T}\varphi \left(x\right)+b{y}_{i}\le \varepsilon +{\xi}_{i}^{\mathrm{\ast}}\hfill \\ \hfill \hfill & \hfill {\xi}_{i},{\xi}_{i}^{\mathrm{\ast}}\ge 0,i=1,2,\dots ,n\hfill \end{array}\hfill \end{array}$ where ξ_{i} and ${\xi}_{i}^{\ast}$ refer to the slack variables. By introducing the Lagrange multiplier, the regression can be expressed as: $f\left(x\right)=\sum _{i=1}^{N}\left({\alpha}_{i}{\alpha}_{i}^{\mathrm{\ast}}\right)K\left({x}_{i},x\right)+b$ where α_{i} and ${\alpha}_{i}^{\ast}$ are the Lagrange multipliers that satisfy the conditions ${\alpha}_{i}\ge 0,{\alpha}_{i}^{\ast}\ge 0$ and ${\sum}_{i=1}^{N}\left({\alpha}_{i}{\alpha}_{i}^{\ast}\right)=0.K\left({x}_{i},x\right)$ is the kernel function conforming to Mercer’s theorem.
Crossvalidated lasso
The Lasso algorithm is a regression model that can perform feature selection and regularization at the same time. It was originally proposed by Robert Tibshirani of Stanford University, with better prediction accuracy and interpretability (Tibshirani, 1996). Normally, in regression, we want to find a coefficient $\beta =\left({\beta}_{1},\dots ,{\beta}_{p}\right)$ that satisfies the following: $Y=X\beta +\varepsilon ,E\left[\varepsilon X\right]=0$ where Y is the dependent variable, $X=\left({X}_{1},\dots ,{X}_{N}\right)$ is the covariate, and ɛ is the unobserved noise. Lasso tries to minimize the objective function while forcing the sum of the absolute values of the coefficients to be less than a fixed value t (Hung, Yen & Li, 2016): $min}_{{\beta}_{0},\beta}\left\{\frac{1}{N}\sum _{i=1}^{N}{\left({y}_{i}{\beta}_{0}{x}_{i}^{T}\beta \right)}^{2}\right\$ $s.t.\sum _{j=1}^{p}\left{\beta}_{j}\right\le t.$
Rewritten in the Lagrangian form: $\stackrel{\u02c6}{\beta}}_{lasso}=\underset{\beta \in {R}^{p}}{\mathit{argmin}}\left\{\frac{1}{N}{\u2225yX\beta \u2225}_{2}^{2}+\lambda {\u2225\beta \u2225}_{1}\right\$
The L_{1}norm is used instead of the L_{2}norm in Lasso. Since the constraint region is diamondshaped, it is more likely to pick the solution that lies at the corner of the region. As a result, the solution of the lasso is sparse, with some coefficients set to exactly equal to zero, that is, Lasso performs a straightforward feature selection.
To estimate ${\stackrel{\u02c6}{\beta}}_{lasso}$, the value of the penalty parameter λ is critically important. However, the optimal λ is not given automatically. If λ is chosen appropriately, Lasso achieves the fast convergence under fairly general conditions; On the other hand (chosen inappropriately), Lasso may be inconsistent or have a slower convergence. In the paper, we adopt the crossvalidated Lasso algorithm, in which the penalty parameter λ is chosen based on crossvalidation, and this is also the leading recommendation way in the theoretical literature (Park & Casella, 2008).
Prediction performance criteria
In the study the mean absolute percentage error (MAPE) , mean absolute error (MAE) and RMSE are used as performance indicators to evaluate the proposed wind forecasting model, which are defined as follows:
$MAPE=\frac{1}{N}\sum _{i=1}^{N}\left\left({Y}_{i}{\stackrel{\u02c6}{Y}}_{i}\right)/{Y}_{i}\right$ $MAE=\frac{1}{N}\sum _{i=1}^{N}\left{Y}_{i}{\stackrel{\u02c6}{Y}}_{i}\right$ $RMSE=\sqrt{\frac{1}{N1}\sum _{i=1}^{N}{\left({Y}_{i}{\stackrel{\u02c6}{Y}}_{i}\right)}^{2}}$ where Y_{i} and ${\stackrel{\u02c6}{Y}}_{i}$ refer to the observed and predicted wind speed of data point i, respectively. For MAPE, MAE, RMSE, the smaller value, the better the performance.
Results
Wind speed data
The wind speed data used in the study is gathered from two wind stations in Michigan, USA from September 2019 to October 2019. The number of data is 1,464. The initial 50 days from September 1, 2019 to October 20, 2019 are employed as input for model training, and the remaining days, i.e., from October 21, 2019 to October 31, 2019 are used to test. Figure 2 shows these two wind speed time series, and the corresponding statistics are listed in Table 1.
Wind station  Dataset  Date  Statistical indicators  

Mean (m/s)  Max (m/s)  Min (m/s)  Std.  Stew.  Kurt.  
Site #1  Training set  Sept. 1, 2019 ∼ Oct. 20, 2019 (∼83%) 
3.2975  14.4  0  2.378  0.871  0.865 
Testing set  Oct. 21, 2019 ∼ Oct. 31, 2019 (∼17%) 
3.1614  13.9  0  2.486  1.108  1.312  
Site #2  Training set  Sept. 1, 2019 ∼ Oct. 20, 2020 (∼83%) 
3.6919  11.3  0  2.183  0.807  0.353 
Testing set  Oct. 21, 2019 ∼ Oct. 31, 2020 (∼17%) 
3.5667  9.3  0  2.118  0.500  −0.318 
Experiments and result analysis
To verify the effectiveness of the proposed model, we compare it with five classic individual models, including Persistence, ELM, SVR and ANN, ARIMA. The 1 to 3step forecasting results of these models under time series #1 and #2 are displayed in Figs. 3–4, and the corresponding error estimated results are listed in Tables 2–5. It is worth noting that for a fair comparison, the parameters of the involved models are selected based on crossvalidation. Based on the experimental results, we can get the following conclusions:
Models  1step  2step  3step  

RMSE  MAE  MAPE (%)  RMSE  MAE  MAPE (%)  RMSE  MAE  MAPE (%)  
Persistence  1.1892  0.8996  36.20  1.5892  1.2221  49.65  1.9008  1.4687  57.64 
ARIMA  1.1724  0.9010  34.25  1.5182  1.1561  45.25  1.7647  1.3569  53.79 
ELM  1.2705  0.9724  36.20  1.5500  1.1729  46.55  1.8109  1.3603  55.18 
SVR  1.1739  0.9024  34.87  1.5676  1.1928  46.71  1.7832  1.3376  52.78 
ANN  1.1984  0.9354  36.24  1.5338  1.1615  45.79  1.8427  1.3906  55.70 
The proposed  0.5859  0.4426  21.11  0.7531  0.5848  24.78  0.8528  0.6798  27.55 
Models  1step  2step  3step  

RMSE  MAE  MAPE (%)  RMSE  MAE  MAPE (%)  RMSE  MAE  MAPE (%)  
Persistence  1.2720  0.9739  35.98  1.4292  1.0947  41.02  1.6700  1.3073  47.99 
ARIMA  1.1609  0.9302  38.71  1.3214  1.0430  45.43  1.5257  1.2188  53.05 
ELM  1.2528  1.0188  44.81  1.3657  1.0915  51.40  1.5867  1.2849  60.12 
SVR  1.1602  0.9218  36.51  1.3115  1.0360  43.63  1.5018  1.2008  49.91 
ANN  1.1901  0.9460  40.62  1.3116  1.0330  43.72  1.6345  1.2798  52.18 
The proposed  0.5593  0.4193  17.10  0.7540  0.5966  22.99  0.7911  0.6437  24.59 
Models  1step  2step  3step  

Persistence  P_{RMSE} (%)  102.98  111.04  122.89 
P_{MAE} (%)  103.24  108.97  116.05  
P_{MAPE} (%)  71.47  100.34  109.26  
ARIMA  P_{RMSE} (%)  100.11  101.60  106.94 
P_{MAE} (%)  103.55  97.69  99.60  
P_{MAPE} (%)  62.20  82.58  95.26  
ELM  P_{RMSE} (%)  116.85  105.83  112.35 
P_{MAE} (%)  119.68  100.57  100.11  
P_{MAPE} (%)  71.44  87.82  100.31  
SVR  P_{RMSE} (%)  100.36  108.16  109.11 
P_{MAE} (%)  103.87  103.96  96.76  
P_{MAPE} (%)  65.15  88.48  91.62  
ANN  P_{RMSE} (%)  104.54  103.68  116.09 
P_{MAE} (%)  111.33  98.62  104.56  
P_{MAPE} (%)  71.63  84.78  102.23 
Models  1step  2step  3step  

Persistence  P_{RMSE} (%)  127.42  89.54  111.11 
P_{MAE} (%)  132.24  83.49  103.08  
P_{MAPE} (%)  110.33  78.38  95.19  
ARIMA  P_{RMSE} (%)  107.55  75.25  92.86 
P_{MAE} (%)  121.83  74.83  89.35  
P_{MAPE} (%)  126.31  97.56  115.77  
ELM  P_{RMSE} (%)  123.99  81.12  100.59 
P_{MAE} (%)  142.95  82.96  99.60  
P_{MAPE} (%)  161.98  123.54  144.54  
SVR  P_{RMSE} (%)  107.43  73.93  89.84 
P_{MAE} (%)  119.81  73.66  86.54  
P_{MAPE} (%)  113.43  89.76  103.00  
ANN  P_{RMSE} (%)  112.78  73.95  106.62 
P_{MAE} (%)  125.58  73.16  98.82  
P_{MAPE} (%)  137.50  90.12  112.23 

In the 1step forecasting, for wind station #1, the proposed model obtains the best accuracy: RMSE, MAE, and MAPE are 0.5859, 0.4426, and 21.11%, respectively. The classic individual models from low to high based on RMSE are ELM, ANN, Persistence, SVR, and ARIMA, with MAPE values of 36.20%, 36.24%, 36.20%, 34.87%, and 34.25%, respectively. Likely, in wind station #2, compared with the classic individual models, the proposed model still obtains the best performance, and the MAPE value is 17.10%.

In the 2step forecasting, when wind station #1 is used, the proposed model has the lowest performance criteria, i.e., the values of RMSE, MAE, and MAPE are 0.7531, 0.5848, and 24.78%, respectively. In addition, for wind station #2, the proposed model still achieves the lowest performance criteria value. Take MAPE as an example, the value of MAPE is 22.99%, which is significantly lower than other models.

In the 3step forecasting, the proposed model is still the model with the highest prediction accuracy, and the MAPE of wind stations #1 and #2 are 27.55% and 24.59%, respectively. Persistence has the worst RMSE value among these models, with MAPE of 57.64% and 47.99%, respectively.
In general, under 1 to 3step forecasting, the proposed model can obtain the best prediction performance compared with the classic individual models.
Compared with traditional EMD methods
As a nonlinear signal analysis method for processing nonlinear and nonstationary time series, EMD has been widely used in time series. To further verify the effectiveness of our EMD model, we compare it with four widely used EMD models, namely EMDELM, EMDSVR, EMDSPSVR, and EMDANN. It is worth noting that in this study, these methods used the same way as our proposed model, using EMD to decompose the wind speed, using a single classifier to predict each IMF component separately, and adding all the prediction results to get the final prediction wind speed. The prediction results and the error estimated results of these four EMDbased methods and the proposed method are displayed in Figs. 5–6 and Tables 6–9. Based on Figs. 5–6 and Tables 6–9, it can be observed that:
Models  1step  2step  3step  

RMSE  MAE  MAPE (%)  RMSE  MAE  MAPE (%)  RMSE  MAE  MAPE (%)  
EMDELM  0.6400  0.5128  22.63  0.7854  0.6316  27.22  0.8746  0.6937  29.02 
EMDSVR  0.6379  0.5120  23.32  0.7768  0.6181  27.09  0.8583  0.6749  28.48 
EMDSVRSP  0.6310  0.4867  23.03  0.7987  0.6141  26.30  0.8591  0.6762  28.66 
EMDANN  0.6342  0.5055  23.55  0.7879  0.6221  27.67  0.8987  0.7040  29.31 
The proposed  0.5859  0.4426  21.11  0.7531  0.5848  24.78  0.8528  0.6798  27.55 
Models  1step  2step  3step  

RMSE  MAE  MAPE (%)  RMSE  MAE  MAPE (%)  RMSE  MAE  MAPE (%)  
EMDELM  0.6560  0.5283  21.59  0.8199  0.6669  27.49  0.8775  0.7096  27.65 
EMDSVR  0.6567  0.5233  24.88  0.8317  0.6736  29.85  0.8508  0.6986  30.52 
EMDSVRSP  0.6437  0.4972  24.06  0.8211  0.6718  28.53  0.8894  0.7264  32.31 
EMDANN  0.6397  0.5046  21.83  0.7927  0.6373  25.34  0.8520  0.6934  27.86 
The proposed  0.5593  0.4193  17.10  0.7540  0.5966  22.99  0.7911  0.6437  24.59 
Models  1step  2step  3step  

EMDELM  P_{RMSE} (%)  9.23  4.30  2.56 
P_{MAE} (%)  15.85  8.00  2.05  
P_{MAPE} (%)  7.20  9.82  5.36  
EMDSVR  P_{RMSE} (%)  8.88  3.16  0.65 
P_{MAE} (%)  15.67  5.69  −0.72  
P_{MAPE} (%)  10.43  9.31  3.41  
EMDSVRSP  P_{RMSE} (%)  7.70  6.06  0.74 
P_{MAE} (%)  9.95  5.01  −0.53  
P_{MAPE} (%)  9.09  6.10  4.05  
EMDANN  P_{RMSE} (%)  8.25  4.63  5.39 
P_{MAE} (%)  14.21  6.38  3.56  
P_{MAPE} (%)  11.52  11.67  6.40 
Models  1step  2step  3step  

EMDELM  P_{RMSE} (%)  17.29  8.74  10.93 
P_{MAE} (%)  25.98  11.78  10.23  
P_{MAPE} (%)  26.20  19.55  12.46  
EMDSVR  P_{RMSE} (%)  17.41  10.30  7.56 
P_{MAE} (%)  24.80  12.91  8.52  
P_{MAPE} (%)  45.48  29.81  24.15  
EMDSVRSP  P_{RMSE} (%)  15.09  8.90  12.43 
P_{MAE} (%)  18.58  12.61  12.84  
P_{MAPE} (%)  40.64  24.08  31.42  
EMDANN  P_{RMSE} (%)  14.37  5.12  7.71 
P_{MAE} (%)  20.33  6.83  7.72  
P_{MAPE} (%)  27.62  10.22  13.29 

Compared with the abovementioned classic individual models, the performance of the EMDbased method is significantly improved. Take wind station #1 as an example, in the 1step forecasting, the value of RMSE of the EMDbased methods is around 0.60, while the classic individual model is around 1.20. After the wind speed is decomposed by EMD, the value of RMSE is reduced almost doubled.

For wind station #1, except for the MAE in the 3step forecasting, the performance indicators obtained from the proposed model are significantly better than those EMDbased combined models. For the 3step forecasting, the performance of EMDSVR and EMDSVRSP in MAE is slightly better than the proposed combined model, but in other evaluation indicators, the proposed combined model achieves a significantly better performance. Furthermore, EMDANN is always worse in MAPE as compared with the other three combined models, with MAPE of 23.55%, 27.67%, and 29.31% for 1 to 3step forecasting.

For wind station #2, in 1 to 3step wind speed forecasting, the proposed combined model obtains the best prediction results. The RMSE, MAE and MAPE in the 1step forecasting are 0.5593, 0.419, and 17.10%, respectively. In comparison, among the other four EMDbased combined models, the EMDELM and EMDANN models have similar prediction performance in 1 to 3step forecasting, with MAPE values of 21.59%, 27.49%, 27.65% and 21.83%, 25.3%, 27.86%, respectively.
In total, the EMDbased method has obvious advantages over traditional methods, and the proposed method that using EMD, FS, SVR and LassoCV can achieve better performance.
Discussion
Performance of SVRSP and LassoCV on different IMFs
According to the EMD principle, the frequency of the IMF components is from high to low. The nonlinear and nonstationary information of wind speed data is mainly concentrated in the highfrequency IMF, and the lowfrequency IMF presents a Sinlike function curve. Based on its characteristics, in this study we use SVRSP and LassoCV to predict IMFs of different frequencies. In order to verify the effectiveness of this hybrid EMD model, in this section, we take wind station #2 as an example to analyze the performance of the two methods on different IMF components. Table 10 lists the RMSE of SVRSP and LassoCV on different IMF components. It is worth mentioning that in multistep prediction, the prediction accuracy of the first step is more important than the other steps, which is of great significance for the accurate estimation of wind power. It can be seen from Table 10 that SVRSP can obtain significantly better performance than LassoCV at high frequency (IMF1), while LassoCV can obtain better performance at low frequencies (IMF2∼IMF7, Trend), and its RMSE is already close to zero at IMF4. Moreover, SVRSP has a risk of overfitting when predicting low frequencies, resulting in poor performance. In total, the proposed model that combines the EMD decomposition characteristics and the advantages of the algorithm can achieve better performance than the traditional EMD model.
Steps  Models  IMF1  IMF2  IMF3  IMF4  IMF5  IMF6  IMF7  Trend 

1step  SVRSP  0.530  0.256  0.061  0.047  0.042  0.040  0.326  0.100 
LassoCV  0.594  0.178  0.033  0.002  0.001  0.000  0.000  0.000  
2step  SVRSP  0.670  0.407  0.198  0.067  0.041  0.042  0.327  0.100 
LassoCV  0.662  0.369  0.121  0.009  0.001  0.000  0.001  0.000  
3step  SVRSP  0.668  0.422  0.354  0.086  0.046  0.045  0.327  0.100 
LassoCV  0.663  0.401  0.262  0.023  0.002  0.001  0.001  0.000 
Comparison of different signal decomposition techniques
Besides EMD, Variational Mode Decomposition (VMD) and Ensemble Empirical Mode Decomposition (EEMD) are also widely used in shortterm wind speed forecasting. Here, we analyze the impact of different signal decomposition techniques on the performance of our proposed method. Table 11 shows the prediction performance of the three signal decomposition techniques on two wind stations. For wind station #1, it can be found that compared with VMD and EEMD, EMD obtains the best RMSE value in the 1step forecasting. The performance obtained by VMD in the 1step and 2step forecasting is relatively close, but it drops significantly in the 3step forecasting. EEMD inherits from EMD, similar to EMD, as the step size increases, the performance will decrease significantly. For wind station #2, EMD also obtained the best predictive performance. VMD has a similar conclusion on wind station #1, and the performance of the 1step and 2step forecasting is relatively close. It should be pointed out that in multistep forecasting, the 1step forecasting is usually used for wind energy estimation, and other steps are used to assist decisionmaking, so more attention is paid to the performance of the 1step forecasting.
Wind station  Signal decomposition method 
RMSE  

1step  2step  3step  
Site #1  VMD  0.6395  0.6782  0.7793 
EEMD  0.6358  0.7301  0.8277  
EMD (The proposed)  0.5859  0.7531  0.8528  
Site #2  VMD  0.6664  0.6654  0.7111 
EEMD  0.5844  0.8404  0.8758  
EMD (The proposed)  0.5593  0.7540  0.7911 
The impact of the number of selected features on performance
Feature selection is used to remove redundant features in the study. However, the number of selected significant features will more or less affect the shortterm wind speed forecasting. In order to ensure the stability in the complicated industrial system, we analyzed the performance of our proposed method under the different number of selected features. Figure 7 shows the RMSE value between the number of selected features and the performance of our proposed method. It should be pointed out that in the study based on the characteristics of EMD decomposition we use FS and SVR to predict highfrequency component (i.e., IMF_{1}), and use LassoCV to predict lowfrequency components. Feature selection is mainly used in the prediction of IMF_{1} component. From Fig. 7, we can be seen that feature selection can slightly improve the performance of 1step forecasting, but has little effect on 1step and 2step forecasting. Overall, as the number of selected features decreases, the generalization performance of the method will improve, but when the selected features are too scarce, the performance will drop sharply due to the deletion of useful features. In order to determine the appropriate number of features, by following (Bradley, Mangasarian & Street, 1998; Chizi, Rokach & Maimon, 2009) , this study uses crossvalidation to select.
Performance under different signaltonoise ratios
In the process of collecting wind speed, it is often affected by the environment and the anemometer itself, resulting in a certain amount of noise in the data. In order to verify the reliability of the method, we analyzed the prediction performance under different signaltonoise ratios (SNRs). Figure 8 shows the 1step to 3step prediction performance of the method from 30∼60db SNR. Take wind station #1 as an example, it can be seen from Fig. 8 that the performance of the proposed method is relatively stable under different signaltonoise ratios. The RMSE value of 1step forecasting is about 0.6, the RMSE value of 2step forecasting is about 0.75, and the RMSE value of 3step forecasting is about 0.85. In general, as the signaltonoise ratio increases, the prediction performance of the proposed method will be improved. Similar performance also exists on site #2. These experimental results show that the proposed method can accurately predict wind speed under certain noise.
Conclusions
As a sustainable and renewable energy, wind power has attracted widespread attention and rapid development in recent years. Reliable and accurate wind speed forecasting will provide support for wind power planning and control. Due to the nonlinearity and nonstationarity of wind, forecasting is still a difficult yet challenging problem. In the paper, we developed a new wind speed forecasting model based on EMD, FS, SVR and LassoCV. EMD is employed to extract IMFs from the original nonstationary wind speed time series. FS and SVR are combined to predict the highfrequency IMF. LassoCV is adopted to complete the prediction of lowfrequency IMF and trend. By testing in two wind speeds obtained from Michigan, USA, the experimental results show that under 1 to 3step forecasting the proposed model can achieve better prediction performance than the classic individual and traditional EMD combined models. Although the proposed model has achieved good performance, it still has some limitations. After the new data is updated, the model needs to be retrained. In future research, we will try to integrate online learning in our proposed method.