Effective short-term forecasts of Saudi stock price trends using technical indicators and large-scale multivariate time series

View article
PeerJ Computer Science

Introduction

The stock market is considered one of the most highly complex financial systems as it holds diversified and intricate (macro and micro) factors accountable for the rapid fluctuation of stock prices. The stock price forecast is the task of predicting future prices and movements through analyzing and discovering the hidden patterns of discrete-time historical data using statistical and machine learning methods. Accurate stock price prediction is one of the most challenging tasks in predictive modeling due to the high sensitivity and natural nonlinearity associated with the stock data, making it crucial to be thoroughly addressed through well-processed time-series data to yield informative and constructive forecasts.

Predicting future stock prices and movement has been widely researched in the literature. Some of these works, including Mehmood, Mehmood & Mujtaba (2012), Jarrett (2008), Gilmore & McManus (2003), Fama (1995), Shen & Shen (2016), Al-Shiab (2006), Junior, Salomon & de Oliveira Pamplona (2014), Wadi et al. (2018), Jarrett & Kyper (2011), Banerjee (2014), Almasarweh & Alwadi (2018), Ariyo, Adewumi & Ayo (2014), Mekayel Anik, Shamsul Arefin & Ali Akber Dewan (2020), Siew & Nordin (2012), Cakra & Trisedya (2015), Gharehchopogh, Bonab & Khaze (2013), have applied classical linear methods such as linear regression (Seber & Lee, 2012), random walk (Lawler & Limic, 2010), and Autoregressive Integrated Moving Average (ARIMA) (Zhang, 2003). With the recent improvement of Artificial Intelligence and increased computational capabilities, many others, including Pang et al. (2020), Bing, Hao & Zhang (2012), Yetis, Kaplan & Jamshidi (2014), Mizuno et al. (1998), Chen, Zhou & Dai (2015), Pawar, Jalem & Tiwari (2019), Selvin et al. (2017), Althelaya, El-Alfy & Mohammed (2018), Selvamuthu, Kumar & Mishra (2019), have employed cutting-edge machine learning-based methods such as artificial neural networks (ANN) (McCulloch & Pitts, 1943), Long short-term memory (LSTM) (Hochreiter & Schmidhuber, 1997), and convolutional neural network (CNN) (LeCun & Bengio, 1998). Most of these works study stock forecasting capabilities using univariate time series data by employing the close price as the target feature. Recent advances in time series-based data collection have enabled the vast availability of high-dimensional time series data, facilitating the use of multivariate time series in forecast modeling to render better and possibly more accurate forecasting results. Accordingly, this work considers multivariate-based time series data using historical stock market data featuring various attributes and related characteristics.

An accurate and reliable method for Saudi stock price forecasts which employs large-scale historical data has not been introduced in the literature yet, despite a few efforts reported by Jarrah & Salim (2019), Alamro, McCarren & Al-Rasheed (2019), Alotaibi et al. (2018) and Alsubaie, El Hindi & Alsalman (2019), which involved (a) operating on a minimal subset of the Saudi stock market data, (b) lacking appropriate representation, or (c) reporting results using incorrect metrics. First, the forecasting method presented by Jarrah & Salim (2019) is restricted to predicting the next day’s closing price using a small subset of the Saudi stock historical data from 2011 to 2016. Second, the results reported by Alotaibi et al. (2018) are inaccurate as it uses incorrect evaluation metric (percentage accuracy is not the correct metric for regression problems). Furthermore, the work by Alsubaie, El Hindi & Alsalman (2019) tries to solve a different problem than forecasting, where the authors proposed a cost-sensitive naive Bayes-based classification model to improve the investment return applied to a subset of Saudi stock market data from 2015 to 2018. Finally, Alamro, McCarren & Al-Rasheed (2019) presented a customized methodology that engages sentiment analysis of financial news with recurrent neural network-based modeling to predict the Saudi stock market index using a multivariate time series analysis. Besides reporting forecasts for the next day’s closing price only, we found this study unrepresentative and opaque in the following areas: (1) there is not enough information on how the input stock data was initially collected and used for the forecasting task, and (2) it lacks a clear explanation of how time series stock data was preprocessed and how the forecasting results are related to a company or sector.

The main research question to be investigated by this article is whether the recurrent neural networks (RNN) yield robust and reliable performance in short-term closing price forecasts of the Saudi stock market. Since the related published articles on this topic are either technically restricted or have been incorrectly reported, this article presents a practical and comprehensive methodology for multi-step forecasts of the stock’s closing price, rather than a one-step (i.e., next day) forecast which may not be insightful for long-term strategies and planning. The proposed methodology in this work applies a cutting-edge RNN-based learning method, namely Gated Recurrent Units (GRUs), to construct a well-architected forecasting model using the most recent published multivariate time series data of the primary Saudi stock market index, which was made publicly attainable from the Saudi Stock Exchange (Tadawul) (TASI, 2020). The GRU model was trained to produce short-term forecasts for the next five business days and benchmarked against Vector Autoregression Moving-Average with Exogenous Regressors (VARMAX) model, which was also customized to yield short-term forecasts. Furthermore, custom-tailored feature engineering pipeline is made to preprocess the input raw stock data and enable inducing financial-related attributes, followed by a stride-based sliding window to yield trainable short-term multivariate-based time series data ready for the modeling phase. The empirical results interpreted and evaluated by a set of regression-related metrics demonstrate outstanding results with the lowest error rates compared to existing works near this topic, manifesting the proposed methodology’s capability to produce accurate short-term closing price forecasts per company in the Saudi stock market. Simply put, the contribution of this work can be summarized in three folds:

  1. Introducing an AI-powered comprehensive forecasting methodology to enable short-term closing price forecasts of the Saudi stock market index assessed using a set of regression-based metrics to interpret the models’ prediction confidence concerning the stock market trends application.

  2. We propose a carefully-engineered Gated Recurrent Unit (GRU) model constructed and carefully calibrated to enable precise short-term forecasts.

  3. We apply large-scale multivariate time series data of a recently published Saudi stock market index data by TASI (2020), where some stocks have been recorded since 2000, to improve the reliability of the trained model in producing short-term forecasts.

This work attempts to present the best possible AI-assisted decision-making Saudi stock market’s closing price forecasting method, potentially for a deployable solution in real-world settings. The rest of this article is organized as follows: Section 2 presents a literature review of the related works in the stock market predictions. Section 3 presents brief preliminaries about the essential elements used in this article. Section 4 provides a description of the datasets employed during the experiment, followed by demonstrating the proposed preprocessing and modeling method. The article’s experiment is presented in Section 5 followed by a thorough discussion of the results in Section 6. The article is finally concluded in Section 7.

Literature review

Studies of forecasting the stock market movement and prices have been rapidly growing in recent years, mainly due to the advancement in machine learning-based methods. Chen, Leung & Daouk (2003) attempted to model and predict the direction of the Taiwan Stock Exchange Index by employing the probabilistic neural network (PNN) method with investment strategies. The performance of the proposed modeling design was compared with that of a parametric forecasting approach, namely the generalized methods of the GMM-Kalman filter and random walk forecasting models. The empirical results showed that PNN demonstrates a more substantial predictive capability than the GMM-Kalman filter and the random walk prediction models. Guresen, Kayakutlu & Daim (2011) used a customized artificial neural network (ANN) based model to forecast the stock market index applied to the NASDAQ index forecasts. While the results are not effective enough, they successfully outperformed the existing methods at that time. Another similar research was conducted by Kara, Boyacioglu & Baykan (2011) which attempted to develop two efficient models in predicting the direction of movement in the Istanbul Stock Exchange (ISE) Index using the support vector machine and artificial neural networks, indicating that the average performance of the ANN-based model (75.74%) was significantly better than that of the SVM-based model (71.52%). Li et al. (2011) studied the effect of combining information hidden from the market news with the stock price applied to the Hong Kong stock market use case. The authors utilize the multi-kernel learning (MKL) technique that integrates news articles and short-time history prices, followed by a modeling step using the support vector machine (SVM). The results have shown that the multi-kernel-based model could better use information in news articles and history prices than the model merely combining features of news articles and prices. Li, Dagli & Enke (2007) proposed a reinforcement-oriented forecasting scheme for forecasting short-term stock price movements. The scheme applies actor-only and actor-critic reinforcement learning, respectively, and is eventually compared with a supervised-only model and classical random walk benchmark. The actor-critic-based system’s performance was better than other alternatives, while the actor-only system also showed efficacy.

Sun et al. (2019) introduced a data-driven approach that combines classical financial pricing models with the artificial neural networks, named ARMA-GARCH-NN, that incorporate unique designs of feature selection, cross-validation, and ensemble learning to enable capturing intra-day patterns for stock market shock forecast. The authors investigate the value of historical information in capturing unexpected market movements, defined as the innovation obtained from ARMA-GARCH models. The empirical studies applied to high-frequency data of the U.S. stock market show that (i) ARMA-GARCH-based market shocks at the intra-day level are predictable, and (ii) the predicted market shocks can serve as new trading signals to benefit the decision-making process for financial investments. Moreover, Cao, Li & Li (2019) investigated the impact of noise inherent in the non-linear and non-stationary stock market data by introducing two-hybrid forecasting models that combine the empirical mode decomposition (EMD) and its advanced version CEEMDAN with the long short-term memory (LSTM) to predict the financial time series. The idea is to decompose the original time series into several sub-series under different frequencies using the EMD and CEEMDAN methods, and the sub-series are respectively predicted as the input data of the LSTM model. Finally, all the predictions are re-constructed to get the final result. The proposed method is tested using four major global stock indices. The results are compared with a single LSTM, support vector machine (SVM), multi-layer perceptron (MLP), and other hybrid models, demonstrating a better performance in one-step-ahead forecasting of financial time series. Chen, Lin & Wang (2019) studied the problem where traditional statistical methods cannot extract relevant features for mining the financial time series data. Thus, the authors proposed a dual-phase trend prediction model (TPM) based on dual features and a dual attention mechanism to predict the direction and duration of stock price changes. The performance of the proposed method outperforms the state-of-art methods, including SVR, LSTM, CNN, LSTM-CNN, and TPM-NC, under various training and testing parameters with different stock index data sets. Finally, Javed Awan et al. (2021) moves toward the financial big data problem where modeling is challenging. The authors implemented several machine learning models using the Spark MLlib library in PySpark to predict market trends using the stocks of several top companies containing 10-year historical data to predict stock price movements. Experimental results indicated that linear regression, random forest, and generalized linear regression deliver 80–98% accuracy.

Preliminaries

Before describing the proposed methodology, we need to establish an understanding of the essential elements involved in this experiment, including the definition of multivariate time series and their special properties with forecasting strategies, problem formulation, and the RNN-based modeling component.

Time series forecasting

Time series is a collection of sequential vector-valued variables of some phenomenon indexed over time. Time series forecasting is the task of predicting future values of given time-series observations by analyzing historical time series data to check for patterns of time decomposition, such as trends, seasonal patterns, cyclic patterns, and regularity. Univariate time series implies that forecasting is based on a single variable of time series observations without considering the effect of the other variables, making them desirable for computational efficiency especially when data availability is already scarce. On the other hand, multivariate time series have more than one time-dependent variable where each variable depends not only on its past values but also tends to have some interdependency with other variables. Although the multivariate time series setting is known for difficulty in extracting beneficial information for the prediction process (Chayama & Hirata, 2016), informative forecasts are often assumed to be conveyed by multivariate time series since advanced statistical methods can leverage the dependencies between variables to improve the forecasting accuracy (Yin & Shang, 2016; Zhang et al., 2017; Lang et al., 2019).

Problem definition

Multivariate time series forecasting has numerous valuable applications, such as predicting electricity consumption, solar power production, and stock trend movements. The potential interdependency among variables in the multivariate-based time series can lead to better and more coherent learning results, especially when variables are highly correlated. This research seeks to employ the multivariate-based time series analysis to enable better information extraction about the data and their underlying dynamics to improve the short-term prediction accuracy.

Problem formulation

Given a set of observed time series X={x1,x2,,xn}, where xtRd is a real vector of dimension d representing an observed time serie at time t, the goal of multivariate time series forecasting is to forecast the future time, xt+h, where h is the desirable fixed-value horizon ahead with respect to different tasks.

The horizon length h is specified based on multiple parameters, including the data and environment settings. For instance, the choice of the horizon h for stock market data can be anywhere from minutes to days, as this range makes sense for the stock market domain.

VARMA with exogenous regressors

The Vector Autoregressive Moving-Average (VARMA) model extends the ARMA/ARIMA model to work with time series with multiple response variables (vector time series). It is one of several statistical analyses frequently used for multivariate-based time series studies. The model includes the dynamic relationship between the multiple response variables as well as the relationship between the dependent and independent variables. It is a combination of Vector Auto-Regressive (VAR) and Vector Moving-Average (VMA) models that supports the multivariate time series modeling by considering both the lag order (p) and order of moving average (q) in the model. The VARMA model can operate as a VAR model by just setting the q parameter to 0, and it also can operate like a VMA model by just setting the p parameter to 0. Moreover, the Vector Autoregression Moving-Average with Exogenous Regressors (VARMAX) is an extension of the VARMA model that includes modeling with exogenous variables (also called covariates). In practice, the target series is referred to as an endogenous sequence to contrast from exogenous sequences that are included directly in the model at each time step but not modeled in the same way as the target endogenous sequence. Like VARMA, VARMAX can also operate as VARX and VMAX to model the subsumed models with exogenous variables.

Formally, the endogenous vector yt is said to follow a vector autoregressive moving average with exogenous variables VARMAX(p,q,s) if it satisfies an equation of the form (Östermark, 1994; Kadiyala & Kumar, 2014; SAS, 2016):

yt=i=1pΦiyt1+i=0sΘixt1+ϵti=1qΘiεt1where the output variables of interest, yt=(y1t,,ykt) referred to response or endogenous variables, can be influenced by other input variables, xt=(x1t,,xmt) referred to as regressor or exogenous variables. The unobserved noise variables, εt=(ε1t,,εkt), are a vector white noise process. Therefore, the model VARMAX(p,q,s) can be written as:

Φ(B)yt=Θ(B)xt+Θ(B)εtwhere,

Φ(B)=IkΦ1BΦpBpΘ(B)=Θ0+Θ1B++ΘsBsΘ(B)=IkΘ1BΘqBqare matrix polynomials in B in the backshift operator, such that Biyt=yt1, the Φi and Θi are k×k matrices, and Θi are k×m matrices. For the sake of achieving the complete definition, the following assumptions are made:

  • E(εt)=0, E(εtεt)= which is positive-definite, and E(εtεs)=0 for ts.

  • For stationarity VARMAX, the roots of |Φ(z)|=0 and |Θ(z)|=0 are outside the unit circle.

  • The exogenous independent variables xt are not correlated with residuals εt, E(xtεt)=0.

  • The exogenous variables can be stochastic or non-stochastic.

Gated recurrent units

We first present a brief exposition of recurrent neural networks (RNNs). RNN, first conceived by Elman (1990), is an extension of the conventional feedforward neural network used for sequence modeling problems. RNNs can be seen as a universal approximator for dynamical systems of variable-length input sequences (Schäfer & Zimmermann, 2006). RNNs can let information persist within the network over time, making them suitable for sequence learning modeling. Formally, given a sequence X=(x1,x2,...,xt), a default RNN updates a recurrent hidden state ht at time t as (Olah, 2015):

ht=f(ht1,xt)

where f is commonly chosen as a nonlinear activation function, such as sigmoid or tanh, with an affine transformation of both ht1 and xt. Thus, the update of the recurrent hidden state can be rewritten as:

ht=Φ(Wxt+Θht1+b)where xt is m-dimensional input vector at time t, ht is n-dimensional hidden state, Φ is a nonlinear activation function, W is weight matrix of size n×m, Θ is recurrent weight matrix of size n×n, and b is a bias of size n×1.

The default RNN is deficient in capturing long-term dependencies because the gradients tend to either vanish or explode with long sequences (Bengio, Simard & Frasconi, 1994). Recall a well-known phenomenon in canonical recurrent neural networks (RNNs) called long-term dependencies (Bengio, Simard & Frasconi, 1994), in which a system’s desired output at time T depends on inputs presented at times t<T. In order words, as the sequence length increases during training, the gradients tend to vanish, leading to the problem of vanishing gradients. Vanishing gradients prevent the network from learning the underlying dynamics and hidden patterns extracted from the long-term dependencies, which is beneficial in producing accurate future forecasts. Vanishing gradients also arise when the network dynamics have attractors necessary to store bits of information over the long term (Chandar et al., 2019). Successful architectures, such as LSTM and GRU, alleviate this issue by introducing memory in the form of information skipped forward across transition operators, with gates bounded by activation functions (sigmoid, tanh) to determine which information to skip, releasing the hidden states (i.e., the memory) from their output states to reduce the vanishing problem.

To overcome the shortcoming mentioned above, two RNN-based variants have been introduced, namely Long Short Term Memory (LSTM) (Hochreiter & Schmidhuber, 1997) and Gated Recurrent Units (GRU) (Chung et al., 2014). Both variants seek to keep long-term dependencies effective while alleviating the vanishing and exploding gradient problems (Chandar et al., 2019; Aseeri, 2021). As seen in Fig. 1, GRU operates on two gates: reset and update gates. The architecture essentially combines the forget and input gates used by Long term short memory (LSTM) into the update gate and merges the cell state and hidden state into the reset gate, resulting in a simpler design and consequently lower number of trainable parameters needed compared to the LSTM variant, improving the training time and performance (Chung et al., 2014). The operations performed by the GRU model are as follows (Chung et al., 2014; Dey & Salem, 2017):

Illustration of the Gated Recurrent Unit (GRU) architecture.

Figure 1: Illustration of the Gated Recurrent Unit (GRU) architecture.

rt=σ(Wr[ht1,xt])θt=σ(Wu[ht1,xt])ht=tanh(W[rtht1,xt])ht=(1θt)ht1+θthtwhere rt and θt are respectively the reset and update gates at time t, W are weight matrices, and denotes component-wise multiplication.

Methodology

Data description and exploration

To achieve the best possible forecasting results, especially in the time-series domain, it is important to employ long historical and well-preprocessed time series data to help capturing the hidden patterns for the domain of interest, such as stocks’ trends and movement. In this experiment, we use a recently published multivariate time series of the primary Saudi stock market index, particularly the Tadawul All Share Index (TASI), made freely accessible by TASI (2020). The TASI dataset represents daily closing prices of TASI index companies, which are distributed into ten sectors featuring 14 attributes as shown in Table 1 where eight of which are numerical features: open, high, low, change, percentage change, volume traded, value traded, and number of trades. Since the Saudi stock market index is as old as 2001, some companies in the dataset have been recorded since 2001, while others have recently joined the market, meaning that the recording data of a new company begins immediately starting from the joining date. We randomly selected ten companies, one from each sector, to be modeled by our proposed method and listed in Table 2. In addition, Fig. 2 illustrates the distribution of the closing price of each of the ten chosen companies.

Table 1:
Feature descriptions of the dataset used in this experiment.
Feature Type Description
symbol Integer The symbol or the reference number of the company
name String Name of the company
trading_name String The trading name of the company
sectoer String The sector in which the company operates
date Date The date of the stock price
open Numeric The opening price
high Numeric The highest price of the stock at that day
low Numeric The lowest price of the stock at that day
close Numeric The closing price
change Numeric The change in price from the last day
perc_change Numeric The percentage of the change
volume_traded Numeric The volume of the trades for the day
value_traded Numeric The value of the trades for the day
no_trades Numeric The number of trades for the day
DOI: 10.7717/peerj-cs.1205/table-1
Table 2:
A list of randomly selected companies involved in this study, each representing a sector.
Sector Company name Trading name
Financials Al Rajhi Bank ALRAJHI
Energy Saudi Arabia Refineries Co. SARCO
Materials National Industrialization Co. TASNEE
Health care Saudi Pharmaceutical Industries and Medical Appliances Corp. SPIMACO
Utilities Saudi Electricity Co. SAUDI ELECTRICITY
Real estate Emaar The Economic City EMAAR EC
Consumer discretionary Fitaihi Holding Group FITAIHI GROUP
Industrials Al-Ahsa Development Co. ADC
Consumer staples Al Gassim Investment Holding Co. GACO
Communication services Saudi Telecom Co. STC
DOI: 10.7717/peerj-cs.1205/table-2
KDE plot illustrating the analytical distribution of the closing price of each of the 10 companies chosen randomly in this experiment.

Figure 2: KDE plot illustrating the analytical distribution of the closing price of each of the 10 companies chosen randomly in this experiment.

A company’s sector is stated within the parentheses.

Temporal multivariate processing

In this study, the process of Saudi stock trend prediction is considered a regression problem where the goal is to produce future horizon closing price forecasts of size h, where h is set to five representing the five trading days, given a past window of multivariate time-lagged historical time series1 . Moreover, this work explores the performance of the proposed prediction system when employing a number of financial-related technical indicators to improve forecasts. To the best of our knowledge, this is the only work reporting multi-step closing price forecasts of the Saudi stock TASI index. The proposed methodology is visualized in Fig. 3, which depicts the processing pipeline (top subfigure) and the windowing approach applied to the raw time series data (bottom subfigure).

The forecasting pipeline shown in the top subfigure of Fig. 3 is tailored to ensure systematic and sound preprocessing and modeling. The pipeline involves five essential modules, each performing a specific task. The process begins with initiating a streaming data reader from the raw stock data to enable retrieving multivariate time series data. When a specific company’s multivariate time series is fully retrieved, the data interpolation module begins cleaning the retrieved multivariate time series by (1) removing duplicate timesteps when detected and (2) performing linear interpolation using the forward technique by replacing missing values with the last valid observation propagated to fill the consecutive gaps (the missing values are the weekend days as the market is closed). Next, the feature engineering module is responsible for conducting three major processes: (i) feature reconstruction, (ii) correlation-based feature selection, which will be elaborated in detail in the next subsection “Feature Engineering: Technical Indicators and Feature Selection”, and (iii) the min-max normalization. After that, the windowing and splitting module performs a customized one-stride sliding window procedure to form adequate sequences for the modeling phase, followed by splitting the overall sequences into three sets, namely training, validation, and testing sets. Here, the word stride implies the amount of movement/shift to the right direction in a time, so one-stride means one movement to the right at a time. Looking closely at the left side of the windowing approach subfigure in Fig. 3, an iterative top-down procedure applied to the multivariate time series data generates multivariate sequences with a vector of future horizons, thus constructing supervised-like multistep sequences ready for training and validating the forecasting model. For instance, at a timestep t, a one-stride sliding window is rolled to specify a batch that involves the historical time-lag window of size w (in blue) and the forecast window of size h (in green). Consequently, his process can be formally expressed as:

A visual representation of the proposed forecasting system.

Figure 3: A visual representation of the proposed forecasting system.

The top subfigure (A) represents the processing pipeline, while the bottom subfigure (B) illustrates the one-stride windowing approach used with the input datasets.

yt+h=fh{yt,,ytw}where h is the number of steps to forecast into the future (i.e., the desirable future horizons), w is the size of the time-lag window, t is the timestep (i.e., the day), and fh is any arbitrary learner. Furthermore, it is well-recognized that the most critical parameter in multivariate time series forecasting is the proper determination of the number of past observations, i.e., the time-lag window. Thus, we performed a trial and error method with different sizes of multivariate time-lag windows, including 1, 5, 10, and 15 days, to choose the best possible window size that yields better performance. We concluded that the time-lag window of size 5 days (i.e., w=5) works well in providing satisfactory forecasting results while maintaining an efficient processing time. Finally, both model construction and model evaluation modules will be elaborated on in the upcoming subsection “Modeling Design”.

Feature engineering: technical indicators and feature selection

Technical indicators (TIs) are heuristic pattern-based indicators produced by a feature of interest, such as the price, volume, etc., which are usually utilized as exogenous features to aid in identifying possible hidden patterns, trends, and movements within the historical time series. In practice, the success of employing the technical indicators relies heavily on the availability of sufficient historical data, making them conceivably applicable in this experiment. In this study, we plan to explore the effectiveness of employing technical indicators as additional exogenous features to the input multivariate TASI index data. We hypothesized that we would get enhanced forecasting results when exploiting technical indicators as they can improve the interdependency between the exogenous features with respect to the endogenous target variable, hence better capturing hidden patterns and trends within the historical time series.

Accordingly, we perform feature reconstruction within the feature engineering module where the original features, listed in Table 1, are extended using an arbitrary set of financially-related technical indicators, computed with respect to the target feature of interest in this study, i.e., the close price. We choose five essential and widely adopted technical indicators in finance, namely Weighted Moving Average (WMA), Double Exponential Moving Average (DEMA), Triple Exponential Moving Average (TEMA), Hull Moving Average (HMA), and Zero-Lag Exponential Moving Average (ZLEMA), thus resulting fourteen features in total (nine original numerical features, five technical indicators features). The calculation is carried out in two steps: (1) calculating a new vector of attributes on each trading day, representing the values of the selected technical indicators where the input parameter lag in each indicator is set to 5, and (2) extending the original exogenous features with the new computed vector. For the sake of completeness, we show the formulations of the indicators mentioned above as follows: if n is the number of periods, w is the weighting factor, α is a smoothing factor, and xt is the data observation at the period t, the well-known Weighted Moving Average (WMA) is defined as (Perry, 2010):

WMA=t=1nwtxtt=1nwtalso, the Double Exponential Moving Average (DEMA) and Triple Exponential Moving Average (TEMA) are defined as (Rozario et al., 2020):

DEMA=2×EMAEMA(EMA)

TEMA=(3×EMA3×EMA(EMA))+EMA(EMA(EMA))where,

EMAt={x0t=0αxt+(1α)EMAt1t>0the Hull Moving Average (HMA) makes a moving average more responsive while maintaining a curve smoothness. It uses a combination of three WMAs with different sizes of period, and it is computed as Hull (2005):

HMA=WMAn(2×WMAn2(X)WMAn(X))finally, the Zero-Lag Exponential Moving Average (ZLEMA) modifies the Exponential Moving Average (EMA) to greatly reduce lag. It is computed as (Ehlers & Way, 2010):

ZLEMA=K(2×x0xlag)+(1K)ZLEMA1where,

K=2(n+1)lag=(n1)2

Next, to ensure that the newly added technical indicators and the original exogenous features have the desired level of relevance and interdependencies, we adopt Pearson’s correlation as a correlation-based feature selection method to return the most highly-correlated features with respect to the target variable. Pearson’s correlation can be utilized to translate the linear relationship’s strength between every two variables. It outputs values in the interval [−1, +1] where +1 implies total positive correlation, −1 total negative correlation, and 0 no correlation. As per the standard literature, for a pair of variables (X,Y), the linear correlation coefficient r is given by:

r=(XiX¯i)(YiY¯i)(XiX¯i)2(YiY¯i)2

This study found that all correlation matrices rendered from the selected ten companies have an identical strong positive correlation between only nine out of the total fourteen features, namely: open, high, low, close, WMA, DEMA, TEMA, HMA, ZLEMA, which can be observed in Fig. 4 where we visualize the correlation matrix (half diagonally since it is symmetric) from a selected company, i.e., the Saudi Arabia Refineries. The rest of the correlation matrices of the remaining eight companies involved in this work are shown in Fig. A1 in the Appendix.

Illustration of the correlation matrix applied on the preprocessed time series of the Saudi Arabia Refineries Co.

Figure 4: Illustration of the correlation matrix applied on the preprocessed time series of the Saudi Arabia Refineries Co.

Modeling design

After completing the data preprocessing phase, we begin modeling the processed multivariate time series data using a carefully-engineered Gated Recurrent Units (GRU) model. This work deliberately applies the gated recurrent units method for multivariate time series forecasting as one of the most effective RNN-based methods that provide accurate results with efficient computation. We spent quite some time experimenting with varying design architectures and hyperparameter tuning options to yield the best possible forecasting results, including Fig. 5 that demonstrates the most suitable model design and configuration given the input preprocessed multivariate time series data. This final architecture involves one GRU layer composed of 256 GRU processing units (see Fig. 1 for depicting the generic GRU processing unit) and followed by a dense layer of linear activation units equal to the size of the forecast horizon h (in this work, h=5).

GRU-based model architecture.

Figure 5: GRU-based model architecture.

In order to verify the correctness of the GRU forecasting estimates, we need to compare them with a baseline statistical method such as AutoRegressive Integrated Moving Average (ARIMA). Since we operate on the multivariate time series modeling, we employ an ARIMA variant designed for handling exogenous data, called Vector Autoregressive Moving-Average with Exogenous Regressors (VARMAX) (see subsection “VARMA with Exogenous Regressors” for more details about this variant). The comparison results will be illustrated later in the upcoming section “Results and Discussion”.

Empirical experiment

Experimental setup

The experimental environment is implemented using the Python programming language (Python 3). The neural network-based modeling framework of choice is TensorFlow v2.4.1. We also implemented the VARMAX modeling with the support of pmdarima API v2.0.0 (Smith, 2017). To ensure sound model training and evaluation, the input time series is splitted into three sets: train set starts from the first-ever trading day until 31-12-2015, the validation set is from 01-01-2016 to 31-12-2016 while the testing set is from 01-01-2017 to 01-04-2020. The experiment was performed on a workstation having Intel Core i7-10700KF (3.8 GHz) CPU, Nvidia Geforce RTX 3060 Ti (8 GB) GPU, and 64 GB of system RAM. For each company involved in this experiment and listed in Table 2, the experiment was run five times, and we reported only the best run.

The neural network-based methods seek to minimize errors between predicted and true values assessed using a suitable cost function by iteratively training and tuning the model’s hyperparameters using a gradient-based descent optimization method. For training the GRU model, we found out that Nesterov-accelerated Adaptive Moment Estimation (NADAM) (Dozat, 2016), with the learning rate of 1×104, works well during the training and the validation of the GRU model design. The mean squared error was appointed as the cost function of choice to estimate the model’s errors, while the batch size fed to the model during the feedforward process was fixed to be 10% of the training data. Furthermore, the activation functions for the GRU gates are left to the conventional default. That is, for both the update and reset gates, the activation is the sigmoid function, while the hyperbolic tangent function (tanh) is used with the recurrent activation gate. For the sake of reproducibility, Table 3 reports the hyperparameters options along with their description of each model used in this experiment.

Table 3:
Descriptions of the hyperparameters used in this experiment for reproducibility purposes.
Model Hyperparameter description
VARMAX p, d, q = (5,0,0) iterations = 100 optimizer = Limited-memory BFGS
GRU Optimizer = Nesterov-accelerated Adam (Nadam) learning_rate = 1×104 early_stopping = 20 epochs = 1,000 batch_size = 10% of train data
DOI: 10.7717/peerj-cs.1205/table-3

Evaluation metrics

The performance of neural network-based methods is generally evaluated in terms of metrics related to the models’ discriminative power. In principle, making a proper and correct selection of evaluation metrics for quantifying prediction performance enable the correct interpretation of the model’s skill and capability in making the forecasts. Therefore, we choose a regression-based metric set that assesses our models’ precision for the multi-step forecasting task. Various evaluation metrics are available for calculating the accurateness of the forecasts, but the selected metrics shown in Table 4 have been the default metrics for regression problems as well as widely adopted for evaluating the performance of time series forecasts. Here, y, x, and n represent the forecasted value, the actual value, and the number of sample data for the period, respectively.

Table 4:
Performance evaluation metrics used in this study.
Evaluation metric Equation
Mean absolute error (MAE) i=1n|yixi|n
Mean squared error (MSE) i=1n(yixi)2n
Root mean squared error (RMSE) i=1n(yixi)2n
Coefficient of determination (R2) 1sumsquaredregression(SSR)totalsumofsquares(SST)
DOI: 10.7717/peerj-cs.1205/table-4

The evaluation metrics shown in Table 4 are commonly used in predictive modeling. That is, mean absolute error (MAE) measures the average error magnitude in a set of predictions using the absolute value. This type of error measurement is proper when measuring prediction errors in the same unit as the original series. The mean squared error (MSE) formula replaces the absolute value with a square, giving large errors a relatively greater influence on MSE than smaller ones. MSE has nice mathematical properties, making it easier to compute the gradients. Similarly, root means squared error (RMSE) measures the average magnitude of the error but with a square root on the average squared differences between prediction and actual observation (MSE and RMSE are monotonically related, i.e., through the square root). Finally, the coefficient of determination (R2) is the goodness-of-fit measurement of how well the regression line approximates the actual observation at the 0–1 scale (or 0–100%) where a higher value is considered desirable as opposed to the MAE, MSE, and RMSE where lower is deemed desirable. The sum squared regression (SSR) is the sum of the residuals squared, and the total sum of squares (SST) is the sum of the distance the data is away from the mean all squared.

Results and discussion

This section analyzes and demonstrates the performance of our proposed methodology and modeling design discussed in “Methodology”. All results shown in this section are based on a separate test set of preprocessed multivariate-based time series used for neither training nor hyperparameter tuning. The test time series is set from 01-01-2017 to 01-03-2020. To fully evaluate the effectiveness of the resulting forecasting models, we collect the predictions of the five-day horizon period for each out-of-sample observation and evaluate them using the chosen metrics discussed in subsection “Evaluation Metrics”.

Recall that one central theme in this work is to explore the effectiveness of employing technical indicators as additional features incorporated into the multivariate-based TASI dataset to improve the accuracy of the short-term multi-step closing price forecasts (see subsection “Feature Engineering: Technical Indicators and Feature Selection”). We hypothesized that we would get enhanced forecasting results when adding technical indicators to improve the interdependency between the exogenous features, hence better capturing hidden patterns and trends within the historical time series. Afterward, the relevance between the exogenous features is inspected using Pearson’s correlation coefficient as a correlation-based feature selection method to return the most highly-correlated features with respect to the endogenous target variable.

This experiment applies an RNN-based gated recurrent units (GRU) modeling design tailored to obtain the best possible forecasting results using the TASI dataset. We compare the GRU model’s results with the VARMAX model’s results, with the latter being the baseline. To enable the most helpful comparison, we perform two experiments per model: one by operating on the original multivariate time series data, i.e., preprocessed version of the original numerical features listed in Table 1, and the other one by the selected feature set resulting from the feature engineering module’s step elaborated previously in subsection “Feature Engineering: Technical Indicators and Feature Selection”, which will be marked with the asterisk (*) symbol in the performance comparison Table 5.

Table 5:
Performance comparison between the baseline VARMAX model and the proposed GRU model applied to the selected 10 companies listed in Table 2, where the bold cells indicate the better result.
The asterisk (*) in front of the model implies that the model was trained on data preprocessed using the proposed feature engineering step discussed in “Feature Engineering: Technical Indicators and Feature Selection”. For all results, the selected sizes of time-lag window and the future horizon were set to and, respectively (i.e., five trading days, see “Temporal Multivariate Processing”).
Company (Sector) Model Evaluation metric
MAE MSE RMSE R2
AlRAJHI (Financials) VARMAX 1.0242 2.0669 1.4377 0.9806
VARMAX* 1.0253 2.07 1.4387 0.9806
GRU 3.0632 203.4262 13.9463 −0.911
GRU* 0.8097 1.3925 1.1484 0.987
SARCO Co. (Energy) VARMAX 1.0733 2.6944 1.6415 0.8957
VARMAX* 1.056 2.6497 1.6278 0.8974
GRU 1.2986 3.7538 1.9169 0.8547
GRU* 0.805 1.65 1.2403 0.9362
TASNEE Co. (Materials) VARMAX 0.4295 0.339 0.5822 0.9616
VARMAX* 0.4285 0.3302 0.5747 0.9626
GRU 0.4164 0.3201 0.5521 0.9638
GRU* 0.3479 0.2345 0.4719 0.973
SPIMACO Co. (Health Care) VARMAX 0.5103 0.5645 0.7513 0.9551
VARMAX* 0.5021 0.5525 0.7433 0.956
GRU 0.4871 0.4847 0.6822 0.9614
GRU* 0.3994 0.3678 0.5923 0.9707
Saudi Electricity Co. (Utilities) VARMAX 0.4314 0.3747 0.6121 0.959
VARMAX* 0.4223 0.3617 0.6014 0.96
GRU 0.3304 0.2419 0.4815 0.973
GRU* 0.3198 0.2262 0.4636 0.975
EMAAR EC Co. (Real Estate) VARMAX 0.3035 0.1868 0.4322 0.9793
VARMAX* 0.2967 0.1851 0.4302 0.9795
GRU 0.3651 0.2177 0.4535 0.9758
GRU* 0.2282 0.1187 0.3304 0.9868
FITAIHI GROUP (Consumer Discretionar) VARMAX 0.1852 0.0713 0.267 0.922
VARMAX* 0.1776 0.0686 0.262 0.925
GRU 0.1836 0.0615 0.2407 0.9329
GRU* 0.1341 0.0429 0.2016 0.953
ADC Co. (Industrials) VARMAX 0.2365 0.1295 0.3599 0.94
VARMAX* 0.2296 0.1231 0.3508 0.943
GRU 0.2478 0.137 0.3606 0.936
GRU* 0.1818 0.0832 0.2805 0.961
GACO Co. (Consumer Staples) VARMAX 0.2562 0.1336 0.3655 0.932
VARMAX* 0.238 0.1232 0.351 0.937
GRU 0.5177 0.3715 0.5774 0.8095
GRU* 0.2488 0.1167 0.3298 0.9401
STC Co. (Communication Services) VARMAX 1.7461 6.2713 2.5042 0.969
VARMAX* 1.7603 6.2301 2.496 0.969
GRU 6.9844 69.0844 8.2641 0.656
GRU* 1.4412 4.6063 2.0916 0.977
DOI: 10.7717/peerj-cs.1205/table-5

As seen in Table 5, all results shown in this table are considered highly successful and effective in producing short-term forecasts. We can see that the experiment run on the preprocessed data conducted by the proposed feature engineering step (labeled by the * symbol) yields overall better forecasting results, with the best forecasting outcomes resulting from the GRU model (stressed in bold). We can conclude that the proposed methodology works best when paired with the GRU-based modeling option. This aligns with our initial intuition that RNN-based methods mostly lead to better forecasts than linear methods such as ARIMA and its variation. In addition, Fig. 6 demonstrates the GRU model’s forecasting results (GRU* only) for one company. The top subfigure (a) illustrates the historical daily stock price from the first-ever trading day until December 2016, followed by the next-day predictions using the unseen testing time series, from 01-01-2017 to the end of the company’s time series record (see. how the test and forecast estimates overlap due to the higher predictions’ precision). The bottom subfigure (b) shows the multistep closing price forecasts for the next five-days stock prices. We can also conclude that the model’s high precision predictions for the short-term forecasts, thus giving more insightful knowledge and better futuristic decision making. The forecasting illustrations of both the GRU model and the VARMAX model from the remaining nine companies selected in this experiment are shown in Figs. A2 and A3, respectively, and placed in the Appendix section.

An illustration of the GRU model’s forecast results for Al Rajhi Bank.

Figure 6: An illustration of the GRU model’s forecast results for Al Rajhi Bank.

Conclusion

This article introduces an AI-enabled forecasting methodology consisting of a problem-tailored feature engineering and a well-architected GRU model design to yield short-term closing price forecasts of the Saudi stock market evaluated using a set of regression-based metrics and benchmarked with a baseline model, i.e., the VARMAX model. The experimental results based on the large-scale TASI index data are also presented and visually demonstrated. The empirical analysis revealed that our proposed forecasting methodology outperforms the most relevant existing studies to this work known to us by a significant margin. Besides the high precision results, it is due to the ability to generate highly accurate multi-step forecasts of the Saudi stock market for the next five business days, as opposed to only the next-day forecast conveyed by existing studies related to this work. In addition, this study demonstrates the successful power of the gated recurrent units (GRU) method in modeling historical multivariate-based stock market data. We plan to investigate further the improvement achieved by this work toward other related problems using a fine-grained (hourly time series) dataset with various feature selection strategies.

Supplemental Information

Source code used to experiment this work.

DOI: 10.7717/peerj-cs.1205/supp-2

The dataset used in this experiment.

DOI: 10.7717/peerj-cs.1205/supp-3
The selection of h = 5 is because the stock trading in Saudi Arabia is five business days per week.
14 Citations   Views   Downloads