Alternative prediction methods in the stock exchanges of Thailand

This paper is conducted to substantially do the two alternatives between the traditional statistical methods such as ARMA and HW models, and artificial intelligence (AI) contains KNN and ELM, respectively. To scope the main object of the paper, SET indexes are collected as the main financial variable, which are 5,472 daily observed days during 9 September 1997 to 11 June 2018. Technically, the cross-entropy (CE) analysis, MSE and RMSE calculations are computationally employed to clarify the resolution of the two computations. The empirical results state that the AI prediction can be a substitution replacing the traditional estimations, and this can strongly confirm that machine learning (ML) algorithms are continuously interested, and they are recently becoming a powerful tool for modern econometric forecasting.


Introduction
Undeniably, Thailand economic system has been driven by financial sectors along thirty years after the great financial crisis in 1998. The financial market and banking sectors in Thailand are recently becoming stronger as well as skilled investors are continuously increased. They inevitably need a sharp tool to guarantee their prediction. However, an enormous amount of information causes the forecasting to be elusive, and many disadvantages of P-value are likely to make predictive results become suspicious. Moreover, economists and financial predictors have been struggling to employ traditionally statistical computations to provide their results for stock selections and risk classifications. Consequently, the development of methods to more precisely predict and make the financial market's trends become sensible is very crucial in financial analyses.
Interestingly, a novel concept called "Machine Learning (ML)" is to use a type of computation-al artificial intelligence (AI) that learns when exposed to new data and let the machine deal with data by itself. The implemented machine learning classifiers named K-Nearest Neighbor (KNN) and Extreme Learning Machine (ELM) have been employed to overcome restrictions in traditional statistical forecasting. The performance of these classifiers has been compared using different matrices that include mean absolute error, root mean square error and accuracy validations. Considering into the computation, although ML has been being continuously applied to predict stock markets, but the challenging question is how we can substantially present the differences between AI computations and traditional manual estimations, especially in the financial econometric field. Consequently, this paper is determinedly conducted to solve the question. Moreover, since the advancement and availability of forecasting technologies and software, AI and traditional computations can be quickly provided for clarifying the prediction in one machine. Historically, machine learning algorithms have been frequently used to be the predictor for financial sectors since several years ago. For instance, [1], [2], [3], [4], and [5]. On the other hand, the prediction by traditional statistics, employing the ARMA model, has been famous for several decades, especially in financial econometrics. For example, [6], [7], [8]. As seen in literature, the two total different methods for stock forecasting is rare to see that they are either substituted, or which one is sensible to apply for real predictions, or they are both to be used. Hence, the first propose of this paper is to evaluate the forecasting estimations between an AI algorithm and traditionally statistical tool in the stock exchange of Thailand by using both MSE and RMSE based on out-of-sample testing, and the second is to compute the cross-checking application called "frequenting cross-entropy calculation" for giving a suggestion to possible select the collection of econometric tools for financial forecasting.

Research methodologies
The method of the paper is divided into two major parts, including data classification and data prediction. The former section is to employ the data stationary test, ADF unit testing, based on Bayesian inference. From the Bayes' theorem, we obtain the posterior density π(θ|y) that refers to a probability distribution of parameters θ as [9], π (θ |y)α L(y|θ )π (θ ).
(1) As shown in Equation (1), π (θ ) is the prior density for θ. Bayesian modeling, as mentioned, requires a joint distribution, which is conveniently factored into a prior distribution for the parameters, and the complete-data likelihood function expressed in Equation (2).
where y refers to the time-series data of n observations, y = (y1,y2,...yn) and θ stands for the estimated parameters. According to [10] and [11], Bayesian statistics considers hypotheses regarding multiple parameters by adapting Bayes factor comparisons. Expressly, the ADF test analyzes the null hypothesis that a time-series data is I(1) against the alternative I(0), assuming that the dynamics in the data have an ARMA structure [12]. The ADF test is relied on estimating the regression test, which is presented in equation (3).
The latter part is to provide the various computations of traditional econometric models and AI algorithms. Fundamentally, the analysis of time series with ARMA (p,q) models, possibly overlapping uses in financial data. The ARMA model including the measurement of errors can be expressed as the following equation [13].
Another traditional statistical regression is the Holt-Winters exponential smoothing (HW) method. This application is extended to be able to accommodate multi-seasonal patterns [14]. The relationship of these three components can be expressed by the following equations.
The  3 It is notable that the three equations require constant smoothing,  ,  , and  . They take values between 0 and 1. The goal of Holt-Winters smoothing of series is to use the same for forecasting future time series values [15].
Speaking to Machine Learning (ML) methods, the analysis of machine learning algorithms is well known as computational learning theory. In this paper, R software is the major application for the computation. There are essential approaches and crucial processes for ML described as follows:  The decision tree learning.  An artificial neural network (ANN) learning algorithm.  Inductive logic programming (ILP).  Support vector machines (SVMs).  Clustering analysis for unsupervised learning, and a common technique for statistical data analysis.  Bayesian networks for performing algorithm inferences and learning.  Representation learning (RL) for underlying factors of variation that explain the observed data [16].  Genetic Algorithm (GA).  Rule-based machine learning is a contrast model to other machine learners that commonly verifies a singular model for predicting [17].  Feature selection approaches [18]. For doing the computation of machine learning approaches in this paper, namely, k-nearest neighbors (kNN) and Extreme Learning Machine (ELM) are selected to be the algorithms in AI computations for financial data. The former is a non-parametric method for both classification and regression problems. Data is performed as its k-nearest to others in the feature space. For solving the problem, data observations are assigned to the majority class of its nearest observations. The latter is the extreme learning machine (ELM). ELM was originally developed for the singlehidden-layer feedforward neural networks and then extended to the "generalized" SLFNs which may not be neuron alike [19], where   x h is the hidden-layer output corresponding to the input sample x and  is the output weight vector between the hidden layer and the output layer [20]. For proposing the learning algorithm for SLFNs, a simply extreme learning machine can be summarized as follows [21]: Algorithm ELM: Given a training set , an activated function ) (x g and hidden node number Ñ can follow the upcoming steps.
Step 1 is to randomly assign input weight i w and bias i b , i = 1,…, Ñ .
Step 2 is to calculate the hidden layer output matrix H.
Step 3 is to calculate the output weight  , [20], [21]. For selecting AI and traditional statistical predicting models is provided by the value of the mean square error (MSE) and root mean square error (RMSE). Additionally, this section regards the model validation by employing the frequenting cross-entropy analysis, which can help to select the predicting model easier. The formula is explained in Equation (12).
where   Y X H , is cross entropy, which can measure the value of relatedness between the random variable X and Y. P(x) is the probability of occurrence of outcome x of variable X, and P(y) is the probability of occurrence of outcome y of variable Y [22].

Descriptive data
Fundamentally, collected financial trends were observed as daily data during 1997 to 2018 (5,742 observations). Data was transformed to be the rate of log return values, and it is confirmed to be the normal distribution by Jarque-Bera normality testing. The graphical feature and normal distribution testing are shown in Figure 1.

The result of data stationary checking based on Bayesian inference
The ADF unit-root testing based on Bayes statistics is employed to guarantee that suspicious information would not be occurred during the long-run period. However, the result stated that the empirical value calculated from the Bayes factorial comparison between the non-stationary model (Model1) and stationary model (Model2) is less than 0.01, which equals 0.000866. The result is also represented in Table 1.

The forecasting resolutions estimated by traditional econometric models and AI algorithms
Interestingly, the empirical result of the evaluation shows the AI prediction can be substituted the traditional estimation. In other words, both MSE and RMSE values estimated from the KNN regression are near to the points of the counterpart, which are 0.00000719 and 0.00268160, respectively (as seen details in Table 2). These values are closed to the predictive results estimated by the ARMA model and Holt-Winters, which are 0.000007205, 0.002684199, 0.0000073, and 0.0027049, respectively. However, the NNT model is not suitable for predicting in this case.

The forecasting resolutions estimated by traditional econometric models and AI algorithms
Another essential part of the paper is the cross-entropy (CE) analysis, which is employed to clarify the collection of predicting models. In the case of traditional modeling prediction, the result presented in Table 3

Conclusions
The complicated issue regarding the possibility of the replacement AI algorithms to do the econometric prediction is successfully clarified in this paper. To answer the argument which states these artificial computations can be substantially substituted traditional statistical applications, the comparison by using MSE and RMSE based on out-of-sample testing and the cross-entropy (CE) analysis are employed. Experimentally, the financial data trends of Thailand Stock Exchange (SET) were observed as daily details during 9 September 1997 to 11 June 2018. This data can be typed as the big data analysis. Empirically, the results were obvious that the trend using AI algorithms to do an efficient prediction is recently being interested. The AI calculations can be effectively replaced the counterpart to compute linear forecasting. This can be supported by the error terms estimated by both AI algorithm (KNN) and ARMA model are very similar. Moreover, the CE analysis simultaneously confirmed the KNN algorithm and ARMA model are selected to be the most efficient choice for predicting the collected data. However, the restriction of the paper is to estimate only univariate variables. It is interested and it should be the future study on employing AI algorithms and traditional models to estimate the multi-dimensional prediction in the big-data analysis.