Evaluation of generated bootstrap weight in layer perceptron for Southeast Asia visitors during COVID19 outbreak

The pandemic COVID-19 effected the global business sector include the tourism industry. Forecasting the visitor arrival from Southeast Asia is a vital for organized the economy impact at Malaysia state, particularly during this outbreak. Neural network family has been substantial approaches in tourism and the economy. The layer perceptron is a part of the neural network model which is used to produce accurate forecasting. However, the inherent biasness in the perceptron algorithm could lead to an underfitting problem which eventually leads to poor performance of forecast accuracy. The motivation of this study is to improve the accuracy of single-layer perceptron in forecasting the Southeast Asia visitors in Malaysia during COVID19. In this study, the bootstrap weights are generated at the hidden layer to reduce the biasness in output layer. The forecasting result of generated bootstrap weight model is compared with conventional perceptron model in terms of small bias estimation. The statistical results revealed that the generated bootstrap weight in perceptron provides accurate forecasting for Southeast Asia visitors during COVID-19.


Introduction
Economy impact commonly in tourism application is to estimate the income, changes in regional spending and employment associated with tourist facilities, policy, destinations and events [1]. During the pandemic COVID-19, there is an effect in tourism industry based on the supply and demand curve due to the Malaysia Movement Control Order (Malaysia MCO) announced its first phases on 18 th March 2020. Due to this pandemic, it also causes the economic activities to slow down and eventually makes the economic collapse if inappropriate decision management to be taken. According to [2] report, the largest contributor of Southeast Asia (ASEAN) visitors to Malaysia recorded negative growths for each market by region is -37.3%. Statistically, the top ten tourists from ASEAN countries arrived in Malaysia for the first quarter of 2020 are Singapore (1,541,591), Indonesia (701,142), China (401,067), Thailand (331,417), India (153, 727), Brunei (135, 412), South Korea (118, 571), Japan (73, 154), Australia (72, 047) and the Philippines (64, 257). Based on this report, it showed that Singapore is main contributor of tourist compare to Indonesia and China. However, this number shows a significant difference among ASEAN visitor when compared to year 2019's statistics in region. The IOP Publishing doi: 10.1088/1742-6596/1988/1/012098 2 coronavirus pandemic is causing extensive disruption in the worldwide economy [3] [4]. There are several services such as airplane operators, tour agents, resort, cruises and dining services have reduced their activities to a minimum range [5]. Furthermore, through financial crisis many employees in the tourism industry had difficulties at work, facing reduced work hours and fear of fired during the pandemic [6]. Therefore, it is important to have an accurate prediction on Southeast Asia visitors to aid the government to design a new norm tourism policy-making. New policy-making would help to fulfil the visitors' demand by turning away their popular tourist spots, for example Malaysia.
Predicts a visitor to Malaysia is important in the Malaysia tourism industry. It is eventually helping the state government and authorities in decision making, evades the surplus and disorganization of tourism resources, thus minimize uncertainty and risk [7]. An Artificial Neural Network (ANN) has been applied in many fields of research in a real-world application, and interestingly, ANN is used to forecast the tourism industry [8][9][10] [11]. According to [8], it has been proven that predict tourism demand using ANN resulted as an excellent model compares with the multiple regression model and ARIMA model. Most of the researchers agreed that nonlinear method of ANN performs better than the linear methods in modelling economic pattern [12] [13]. However, there is a major problem when dealing with ANN where the presence of high variation which eventually led to overfitting issue in supervised machine learning. [14] claims that overfitting occurred when the model learning too well includes the variance and noise in the data trained and could not generalize onto the unseen data.
The presence of high variance in the model can underestimate prediction estimator. Motivated by this reason, this study embedded a hybrid method to overcome this drawback. In order to produce the accurate and high validity of model prediction, this study uses ANN and hybrid it with a nonparametric aggregate bootstrap approach. ANN has problems of inability generalization, noise, variance, bias that make ANN use the technique of hybridization to improve its accuracy and validity estimation [15]. Meanwhile, the bootstrap is a computational procedure by using the resampling with replacement method with the purpose to decrease the uncertainty, i.e., variance and bias of particular model estimator [16]. The bootstrap technique is to create the confidence interval for prediction using the small size sample and simply estimate the generalization of errors in ANN [17]. This study consists of two objectives to achieve which are; i) to reduce the error in output layer by generated aggregate bootstrapping weight in a hidden layer to overcome the overfitting problem, ii) to investigate the accuracy and consistency of the proposed hybrid model using Monte Carlo simulation, and iii) to predict the proposed hybrid model on Southeast Asia visitor sample data for 12 months of the year 2021.

Artificial Neural Network
An artificial neural network (ANN) defines as a computational model based on the function and construction of a biological system. One major application field of using the ANN model is prediction [18]. The ANN normally comprised of three different layer which are; i) an input layer that received the input from the external sources; ii) one hidden layer to produce output throughout the activation function; and iii) the final layer is the output layer. Thus, ANN output is compared to the desired output and if there is a dissimilar value shows there is some error that should be modify using backwards propagation to adjust the weights of input [15].

Perceptron
In this study, the layer perceptron is a feed-forward propagation network. The particular reason for the circumstance of simple application and excellent model, ANN being widely used in prediction or forecasting [19] [20]. The architecture of the perceptron is ascertained by the numbers of layers and also the nodes in each layer [21]. Every single mapping from nodes in each layer to the next layer is IOP Publishing doi:10.1088/1742-6596/1988/1/012098 3 known as weight and it is determined by a learning neural network algorithm [22]. The figure below shows the mapping starts at input nodes to each of nodes in hidden layer and ends to output layer.
The prediction of neural network in figure 1 can be statistically written as follows: where parameter is the output of equation and obtained at an output layer, i.e., q = 1 ; w is often called the connection weight; b is the bias and represents the residual at time t of neural network model Meanwhile 1 ( 1 +∑ ) and 2 + 2 [∑ 1 (.)] obtained from input layer and hidden layer respectively. The 1 and 2 represent the transfer process, namely as activation function, at input layer and hidden layer respectively. This activation function can be simplified in terms of as follows: where m is neuron in the layer l. The equation (2) can be calculated using a common sigmoid function, as follows: where the logistic function is often used as the hidden layer activation function to calculate the probability score from the ANN model's output.
Based on equation (1), 1 is obtained from mapping structure of input layer onto hidden layer, while 2 is obtained from mapping structure of hidden layer to output layer. 1 and 2 are bias estimation value from hidden layer and output layer respectively.

Bootstrap Aggregating
Bootstrap aggregating is noticeable as bagging in ensemble method that introduced by [23]. The main contribution of this method is to reduce the variance to avoid the occurrence of overfitting problem [24]. The overfitting in ANN demonstrates that the training data is so well fitted where it includes  memorizing the noise and outlier data. Moreover, overfitting could be a poor generalization property of the ANN model. [25] claimed that the sample of bootstrap gets throughout resample with replacement. Let M be a population with unknown probability to G function and their sample, expected to be independently and identically distributed (IID) with mean and variance 2 .
while the and represent as input vector and output vector respectively with i=1,…,n. Let is a bootstrap sample obtained from Equation (4) which can be concluded as follows: where n is the size of sample ̂i draw from empirical distribution function ̂ by place a mass 1/n for every 1 ,…, to get a dataset of bootstrap samples [25][26].

Bootstrap the Perceptron Layer
The dataset of Southeast Asia visitor to Malaysia in monthly was employ in the research starting from January year 2000 until September year 2020 and generated using R-Language software. The data itself divide by 70% for train data and 30% for test data. The train data from year 2000 to year 2014 were used to build a model neural network while the remaining years for test data generated to validate the model. This research comprises a three-stage involve as follows: • ). This matrix sample calculates the row mean, mq to obtain the bootstrap sample, 2 * and replace them with the original weights at first hidden layer.
• Stage 3: Mapping the new weight, 2 * and bias 2 to all the neuron nodes of the hidden layer by turn on the activation function * = 2 + [∑ 2 * 1 * ( 1 * + ∑ 1 * . )] using equation (2). The result of mapping as follows: where the * is the output for model prediction of bootstrap the perceptron layer. The performance of proposed hybrid model of aggregating bootstrap embedded onto ANN is evaluated in terms of model's accuracy estimation. In this study, the accuracy of proposed hybrid model is evaluated using mean squared error (MSE) and mean absolute error (MAE) as follows: where * represents the prediction output of bootstrap ANN model and is actual output, . Also, in order to fulfil the simulation process, bias estimation is calculated using the formula as below: where k is the actual value.    [27][28]. The time plot shows consistently increasing and the highest peak occurred in the year 2014 since the visitors rose dramatically because of the special event. In conjunction with Visit Malaysia Year 2014 had to welcome the visitors from all over the world and the ASEAN countries continued to be the largest contributor of visitors' arrival to Malaysia. There were many activities organised throughout the year and the investment certainly paid off to Malaysia economy. Five years later Malaysia had announced the starts of MCO because COVID-19 pandemic had hit many countries and it's also given an impact to the ASEAN visitors arrival figure as the growth shows descend rapidly in the year 2020. Thus, to handle the unexpected loss it is important for this study to help the government of Malaysia to prepared for the uncertainty ahead by modelling the accurate predicting statistical model for visitor arrival and also a wellorganized of decision making during a hard time because it will affect the economy time series. Table 1 shows the descriptive statistics of the ASEAN visitors volume data. The skewness in arrival visitor had skew to the left. When conducting the individual distribution identification test it was found that ASEAN visitors series approximated to Gamma distribution with Anderson-Darling test = 2.031, Kolmogorov-Smirnov is 0.03721 and Chi-Squared is 36.592. The parameter of shape and scale of approximate Gamma distribution are 0.257 and 3.388E+6 respectively. The dataset of visitors from ASEAN countries from year 2000 to 2020 was divided into two parts before proceeding the learning process in neural network. The given training data form year 2000 to year 2014 that contains 15 years data point in monthly while the testing data for prediction  accuracy within year 2015 to year 2020. It is important to achieve the accuracy of forecasting using neural network as shown in figure 4, the best-fitted model manage to provide a composition with N (15-1-1) . These consist of one input layer with 15 neurons inside, one hidden layer and one final output of neural network. This model is complex since there were many neurons and make the procedure of weight-update complicated. The black lines show the connection and the value of weight in every layer of perceptron while the blue line shows the bias in every neuron in layers except input layer. The alternative way to get over this problem, the technique called backpropagation algorithm was used to update the weights and minimize the error of the output model.
Overfitting is detected in this perceptron of neural network. Based on the figure 5, the graph of training loss versus test loss to diagnose the performance of the model. By reviewing this plot, it can help to inform about possible configurations to get better performance from the model. This model shown there was high variance on the training data which is the inability to be generalized to the new data because it has difficulty to read the test data that may be different from the training data. The overfitting eventually increases the poor predicting for test sample due to high variance [29]. On the other hand, the training data also perform better of smaller error than the test data, it means that the model is likely overfitting. The approach of aggregate bootstrap as the further method to improve model and reduce the effect of overfitting was possible to apply into this study.    Figure 12 indicate the test data of neural network from year 2015 to 2020 and different written from training set data as x2015, x2016, x2017, x2018, x2019 and x2020. Based on this figure, it is found that there was an existing the overfitting problem in this model neural network, so that the study proposed the ensemble technique which bootstrap or known as bagging to overcome this problem by resampling the weight in hidden layer to output layer with replacement. Based on figure 13, the bweights1, bweights2, bweights3, bweights4, bweights5, bweights6, bweights7, bweights8, bweights9, bweights10, bweights11, and bweights12 are refers to the bootstrap weight that has been improved the new weights and boutput also refers to update output of a new neural network.     table 3, the result of smaller error between the model ANN and model hybrid BANN shows the different value almost to 0.003. This means, the measured the squared average between the real data and the predicted data using hybrid model designate small scale of error in the model. Thus, in term of accuracy performance, the hybrid model has a better performance in predicting the ASEAN visitor.

Conclusion
This paper has proposed the process of modelling the prediction the Southeast Asia (ASEAN) visitor to Malaysia during the COVID-19 outbreak using a proposed hybrid model of bootstrapping hidden layer of neural network. Data chosen to build the hybrid model were acquired from the Ministry of Tourism Malaysia. The fundamental focus on aggregate bootstrapping the hidden layer because of the the existence of overfitting in the neural network model. Overfitting recognized as the major problem in ANN when the model learns the noise and details in training data that its negatively impact the performance of the testing data. By resample, the weights in the hidden layer could decrease the variation and produced the accurate predicting model. The analysis result showed the hybrid model approach has a small variation estimation and indicate that bootstrapping the hidden layer improve the problem of overfitting neural network model. Besides that, the hybrid model has better prediction accuracy to predict the fluctuation trends of ASEAN visitor in the year 2021.