Global Solar Radiation Forecasting using Artificial Neural Network and Support Vector Machine

Global solar radiation (GSoR) forecasting involves predicting future energy from the sun based on past and present data. Literature reveals that not all meteorological stations record solar radiation, some equipments are faulty, and are not available in every location due to high cost. Hence, the need to predict and forecast using predictors such as land surface temperature (LST). Satellite data when were used to complement ground-based stations have been yielding good results. Different artificial intelligence (AI) methods such as Support Vector Machine (SVM) and Artificial Neural Network (ANN) present different forecasting performances. Motivated by existing literature-related contradictions on the performance superiority of ANN and SVM in GSoR forecasting, the two techniques were compared based on several statistical tests. Experimental results show that ANN outperformed SVM by 2.9864% accuracy, making it superior in the forecast of GSoR.


Introduction
Global solar radiation (GSoR) forecasting is the process of predicting future energy from the sun based on past and present data. Literatures reveal that not all meteorological stations record solar radiation, some equipment are faulty thereby given inaccurate measurements. Other challenges include nonavailability of measuring equipment in every location due to high cost. Hence, the need to predict and forecast using satellite-derived predictors such as land surface temperature (LST). Like other time series, GSoR forecasting helps in mitigating risks in decision making usually made by government policymakers, energy experts, etc., when planning to regulate power usage, optimise energy balance and environmental effects, and/or increase photovoltaic (PV) plants [1].
Artificial intelligence (AI) techniques are preferred over statistical methods in forecasting GSoR due to their ability to extract complex nonlinear relationships in data. Common AI techniques in use include support vector machine (SVM), extreme gradient boosting (XGBoost), artificial neural network (ANN), random forest and gradient boosted regression trees (GBRT) [1][2][3][4][5][6][7][8][9][10] among others. Despite the use of these AI techniques in forecasting GSoR, however, a contradiction exists in reported literature on the superiority of the popular techniques -ANN and SVM [2,[11][12]. The result reported by Quej et al. [13] disagrees with that of Hamilton et al. [2]. Motivated by this rationale, this study compares ANN and SVM in forecasting GSoR.

Study Sites
The study site comprises three locations from Queensland, Australia (Table 1). Australia was selected because of its potential for solar, being characterised by high insolation, low cloud coverage, and low rainfall. The three study sites are: St. George, St. George Post Office, and St. George Airport. The model is trained using the first two sites and forecasted using the third location (Table 1). Table 1. Characteristics of study sites with monthly average satellite-derived land-surface temperature (LST) and solar radiation (GSoR) for training (2008 -2018) and forecasting (2019).
Land surface temperature (LST) data were obtained from National Aeronautics and Space Administration (NASA) built satellite, and global solar radiation (GSoR) from Scientific Information for Land Owners (SILO) for twelve years -eleven years for model development and one year for crossvalidation. The primary predictor is the LST while the target variable is GSoR. Other accompanying site-related features were the altitude, latitude, longitude, and months (which is the solar periodicity).

Data pre-processing and normalisation
In this study, the data were checked for availability of records containing missing and zero values and were replaced with the mean of the non-zero values as a rationale value because deleting such records is not recommended. Then, min-max normalization was applied to scale the data in the range 0 to 1. Scaling ensures that variables (attributes) with large values do not dominate smaller ones to bias the result.

Evaluation metrics
Although the best performance metrics in forecasting are MAPE and accuracy, this study also included other indicators such as MSE, R 2 , and RMSE. In this study, the R 2 , RMSE, and MAPE are used at the training stage to determine the best configuration's performance. At the final forecast stages, the RMSE, MAPE, and accuracy are used to compare and determine the best-implemented model.

Artificial Neural Network
The ANN was configured with LST obtained from the moderate resolution imaging spectroradiometer (MODIS) of the NASA satellite, which is used as the main predictor together with study-site parameters such as longitude, latitude, altitude, and solar period (months) as the model's regression inputs to forecast the target variable, GSoR (Figure 1). To obtain the value of neurons at the hidden layer that gives a competent network, a rule of thumb was applied. It suggests that, to prevent over-fitting, the neurons in the hidden layer (Nh) be kept below the value:

where:
Ni is the value of input neurons; No is the value of output neurons; Ns is the value of samples in the training data set; Į is an arbitrary scaling factor usually 2-10; Ni = 5, No = 1, Ns = 264, and Į=2 (being most common value). Hence, Nh is 22 (using Equation 1).

Support Vector Machine
The same regressors were used to configure a regression version of support vector machine (SVM) which uses a linear model to create a hyperplane to implement nonlinear class boundaries in a highdimensional feature space through some nonlinear mapping input vectors ( Figure 2). Regression problems in SVM can be applied to include distance measurement by introducing the alternative loss function [18]. Support vector regression (SVR) model which is the regression form of SVM was setup with a cross validation value of 30. The RBF kernel was used being common in the literature. The output provides the total MSE and R 2 . To get the best of the experiment, the kernel was first configured with default parameters C, Ȗ, ‫,ܭ‬ to be 1, 0.2, and 0.1, respectively. Subsequently, a grid search tuning technique was applied to obtain optimal value for cost and gamma (C, Ȗ).

Artificial Neural Network Implementation
The maximum number of neurons for the hidden layer (Nh) were found to be 22. Therefore, trials were made for the values of the hidden neurons from 2 to 22. Table 2 shows the two models of ANN using Tanh and logistic activation functions.   In modelling, forecast performance is the most important, and priority is giving to the lowest value of RMSE or MAPE. The closer the value of RMSE to zero, the better the model and vice-versa to 1. The RMSE in M22 (i.e., 0.0883) was accepted being the lowest and justified by the value of MAPE (0.1431). Consequently, Figure 5 presents a one-year (2019) forecast using M22 (Table 2).

Support Vector Machine Implementation
The grid search tuning yielded the best performance parameters for cost and gamma as 1.04 and 0.15, respectively. Table 3 shows the SVM forecasting performance using grid search tuning parameter. The tuned parameter values were used to train the SVM ( figure 4). Furthermore, GSoR was forecasted using this tuned RBF-SVM model for one-year (2019). The result of the prediction yielded an R 2 of 0.9112 at an RMSE of 0.1014 (Table 3, Figure 5).

Comparison between ANN and SVM
The summary of the two techniques is presented in Table 4 -putting the best models' values together using the R 2 , RMSE, MAPE, and accuracy (in forecast). To compare the forecasted and actual monthly GSoR at the cross-validation period (2019), Figures 5  and 6 plots the models' forecasts and errors, respectively. Note that, the errors were calculated by taking the absolute values of the difference between actual and forecasted GSoR (i.e., |GSoRA -GSoRF|), each month. Notably, the SVM model recorded the highest error magnitude (MAPE=9.0676MJm -2 month -1 ) which led to its low accuracy in forecast. High MAPE produces low accuracy and vice versa. The forecast errors are presented in Figure 6.  The result in Table 4, Figure 5, and Figure 6, presented ANN as a superior model over SVM having recorded a higher forecast accuracy of 93.9188% at a lower MAPE of 6.0812MJm -2 month -1 .

Conclusions
Two separate analyses were carried out and discussed. Data extraction processes as it relates to MODIS and SILO was the first task. Secondly, SVM and ANN models were implemented to forecast GSoR in comparison. The global nature of the models is in fitting one model for the training locations instead of one for each. The result of ANN's forecast was an R 2 of 0.9024, a MAPE of 6.0812MJm -2 month -1 , and an accuracy of 93.9188%. On the other hand, the resultant forecast from the grid search tuning of SVM gave an R 2 of 0.9112, a MAPE of 9.0676MJm -2 month -1 , and an accuracy of 90.9324%. Conclusively, ANN outperformed SVM by 2.9864% accuracy, making it superior in the forecast of GSoR.