Chaotic analysis and short-term prediction of ozone pollution in Malaysian urban area

This study focuses on the analysis and prediction of hourly ozone (O3) pollution in one of Malaysian urban area namely Shah Alam through chaotic approach. This approach begins by detecting the chaotic behavior of the O3 pollution using phase space plot and Cao method. Then, the local mean approximation method is used for prediction purposes. The O3 pollution observed at Shah Alam is detected as chaotic in behavior. Due to the chaotic behavior, only short-term prediction is allowed. Thus, the one-hour ahead prediction is done through the local mean approximation method. The prediction result shows that correlation coefficient value between the observed and predicted time series is near to one. This excellent prediction result shows in particular that the local mean approximation method can be used to predict the O3 pollution in urban area. In general, chaotic approach is a useful approach that can be used to analyze and predict the O3 pollution time series.


Introduction
Ozone (O3) is an air pollutant that endangers health [1,2]. Therefore, the development of prediction models of O3 pollution is important. Mathematically, the behaviors of the time series can be classified into deterministic and random. Deterministic time series is predictable while random time series is not predictable. Chaotic behavior is in between the deterministic and random behaviors [3]. Chaotic time series is predictable; however, due to the sensitive dependence on initial conditions, then, for a chaotic time series, only short-term prediction allowed [4]. There are various approaches that have been used by previous studies to test whether a time series of O3 is chaotic or not. By using the correlation dimension method, integral correlation, entropy and Lyapunov exponent, studies by [5][6][7] found that their observed O3 time series are chaotic. Recently, by using the exponents Lyapunov, Hausdorff dimensional method and phase space plot, [8] also proved that the behavior of O3 time series is chaotic. Phase space plot and Cao method [9] are able to classify the behaviors of the time series. However, these methods are rarely used on O3 time series although both have been proven effective over time series such as suspended sediment concentration, traffic flow and earthquakes (see [10][11][12]). Therefore, this study will apply both methods towards the chosen O3 pollution time series.
In the current research in Malaysia, O3 time series are predicted using neural networks and multiple linear regression method (see [13][14][15] meteorological factors such as water temperature, humidity, solar radiation and wind speed as well as gaseous factors such as the precursor gases of methane, carbon monoxide (CO) and nitrogen oxide (NOx). However, if the information of those factors is not sufficient, an alternative method is needed to run the prediction. Therefore, in this study, local approximation method, a method based on chaotic approach is used. This method has its own advantages as prediction process is done simply by using data from O3 time series only, without involving data from other factors. The local approximation method has been used by [16] and [17] to predict O3 time series in metropolitan areas. Both studies yielded very satisfactory results. Therefore, in this study, prediction models of O3 time series is also carried out using the local approximation method. There is various sub method of local approximation method. The simplest and commonly used is the local mean approximation method. Thus, this method will be applied in this study. The contributions of this study are: i) to apply the phase space plot and Cao method for detecting the presence of chaotic behaviors of O3 time series and ii) to adapt the local mean approximation method to predict the observed time series. The pilot study towards O3 times series in Malaysian background station has been conducted successfully by [18]. In this study, the same approach will be applied. However, this present study is focusing on predicting the O3 pollution time series in Malaysian urban area.

Time Series Data
The observed secondary hourly data from 1 st January to 31 st March 2014 are obtained from the Department of Environment Malaysia. The O3 pollution time series are recorded in ppb (part per billion) unit. The location where the O3 time series was recorded is one of the government's primary school named SK TTDI Jaya, located in Shah Alam area. Shah Alam is an urbanized area located in Selangor state of Malaysia. Shah Alam area is approximately 55.2 km². Shah Alam city filled with residential and commercial center. The main plant of the Malaysian car manufacturer Proton located at the industrial suburb of Shah Alam. The city began to grow after the Proton factory set up, which marked the beginning of the city as an industrial city. Network development factory and major highways along with commuter trains, buses and taxis contributed a lot to the quality of the air pollution in the area. Therefore, it is important to know the air quality in this area in the future.

Chaotic Approach
The O3 time series from is recorded in the form of where t x is the O3 time series at t -th hour and N is the total hours of observation. and the last two weeks was used as test X . Thus, the total number of used data is 2160. The time series in equation (1) will be reconstructed into the m -dimensional phase space of: From equation (2), the value of delay time,  and embedding dimension, m must be determined. As the data are observed hourly (every hour), therefore the chosen  value is one. In calculating the value of m , it has been proved by [9] that Cao method has many advantages (i.e does not contain any subjective parameters and does not depend on the number of observed data). One may refer to [9] for further details on how to compute m using Cao method.

Prediction Models
Prediction process through the chaotic approach is interpreted through equation: Prediction of The performance of the prediction models is calculated through the value of correlation coefficient r .
The r was used to summarize the information about the relationship between the observed and predicted data. The r value is between -1 and +1. A value of 0 indicates that there is no relationship between the observed and predicted time series. Conversely, the closer r to -1 or +1, the stronger the relationship and explains that the real and predicted data series are close to each other.

Chaotic Analysis
By using 1   , the phase space plot was graphed. Figure 1 illustrates the phase space of   1 , tt xx  .
The phase space plot represents the behaviors of the data. Through the reconstruction of the phase space, the movement of data from the initial position can be observed and consequently, the overall evolution of the data can be viewed through these phase space trajectories. As can be clearly seen in Figure 1, there exists a well-defined of attractor region in the middle of the plot where all points converge towards it. Referring to the study by [10], if there exists any well-defined region, the chaotic behaviors is present.
On the other hand, if all points fill up the entire plot with no apparent structure of the attractor region, then the observed data are said to be random. Through Figure 1, it can be said that the chaotic behaviors is present in the observed O3 data series. Figure 2 displays the results of    Recent investigation by [19][20][21] show that the O3 pollution in urban area has strong relationship with meteorological factors such as water temperature, sea breezes, suspended dust, relative humidity, solar radiation, solar energy, wind direction and wind speed. The sensitivity analysis towards O3 time series conducted by [22] found that the maximum and average temperature, solar radiation, maximum and average O3 pollution, relative humidity, NO pollution, wind direction, PM10 pollution as well as sunshine time influenced the O3 time series. As 6 m  is obtained from the Cao method, this suggests that at least six factors influence the observed O3 time series in the urban area of Shah Alam. The list of factors detected and listed by recent studies above suggests that more than six factors influence the O3 time series. Therefore, the present study finding is compatible and consistent. Thus, the result of 6 m  obtained from the Cao method is consistent and reliable.   Due to the sensitive dependence on initial conditions, then, for a chaotic time series, only short-term prediction allowed [4] for chaotic time series. In this present study, the one-hour ahead prediction is done for two weeks (336 hours). With   Figure 3, it can be seen that the data trend (up and down) can be predicted well. Thus, the results demonstrate that the local mean approximation method is good in predicting the observed O3 time series at urban area.

Conclusion
In this study, the chaotic behavior of hourly O3 time series at an urban area of Shah Alam is detected through the phase space plot and Cao method. A short-term prediction model was built through the local mean approximation method, a method based on chaotic approach. The results demonstrate that the local mean approximation method is good in predicting the observed O3 pollution at urban area with r value near to one. [23] has successfully predicted the PM10 time series using the local approximation method. In future, chaotic approach is suggested to be applied on other pollutants time series such as NO and CO. Furthermore, the O3 time series which used in this study is pure. The series currently consists of noise disturbance due to the presence of impurities. In the future, noise reduction techniques will be explored to get a time series with less noise disturbance.