Time Series Analysis and Forecasting of Wastewater Inflow into Bandar Tun Razak Sewage Treatment Plant in Selangor, Malaysia

Analysing the fluctuations of wastewater inflow rates in sewage treatment plants (STPs) is essential to guarantee a sufficient treatment of wastewater before discharging it to the environment. The main objectives of this study are to statistically analyze and forecast the wastewater inflow rates into the Bandar Tun Razak STP in Kuala Lumpur, Malaysia. A time series analysis of three years’ weekly influent data (156weeks) has been conducted using the Auto-Regressive Integrated Moving Average (ARIMA) model. Various combinations of ARIMA orders (p, d, q) have been tried to select the most fitted model, which was utilized to forecast the wastewater inflow rates. The linear regression analysis was applied to testify the correlation between the observed and predicted influents. ARIMA (3, 1, 3) model was selected with the highest significance R-square and lowest normalized Bayesian Information Criterion (BIC) value, and accordingly the wastewater inflow rates were forecasted to additional 52weeks. The linear regression analysis between the observed and predicted values of the wastewater inflow rates showed a positive linear correlation with a coefficient of 0.831.


Introduction
Sewage treatment plants (STPs) are among the most valuable infrastructures in countries' development. The main objective of STP's is to treat the collected wastewater sufficiently to prevent negative impacts to human health, aquatic life, and the surrounding environment. STP's capacity and treatment processes must be designed and operated carefully in order to provide reliable treatment despite fluctuating characteristics, such as inflow and organic loading of the influent waste stream, in order to maintain compliance environmental permit limits and effluent standards.
For that, it is significant to evaluate and predict the design loads periodically (e.g. 3-5 years), because influent hydraulic and loadings parameters can vary considerably depending on the population that is being served, vacations and even tourist inflow can affect the inflow rate of wastewater. Forecasting and simulating STP's inflow rates are valuable in order to define the average as well as peak flow rates, which assess in future planning of collection and treatment facilities. This can be conducted based on the previous observed and recorded inflow rate values at regular time intervals, through time series analysis of sewage inflow rates into treatment plants.
Box-Jenkins [1] or Autoregressive Integrated Moving Average (ARIMA) models are able to fulfil this task, and give an accurate prediction. ARIMA model consists of an integrated component (d), which performs differencing of the time series to make it stationary [2]. Another two components are autoregressive AR (p) and moving average MA (q); AR component correlates the relation between the current value and the past value of time series, while, MA captures the duration of random shock in the series.
These techniques have been well established and used for predicting hydro meteorological parameters in various studies [3]- [8].
In this study, the for ARIMA model has been applied for a time series of Bandar Tun Razak STP's sewage inflow data. And best fitting ARIMA model were selected and assessed using linear regression analysis between the observed and predicted values.

Bandar Tun Razak (BTR) STP
Bandar Tun Razak STP is located on Jalan 11/118b, Desa Tun Razak, 56000 Kuala Lumpur, Wilayah Persekutuan Kuala Lumpur, Malaysia. The sewage plant is operated and managed by the Indah Water Konsortium Sdn Bhd (IWK). Total area of the plant is about ten acres, while the reserved area is six acres (Figure 1).
BTR STP built to serve part of Kuala Lumpur, with daily design capacity of 25,000 m3 and equivalent to 100,000 populations. Currently, the plant receives and treats about 11,700 m3 per day which equivalent to 52,000 populations. Sequential Batch Reactor (SBR) treatment system is equipped in the plant.

Data collection
The wastewater inflow rates (Q) data was obtained from Bandar Tun Razak STP management, and it covered three continuous years on a weekly basis between 2011 and 2013, and the average inflow rate was about 16,711 m3. The weekly laboratory measurements of sewage inflow rate and loading parameters such as biological oxygen demand (BOD), chemical oxygen demand (COD), suspended solids (SS) and ammoniacal nitrogen (NH3-N), are authorized by the Department of Environment (DOE) to guarantee meeting the DOE standards of STP's.

Autoregressive Integrated Moving Average (ARIMA)
ARIMA or Box-Jenkins [12] model is considered the most popular and effective statistical models for time series forecasting. It based on generating a liner function extracted from the past observations of a time series in order to forecast the future values [9]. The linear function is consisting of three parametric components, Auto-Regression (AR), Integration (d) and Moving Average (MA) [1]. This can be illustrated in the form ARIMA (p, d, q).
In auto-regression (AR) or ARIMA (p, 0, 0) model, of order "p", the value of current output Zt (Observed value) depends upon the prior outputs "p" and the current inputs "et" (independent random shock). Therefore, the AR (p) equation can be written as: While, in moving average (MA) or ARIMA (0, 0, q) model of order "q", the current output Zt (Observed Value) depends on the current input and prior inputs "q". MA (q) is represented as: However, Autoregressive Moving Average Model (ARMA) of order (p, q) combines both AR and MA elements. An ARIMA (p, 0, q) or ARMA (p, q) is a model for a time series that depends on p past values of itself and on q past random terms et. This method has the form of: The Box-Jenkins models require a stationary time series data; therefore, a non-stationary data is always transformed to induce mean stationarity. A difference of order one leads to the subtraction of each observed value with the neighbouring value, which gives the new time series. Hence term "d" is referred TO the degree of ordinary differencing, applied to achieve series stationarity: After applying the ARMA model to the differenced time series, the differencing transformation is reversed to reclaim the original values obtained by the modelled values and "integration" ("d" times) is done. A process in which the d th order differencing is involved is called an Integrated process of order d, it is denoted by the notion I (d). A combination of AR, MA and I models is called an ARIMA (p, d, q) model of order (p, d, q).

Model Development
Time series analysis and forecasting of waste water inflow in Bandar Tun Razak sewage treatment plant (STP) were performed using a historical record of 3 years (2011-2013). Sewage inflow data was provided on weekly interval basis during the study period (156 weeks).
ARIMA model was applied in this study through the following steps: model identification and estimation, diagnostic checking and forecasting. The identification test is done to obtain the value of order of differencing 'd' in ARIMA (p, d, q) and also the values of AR and MA operators. The appropriate orders of the ARIMA (p, d, q) model are usually determined through the Box-Jenkins model building methodology [8]. IBM SPSS statistics 22 software was used in this study. In addition, linear regression analysis was used to compare between the observed and predicted values.

Model performance tests
In order to judge the modelling accuracy and select the most fitted ARIMA model configurations, different performance criterions such as R-square, stationary R-square, root mean square error (RMSE), mean absolute percentage error (MAPE), and normalized Bayesian Information Criterion (BIC) were used to select the best fitting.

Model Identification and estimation
A sequence graph of sewage inflow data (156 weeks) in Bandar Tun Razak was plotted to check the stationarity of analysed data as shown in Figures 1. By computing the autocorrelation and partial autocorrelation coefficients (ACF and PACF), the data was found to be non-stationary as shown in Figures 2.  Therefore, the first order differencing of the data series was applied (Figures 3). The obtained differenced data was tested for stationarity by ACF and PACF as shown in Figure 4. After examining (ACF and PACF) plots and its associated tables (Tables 7 in Appendix), it was concluded that the data become stationary in order to start applying ARIMA models.   Applying of ARIMA model was through trying various orders of 'p' and 'q' with the difference of one (d=1) to choose the best fitting ARIMA model. Among different configurations of ARIMA models orders, the best fitting model was chosen based on high stationary R-Square value, good value of R-Square and low values of RMSE, MAPE and Normalized BIC as illustrated in Table 1. The best suitable model for inflow rate of Bandar Tun Razak STP was found to be ARIMA (3, 1, 3). Main parameters of the selected model are given in the following tables:

Diagnostic checking
The selected model was tested and verified by examining the residuals ACF and PACF of various orders, which indicated a "good fit" of the model as shown in Figure 5. Autocorrelations up to 24 lags were evaluated and their significance was verified by Box-Ljung statistic as illustrated in Table 8 in Appendix. Clearly, we can notice that almost all lags were within the reasonable level in residual ACF and residual PACF. Therefore, this refers that the selected ARIMA (3, 1, 3) model can be used for inflow rate analysis in Bandar Tun Razak STP.  Table 5, the correlation coefficient of predicted inflow was 0.83, which suggests a good positive linear correlation.  The Normal P-P Plot of Regression Standardized Residual showed a random scatter of the points with a constant variance without any outliers. Since the points are close to the diagonal line ( Figure 6), it is understood that the residuals are approximately normally distributed.

Forecasting
The best fitted ARIMA (3,1,3) was used to forecast the inflow rate till 208 weeks (4 years). The forecasted values are tabulated in Table 6, while the observed and predicted values with the confidential limits are shown in the Figure 7.

Conclusions
In this study, time series ARIMA modelling of a weekly sewage inflow into one of the main STP's in Kuala Lumpur, Malaysia was successfully conducted. The three continues years (152weeks) data collected by the Bandar Tun Razak (BTR) STP's management was found to be non-stationary, thus it was transformed to the first order differencing (d=1) to make it stationary. Fifteen ARIMA models of various orders of 'p' and 'q' were applied on to the transformed data to select the best fitted model. Based on the diagnostics like high R 2 value and low normalized Bayesian Information Criterion (BIC), the ARIMA (3, 1, 3) was found to be the best fitted model. The linear regression model was applied between the observed and predicted values, and it showed a positive linear correlation with a correlation coefficient of 0.83. By this linear regression analysis, it was understood that there was no much variation between the observed and predicted data. The best fitted ARIMA (3, 1, 3) model forecasted the inflow