Estimation of Discharge and Total Water Level at Yedgaon Dam using Data Driven Techniques

A reservoir operation planning using Data driven Techniques is gaining its momentum in hydrological area with good prediction and Estimation capabilities. The present work aims at using the 5 years data of Water Level to estimate the discharge and water level at the Yedgaon dam which is like pick up weir having its own yield and storage. It receives water from Dimbhe (though DLBC), Wadaj (through MLBC), Manikdoh (through river) and through Pimpalgaojoge (through river), in the Kukadi project of Maharashtra State, India. 4 different models were developed to estimate the water level using the Data Driven Techniques: M5 Model Tree, Support Vector Regression, Multi Gene Genetic Programming and Random Forest. The Accuracy of the developed models is assessed by the values of coefficient of correlation, coefficient of efficiency, mean absolute error and root mean squared error and comparison is done between actual values and Predicted values. The results indicated that the MGGP model was superior as compared to other techniques with correlation coefficient as 0.86 with an advantage of a single equation to estimate the water level.


Introduction
Among the all other basic needs of human beings, water is considered to be the prime natural resources. The amount of water available on earth is about 1400 million km 3 , of which only 2.7% is available as freshwater (Sinha, 2005). Hence, only a small fraction of can be utilized and is available for all the purposes. A reservoir operation plan is designed to achieve the maximum benefits from the storage capacity. The flow characteristics of the stream i.e. history of its past performances should be known to plan the reservoir operation. A reservoir operation is described as the amount of water to be released from the storage at any time depending upon the state of the reservoir, level of demands and any information about the likely inflow in the reservoir. Considering the monsoon season of about three to four months in India, there occurs the necessity of creation of large storage and utilization of runoff. The regional variation is also extreme from about 100 mm in Rajasthan to IOP Publishing doi:10.1088/1757-899X/1197/1/012021 2 11000 mm in Meghalaya of northeast India, causing the drought prone situation at some part and flood condition at the other in same time (Sinha, 2005).In order to overcome this drought flood situation in the country, it is necessary to plan and construct reservoirs and other water storage plants for conservation and utilization of water resources to its maximum benefits. For the present study, a reservoir operation is to be performed on the Kukadi project of Maharashtra state, India. Regression analysis can be done using data sets at five different stations namely, Dimbhe, Manikdoh, Wadaj, Pimplgaojoge and Yedgaon. The Yedgaon dam is like pick up weir having its own yield and storage. It also receives water from Dimbhe dam (though DLBC), Wadaj dam (through MLBC), Manikdoh dam (through river) and through Pimpalgaojoge dam (through river) (Birajdar, 2012). develop an efficient model for forecasting Lake water level variations, exemplified by the Poyang Lake (China) case study. A random forests (RF) model was first applied and compared with artificial neural networks, support vector regression, and a linear model. The aim of the present work is develop a model to estimate outflow at Yedgaon dam using outflows at Manikdoh, Wadaj, Dimbhe, Pimpalgaojoge dams and analyze the results using various soft computing tools such as M5 Model Tree, Support Vector Regression, Multi Gene Genetic Programming and Random Forest. Accuracy of the models is assessed by the values of coefficient of correlation, coefficient of efficiency and root mean squared error and comparison is done between actual and predicted values of discharge and the total water level. The outline of the paper is as follows. Section 2follows the salient features of the data used to train and test the Data Driven Technique model. Section 3 provides the brief information about all the soft computing tools. After this model, results along with a discussion of the reliability of the predictions are presented next section. Concluding remarks are given at the end.

Study Area and Data
The present study aims to predict daily discharge values and total water level at Yedgaon. The study area consists of Western Ghats of Maharashtra of Sahyadri hill range where five dams of the Kukadi integrated system are situated. Study area also extends up to command area in three districts of Pune, Solapur and Ahmednagar. Figure 1 given below shows the Google map for the Kukadi complex which gives the general idea of the study area we have considered for the present work. The total no. of observations for daily discharge data available for each station is 592 each and that for total water level is 1548 each. Table 1 below shows Statistical parameters of the daily discharge (Model 1) and Table 2 show Statisticalparameters of total water level (Model 2) data for the study sites.

Methodology
In the current study, four models were developed using each of the techniques viz. MT, MGGP, SVR and RF to predict the discharge and total water level at Yedgaon. Objective of the present study is to correlate upstream and downstream stations using Data driven technique (M5 Model Tree, SVR, MGGP and Random forest). After obtaining the maximum and minimum daily values, two models were formed for predicting discharge and total water level on daily basis by MT, SVR, MGGP and RF. To find out the predicted values, M5 Model tree (MT) tool, Support vector regression (SVR) and Random forest is used by using WEKA (3.9.4) and Multi Gene Genetic Programming (MGGP) tool is used by using MATLAB R2019a. The following table 3 shows the model (M) and their input (I1, I2, I3 and I4) and output parameters. Figure 2 given below shows the methodology used for this particular study using the various data driven techniques (M5 Model Tree, Support Vector Regression, Multi Gene Genetic Programming and Random Forest).

Results and Discussion
Consolidated summary of model performance is described in Table 4.

Table 4 Results
The    The scatter plots shown above for model 1 and model 2 are varying. The values of inputs in model 1 i.e. the outflows at Manikdoh, Dimbhe, Wadaj and Pimpalgaojoge are somewhere either low or zero causing the extreme variation in the predicted output at Yedgaon. Ultimately it gives poor relation in the scatter plot. Hence, it was decided to consider the total water level for the further study. The consolidated summary of the results is that the values of correlation coefficient 'r' between the estimated total water level and observed total water level is between 0.78 (M2, RF)and 0.85 (M2, MGGP). The mean absolute error varies between 0.99m (M2, RF) and 0.7576m (M2, MGGP). The root mean square (RMSE) showed a variation of 1.2425m (M2, RF) to1.4152m (M2, RF).

Conclusions
In the present work, development of a model using Data Driven Techniques like M5 Model Tree, Support Vector Regression, Multi Gene Genetic Programming and Random Forest is done on the Kukadi Project of Maharashtra state, India. Present work aims at estimation of outflow at Yedgaon using outflow at Manikdoh, Wadaj, Dimbhe, and Pimpalgaojoge and analyze the results. For the model formulation, Discharge data and the total water level data has been used. The performance of the model was evaluated using scatter plots and error measures like coefficient of correlation, root mean squared error, mean absolute error. Using the results, the values are compared. The values of