A Water Quality Prediction Method Based on Deep LSTM Network

Water quality parameters are key factors affecting marine ranching. The water quality parameters are not consistent, which are usually complex and variable. The traditional water quality prediction methods have problems such as low long-term prediction accuracy and weak generalization ability. In order to solve the above problems, this paper proposes a multivariate water quality prediction model based on WT-LSTM, which is compared with the BP neural network model for short-term prediction and long-term prediction of dissolved oxygen water quality parameters, and the experimental results show that the WT-LSTM model has good accuracy and generalization in both short-term and long-term prediction, and the short-term prediction accuracy can be up to 98.47% and the long-term prediction accuracy can be up to 98.28%.


Introduction
Marine rangeland is an important economic component of fisheries, and the combination of marine science and artificial intelligence provides an important tool for the prediction of water quality parameters [1].
Li et al [2].combined an improved gray model with a BP neural network to predict water quality, and the prediction accuracy was improved by 15%.Ren et al [3].used genetic algorithm to optimize fuzzy neural network to improve the prediction accuracy of dissolved oxygen in aquaculture.Cheng et al [4].used time series algorithm and T-S fuzzy neural network to predict water quality parameters in fisheries with accuracy of more than 90%.Graf et al [5], [6].used a hybrid model of wavelet variations and neural network to predict the water quality parameters that improved the prediction effect of traditional neural network.The combination of traditional neural networks and optimization algorithms has achieved better results in prediction, but the seawater parameters are affected by interactions, and there is a large nonlinear problem in parameter changes, which makes the performance of the model poorer in long-term prediction [7], [8].In order to make the model predict long-term data with better accuracy and reliability, in this study we trained a water quality parameter prediction model based on WT-LSTM neural network, so that it can better capture the temporal characteristics and change patterns in time series data.

Monitoring system
In order to better understand the water quality conditions in the marine ranch area, a micro-marine environment water quality monitoring system was built to mimic the marine ranch water quality environment in a laboratory environment, as shown in Figure 1.The system monitors the temperature, pH, dissolved oxygen saturation and turbidity values of the water body.This water quality monitoring system consists of four parts: collection node, aggregation node, cloud server, and application platform, which involves multiple aspects of data collection, transmission, visualization, and prediction.
The collection node is the core part of water quality environmental information data collection, processing and transmission, using STM32L475RGT6 as the main controller chip to interact with the monitoring sensors, LoRa module and SD memory card for data.A variety of water quality sensors communicate with the main controller through the RS-485 interface.
In the whole monitoring system, the collection node sends the collected water body data to the aggregation node through LoRa communication network.A connection is established with the cloud server using the MQTT protocol, and then the collected data is uploaded to the cloud server through the 4G communication network [9].
The collection nodes upload the data to the data stream of the cloud server in JSON data format [10].The IoT platform provides visualization functions, and users can also monitor the changes of water quality data in real time through mobile.

Data denoising processing
High-quality sample data is the basis for ensuring the performance and generalization ability of neural networks.While obtaining water quality data, the raw data obtained have problems such as noise and missing values [11].Wavelet threshold denoising method is used to preprocess the sample data to reduce the impact of monitoring noise on model training.
The parameters involved in wavelet threshold denoising include the number of wavelet layers, wavelet basis, threshold value and threshold function, etc.The threshold value λ is a key parameter in the wavelet threshold denoising algorithm.To achieve the noise reduction effect while retaining the original data information, the number of decomposition layers is generally chosen to be 3.An improved method based on a fixed threshold is used to improve the noise reduction performance.In order to improve the noise reduction performance a fixed threshold based improvement method is used as shown in equation ( 1) [12] In Eq: λ represents the improved threshold, σ represents the noise variance, N represents the signal length, and j represents the decomposition scale.
The denoising effect of wavelet transform is related to the selection of wavelet base, by analyzing and testing the features of commonly used wavelet bases, the wavelet basis function sym10 which conforms to orthogonality, symmetry, shorter support length and higher order of vanishing matrix is finally selected.The original data and denoised data are shown in Fig. 2. It can be clearly seen that the signal curve after filtering by wavelet threshold denoising algorithm is smoother and more stable.

Correlation Analysis
In aquaculture environments, the water quality parameters are highly correlated with each other, making the univariate prediction model less effective [13], so it is necessary to study the correlation of each parameter and construct a multivariate prediction model to utilize the relationship and interactions among multiple variables, to capture the patterns of changes and trends in water quality and thus to improve the accuracy and reliability of the prediction.In this paper, Pearson's correlation coefficient equation ( 2) was utilized to calculate the correlation between water quality factors and the results are shown in Table 1.
In Eq: ,  and   are the standardized score, sample mean and sample standard deviation for   samples, respectively.Table 1 shows that except for turbidity, which is weakly correlated with all other parameters, there is a strong positive or strong negative correlation between all water quality parameters.Combined with Fig. 2, dissolved oxygen has the largest variation, and a multi-feature prediction model of dissolved oxygen saturation was chosen to be constructed.

The Prediction Model Based on LSTM Deep Learning
The LSTM prediction model was constructed as three hidden layers and one fully connected layer, with each hidden layer unit set to 32, and a Dropout layer was added after each layer, with 0.1 neurons randomly removed from each Dropout layer.Other differential parameters: the Epochs were set to 100, the Batch Size was set to 32, and the Learning Rate was set to 0.0005.Four water quality parameters were used as input features, and the DO parameter was used as the only output.

Results and Discussion
The experimental data were collected from the tanks by the deployed sensors and then transferred to the data server for storage through the Lora module.In order to validate the generalization of the constructed LSTM prediction model, short-term prediction and long-term prediction of dissolved oxygen water quality parameters were validated.

Short-term Prediction Result
Short-term prediction requires the prediction of data within the next 8 hours, the data sampling frequency is every 5 minutes, a total of about 500 sets of data, 400 sets of data will be used for model training, and the other 100 sets of data will be used for validation.The results of the two prediction models are shown in Figure 3.
From the figure, it can be seen that the data predicted by the two models do not exactly match the real values, but the error is small.The deviation between the predicted and actual values of the LSTMbased model is about 0.25, while the deviation between the predicted and actual values of the BP neural network is 0.38, so the LSTM-based model is more effective and more accurate in predicting the values of dissolved oxygen.

Long-term Prediction Result
The trained model was used to predict the water quality parameters for the next 1 day, and the data were sampled at a frequency of 1 minute, with a total of 20,000 sets (about 14 days) of data collected for model training and an additional 1,500 sets of data for comparative validation.A total of 1000 training sessions were conducted.The long-term prediction results of dissolved oxygen are shown in Figure 4.
Figure 4 shows that in the long-term prediction model, the LSTM model prediction curve is overall smoother, unable to track the real data at the spikes, but the prediction accuracy is still relatively high about 96.28%, while the BP model prediction accuracy is 95.17%.

Conclusions
Currently, artificial intelligence is widely used in the marine field and this paper proposes a multivariate water quality prediction model based on WT-LSTM for the problems of traditional water quality prediction methods.The constructed model is used to predict the dissolved oxygen water quality parameters in the long and short term respectively, and the accuracy in the short-term prediction is up to 98.47%, and the accuracy in the long-term prediction is up to 96.28%, and the results show that compared with the BP neural network, the proposed model can be predicted with a higher accuracy and a stronger generalization.

Figure 1 .
Figure 1.Marine microenvironmental water quality monitoring system

Figure 2 .
Figure 2. Sample data before and after denoising.Table1.Calculation results of correlation.

Table 1 .
Calculation results of correlation.