Online COD measurement based on multi-source spectral feature-level fusion model

To overcome the shortcomings of UV-Vis spectrometry and fluorescence spectrometry for COD detection, an online measurement named FfCODnet has been proposed and estimated in this paper. In this method, UV-Vis spectrum and fluorescence emission spectrum are collected online by a spectrophotometer, then De-noising Auto-Encoders (part of FfCODnet, abbreviated as DAE) are used to clean two types of raw spectrum respectively, finally the rest of FfCODnet is utilized to predict COD value of water sample. Through experiments and results comparison, the result shows that the proposed method has a good performance on both noise tolerance and measure accuracy.


Introduction
Nowadays, water pollution has become a serious problem, especially in urban areas [1] . Therefore, there is a great need for online water quality detection system to protect public health. Chemical oxygen demand (COD, unit: mg/L), as one of indicators for evaluating the impact of discharged wastewater on the receiving environment, is often concerned by scholars [2]. It is very important to measure COD accurately and quickly.
For water contaminant detection, the development can be divided into three stages. The first stage is based on wet chemistry, which is widely used by laboratories all over the world as international standard method due to its high accuracy. However, until now, the measurement method based on wet chemical method is still reagent-consuming and time-consuming. Thus, in general, although the wet chemical method has high accuracy, it can hardly meet the requirements of online measure in practical applications [3,4].
The second stage is to apply electrochemical sensors or spectrophotometry to realize online measure [5]. The theoretical basis for these techniques is that under ideal conditions there should be a significant correlation between COD value and spectral change or sensor response. With regards to the technology based on spectrophotometry, many scholars used UV-Vis absorbance on 254nm wavelength as water feature to realize COD online measure, due to the strong linear correlation with organic content and absorbance on 254nm under ideal conditions [6]. However, the UV-Vis spectrum on 254nm wavelength can be easily distracted by scattering [7]. Thus, some scholars took UV-Vis absorbance on other wavelength (such as 350nm and 465nm) as the second feature to establish an improved measure model in order to compensate the influence from scattering [8,9]. Concerning the technology based on electrochemical sensors, Gutierrez et al. [10] successfully applied electrochemical sensors to realize rapid COD measure in urban waste water. To sum up, although those methods have characteristics of 2 easy measurability and continual detection, it have poor performance on surface water detection with low COD value, especially susceptible to the disturbance of inorganic suspended matter in water sample [11].
The third stage is to establish an information fusion model based on multi-source spectrum, which can improve both measure accuracy and detection speed .Although this idea maybe provide a new approach to realize online COD measure, related research and application studies are few.
Currently, in the field of COD online measure, UV-Vis absorbance spectrum and fluorescence emission spectrum have been widely researched. However, UV-Vis spectrum is susceptible to organic contents and fluorescence spectrum is easy to be disturbed by Rayleigh scattering and Raman scattering. Thus, neither of spectrum can meet the requirement of high measure accuracy. Meanwhile, those two detection methods could be able to theoretically complement with each other [12]. So this paper focus on feature-level fusion measure model, combined with deep learning techniques, based on sample feature provided by UV-Vis spectrum and fluorescence spectrum, to improve the COD measure accuracy.

Materials and methods
The proposed method for online COD measure can be divided into four main parts: construct dataset, divide dataset, spectrum preprocessing and modeling. The details and relationship between main parts is shown in Figure1.

Construct dataset
In this study, water samples were collected from the underwater 50 cm of eight water quality monitoring address in China. A total of 972 samples were obtained between April 06, 2020 and July 11, 2020.Those samples covered temporal and spatial variations.
For each sample, a spectrophotometer (PG2000-Pro-Ex, Ocean Optics, USA) was used to measure two kinds of spectrum and presented by Morpho V3.0 (Ocean Optics, USA) from 196nm to 1100nm with 0.43nm resolution at a room temperature of 20-22℃.

Spectrum preprocessing
For online COD measure of water sample, the noise interference for spectrum is inevitable. Therefore, De-noising Auto-Encoders (abbreviated as DAE) is used to clean the raw spectrum. The construction of DAE is shown in Figure2. As shown in figure 2, x is setting as clear spectrum. Through randomly add noise to x (such as partially setting zero or adding Gaussian white noise), the original spectrum x turn to � with certain proportion of noise. After processing by coding part of DAE, the code of � can be expressed as (1). Where = ( , ) represents parameters of DAE coding part, is a ′ × weight matrix ( ′ represents input layer dimension and represents hidden layer dimension), is bias vector for hidden layer, is activation function. = ( �) = ( � + ) (1) After processing by decoding part of DAE, the reconstruction of code can be expressed as (2). Where ′ = ( ′ , ′ ) represents parameters of DAE decoding part, ′ is a × ′ weight matrix ( represents hidden layer dimension and ′ represents output layer dimension), ′ is bias vector for output layer.
= ′ ( ) = ( ′ + ′ ) (2) Generally, the reconstruction result can't accurately reproduce original spectrum x. Therefore, the reconstruction error as shown in (3) is used as loss function to reversely adjust parameters , ′ , and ′ , until the distance between and x is as close as possible.
Where (•) represents the sum of norm-2 distance among different dimension.The above loss function is used to evaluate the training performance, and gradient stochastic descent algorithm is used to iterate parameters of DAE. The higher approximation between the reconstruction information z and the input x is, the more detailed and accurate the features extracted by the model are.

Modeling
In order to improve COD measure accuracy, combined with deep learning techniques, a Feature-level fusion based COD measure network (abbreviated as FfCODnet) has been proposed, which mainly consists of two function modules: feature enhancement layer and feature-level fusion layer.
is the length of data in feature map.
is sigmoid activation function.

Feature-level fusion layer.
The feature-level fusion layer is composed of concatenate operation, convolutional layer and global max-pooling layer. Mark concatenate feature map as . Mark this layer final output as . Then the calculation process of this layer can be expressed as (7) and (8).

Feature-level fusion model construction.
Coupling DAE, feature enhancement layer and featurelevel fusion layer, proposed construction of FfCODnet. The detail is shown in Table 1.  Table 1 introduces the construction of FfCODnet from bottom to up. FfCODnet has 36 layers with 3,502,736 trainable parameters. The input of FfCODnet is UV-Vis spectrum from 194nm to 700nm and fluorescence spectrum from 440nm to 790nm. And resolution of selected spectrophotometer is 0.43nm, so the input dimension of UV-Vis spectrum is 1177, and input dimension of fluorescence spectrum is 814.

Performance estimation
The proposed COD measure model (FfCODnet) is estimated by validation data set. The detail performance of COD prediction is shown in Figure 3. As shown in Figure 3, for samples in validation data set, the error between FfCODnet prediction result and COD real value is small.  Figure 3. Performance of proposed method on COD prediction.

Discussion
Comparison between the feature-level fusion modeling method proposed in this paper and current mainstream modeling methods is shown in Table 2. As shown in Table 2, although the proposed method requires much more trainable parameters and longer training time, its performance is better than other traditional spectroscopic measure methods.

Conclusion
In this paper, an online COD measure method based on UV-Vis spectrum and Fluorescence spectrum feature-level fusion has been developed and estimated. In this method, De-noising Auto-Encoders (abbreviated as DAE) is used to clean two raw spectrum respectively. Meanwhile, In order to improve COD measure accuracy, a Feature-level fusion based COD measure network (abbreviated as FfCODnet) has been proposed, which mainly consists of two function modules: feature enhancement layer and feature-level fusion layer. Finally, the proposed method has been compared with other traditional spectroscopic measure methods, from the performance of each model on the testing data set, the proposed method has more advantage in measure accuracy.