Design of order statistics filters using feedforward neural networks

In recent years significant progress have been made in the development of nonlinear data processing techniques. Such techniques are widely used in digital data filtering and image enhancement. Many of the most effective nonlinear filters based on order statistics. The widely used median filter is the best known order statistic filter. Generalized form of these filters could be presented based on Lloyd's statistics. Filters based on order statistics have excellent robustness properties in the presence of impulsive noise. In this paper, we present special approach for synthesis of order statistics filters using artificial neural networks. Optimal Lloyd's statistics are used for selecting of initial weights for the neural network. Adaptive properties of neural networks provide opportunities to optimize order statistics filters for data with asymmetric distribution function. Different examples demonstrate the properties and performance of presented approach.


Introduction
In recent times, many linear filtering techniques were widely used in various fields of science and technology. At the same time, the use of linear filtering does not provide a satisfactory solution for many practical applications. It is known, for example, that the problem of optimal filtering can be solved in the class of linear filters when the additive noise is statistically independent and has a distribution function that is close to the normal distribution. In practice, the additive noise can be dependent on desired signal or have a distribution function the is different from the normal. In these cases, the optimal solution can be found in the class of non-linear filters, for example, using artificial neural networks [1].
Linear order statistics (or linear order estimation of some variable μ) is a combination , where x (i) -i-th term in an ordered (ranked) series. Particularly, the arithmetic average can be obtained as an example of the linear order statistic by setting coefficients 1/ i n   [2]. Some cases of order statistics play a major role in the various statistical studies, such as extrema, sampling range, sample median and other.
The problem of optimal weighting of observations (or finding the optimal linear statistics) was set in the middle of the last century, and soon was successfully solved by E.H. Lloyd, who found a common solution (on the assumption that the distribution function is known) for the evaluation of the two unknown parameters of the distribution function: mean and variance [3]. However, in practice, such linear estimates are insufficient. Therefore, in recent years nonlinear order statistics filters have been developed, especially for digital image processing. Authors of the paper [4,5] present filters that is based on nonlinear order statistics using perceptron.
In the present work non-linear analogues of order statistics based on feedforward neural networks is proposed. The main idea of the proposed technique is based on Lloyd's statistics, that are used for initialization of the neural network weights. Nonlinear activation function of neurons improves the quality of filtering results in cases with distribution different from the normal.

Lloyd's statistics
For Lloyd's statistics estimation it is assumed that the form of density function for observed values is known, but its parameters (like mean value μ and variance σ) can change. The density distribution function can be determined as following: ( If we know the density function (1), we can estimate mathematical expectation and variance of order statistics 0 Theoretical equations for i  and , ij B values are rather complicated, we do not present them here. For unknown random parameters  and  mathematical expectation of order statistics equals to [2]: where х is a vector of observed order statistics. For estimation of the unknown parameters it is needed to select values that minimize standard deviation between conditional mathematical expectations (4) and current set of order statistics. Let write simpler form of these parameters for father calculations: The expression for the least-squares criterion can be written taking into account the correlation of order statistics as: Varying the parameter  and equating the result to zero, we found the unknown parameters in the following form Matrix (5) of found coefficients gives us the required coefficients of linear order statistics [3].
Mathematical expectation and variance can be found as the following expressions: If we do not know the analytic form of distribution, but have a sufficiently large set of empirical data, we can calculate the average value and covariance of order statistics, and then use equations (6) and (7) for finding of approximate values for coefficients  и .
x is the set of ranged values, there are some obvious requirements for ordered series which are used to estimate the mean and variance, such as First of all, if we add some constant С to all values, estimations of the mean value should change on the same value. That's why the first condition on the function f: At the same time, for the function that provides estimation of the variance adding of additional constant С shouldn't change the estimation of the variance. That's why the second condition is: Based on these conditions we can write following equations for coefficients of order statistics:

Nonlinear order statistics filter
Unlike the linear statistics, for neural network with nonlinear activation functions the conditions (8) can't be satisfied, because for some values of constant C activation functions will saturate and properties of estimations will change. Suppose that we know the probability distribution function of the estimated parameters μ и λ, or at least the range of their variations. Then we can create a representative set of training examples, and we need to achieve the results that the neural network gives a good performance for this prior distribution. In this case the exact conditions (8) are not guaranteed, but if the distribution of the estimated parameters corresponds to our expectations, then we will get satisfactory results (figure 1). Design of order statistics based on neural networks has obvious similarity to the estimates based on the criterion of maximum of a posteriori probability. It would be better to use information about Lloyd's order statistics for initial distribution of neurons weights [7]. Consider the simplest technique for solving this problem in the case when functions implemented by various network layers are evidently separated. The error that occurs due to the nonlinearity of neurons activation functions prior to the network training should be small. This fact imposes certain restrictions on the choice of activation functions for layers. For example, the hyperbolic tangent (the sigmoid) has a domain with a weak nonlinearity at the origin of coordinates. While using neurons with sigmoidal activation functions, by properly scaling the training set we can make transformations of a signal by the network neurons close to linear. Let us consider, as an example, the application of the proposed initialization algorithm to a threelayer feedforward neural network. The input layer of the network scales the data to make all values of the series to be located within the linear domain of the hyperbolic tangent. The intermediate layer performs a weak nonlinear transformation with the help of the weight matrix that contains coefficients of the Lloyd's statistics filter. The output layer performs the inverse data transform. By a proper choice of the  coefficient one can make the error stipulated by the nonlinearity of the activation function arbitrarily small. In this case the neural network with weights chosen in such a way will estimate the current value of order statistics in accordance with expressions (6, 7) [6]. Note that the neural network output before training nearly coincide with the Lloyd's statistics. When training the network with an algorithm with a monotonic decrease of the error (e.g., gradient and quasi-Newton algorithms, conjugate gradients method, etc), the result obtained on any training step is the same or even better than that obtained with the help of the linear statistics. In this paper we use the Levenberg-Marquardt algorithm; it belongs to the class of quasi-Newton methods that guarantes a high convergence rate [1].

Results
Consider the synthesis of nonlinear order statistics on a model example with Generalized Extreme Value distribution (GEV) with the parameter k: This type of distribution is often used to simulate the smallest and largest value among a large set of independent and identically distributed random values representing measurements of some values or observations. The main feature of this distribution is that it is asymmetric (figure 2).  We modeled the set of N samples of the same type, consisting of n counts each. The purpose of this investigation is to compare the Lloyd's statistics coefficients with the neural networks statistics under the assumption that the distribution type is GEV.
The procedure involves the following stages: 1) With the help of a random number generator we create a matrix Nn of random numbers distributed according to the GEV distribution, then we add to the each line a constant C, distributed by the normal law with a mean of 0 and a variance shift; 2) Sorting the numbers in each lines. Thus, each line forms an ordered sample of n elements, and we have N number of such test samples.
3) Averaging the columns, we find estimates for the expectation of order statistics. Calculating the covariance between the columns, we find an estimation of the covariance matrix for order statistics; 4) According to the equations (4, 5) we calculate two sets of coefficients -for the expectation and variance. After that we can train the neural network. The example of such modelling is shown on the figure 3.  Table 1 shows the results of distribution parameters estimations for a sample of random variables with GEV-distribution. The table includes standard deviations for different statistics (Lloyd and neural network statistics) varying the distribution parameter k and the sample size n.  Figure 4a shows an example of a grayscale image. The same image with GEV-distributed noise is shown on the figure 4b. Figure 4c shows the results of the image filtering based on Lloyds statistics. And the figure 4d shows the results of the neural network statistics filter. It is clearly seen that nonlinear order statistics filter is better that linear order statistics filter.   However, when we work with real data, we cannot determine in advance the type of noise distribution in the processed signal. Therefore, the filtering of a set of real data will include the following stages: in the first stage the separation of noise from the original dataset with the median filtering or moving average, then an assessment of species distribution and its parameters. Then forming a test data with a given type of distribution, and the search for the coefficients of linear and neural network of order statistics, with the help of a network formed by filtering the original data. To assess the quality of filtering, you can repeat the procedure of filtering, starting with the first step.