The optimal solution prediction for genetic and distribution building algorithms with binary representation

Genetic and distribution building algorithms with binary representation are analyzed. A property of convergence to the optimal solution is discussed. A novel convergence prediction method is proposed and investigated. The method is based on analysis of gene value probabilities distribution dynamics, thus it can predict gene values of the optimal solution to which the algorithm converges. The results of investigations for the optimal prediction algorithm performance are presented.

in section III. We have tested the proposed convergence method with a representative set of complex optimization problems -results are shown in section IV.
Distribution estimation for search algorithms with binary representation. The binary representation GAs are the conventional GAs with population of fix-length solutions, which contain 0 or 1-value in each position. As solution is binary vector and its fitness (objective function value) is a real number, we can talk about pseudo-boolean optimization problem statement.
One can estimate the probabilities vector for a population to present the statistics proceeding by search algorithm: We can write down the general scheme of any EDA algorithm using given probability distribution in the following way: 1. Randomly generate an initial population according to the probability distribution.
2. Evaluate current population. 3. Update the distribution using the given strategy. 4. Form a new population according to the probability distribution. 5. Until stop criterion is satisfied, repeat steps 2-4. As we can see, GAs use the same scheme. The main differences have a place on steps 3-4, where GAs update information using such operations as selection, crossover and mutation.
Step 3 also defines the different EDA algorithms.
Variable probabilities method ("MIVER") was first proposed in 1986 [4] and improved in 1987 [5]. This is a stochastic optimization procedure for pseudo-boolean optimization problems, which works with a population of binary solutions. The distribution update strategy is based on 1-value distribution of the best and the worst solution in certain population. There exist a number of mathematical proofs of the convergence for some classes of optimization problems.
Population-based incremental learning (PBIL) was proposed in 1994 [6]. This is a combination of the evolutionary optimization and artificial neural networks learning. PBIL updates the distribution in the same way as "MIVER" does -according to the 1-value distribution of the best and the worst solution, but using similar to neural net weights update procedure.
Both PBIL and "MIVER" use greedy linear update strategy (taking into account only the best/worst solution), so they demonstrate much more local convergence than conventional GA. Thus conventional GAs performance on average is better for many complex optimization problems.
Probabilities-based GA was proposed in 2005 [7]. This is a further evolution of ideas of variable probabilities method and PBIL. In this algorithm, genetic operators are not substituted with the distribution estimation, but the estimated distribution is used to model genetic operators. Probabilities-based GA uses the whole population to update distribution, so it demonstrates the global performance as the conventional GA. Moreover, it contains less parameters to tune and proceeds additional information about search space (probabilities distribution), thus probabilities-based GA can exceed conventional GAs on average. We have investigated the performance of algorithms using some visual representation of the distribution. Each component value of the probability vector, which performs distribution, varies during the run, so we can show it on diagram (figure1). As we can see the probability value oscillates about p = 0.5 (initial steps, uniform distribution over the search space) and then converges to value one (or zero in other case) as the algorithm converges to the optimal (or suboptimal) solution.
We can estimate and visualize distribution of 1-value for any binary GA, even if GA doesn't use the distribution. The distribution estimation can be performed in a following way: where N is a population size. As different algorithms use different distribution update strategies, so we obtain different behavior of probabilities over the generations (figure2-5): Analyzing these diagrams for different optimization problems we can find that the components of the probabilities vector frequently converge to value one if optimal solution contains 1-value in corresponding position (or to zero if optimal solution contains 0-value). It means that if the probability converges to one (or zero) then the value of the corresponding gene of the optimal solution (or a solution to which the algorithm intends to converge) most probably is equal to one (or zero). The higher algorithm performance the more often probabilities converge to the correct value.
The optimal solution prediction method. We can express the previously mentioned feature of stochastic binary algorithms as: when the algorithm's iteration number approaches infinity. Here opt i x is the i-th position value of problem optimal solution, i = n , 1 . So we can use this feature to predict an "optimal" solution of the given problem. The convergence prediction method is as follows: 1. Choose the binary stochastic algorithm (e.g. GA, PBIL or other), set the iteration number  4. Add "optimal" solution into the current population. A simple way to determine the convergence tendency is the use of the following integral criterion on the step 3: The main idea is that the more often probability value is greater than 0.5, the higher probability of being "optimal" solution coordinate equal to one.
In many practical problems, a situation exists when the binary stochastic algorithm collects not enough information on the early steps and the gene value of the certain position is equal to one (or zero) for almost every current solution. At the later stage algorithm can find the much better solution with inverted value of genes and it means that the probability vector values will change their convergence direction. But the previously mentioned prediction method will give us the primary value, because the value of the probabilities vector was greater than 0.5 (or less than 0.5 for zero values) for too long time.
We propose to use the following modification of the convergence prediction method to avoid the above mentioned shortcoming: 1. Set the prediction step K.
2. Every K iteration use the given statistics i P , to evaluate the probability vector change: . 3. Set the weights for every iteration accordingly to its number: 4. Evaluate probability vector weighted change as: 5. Set the "optimal" solution: The strategies of the predicted "optimal" solution usage may vary: use it as the final solution, add it to the population and continue search, and so on. In this work we just added it in the current population without additional heuristics which can be applied here.

Test Problems and Numerical Experiments.
We have tested the proposed convergence prediction methods for discussed EDA and conventional GA with the representative set of complex optimization problems.
We have included 12 known test problems of continuous optimization, 6 of them are from the test function set approved at Special Session on Real-Parameter Optimization (CEC 2005) [8]. All these functions are known as complex optimization problems and represent the challenge for GAs and other evolutionary algorithms. We haven't included any well-known binary problem as they are less complex for fine-tuned GAs than real-value problems after the binarization.
To estimate algorithms efficiency we have carried out series of independent runs of each algorithm for each test problem. We used the reliability as a criterion for evaluation of the algorithm efficiency. Reliability is the mean number of successful algorithm runs (i.e. the exact solution of the problem was found) over the whole number of independent runs. So we have computed the expected value of algorithms reliability.
We have set the following parameters values for algorithms: • Accuracy (the whole binary solution length) -40, • Population size -50, • Generation number -50, • Independent runs number -100. The settings and parameters for each algorithm (e.g. selection type in GA, learning rate in PBIL and others) were chosen in advance to be efficient on average over the test function set.
The results are validated through ANOVA method, namely Mann-Whitney-Wilcoxon test. All result differences are statistically significant.
First we have estimated the efficiency of convergence prediction algorithms using original integral criterion. Results are shows in a Table 1. The last column contains the values for the conventional GA without prediction. The grey cells show the best on problem value. As we can see, results vary: algorithms are good in some cases and less efficient in others. The variable probabilities method demonstrates the worst performance as it is most greedy algorithm and has local behavior. PBIL, probabilities-based GA and conventional GA performances are almost similar. So we can say that the average reliability of integral criterion in convergence prediction is low although this method is useful as GA with the prediction method shows higher performance comparing to conventional GA. Table 2 presents efficiency estimation results for the modified convergence prediction algorithm. The results show that the modified convergence prediction can essentially improve performance of search algorithms. The best results are achieved using the probabilities-based GA as it collects and proceeds more statistical information about the search space. As we can see in Table 2, probabilities-based GA "wins" in 7 of 12 problems (GA -5, PBIL -4, MIVER -0). In cases when the probabilities-based GA exceeds, its reliability essentially better, in other cases its reliability is also high.  As we can see, the modified convergence prediction algorithm demonstrates the better performance. It also can essentially improve the performance of simple algorithms like PBIL or MIVER. The GA-based techniques can increase the reliability up to 100% even with complex problem.
We have also established experimentally that deceptive problems [9,10] are not difficult for GA with prediction, but GA requires a slight modification to intense mutation [11].
We have investigated the following mostly used trap-functions: Ackley Trap, Whitley Trap, Goldberg Trap, Trap-4, Trap-5, DEC2TRAP, Liepins and Vose, which are known as GA-hard problems.
In this paper, we demonstrate results only for the probability-based GA as it shows the highest efficiency on average. We investigated two versions of algorithms: standard and one with the optimal solution prediction (the predicted solution is evaluated every 5 generations and added to population).
The binary solution length is 100. The results are evaluated over 100 independent run. The parameters for algorithm (selection and mutation) are chosen in advance to be efficient on average over the test function set.
In addition to the standard mutation operation, we implement the inversion operation, which inverts all genes in chromosome with very low probability. As numerical experiments shows it can help to overcome the trap attraction.
The results are shown in a table 3.