Real effectiveness of iterative method for solving nonlinear equations

In this paper is defined a strategy to analyze the practical effectiveness of given iterative methods for solving nonlinear equations; which considers the CPU time used for convergence to the root with a given accuracy. Through this time the difference of two populations mean between each pair of these methods is estimated. From this study the practical effectiveness of each method and its computational cost is obtained.


Introduction
One of the most important problems in Numerical analysis is to solve nonlinear equations, which is usually done by standard iterative methods such as Newton, secant and bisection. Among the measures of efficiency of these methods are the order of convergence and the efficiency ratio, the second being a measure of a balance between the order of convergence of the iterative method and the number of functions that are evaluated in each iteration (see references given in this paper and the citations given therein).
However, either of these two classic ways of measuring the efficiency of an iterative method is biased. Indeed, the fact that certain iterative methods have a higher order of convergence does not necessarily imply that it converges in less time under an accepted tolerance.
In a similar way it happens to the efficiency ratio, because the time of evaluation of a function or its derivatives are often different, especially for non-linear functions. Because of this, in recent years there have been developed many numerical techniques which modifications of these methods are, among which is the predictor-corrector methods type, where the predictor is usually the method of Newton.
Although in terms of this efficiency is important to improve the order of convergence, which achieve the predictor-corrector methods, does not mean that they be in practice, where it is considered that an efficient iterative method will be which use the least computer time and has the largest radius of convergence for a particular type of problem.
In order to analyze the practical implementation of the mentioned iterative methods, this paper considers the CPU time used by each method to achieve convergence to the root with a given accuracy. That is, for each method, a sample mean of the time used is defined to solve the problem on the basis of non-linear functions (population sample) and a finite number of initial conditions, taken over some interval containing the root of such functions is defined. To validate these sample means and differentiate between each pair of these methods are considered confidence intervals [1,2]. From this study we conclude on the applicability of each method and its computational cost. Due to the large number of new methods discussed in the literature (see references), to validate the proposed classification to define some methods that come from changes to the method of Newton and that only require evaluating the first derivative are selected. In particular, those methods with a higher order of convergence and increased efficiency index than the method of Newton are selected.
The rest of the article is structured as follows. The second section briefly describes the iterative methods to be classified, and their order of convergence and its efficiency ratio, leaving the details to the reference given for each method. A proposed classification of methods is defined in the third section; then in the fourth section the numerical experimentation and the classification obtained are presented. Finally, the results are analyzed together with references work.

Iterative methods
Let : ⊂ ℝ → ℝ be a scalar function and an open interval. The problem of solving the nonlinear equation ( ) = 0 is considered, which is assumed to have a single root in , that is, ( ) = 0 and ′( ) ≠ 0. Methods for the case of multiple roots will not be considered. For any iterative method to solve this nonlinear equation, its efficiency index is defined as = 1/ where is the order of convergence's method and is the number of iterations required functions evaluated by the method [3]. For example, as the order of convergence of the Newton method is quadratic then its efficiency index equal to 2 1/2 ≈ 1.414.
Iterative methods to be classified in this work are the predictor-corrector type, where for all of them the iterative method of Newton acts as the predictor (Equation 1).
which converges quadratically in some neighborhood of [3]. In this kind of methods, the correction terms are defined as follows.
Corrector 1 (Changbum [4], 2008): Corrector 2 (Zhongyong et al. [5], 2011): Corrector 3 (Zhongyong et al. [5], 2011): (M10) Corrector 11 (Changbum and Beny [12], 2012): (M11) Table 1 shows the classification of these iterative methods according to their convergence, and efficiency index. In particular, as a position exchange of these methods is observed when classified according to their efficiency index (see last column of the table), the question is whether such classification changes or another arises when is considered the CPU time used by each one of these same methods. Following are provided answers to this question.

Classification strategy
Suppose we have a set of functions 1 , … , , where each function : [ , ] ⊂ ℝ → ℝ has a single root on the interval ( , ). It is further assumed that the given iterative methods 1 , … , are going to be sorted as follows. It is assumed the method ( = 1, … , ) estimates the root of the function ( = 1, … , ) from the initial data ̂ ( = 1, … , ) and at a time . Under these conditions it is obtained that the estimates of the roots of the functions 1 , … , are made on times 1 , … , , respectively, and its corresponding average time is , with fixed. Table 2 meets these time averages.  In this way, for all initial data or node x jk in [a j , b j ] the average times T 1k , … , T mk (k = 1, … , l) of the respective methods M 1 , … , M m are obtained (see Table 3). From these times the methods M 1 , … , M m are classified.

Statistical test
Each of the methods M i has a real media μ i , which comes from averaging all CPU times used by each method, which arise when using each point of the interval [a j , b j ] (infinite points), the domain of the function f j . As these times cannot be obtained, the times t ij k for k points [a j , b j ] (finite sample) are calculated. From that times a sample mean μ ̅ i is obtained, which is an estimate of μ i . Because an estimate of the true mean of each method is obtained, the next consequences follow. First, it can happen that two M i 1 and M i 2 methods have sample means μ ̅ i 1 and μ ̅ i 2 very close, but whose sample means μ i 1 and μ i 2 are estimates of a same real average μ i of methods M i 1 and M i 2 , that is, the real average of M i 1 and M i 2 are such that μ i 1 = μ i 2 = μ i . It may also happen that the sample means μ ̅ i 1 and μ ̅ i 2 are very close but they are estimates of different averages. This uncertainty generated by the estimate these real averages is solved by confidence intervals' theory.
The confidence intervals estimate consists of determining a couple of LI and LS values which set up the interval [LI, LS], in such a way that for a given probability of 1 − ρ (called confidence level), the condition of de Equation 2.
is fulfilled in relation to the estimate of the true mean μ. This condition ensures with this probability, the true average is contained in the interval [LI, LS], which is built [1]. For this reason, this interval is called the confidence interval.
In the case of one true mean, the confidence interval of error produces just error limits on the average. The values in the interval must be seen as fair values given the experimental data. In the case of a difference between two true means, the interpretation can be extended to a comparison of the two means. For example, if one has a great confidence that a difference μ i 1 − μ i 2 is positive, actually it follows that μ i 1 > μ i 2 with little risk of falling into an error.
In our case, it will be built intervals with a confidence level of 99% for the difference of two upcoming mean to each other, μ i+1 − μ i (i = 1, … , m − 1), such that the Equation 3.
Where LI and LS are presented by Equations 4 and 5, respectively.
and s i+1 and s i are the standard deviations in each of the samples of sizes η i+1 and η i , respectively [8]. In our case, η i = η i+1 for all M i method.

Numerical experiments
The condition of Equation 6, For the numerical experiments stopping criterion.
is implemented with a tolerance ε = 10 −8 . It is considered that all methods converge with one of these stopping criteria and not the maximum number of iterations.
To implement the strategy of classification M i iterative methods considered in the second section, it is used a base of n = 50 functions. The closed interval where the root exists of each of these functions is defined in such a way that all iterative methods converge to this root. In each of these intervals is considered a partition of l = 1000 nodes and is obtained for each method M i a sample of l values of computation times. From the sample of each method the mean and standard deviation are calculated and a table is built in ascending from the means of samples of each method (see Table 4).
In Table 4, columns 6 and 7 show the lower limit LI and the upper limit LS, which are needed to construct confidence intervals for each of the differences of two consecutive mean (see column 8 in Table 4). For example, the third row shows Equation 7.
0.6180 < 4 − 1 < 0.1088, which means one has a confidence level of 99% that the difference of the actual means of the methods M 1 and M 4 is a point in the interval (0.6180,0.1088). Also, one can ensure with the same level of confidence that μ 4 > μ 1 . At the same time, one cannot make a difference between real average μ 3 and μ 4 (see row 4 in Table 4), ie, it cannot be concluded that μ 3 < μ 4 or μ 4 < μ 3 by the fact that μ 3 − μ 4 belongs to the interval (−0.2256,0.2451). Similar conclusions can be drawn for the rest of the differences of two means shown in Table 4. It is important to note that the classification of methods based on the estimated average varies significantly to the one given in Table 1. For example, M 4 method is of higher order and with an efficiency index I ef greater than the M 1 method. However, the CPU time for convergence of M 4 method is greater than the time of M 1 method. This phenomenon is repeated in many cases, showing that many of the newly introduced methods, even if they have high convergence orders or greater efficiency index, they are not necessarily effective in practice.

Concluding remarks
Due to the constant appearance in the literature of new iterative methods with both higher convergence order and efficiency index, it is important to classify them according to a real effectiveness, CPU time. Because of this there has given a simple strategy on how to achieve this classification. The results show that there are methods with a high order of convergence or a great index of efficiency, or both inclusive, which need more CPU time than other methods with lower order of convergence. Furthermore, outside the framework of the given classification, the numerical experimentation allows to conclude that the interval of convergence of the methods that have higher order convergence is much lower than the methods with a lower order of convergence or index of efficiency, for example, the Newton method. Thus, while a method with a high-order of convergence achieves a good classification according to the mean, it loses interest because of its need for an initial approximation extremely close to the root; this was the case with the of M 7 and M 9 methods.
Although the classification was made for iterative methods applied to scalar functions with simple root, and from the method of Newton, the same can be applied to other methods and other configurations of the problem.
Finally, it would be vital that when introducing a new method, and its efficiency according to the order of convergence or efficiency ratio is promoted, is considered the classification given in this paper. This would help relieve some of the large number of new iterative methods are presented in the literature and to have a more realistic classification of them.