• Keine Ergebnisse gefunden

The following strategies for evaluating the performance of a code are frequently used and are also applied by Exler et al. [36] for a comparative study. First, criteria have to be defined to decide whether the outcome of a test run is considered as a successful return or not. The applied criteria are stated for the continuous case and can easily be adapted to mixed-integer optimization problems. Let ϵsucc > 0 be a tolerance for defining the relative accuracy, xk be the returned value of a test run, and x the supposed exact solution known from the test problem collection. Then the output is called a successful solution, if the relative error in the objective function is less than ϵsucc and if the maximum constraint violation is less thanϵ2succ, i.e., if

f(xk)−f(x)< ϵsucc|f(x)|, if f(x)̸= 0 , (6.7) or

f(xk)< ϵsucc , if f(x) = 0 , (6.8) and

∥g(xk) < ϵ2succ . (6.9) Note that the tolerance for the allowed constraint violation, cf. (6.9), might lead to returned solutions with an objective function value better than the best known one in x. These runs are also classified as successful solutions.

Another situation is taken into account. It might occur that the internal termination conditions of a code are satisfied subject to a reasonably small tolerance, but the objective function value of the returned solution is worse than the best known one.

For non-convex problems this situation is not unusual. If such a solution is returned

by a test run, it is also accepted and is called an acceptable solution, respectively. For an acceptable solution

f(xk)−f(x)≥ϵsucc|f(x)|, if f(x)̸= 0 , (6.10) or

f(xk)≥ϵsucc , if f(x) = 0 , (6.11) and

∥g(xk)< ϵ2succ (6.12) hold. The numerical tests are evaluated withϵsucc = 0.01, i.e., a relative final accuracy of one percent is required for a run to be considered as successful.

Different methodologies are applied to evaluate the performance of the tested imple-mentations. Arithmetic mean values are compared on different performance criteria, e.g., the number of function evaluations or the number of gradient evaluations. But it might be misleading to simply compare arithmetic mean value, as they might be dominated by few problems with extremely high results in the considered performance criterion. Especially, if some other codes are unable to solve these problems the aver-age numbers might be inaccurate. On the other hand, restricting the comparison of arithmetic mean values to the set of test problems that are solved successfully by all tested codes would penalize the more reliable and efficient codes.

To overcome these difficulties, two additional techniques are applied. The first one is known under the name priority theory, see Saaty [96], and has been used for exam-ple by Schittkowski [98] and Hock and Schittkowski [62] for comparing optimization codes. Appendix B contains an outline of the theoretical background of the proce-dure. The basic idea can be summarized as follows. The output of this method is a unique priority value, by which the relative efficiency of one code over another one is measured. This is achieved by comparing the codes pairwise with respect to a speci-fied performance criterion over sets of test examples, which are successfully solved by both codes. Then a reciprocal N ×N matrix is determined, where N is the number of codes under consideration. The largest eigenvalue of this matrix is positive and its normalized eigenvector is computed. The priority values are deduced from the eigen-vector after scaling the eigeneigen-vector such that the smallest coefficient becomes one.

The interpretation of these priority values is illustrated by an example. Say a relative priority value 3.0 for the number of function evaluations leads to the conclusion that the corresponding code needs 3.0 times more function calls than the best one with priority value 1.0, and twice as many as a code with priority value 1.5.

In addition, the performance is evaluated according to an approach developed by Dolan and Moré [27]. The so-called performance profiles are frequently used in com-parative numerical studies. The creation of performance profiles is explained in the following. Let the number of function evaluations be the performance criterion under consideration. It is assumed that N codes Ci, i = 1, . . . , N, are compared on a set of M test problems TPj, j = 1, . . . , M. First, the minimum number of function

evalua-Parameter Value Description

ACC 107 termination accuracy, i.e., ϵtol, ACCQP 1012 termination accuracy of QP solver, MAXIT 3,000 maximum number of iterations.

Table 6.2: Parameter settings for 306 Continuous Tests for MISQP and NLPQLP

tions needed by the codes to solve the problem successfully is determined for each test problem TPj,j = 1, . . . , M. The minimum number is denoted by nj and is defined as follows

nj := min

1iN(nij) , (6.13)

where either nij corresponds to the actual number of function evaluations code Ci needed to solve problem TPj successfully, or nij is set to a large constant value, oth-erwise.

Now nj is the reference value for all codes on the test problemTPj, and the number of function evaluations need by each code Ci, i = 1, . . . , N, is compared to nj by calculating the ratio

rij := nij

nj . (6.14)

Thus, a value1forrij indicates that codeCi is the best solver on problemTPj. A value 4 implies that the corresponding codeCi needed four times more function evaluations than the best solver on problemTPj required.

For each code Ci, i = 1, . . . , N, letSi(r) denote the set of test problems where the ratio rij, as defined by (6.14), is lower or equal to a given upper boundr, that is

Si(r) :={j | rij ≤r , 1≤j ≤M} . (6.15) Applying (6.15), then the function ϕi(r) : [1,)[0,1]is defined as

ϕi(r) := |Si(r)|

M , (6.16)

for each code Ci, i = 1, . . . , N, where |Si(r)| is the cardinal number of set Si(r).

The functionϕi(r)represents the percentage of test problems that are solved by code Ci with at most r times more function evaluations than the particular best solver on a problem. Performance profiles display the functions ϕi(r) for all codes under consideration, where r is given along the abscissa starting with 1.

Performance profiles can be interpreted in the following way. A high value for ϕi(1) indicates that the solver is efficient compared to the other solvers, as the function value represents the percentage of problems where the code needed the fewest value, with respect to the performance criterion under consideration, compared to all codes.

code nsucc nacc nerr nf unc ngrad nallf time MISQP/lag 280 25 1 25 19 266 0.75 MISQP/soc 279 25 2 70 35 433 1.64 MISQP/com 283 22 1 25 19 267 0.94

NLPQLP 284 22 0 42 21 312 0.58

Table 6.3: Performance Results for a Set of 306 Continuous Test Problems

On the other hand, the robustness and reliability of a solver corresponds to a curve that approaches1 when r increases.