Performance Evaluation - New Trust Region SQP Methods for Continuous and Integer Optimization

The following strategies for evaluating the performance of a code are frequently used and are also applied by Exler et al. [36] for a comparative study. First, criteria have to be deﬁned to decide whether the outcome of a test run is considered as a successful return or not. The applied criteria are stated for the continuous case and can easily be adapted to mixed-integer optimization problems. Let ϵsucc > 0 be a tolerance for deﬁning the relative accuracy, x_k be the returned value of a test run, and x^⋆ the supposed exact solution known from the test problem collection. Then the output is called a successful solution, if the relative error in the objective function is less than ϵ_succ and if the maximum constraint violation is less thanϵ²_succ, i.e., if

f(x_k)−f(x^⋆)< ϵ_succ|f(x^⋆)|, if f(x^⋆)̸= 0 , (6.7) or

f(x_k)< ϵ_succ , if f(x^⋆) = 0 , (6.8) and

∥g(x_k)⁻∥∞ < ϵ²_succ . (6.9) Note that the tolerance for the allowed constraint violation, cf. (6.9), might lead to returned solutions with an objective function value better than the best known one in x^⋆. These runs are also classiﬁed as successful solutions.

Another situation is taken into account. It might occur that the internal termination conditions of a code are satisﬁed subject to a reasonably small tolerance, but the objective function value of the returned solution is worse than the best known one.

For non-convex problems this situation is not unusual. If such a solution is returned

by a test run, it is also accepted and is called an acceptable solution, respectively. For an acceptable solution

f(x_k)−f(x^⋆)≥ϵ_succ|f(x^⋆)|, if f(x^⋆)̸= 0 , (6.10) or

f(x_k)≥ϵ_succ , if f(x^⋆) = 0 , (6.11) and

∥g(x_k)⁻∥_∞< ϵ²_succ (6.12) hold. The numerical tests are evaluated withϵ_succ = 0.01, i.e., a relative ﬁnal accuracy of one percent is required for a run to be considered as successful.

Diﬀerent methodologies are applied to evaluate the performance of the tested imple-mentations. Arithmetic mean values are compared on diﬀerent performance criteria, e.g., the number of function evaluations or the number of gradient evaluations. But it might be misleading to simply compare arithmetic mean value, as they might be dominated by few problems with extremely high results in the considered performance criterion. Especially, if some other codes are unable to solve these problems the aver-age numbers might be inaccurate. On the other hand, restricting the comparison of arithmetic mean values to the set of test problems that are solved successfully by all tested codes would penalize the more reliable and eﬃcient codes.

To overcome these diﬃculties, two additional techniques are applied. The ﬁrst one is known under the name priority theory, see Saaty [96], and has been used for exam-ple by Schittkowski [98] and Hock and Schittkowski [62] for comparing optimization codes. Appendix B contains an outline of the theoretical background of the proce-dure. The basic idea can be summarized as follows. The output of this method is a unique priority value, by which the relative eﬃciency of one code over another one is measured. This is achieved by comparing the codes pairwise with respect to a speci-ﬁed performance criterion over sets of test examples, which are successfully solved by both codes. Then a reciprocal N ×N matrix is determined, where N is the number of codes under consideration. The largest eigenvalue of this matrix is positive and its normalized eigenvector is computed. The priority values are deduced from the eigen-vector after scaling the eigeneigen-vector such that the smallest coeﬃcient becomes one.

The interpretation of these priority values is illustrated by an example. Say a relative priority value 3.0 for the number of function evaluations leads to the conclusion that the corresponding code needs 3.0 times more function calls than the best one with priority value 1.0, and twice as many as a code with priority value 1.5.

In addition, the performance is evaluated according to an approach developed by Dolan and Moré [27]. The so-called performance proﬁles are frequently used in com-parative numerical studies. The creation of performance proﬁles is explained in the following. Let the number of function evaluations be the performance criterion under consideration. It is assumed that N codes Ci, i = 1, . . . , N, are compared on a set of M test problems TP_j, j = 1, . . . , M. First, the minimum number of function

evalua-Parameter Value Description

ACC 10⁻⁷ termination accuracy, i.e., ϵ_tol, ACCQP 10⁻¹² termination accuracy of QP solver, MAXIT 3,000 maximum number of iterations.

Table 6.2: Parameter settings for 306 Continuous Tests for MISQP and NLPQLP

tions needed by the codes to solve the problem successfully is determined for each test problem TP_j,j = 1, . . . , M. The minimum number is denoted by n^⋆_j and is deﬁned as follows

n^⋆_j := min

1≤i≤N(n_ij) , (6.13)

where either n_ij corresponds to the actual number of function evaluations code C_i needed to solve problem TP_j successfully, or n_ij is set to a large constant value, oth-erwise.

Now n^⋆_j is the reference value for all codes on the test problemTP_j, and the number of function evaluations need by each code C_i, i = 1, . . . , N, is compared to n^⋆_j by calculating the ratio

r_ij := n_ij

n^⋆_j . (6.14)

Thus, a value1forr_ij indicates that codeC_i is the best solver on problemTP_j. A value 4 implies that the corresponding codeC_i needed four times more function evaluations than the best solver on problemTP_j required.

For each code C_i, i = 1, . . . , N, letS_i(r) denote the set of test problems where the ratio r_ij, as deﬁned by (6.14), is lower or equal to a given upper boundr, that is

S_i(r) :={j | r_ij ≤r , 1≤j ≤M} . (6.15) Applying (6.15), then the function ϕi(r) : [1,∞)→[0,1]is deﬁned as

ϕ_i(r) := |S_i(r)|

M , (6.16)

for each code C_i, i = 1, . . . , N, where |S_i(r)| is the cardinal number of set S_i(r).

The functionϕ_i(r)represents the percentage of test problems that are solved by code Ci with at most r times more function evaluations than the particular best solver on a problem. Performance proﬁles display the functions ϕ_i(r) for all codes under consideration, where r is given along the abscissa starting with 1.

Performance proﬁles can be interpreted in the following way. A high value for ϕ_i(1) indicates that the solver is eﬃcient compared to the other solvers, as the function value represents the percentage of problems where the code needed the fewest value, with respect to the performance criterion under consideration, compared to all codes.

code n_succ n_acc n_err n_{f unc} n_grad n_allf time MISQP/lag 280 25 1 25 19 266 0.75 MISQP/soc 279 25 2 70 35 433 1.64 MISQP/com 283 22 1 25 19 267 0.94

NLPQLP 284 22 0 42 21 312 0.58

Table 6.3: Performance Results for a Set of 306 Continuous Test Problems

On the other hand, the robustness and reliability of a solver corresponds to a curve that approaches1 when r increases.

Im Dokument New Trust Region SQP Methods for Continuous and Integer Optimization (Seite 142-145)