• Keine Ergebnisse gefunden

3.4 Summary

4.1.6 Experimental Evaluation and Computational Budget

Now we take a look at the experimental evaluation that has been done in the literature so far with respect to large-scale multi-objective optimisation. With such an amount of different methods published within a short time, and on a topic that is still in development and in need of good methods, it is also natural that the evaluation criteria differ. Therefore, this thesis provides a comparison of the experiments done in the literature. A detailed overview of the experimental methodologies and results in each of the related articles is given in Appendix A. In the following, these findings are summarised and analysed.

In Table 4.3 we provide an overview of each of the current large-scale methods that exist at the current point in time. We list which other algorithms, large-scale and conventional, they were compared with, which benchmark functions were used with how many variables and how many objective functions, and lastly we provide the amount of function evaluations that were used in each of the experiments. The experiments done in the literature differ largely between works. As we see in Table 4.3, some algorithms were only compared to conventional “low-scale” algorithms and some have only used relatively “easy” benchmarks, like the ZDT functions. In some works, other large-scale methods were used for comparison as well, but the used numbers of variables and objective functions also varies greatly.

The algorithm which was used most often for comparison from the large-scale area might be the MOEA/DVA, which appears in 7 out of 11 works which were published after MOEA/DVA. LMEA was used 3 times and CCGDE3 2 times. In total, 5 out of 13 articles have also never compared their algorithms with other large-scale methods, although one of them is the CCGDE3, which was at time of its publication the only large-scale method. Most methods actually compare their large-scale methods with normal, established metaheuristic methods. This makes sense in the way that many of the large-scale methods use other algorithms inside them to optimise the formed subproblems as written above, so it is of interest how the performance compares to them.

On the other hand, there seems to be a general lack of comparison between most of the

ComparedAlgorithms#FunctionYearSourceProposedMethod large-scalenormalBenchmarks#Variables#ObjectivesEvaluations

2013[3]CCGDE3-GDE3,NSGA-IIZDT1-3,ZDT6200-50002upto10,000,000

2015[24]MOEA/DVA- NSGA-III,SMS-EMOA,MOEA/D UF1-10,WFG1-9,DTLZ1,DTLZ3 24-10002-3upto3,000,000 2016[25]LMEAMOEA/DVA MOEA/D,NSGA-III,KnEA DTLZ1-7,WFG3,UF9,UF10,LSMOP1-9 100-50003-10 1,000,000-230,000,000

2016[67]MOEA/D-RDGMOEA/DVAMOEA/DUF1-10,WFG1-9800-10002-310,000,000

2016[79]MOEA/D2-MOEA/D,GDE3DTLZ1-7200-12003100,000 2017[68]DPCCMOEA CCGDE3,MOEA/DVA -DTLZ1-7,WFG1-9 1000310,000,000 2017[78]ReMO- NSGA-II,MOEA/D modifiedversionsofZDT1-3 10,00023000 2018[77]S3-CMA-ES MOEA/DVA,LMEA NSGA-III,RVEA,BiGE LSMOP1-9500-15005-15 5,000,000-15,000,000 2018[70]CCLSM- NSGA-II,IBEA,NSGA-III WFG2-3,UF5,LSMOP1,LSMOP5,LSMOP9 100-3002-1050,000 2018[92]MOEA/D(s&ns)-NSGA-II ZDT1-3,LSMOP1,LSMOP5,LSMOP9 200-3002-3? 2018[71]PEA LMEA,MOEA/DVA NSGA-III,MaOEA-R&D,BiGE LSMOP1-3,MaF1-7 307-10393-10 3,070,000-10,390,000 2018[47]DLS-MOEA CCGDE3,ReMO,WOF-SMPSO,MOEA/DVA,LMEA SMS-EMOA,MOEA/D,NSGA-II ZDT4,DTLZ1,DTLZ3,DTLZ6,WFG1-9,UF1-7 1024-8192210,000,000 2019[69]LSMOF WOF-NSGA-II,WOF-SMPSO,MOEA/DVA NSGA-II,MOEA/D-DE,SMS-EMOA,CMOPSO DTLZ1-7,LSMOP1-9,WFG1-9 200-50002-350,000

Table4.3:Experimentalevaluationsofrelatedworkintheareaoflarge-scalemulti-objectiveoptimisationwiththeircomparedalgorithms,usedbenchmarks,dimensionalitiesandcomputationalbudget.

4.1. CLASSIFICATION OF LARGE-SCALE ALGORITHMS 77

state-of-the-art large-scale methods. A reason might of course be the lack of available codes online. Another reason might also be that in scientific articles, space is usually limited and a comprehensive comparison might not fit in each of these works. However, it is also possible that the high frequency of publications made it difficult to obtain the current state-of-the-art for some of these works. A lot of the published methods are, even at the time of writing this thesis, only a few months old, and many of the listed articles might have been written or under reviews simultaneously. It is therefore of great interest to the field to compare the performance of the newest members of the large-scale area with each other in the future, and some comparison with available state-of-the-art, especially some of the newest algorithms, is provided in this thesis in Chapter 6.

Looking at the used benchmarks and their dimensionality, we also observe great differences.

In total, among all the studies, the DTLZ functions were used most often, in 7 out of 13 articles. Following are the LSMOP (6 times) and WFG (6 times), UF (5 times) and ZDT (4 times) benchmarks. However, not all problems of the respective families were used in all articles. Some only picked one or two of the problems of the respective family and did not report the performance for the other ones. In the ReMO article, only modified versions of the ZDT1-3 function were used instead of the original ones. In summary, none of the 13 studies compared the performance on all of the most relevant benchmark problems in the DTLZ, UF, WFG, and LSMOP benchmark suites (where the ZDT functions are excluded due to their relatively low complexity).

We would also like to point out that the number of objective functions and variables in Table 4.3 are not to be understood in the way that exhaustive combinations of these numbers were tested with all of the used benchmark functions. In many works, only certain benchmarks were used with certain numbers of variables and objectives (for further details refer to Appendix A). Often, especially with the LSMOP benchmarks, the problems with lower numbers of objectives were used also with lower numbers of variables. This leads to the situation where the large-scale instances were at the same time also many-objective instances. And while the results are in some cases impressive, this kind of evaluation makes it of course difficult to examine separately the performance in large search spaces and the performance in large objective spaces. This is for instance visible in the evaluation of the S3-CMA-ES algorithm, which was outperformed on several problem instances by the NSGA-III [23] and RVEA [21] algorithms, which are dedicated many-objective algorithms and originally not designed for large search spaces. Such results can indicate that the actual challenge of the used problem instances was more related to the high number of objective functions rather than the high-dimensional decision space.

Taking a closer look at the numbers reveals that the term “large-scale” is also not universally defined in the literature. While some works, for instance CCLSM and MOEA/D(s&ns) propose large-scale methods and tested them with at most 300 decision variables, on the other end of the scale CCGDE3, LMEA and LSMOF used up to 5000

variables, DLS-MOEA used between 1024 and 8192 variables, and ReMO performed experiments with 10,000 decision variables. If we turn this around and ask the question how the large-scale algorithms perform on rather low-dimensional problems, there is surprisingly less experimental evidence in the literature. Problem instances with fewer than 100 variables were only tested with the DLS-MOEA. One might argue that it is not actually necessary to develop algorithms which are universally superior for any number of variables, as there is no free lunch and in a real application the dimensions of the problems are known. Therefore, an educated choice can be made to employ either traditional or large-scale methods. However, from a scientific point of view, it is still of interest which parts, i.e. building blocks, of large-scale methods might lead to deterioration of performance in traditional benchmark sizes.

Regarding the number of objectives, it was mentioned above that only LMEA, S3 -CMA-ES, CCLSM and PEA were tested on more than 3 objective functions. Both of the indicator-based approaches were not tested for more than 2 (DLS-MOEA) and 3 (LSMOF) objectives respectively. On the other hand, PEA, CCLSM and LMEA were tested with up to 10, the S3-CMA-ES with up to 15-objective problems. Since the current large-scale approaches provide a variety of different techniques, it is likely that they might show different behaviours when facing many-objective problems, and an experimental evaluation in this regard might bring valuable insights for future developments in this area.

Finally, a very interesting observation that sets many of the algorithms apart is the computational budget that is used in the respective experiments. Since a variety of algorithms exist which are able to solve current large-scale benchmarks through different methodologies, a question of interest is the performance of algorithms over time. The author of this thesis published a study in this regard in 2017 [6] which showed that LMEA and MOEA/DVA are unable to reach acceptable results before millions of function evaluations are used up, while the WOF algorithm (see Section 5.1) is able to deliver good approximations after just 100,000 evaluations. This is due to the large overhead these two methods need for obtaining interaction-based groups. In the meantime since the study was conducted, however, a variety of new algorithms has been proposed, and many of them do not rely on interaction-based groups any more, as was discussed above.

Therefore, we now take a look at the computational budget these methods use in their original implementations by comparing the experimental evaluation in the respective articles. In the last column of Table 4.3 the number of function evaluations used in the articles’ experiments is listed. Out of the 13 large-scale algorithms, only the ReMO, CCLSM, MOEA/D2 and LSMOF use fewer than 1 million function evaluations. A number of 3 to 10 million is often used in most of the studies. In the extreme case, the experiments in LMEA used up to 230,000,000 evaluations to test the algorithm’s performance. Sometimes these high numbers may have been chosen due to a comparison with other algorithms which need a large overhead. In many cases, however, it is not clear