• Keine Ergebnisse gefunden

Table 7.1.:Upper bounds of examples from the literature: The number of examples with a given com-plexity, failed or timeout (T/O). The number of examples in which CoFloCo reports a better or worse bound than each tool and the average (Avg.) and median (Md.) analysis times per example in seconds.

# Examples (Total 122) CoFloCo is Time(s)

Tool O(1) O(n) O(n2) O(n3) >O(n3) Fail T/O Better Worse Avg. Md.

CoFloCo 3 62 34 2 1 19 1 - - 1.44 0.48

PUBS-A 3 41 29 3 1 43 2 35 0 0.92 0.71

Loopus 2 57 28 0 2 33 0 19 4 0.05 0.02

KoAT 3 46 41 8 3 16 5 27 5 6.11 1.34

C4B 1 42 - - - 79 0 60 1 1.19 0.07

Rank 1 53 25 1 1 41 0 28 5 0.35 0.08

relations to such a constraint set is unsound if the CRS is non-terminating. On the other hand, performing the analysis without adding any input-output relations yields very weak results.

In order to obtain the best results for PUBS-A, while maintaining soundness, the following approach was taken. First, CoFloCo was executed with the option -only-termination. That option tries to prove termination of the cost relation system by performing the refinement. If all the non-terminating chains are discarded during the refinement, the CRS is non-terminating and input-output relations can be safely added to the CRS. The input-output relations were computed using the implementation of SACO. If CoFloCo failed to prove termination, no input-output relations were added to the CRS. The analysis time of PUBS-A includes the input-output relation generation (if it takes place) and the running time of PUBS-A but it does not include the auxiliary call to CoFloCo.

Rank Two options were considered for generating the input files for Rank. The first one is using a script available in KoAT’s repository that translates ITS to Rank’s representation. Unfortunately, Rank failed to analyze most of the examples generated this way, including most of the examples that come precisely from Rank’s experimental evaluation8. The second option (which is the one adopted here) consists on generating Rank’s input files using C2fsm and Aspic [FG10]. However, C2fsm supports only a limited subset of C which means that some examples had to be adapted and Rank could not be included in the remaining evaluations.

Table7.1contains a summary of the results of the analysis. It contains how many examples were reported in each complexity category, failed or timed out. Note that C4B can only compute linear bounds so its columns for non-linear bounds are empty. The right-hand side of the table contains the number of examples in which each CoFloCo computed a better or worse asymptotic bound than each of the other tools. For instance, CoFloCo computed a better bound than KoAT in27examples and Loopus computed a better bound than CoFloCo in4examples. Compared to all the other tools, CoFloCo was better in more examples than it was worse. Finally, the average and median times in seconds needed per program are reported. These times do not include the translation times between formats.

The second evaluation compared CoFloCo to PUBS-M (the implementation of the analysis presented in [AGM13]) for computing lower bounds. None of the other tools can compute lower bounds. The analyzed examples include the 122 examples from the first evaluation plus the examples of PUBS’s evaluation and the examples of the evaluation in [ABAG13] making a total of 192. These additional

8 This can be seen for example in the results of the experimental evaluation of [BEF+16]. Its cause is not clear, but it might be related to the way loops are encoded.

7.2. Experiments on Imperative programs 125

Table 7.2.:Lower bounds of examples from the literature: The number of examples with a given com-plexity, failed or timeout (T/O) and the average (Avg.) and median (Md.) times per example in seconds.

# Examples (Total 192) Time(s)

Tool Ω(1) Ω(log(n)) Ω(n) Ω(nlog(n)) Ω(n2) ≥Ω(n3) T/O Fail Avg. Md.

CoFloCo 64 0 97 0 26 4 1 0 1.89 1.10

PUBS-M 88 2 35 1 17 5 0 44 2.33 1.87

Table 7.3.:Replication of real world experimental evaluation of [SZV17]: The number of examples with a given complexity, failed or timeout (T/O). The number of examples in which CoFloCo reports a better or worse bound than each tool and the average (Avg.) and median (Md.) analysis times per example in seconds.

# Examples (Total 1650) CoFloCo is Time(s)

Tool O(1) O(n) O(n2) O(n3) >O(n3) Fail T/O Better Worse Avg. Md.

CoFloCo 211 144 39 0 0 1242 14 - - 1.70 0.66

PUBS-A 195 138 36 0 0 1218 63 25 0 3.45 0.38

Loopus 205 486 97 12 2 839 9 18 426 0.75 0.04

KoAT 204 135 41 0 1 1144 125 21 3 7.72 0.69

Loopus * 194 138 40 0 0 1274 4 27 5 0.68 0.05

examples were not included in the previous evaluation because they are only available as cost relation systems. These CRS were already expressed only in terms of the input variables so they could be directly analyzed by CoFloCo and PUBS-M. The input files generated from C programs were transformed in the same way as in the previous evaluation. The results can be found in Table 7.2. In this case, the table includes columns for the complexities Ω(log(n)) and Ω(nlog(n)) as PUBS-M can obtain this kind of bounds. In addition, the table distinguishes between examples where a trivial bound is obtained “Ω(1)” and examples where the tool fails to return any bound “Fail”.

For lower bounds CoFloCo was executed with the following options -v 3 -compute_ubs no -conditional_lbs -stats (see Appendix A for a description of each option). It is worth pointing out that the option -conditional_lbs generates piece-wise defined lower bound functions (see Sec-tion 6.9) and we consider the complexity of a piece-wise lower bound to be defined as the maximum complexity appearing in some of its partitions. This, together with the refinement, contributed greatly to the precision of CoFloCo. CoFloCo obtained a better result than PUBS-M (a higher complexity order) in 74 examples. In contrast, PUBS-M obtained better bounds in 4 examples. In 2 of these examples, PUBS-M obtained an exponential and a nlog(n) bound which are not yet supported by CoFloCo. The other2examples are instances of a class of problem that is discussed in the future work Chapter10.

7.2.2 Loopus’s Real World Experimental Evaluation

In the recent work [SZV17], an extensive experimental evaluation was conducted. In that evaluation, 1659 functions from a compiler optimization benchmark (cBench)9 were analyzed. This benchmark contains a total of 1027 different C files with 211892 lines of code. Table7.3contains a replication of this

9 http://ctuning.org/wiki/index.php/CTools:CBench

0 0.5 1 2 4 8 16 32 58 60 0

100 200 400 600 800 1,000 1,200 1,400

Time(sec)

#Examples

CoFloCo PUBS-A

Loopus KoAT

Figure 7.1.:Analysis time histogram of real world experimental evaluation: Number of examples in each time range and for each tool.

evaluation (with 1650 examples)10with the tools CoFloCo, KoAT, and PUBS-A. The tools C4B and Rank could not be evaluated on this benchmark as they support only a limited subset of C. This benchmark contains bigger examples so in this case Llvm2kittel was called with the following additional parameters -multi-pred-control -only-loop-conditions that perform slicing and simplify the generated ITS.

CoFloCo was also executed with the additional parameter-compress_chains 2. The rest of the setup was as in the previous evaluations.

By examining the results, it became evident that CoFloCo, KoAT and PUBS-A were failing to compute a bound in many examples because the translation using Llvm2kittel does not consider structs and simple pointer references whereas these elements are better handled by Loopus. In order to isolate the effect of the translation, the examples generated by Llvm2kittel were translated back into C programs11 and Loopus was executed on the resulting programs. This corresponds to the row Loopus * in Table 7.3.

The results indicate that the translation plays a major role in the results. Factoring out the translation, Loopus *, KoAT and CoFloCo report similar results in terms of number of examples analyzed successfully.

CoFloCo is better in more examples but Loopus is considerably faster.

The analysis times are not uniformly distributed. On the contrary, most of the tools worked reasonably fast for most of the examples and took a long time for a few of them. Therefore, in addition to the average and median times, a histogram of the analysis times is reported in Figure 7.1. The horizontal axis contains the analysis times in logarithmic scale (except for the last interval58−60 which is used for time-outs) and the vertical axis contains the number of examples analyzed in each time range. Here the differences between tools become more evident. Loopus analyzed most of the examples (1485) in

10 Some examples were excluded because the translation tools failed.

11 Using the script available athttps://github.com/s-falke/kittel-koat.

7.2. Experiments on Imperative programs 127

Table 7.4.:Replication of challenging loop patterns experimental evaluation of [SZV17]: The number of examples with a given complexity, failed or timeout (T/O). The number of examples in which each tool reports a tight bound (Tight) or a finite over-approximation (Ov). The average analysis times per example in seconds with and without timeouts and the median analysis time (Md.).

# Examples (Total 23) Time(s)

Tool O(n) O(n2) O(n3) O(n4) Fail T/O Tight Ov w TO w/o TO Md.

CoFloCo 14 6 1 1 0 1 20 2 17.57 4.73 1.33

PUBS-A 6 3 0 1 12 1 8 2 17.30 4.45 1.89

Loopus 16 5 0 2 0 0 21 2 0.09 0.09 0.05

KoAT 7 9 3 0 1 3 10 9 62.20 26.51 9.98

less than0.5seconds whereas CoFloCo took less than2seconds for1512examples and KoAT had more examples on the higher time intervals (it needed more than2seconds for509examples). PUBS-A was slightly faster than CoFloCo in many examples but it also timed out more (63versus14). Note that this distribution is likely to reflect the fact that, in practice, most functions are small and simple and only a few of them are really complex.

7.2.3 Loopus’s Challenging Loop Patterns Evaluation

The work [SZV17] also contains a selection of 23 challenging integer loop patterns taken from the previous and other benchmarks that present an amortized cost. Table7.4contains a replication of this evaluation with the latest version of CoFloCo. In this case, the timeout is set to300seconds. CoFloCo was run with the additional options-compress_chains 1and -n_candidates 2. The rest of the tools were run with the same options as in the previous evaluation. This table does not include a column for programs with constant cost as all examples have at least linear complexity. Additionally, Table7.4does not contain a comparison with CoFloCo. Instead, the columnsTightandOver-app indicate the number of examples in which each tool computed an asymptotically tight bound or an over-approximation. Note that the columnsTightandOver-appadd up to the number of examples in which the tools return a finite bound, that is, the cases where they did not fail nor timeout.

CoFloCo obtained a bound for all examples except one in which it times out. This example has many nested ifs inside loops which result in a very high number of paths. In fact, CoFloCo timed out while performing the preprocessing (Chapter4). From the examples where CoFloCo obtained a bound, it over-approximated two, the same number as Loopus. However, the examples that were over-over-approximated by CoFloCo and Loopus are different. This results are significantly better than those of PUBS-A and KoAT.

Note that KoAT also found a bound for most examples but it was unable to obtain amortized bounds.

Consequently, it over-approximated the complexity in more examples.

In this case, the timeout is much higher (300instead of 60) so in addition to the average times, the average times without counting timeouts are included. In this case, the behavior of the tools in terms of analysis times is similar to the previous evaluations. Loopus was the fastest, KoAT the slowest and CoFloCo and PUBS-A were in between with similar analysis times.