• Keine Ergebnisse gefunden

Outputs of the Saturdays and Sundays samples

Im Dokument Identifying dependencies among delays (Seite 99-103)

Numerical Results

4.4 Outputs of the Saturdays and Sundays samples

Regional Trains - weekdays

Corr. α W−195 Mon-Wed-Fri Tue-Thu Monday

# Virtual # Virtual % # Virtual % # Virtual %

0.01 41 40 97.6 21 51.2 17 41.5

- 0.05 67 66 98.5 41 61.2 32 47.8

0.10 112 91 81.3 63 56.3 52 46.4

0.01 12 10 83.3 2 16.7 1 8.3

B 0.05 12 11 91.7 2 16.7 1 8.3

0.10 13 12 92.3 2 15.4 1 7.7

0.01 17 16 84.1 4 23.5 1 5.9

BH 0.05 21 18 85.7 7 33.3 3 14.3

0.10 24 19 79.2 9 37.5 3 12.5

Table 4.15: Comparison of virtual edges for the samples W −195, MW F−117, T T −78andMo−39

When no statistical correction is applied, the percentage of highlighted edges is not so regular as in other cases but it is remarkable that with just one fifth of the original data, the Tri-graph is able to point out half of the “reference virtual edges”. To check if this result is a general one or if it is just due to the chosen sample, we tested the Tri-graph also on the other weekdays (Tuesday to Friday). We refer to these samples asT u−39,We−39,T h−39andFr−39. This was the easiest rule to select other samples with the same size of the “Monday-sample”.

The comparison among the different “day samples” has been summarized in Table 4.13. The percentages of pointed edges by the five samples are approximately fifty percent in the case of no statistical correction.

However we are not interested in the number of edges that the Tri-graph highlights.

Principally we want to consider the “virtual” connections that will be pointed out (these edges are in fact the ones that will be transformed into new constraints for the timetable problem). Hence we compared also the results concerning the “virtual”

activities in Table 4.14.

With the only exception of Tuesday, that also gave the smallest percentage regard-ing the total number of edges, the percentage of virtual connections that have been highlighted in the single weekdays is around40%. This is a really good outcome since we are using just the20% of the original data.

These results allow us to state, with cautious optimism, that also the output obtained in the first strategy, i.e.W−30, is close to the half of what we expect to have.

4.4 Outputs of the Saturdays and Sundays samples

For the record we include the outputs of the Tri-graph for the holidays timetable (Sat-urdays and Sundays) where we considered:

• regional trains

• traveling every holiday (39data)

• in the Harz area.

Regional Trains - Saturday - 39 data activity

Correction Quantile Nr. Edges wait drive drive2 virtual error

α # # % # % # % # % # %

0.01 327 149 45.6 58 17.7 36 11.0 25 7.6 59 18.0

- 0.05 517 161 31.1 88 17.0 60 11.6 45 8.7 163 31.5

0.10 665 164 24.7 97 14.6 86 12.9 61 9.2 257 38.6

0.01 111 87 78.4 19 17.1 2 1.8 3 2.7 0 0.0

B 0.05 121 93 76.9 21 17.4 3 2.5 3 2.5 1 0.8

0.10 126 97 77.0 21 16.7 3 2.4 3 2.4 2 1.6

0.01 146 106 72.6 24 16.4 4 2.7 7 4.8 5 3.4

BH 0.05 166 112 67.5 28 16.9 6 3.6 10 6.0 10 6.0

0.10 179 117 65.4 31 17.3 8 4.5 11 6.1 12 6.7

possible edges 152 076

Table 4.16: Output of the Tri-graph for the sampleS a−39

Regional Trains - Sunday - 78 data activity

Correction Quantile Nr. Edges wait drive drive2 virtual error

α # # % # % # % # % # %

0.01 366 154 42.1 77 21.0 39 10.7 27 7.4 69 18.9

- 0.05 558 165 29.6 107 19.2 62 11.1 53 9.5 171 30.6

0.10 755 170 22.5 119 15.8 90 11.9 74 9.8 302 40.0

0.01 116 87 75.0 19 16.4 2 1.7 3 2.6 5 4.3

B 0.05 130 97 74.6 20 15.4 4 3.1 3 2.3 6 4.6

0.10 137 102 74.5 22 16.1 4 2.9 3 2.2 9 4.4

0.01 164 113 68.9 28 17.1 7 4.3 7 4.3 9 5.5

BH 0.05 190 122 64.2 36 18.9 10 5.3 10 5.3 12 6.3

0.10 200 125 62.5 40 20.0 11 5.5 10 5.0 14 7.0

possible edges 176 715

Table 4.17: Output of the Tri-graph for the sampleS u−39

4.5 Conclusion 85 We have two sets of events with cardinality:

• 551for Saturday;

• 594for Sunday.

The sample corresponding to Saturday (S a−39) is characterized by:

• 206waiting activities;

• 206“driving activities”.

and the one corresponding to Sunday (S u−39) is characterized by:

• 221waiting activities;

• 223“driving activities”.

We use again the quotation marks for the driving activities since some events of the chain are missing.

events waiting #driving

S a−39 551 206 206

S u−39 594 221 223

Table 4.18: Characteristic of the samplesS a−39andS u−39

Concerning these two samples, the Tri-graph method is capable to catch in the best cases around 76% of the waiting activities, but not more than50% of the driving activities.

4.5 Conclusion

In this chapter we considered different samples of data built up from the files provided by the Deutsch Bahn within the frame of the DisKon Project to the AG-optimization group at NAM. We tested them with the stochastic approaches presented in Chapter 3:

Contingency Table, Full Conditional Independence Graph, Covariance Graph and Tri-graph. These approaches were generally tested without a multistatistical correction, only in the case of the Tri-graph two possible corrections were considered: Bonferroni and Benjamini-Hochberg.

The outputs of the different methods (in absence of multistatistical corrections) were compared to get a first impression of their powers.

The Contingency Table test has been designed to identify independencies between variables and it seems weak in identifying (more complex) dependencies, principally because it strongly suffers from the transitivity property. The “new” version suggested in Chapter 3 was not able to strengthen the property of the classical formulation. Con-sequently a direct comparison with Tri-graph was not possible. Only its classical formulation could be compared with the Covariance Graph.

The Full Conditional Independence Graph could also not be applied, since the cor-relation matrices corresponding to the samples are strongly linearly dependent. The inaccurate measurement of the delays (exactly just up to the minute and not up to the second) and the small slack times associated to the waiting activities make it impossi-ble to evaluate the precision matrix on which FCIG is based.

The Covariance Graph was easily applicable to the samples but the procedure suf-fers, as the Contingency Table test, from the transitivity property (even if in a weaker form), hence its output results into huge sets of edges. If we transformed all these edges into virtual activities for the capacity model (as explained in Section 2.11), the problem would contain a large amount of dominated constraints, i.e. unnecessary constraints.

The Tri-graph procedure is capable to do a selective identification of edges and to avoid most of the edges that can be referred to the transitivity property. Moreover if the samples contain enough data (e.g. W −195) it catches almost all the waiting activities and most of the driving activities. In case the amount of avaliable data is reduced to one fifth of the original size, it is still capable to catch40% of the original virtual activities.

Im Dokument Identifying dependencies among delays (Seite 99-103)