The DisKon Project - Numerical Results - Identifying dependencies among delays

Numerical Results

4.1 The DisKon Project

Numerical Results

The value of an idea lies in the using of it.

When I have fully decided that a result is worth getting, I go ahead of it and make trial after trial until it comes.

Just because something doesn’t do what you planned it to do doesn’t mean it’s useless.

THOMAS ALVA EDISON (1847-1931) American inventor and businessman

Computational issues must be resolved in order to the prove the practical value of the Tri-graph approach in the identification of dependencies among train delays. In fact the presented properties of this method would remain incomplete for the practical problem if either it will be too difficult to implement or it will not point out the critical points of the railway network.

4.1 The DisKon Project

Since years the Deutsch Bahn (DB) has been collecting data about disturbances of the timetable, creating a quite huge data set. These information allow the study of the system as a whole (from both a theoretical and a practical point of view). Aim of the research is to find a way to improve the timetable minimizing the impact of the occurring delays on the average delay of the passengers. In particular one of the projects on which the DB embarked upon is theDisKon - Disposition und Konfliktlö-sungsmanagement für die beste Bahn. Challenge of the project is to model a system that is able to recognize the conflicts and to solve them in order to save resources and avoid discriminations. Different universities, among which Georg-August-Universität Göttingen, are collaborating on this project. The optimization group, lead by Prof.

Schöbel at the Institute for Numerical and Applied Mathematics (NAM), considers a macro-formulation of the problem as a linear integer program (LIP). Focuses of the research are both a one-criterial formulation based on the average delay of the pas-sengerson as well as a bicriterial optimization based on one side on the delay of the vehicles and on the other on the total amount of missed connections. A Micromodel is considered to check the feasibility of the solutions.

Within the framework of this collaboration between the DB and the NAM Institute, the contribution of this thesis to the project is the analysis of the dependencies among train delays to identify the critical points of the railway system. These dependencies are then transformed into constraints of the Macro-formulation of the LIP in order to improve the robustness of the Macro-solution. This gave us the possibility to test the

Tri-graph method using real data.

4.1.1 The raw data

In Autumn 2005, the Deutsch Bahn placed at disposal of the Optimization Group at NAM a set of files containing measurements of real departure and arrival times of regional trains in the following train stations:

• Bad Harzburg;

• Goslar;

• Herzberg;

• Oker;

• Salzgitter-Ringelheim;

• Seesen;

• Vieneburg;

• Wolfenbüttel.

These stations are located in the Harz region, a mountain range in northern Germany that straddles the border between the states of Lower Saxony, Saxony-Anhalt and Thuringia (see Figure 4.1 and Figure 4.2).

Figure 4.1: Harz area

The data are collected in tabular files each of them referring to arrivals and departures in one of the stations over listed inside a three-month period. Altogether the files

4.1 The DisKon Project 71

Figure 4.2: Location of the considered stations

correspond to a time window of nine months, between Saturday1^stJanuary and Friday 30^thSeptember2005(exactly39weeks).

The tabular files contain infomation concerning:

• the identifying abbreviation of the station (e.g. HBHA for“Hauptbahnhof Bad Harzbug”);

• the kind of event (i.e. Arrival or Departure);

• the day in which the event has been measured;

• the class of the train (i.e. a number specifying the type of the train: ICE, EC, IC, RB. . .);

• the subclass of the train (i.e. a number specifying if the train is regular/special, if it transports passengers or it is empty. . .);

• the identification number of the train;

• the scheduled time of the event;

• the measured (“real”) time in which the event took place;

• the delay, in minutes, of the event (i.e. the difference between the measured and the scheduled time of an event);

• the direction in which the train is traveling (i.e. its next scheduled station).

Peculiarity of the file is to contain in its first half all the arrival events and in its second half all the departure events, both groups are chronologically ordered.

4.1.2 Working with the data

As we started to select the events that could be considered as input for the Tri-graph approach, we noticed some unexpected peculiarities of the data files.

Firstly we observed that the registrations contained in the files correspond to dif-ferent classes of trains (principally local and freight trains) traveling (with or without passengers/cargo) on the considered track system. Freight trains and empty local trains (i.e. local trains without passengers that have to be moved from one station to another due either to logistic needs or maintenance) do not have to follow a fixed timetable but they are allowed to proceed on a track whenever it is free or they are forced to wait longer in a station to avoid disturbances on the regular traffic. The vari-ance corresponding to their traveling times is very large, hence they are very difficult to model. Therefore choosing the sample for the Tri-graph approach, we decided to consider just events corresponding to passenger trains.

Afterwards we noticed that the measurements are precise just up to the minute. In fact the departures or arrivals of some trains are correctly registered in the hour and minute fields, but the field corresponding to the seconds, out of the60possible values, assume just a few of them. Many trains always arrive or depart in perfect time, i.e. the difference in seconds between the registered time and the scheduled one is constantly zero. A constant value for the “seconds field” is registered also in case of delay, no matter the magnitude of it in minutes. Nevertheless we decided to calculate the delays in seconds, instead of minutes as given in the files, in order to profit of possible addi-tional information.

Then we noted that in two stations (Oker and Wolfenbüttel) some events were strangely registered: the same scheduled timetable was considered for both the arrival and the departure of some trains. The personal of the Deutsche Bahn confirmed that these sets of trains were not scheduled to stop in these stations. Thus we considered the data related to the arrival of the trains (i.e. the ones reported in the first part of the data files) as intermediate measurements of the delays and marked these events as

“special”, defining a (sort of) third kind of events.

Moreover, counting the occurrence of every event in the whole time window we noticed double registrations of the same event. E.g. events corresponding to trains traveling every working day (in the considered time window of39weeks) were occur-ring more than195times where195=39·5(5working days per week). In that case we decided to consider just the first registration of the events per day, considering the following ones as a repetition, even if their measured delays were not exactly the same.

In contrast with the previous point, for other events we had to face missing regis-trations either of some fields of the file or of the whole event. We decided not to generate any artificial data to avoid possible influences on the identification of the dependencies and we preferred not to consider these events as suitable samples.

Finally it is important to highlight that the timetable on the weekends and festivi-ties is slightly different than during the week, so we focused our attention principally on the working days (i.e. from Monday to Friday) excluding all the festivities (Satur-days and Sun(Satur-days have been separately considered, see Section 4.4).

For the record we considered as national holidays (based on the Lower-Saxony calen-dar) the following days:

4.1 The DisKon Project 73

• Saturday1^stJanuary 2005 (New Year’s Day);

• Friday25^thMarch 2005 (Good Friday);

• Sunday27^thMarch 2005 (Easter);

• Monday28^thMarch 2005 (Easter Monday);

• Sunday1^stMay 2005 (Labour Day);

• Thursday5^thMay 2005 (Ascension Day);

• Sunday15^thMay 2005 (Whit-Sunday);

• Monday16^thMay 2005 (Whit-Monday).

Summarizing we considered as possible trains for the sample:

• local trains

• transporting passengers

• traveling on weekdays

• on the Harz area

• in a time window of9months (January-September2005).

In accordance with this list, we decided to proceed with two different strategies: on one side we decided to maximize the number of events we could consider having a reasonable amount of data for each of them, on the other side we maximize the num-ber of occurrences per event keeping a reasonable chain of events.

Concerning the first strategy we consider a set of data pro event with a cardinality equal to30reasonable. We found a sample of928events corresponding to229trains, which consists of 358 waiting activities and339 “driving” activities (the quotation marks are used since some events in the chain are missing). Since there are no festiv-ity days falling on Tuesday and Wednesday, we mainly focused our attention on those days, choosing this sample. From now on we will refer to this sample asW −30, abbreviation of “weekdays -30observations”.

On the contrary, for the second strategy we wanted to have the set of events regis-tered every weekday on the time window we got (i.e.195occurrences per event). We found a set of440events corresponding to118trains, which consists of161waiting activities and152“driving” activities (again the quotation marks refer to the absence of some events in the chain). From now on we will refer to this sample asW−195, abbreviation of “weekdays -195observations”.

events trains #observations

W−30 928 229 30

W−195 440 118 195

Table 4.1: Characteristic of the samplesW−30andW−195

Im Dokument Identifying dependencies among delays (Seite 85-90)