Effiziente Algorithmen für ein Fahrplanauskunftsystem

(1)

Universit¨at Konstanz

Effiziente Algorithmen f¨ ur ein Fahrplanauskunftsystem

Frank Schulz

Konstanzer Schriften in Mathematik und Informatik Nr. 110, Januar 2000

ISSN 1430–3558

c Fakultät für Mathematik und Informatik Universität Konstanz

Fach D 188, 78457 Konstanz, Germany Email: preprints@fmi.uni–konstanz.de

WWW: http://www.fmi.uni–konstanz.de/Schriften/

Konstanzer Online-Publikations-System (KOPS) URL: http://www.ub.uni-konstanz.de/kops/volltexte/2006/2085/

(2)

Effiziente Algorithmen f¨ ur ein

Fahrplanauskunftsystem

Efficient Algorithms for a

Timetable Information System

Diplomarbeit von Frank Schulz

Universit¨ at Konstanz

Fakult¨ at f¨ ur Mathematik und Informatik

Dezember 1999

(3)

Abstract

Algorithms for timetable information systems are usually based on Dijkstra’s algorithm. The concrete scenario that we have in mind is a central information server in the realm of public railroad traffic on wide-area networks. Due to the large size of the un- derlying timetables the efficiency of a naive implementation is not acceptable in practice, so usually heuristics are used to improve the efficiency. Typically, using such heuristics means that the optimality of the solutions can no longer be guaranteed. In contrast, we investigate optimality-preserving speed-up techniques for Dijkstra’s algorithm. The basic question is whether algorithms that compute optimal solutions are competitive on contemporary computer technology. Therefore, we present the results of a computational study based on real-world data: the timetable that contains all German trains, and a snapshot of customer queries to an existing information server.

Acknowledgements

The author would like to thank Dorothea Wagner and particularly Karsten Weihe for their great support.

(4)

Chapter 1 Introduction

Imagine you want to travel by train from some station A to another station B. What is the best connection between these stations? This question is the basic motivation for the study.

The connection should be “the best.” This is a very subjective term and has to be defined. Your preference could for example be: What is, among the fastest connections leaving A between 10 and 11 o’clock, the one with the fewest train changes? This study will focus on the most fundamental kind of queries:

What is the fastest connection leaving A not earlier than a specified time.

It will soon be clear that, theoretically, the problem is not hard: Dijkstra’s algorithm solves it with a worst-case complexity of O(nlogn), if n denotes the number of all departures in the timetable.

However, in practice it is not enough to have the guarantee of a good asymptotic worst- case behaviour. The absolute amount of time that is needed to solve queries is crucial.

We have a concrete scenario in mind: an information system for the timetable containing all German trains, which is operated on a central server. The ticket offices may be connected to that server, or it may be accessible through a web-interface. So, the system has to be able to answer a large number of on-line queries. The real-time restrictions are soft, which basically means that the average response time is more important than the maximum response time. Since the system is operated on a central server, space consumption is not an issue.

The average response time of exact solvers (e.g., the naive implementation of Dijkstra’s algorithm) seems to be inacceptable, so usually the problem is approached heuristically.

In contrast, we want to investigate whether optimality-preserving algorithms can cope with the requirements of a timetable information system, which basically means that the algorithms have to be efficient enough to guarantee the required low average response time.

As mentioned above we focus on one simple optimization criterion: the minimization of the traveltime.

The characteristics of the real-world data will be crucial for the success or failure of the algorithms that we will present. Hence, experiments with artificial data do not make

(6)

Introduction 2 much sense, and so our computational study will be based on real-world data (see [21] for a discussion of this problem).

Overview In Chapter 2 the problem is formally defined, and a graph model is introduced where fastest connections correspond with shortest paths. The design of that model is crucial, it is the basis for all algorithms. Then, some details of the real-world timetable and queries, which are later used for the experiments, are presented to give an impression of the characteristics of the input. The chapter ends with an overview of related work.

The next chapter describes the algorithms. It begins with a detailed description of Dijkstra’s algorithm and of two implementations of a priority queue. Fundamental extensions improving the efficiency are shown. Then, four optimality-preserving speed-up techniques are presented. The chapter concludes with considerations concerning the combination of the speed-up techniques.

In Chapter 4 the computational study is presented. The set-up of the experiments is described, and the results are shown. With that background, the four speed-up techniques are reviewed. Then, the results of the computational study are analysed in detail.

In the final remarks the results are summarized and an outlook concerning aspects of generalization is given.

(7)

Chapter 2 Foundations

2.1 Points in Time

The aspect of time is fundamental for this study. The smallest time unit is a minute. To describe points in time, one first day is determined such that all points in time that we want to describe do not occur before that first day. Then, the minutes past 0 o’clock of the first day are used to describe a point in time, mostly denoted by the letter t. Note that not all points in time that we will use refer to the same first day. Only points in time that refer to the same first day can be compared according to minutes past midnight, for example to calculate the time difference.

Sometimes, only thetime of dayis of interest. For example, 10 o’clock in the morning will be described by t = 600 or t = 600 + 24 · 60 = 2040. We will need a unique representation of a time of day: δ(t) denotes the smallest non-negative integerrsuch that r≡t (mod 1440).

Additionally, we define the daily-difference δ(t₁, t₂) of two points in time t₁ and t₂ to be the smallest non-negative integer r such that r≡t₂−t₁ (mod 1440).

Imagine that your clock shows the current time of dayt₁. The daily-differenceδ(t₁, t₂) of t₁ and any other date t₂ constitutes the minimum amount of time necessary to reach the time of day of t2. For example, if t1 = 2875 = 2·24−5 (i.e., 5 minutes to midnight, second day) and t₂ = 5 (i.e., 5 minutes past midnight, first day) then

δ(t₁, t₂)≡t₂ −t₁ ≡5−2875≡ −2870≡10 (mod 1440) The time needed to reach the time of day of t2 is therefore 10 minutes.

2.2 Problem Definition

As mentioned above, we want to focus on the most fundamental kind of queries concerning train connections. Additionally, we make another simplification for the problem definition:

Each train is assumed to operate daily. The consequences of these restrictions on the

(8)

2.2 Problem Definition 4 model and the algorithms will be discussed later. The restrictions make it possible to formulate the problem in a relatively easy way.

Of course, a train timetable is based on a set of stations and a set of trains. Each station has a geometric location, which can be specified by two coordinates. A train can be viewed as a sequence of departures and arrivals at all the stations the train passes.

First, the train leaves the start station, then it arrives at the next station, stays there several minutes, departs again, and so on. Finally, it arrives at the destination.

Definition

• A stationis a triple (S, x, y) whereS is a uniquestation ID, and (x, y) represents the geometric location of the station.

• A train tr consists of a list of pairs ((S₁, t₁),(S₂, t₂), . . . ,(S_n, t_n)). A pair (S_i, t_i) represents a certain timeti (in minutes past midnight of the first day) at a station with IDS_i. The pair with index i is denoted tr[i], the elements of that pair

station(tr[i]), and time(tr[i]). The value l(tr) :=n is called thetrainlength of train tr. For a valid train the following statements must hold:

– n is even

– for odd i,S_i 6=S_i+1

– for even i, either i=n orSi =Si+1

A pair (S_i, t_i) of a trainlist represents an arrival or a departure of the train at some station S_i (a departure if i is odd, an arrival otherwise). The time t_i is defined to be in minutes past midnight of the first day. For example, if a train leaves its first station at 8 pm, then t₁ = 20·60 = 1200. If the next arrival at station S₂ = station(tr[2]) is five hours later, thent₂ =t₁+ 5·60 = 1500 or, in other words, the second day of the train at 1 am.

With that definition, a timetable could be described as a set of stations and a set of trains. Then, we want to find connections between two stations. A connection describes a train trip from a station A to a station B. Such a connection may use parts of several trainstr₁, . . . , tr_n. For each of these trains the part that is used is specified by an interval [i_k, j_k], where 1≤ i_k < j_k ≤l(tr). Each of these parts begins with a departure and ends with an arrival (i.e.,i_k is odd andj_k is even).

Each connection has a traveltime: The traveltime is the sum over all time differences time(tr_k[j_k])−time(tr_k[i_k]) plus the time needed for the stays at stations to change trains.

With the described modelling of the trains, where each train has its own “calendar,” it is not possible to define the traveltime to be the difference between the time of the arrival at the destination station and the time of the departure at the start station. Imagine

(9)

2.2 Problem Definition 5 a connection consisting of two trains. The first train is an overnight train that stops at our start station at 6 o’clock in the morning on its second day and leaves 5 minutes later (t_start = 24·60 + 6·60 + 5 = 1805). The first train stops later somewhere at 8 o’clock where we change trains and continue with a second train, which starts at that station (that means for the second train it is the first day) 5 minutes past 8 o’clock. The second train reaches our destination station at 10 o’clock on its first day (t_dest = 10·60 = 600).

The difference t_dest−t_start becomes negative because for the two points in time different days are used as first days (see Section 2.1 on page 3). Therefore, we have to calculate the time differences separately for each train, where for all points in time the determined first day is the same.

Definition

• A connectionbetween two stations A and B is a list of triples ((tr₁, i₁, j₁), . . . ,(tr_n, i_n, j_n)), where

1. tr_k are trains

2. 1≤i_k < j_k≤l(tr_k), andi_k is odd, j_k is even, for allk with 1≤k≤n 3. station(tr1[i1]) = A

4. station(tr_n[j_n]) = B

5. station(tr_k[j_k]) = station(tr_k+1[i_k+1]) for eachk with 1≤k < n The traveltimeof a connection is

n

X

k=1

(time(tr_k[j_k])−time(tr_k[i_k])) +

n−1

X

k=1

δ(time(tr_k[i_k]),time(tr_k+1[j_k+1]))

The starttimeof a connection is δ(time(tr₁[i₁])).

Finally, there are the (simple) queries: Find the fastest connection from a station A to another station B, departing at a specified time t. Naturally, the feasible connections start at A and end at B. The total amount of time for a feasible connection is then the traveltime of the connection plus the daily-difference of the departure time t and the starttime of the connection.

(10)

2.2 Problem Definition 6

Definition

• Aquery is a triple (S, D, t), whereS denotes thesource station,D the destination station, and t is the departure time.

• Given a query (S, D, t), a connection ((tr₁, i₁, j₁), . . . ,(tr_n, i_n, j_n)) is feasibleif station(tr₁[i₁]) =S and station(tr_n[j_n]) =D.¹

• The totaltime of a feasible connection is

traveltime+δ(t,starttime)

Now, all necessary terms for the formal problem definition are introduced. The input for the problem consists of two different parts: On one hand, there is the railroad network and the timetable (i.e., the stations and trains). This data (called thestatic input in the following) is known “off-line” and could be subject to optimizations in a preprocessing step. On the other hand, the queries (called the on-line input in the following) are not known before they are asked, and have to be processed “on-line”, that is at the moment they are asked.

Problem Definition

Input: Static Input: The railroad network together with the timetable.

Formally, this input consists of a set of stations and a set of validtrains.

On-Line Input: A - possibly infinite - sequence of queries.

These are not known in advance and have to be processed “on-line”.

Output: For each query (S, D, t) the totaltime of a fastest connection.

A fastest connection is a feasible connection with minimum totaltime.

Note that the problem definition implies that an optimal solution for each query is requested, where optimal means minimum totaltime.

Other technical requirements concerning algorithms for the problem have been discussed in the introduction. The main requirement has been that the algorithm has to be fast to get a low average response time. In contrast, we assume the space consumption not to be very important.

1 The departure timet does not have any influence whether a connection is feasible or not. But the totaltime of a feasible connection depends on the departure time. The goal is now to find a feasible connection withminimum totaltime, where the departure time is crucial.

(11)

2.3 Model 7

2.3 Model

For the given problem we have now to find a fast algorithm. First of all, we need a model that allows us to formulate the algorithm.

At first glance, we have a network consisting of stations which are connected by trains.

In other words, we have a graph with one node for each station and an edge between two stations A and B, if there is a train connecting station A and station B. For graphs, there are the well-known algorithms to find shortest paths. It seems to be the most natural approach to construct a similar graph, where fastest connections correspond to shortest paths. Therefore, we define a so-called traingraph, on the basis of the static input data.

The traingraph should be a directed graph, because the trains are directed.

2.3.1 Definition of the Traingraph

Let the static input data be the stations st₁, . . . , st_n and the valid trains tr₁, . . . , tr_m. For the definition of the graph, we need additional terminology. An event represents an arrival or a departure of a train at a station. In the formal description, the elements of the trains tr_i (pairs tr_i[j] = (S_j, t_j)) are the events. These pairs could not be taken directly as a formal definition of events, because if two trainstr₁ and tr₂ start at the same station with numberS at the same timet, thentr₁[1] =tr₂[1] = (S, t), but they should be different events. Therefore, we take the train index and the index of the trainlist-element to identify the events uniquely.

Definition

• Each element of the set {(i, j)|1≤i≤t,1≤j ≤l(tri)} is called anevent. An event (i, j) represents an arrival or a departure of a traintr_i.

• The terms stationand timeare defined naturally for an event e= (i, j):

station(e) = station(tr_i[j]) and time(e) = time(tr_i[j]).

• An event e= (i, j) is called arriving if j is even,leaving otherwise.

The traingraph contains one node for every event. For every station with ID S there is a set of nodes, namely all events (i, j) of that station (i.e., station(tr_i[j]) =S).

An edge in the traingraph should represent one “elementary” part of a connection. On one hand, there are direct connections. Between each event representing the departure of a train and the event representing the very next arrival of that train there is an edge in the graph, called a direct-edge. On the other hand, a stay at a station is also a part of a connection. Imagine, all events of a station are sorted by their time, and if the times of two events are equal, the event representing an arrival precedes the event representing a

(12)

2.3 Model 8 departure. Then between two successive events there is an edge, as well as between the last event and the first event of that station. The latter represents a stay at a station over midnight. All edges representing stays are called stay-edges. A stay at a station between two nodes u and v is the (uniquely determined) part of the cycle that begins at u and ends atv: the staypath fromu tov.

For our particular problem, we have to find thefastestconnection between two stations.

Therefore, the length of a direct-edge is the difference between the times of the events represented by the endnodes of the edge (the real difference). If the edge is a stay-edge, the length has to be the minimum amount of time necessary to reach the time of day of the head event: the daily-difference between the times of the two endnodes (see Section 2.1 on page 3). It follows that the length of the staypath from u to v is equal to the daily- differenceδ(time(u),time(v))

Let us summarize the definition of the traingraph:

Definition of the Traingraph

• Nodes: One node for every event. We identify a node v with the corresponding event e_v = (i, j).

• Edges:

– Direct-Edges: There is an edge (v, w) if v = (i, j) is a leaving event, and w= (i, j+ 1).

– Stay-Edges: Let e₁, . . . , e_l be all events belonging to one station. These events are sorted such that

∗ δ(time(e_i))< δ(time(e_j)⇒i < j, and

∗ δ(time(e_i)) =δ(time(e_j)) ande_i arriving and e_j leaving ⇒i < j For each k with 1≤k < lthere is an edge (e_k, e_k+1), as well as there is an edge (e_l, e₁).

• Edgelength: The length of an edge (e₁, e₂) is l(e₁, e₂) =

δ(time(e₁),time(e₂)) if the edge is a stay-edge time(e₂)−time(e₁) if the edge is a direct-edge

Figure 2.1 shows an example of a very simple traingraph. The input data consists of three stations A, B and C, and three trains. For each column of the train lists - the events - the traingraph has one node. Each station is represented in the traingraph by a cycle that contains all events belonging to that station. These are the so-called stay- edges. Then, for each successive pair of leaving and arriving events of a train there is a direct-edge in the traingraph, connecting different stations.

(13)

2.3 Model 9

A

Train I leaving 6 arriving B1 7 leaving B1 7:05 arriving C1 8

Train II leaving B1 10

A1 arriving 11

Train III 7 arriving leaving A1

C1 9

A

B

8

C

C B

A1

7

10

7 6

11

9 705

Figure 2.1: Example of a simple traingraph. The input data consists of three stations and three trains.

2.3.2 Shortest Paths

We want to solve the queries and have to find fastest connections. How could such a query be solved using the traingraph? Obviously, paths in the traingraph correspond to connections:

• Given a connection, the corresponding path is uniquely determined: The connection naturally defines a sequence of direct-edges, each pair of successive leaving and arriving events inside the index range of every used train defines one direct-edge.

Then, stay-edges have to be inserted: For two successive direct-edges, the endnode uof the first edge and the startnodev of the second edge belong to the same station.

If the staypath fromu tov is inserted in the sequence of direct-edges between each successive pair of direct-edges we get a path from the start station to the destination station.

The traveltime of the connection and the length of the corresponding path are equal.

• On the other hand, a path in the traingraph defines a connection from the station of the startnode (startevent) to the station of the endnode. The connection is the sequence of all direct-edges belonging to the path, in the order they appear: Let the startnodes of the direct-edges belonging to the path be (v1 = (i1, j1), . . . , vn = (i_n, j_n)). Then, the connection is

c:= ((tr_i₁, j₁, j₁+ 1), . . . ,(tr_i_n, j_n, j_n+ 1))

The starttime is equal to the time of day of the startnode of the first direct-edge.

If we want to compare the length of the path and the traveltime of the connection, there is a problem: At the beginning and at the end of the path there may be a sequence of stay-edges. These are not used in the above definition of the connection.

(14)

2.3 Model 10 So, the traveltime of the connection and the length of the path differ only in the length of the leading staypath.

Let the query be (S, D, t): Find the fastest connection from stationS to stationDleaving S at time t (i.e., t minutes after midnight). Then, we compute a shortest path in the traingraph. Again, assume that all events e₁, . . . , e_l belonging to the station S are sorted in the way described in the definition of the traingraph on page 8. Let k be the index of the maximum of all events with a time value smaller thant (if there is no such event, k = 0). If k < l, then e_k+1 is the start node v, otherwise e₁. In other words, v is the first event occurring after the time t. The destination node w is the event belonging to the destination station D with the smallest distance (i.e., length of a shortest path) to v. In particular we can assume that the last edge of a shortest path from v to w is a direct-edge.

Definition

• A shortest path pfrom a startnode s to a station D is a path from s to an arbitrary node belonging to D with

length(p) = min{length(p⁰)| p⁰ is a path from s to any node of D}

From a shortest pathpfromv towa connectionccould be defined as described above, with traveltime t_t and starttime t_s. Since this connection ignores the leading staypath the length of the path is equal to the traveltime of the connection plus the length of the leading staypath:

l(p) =t_t+δ(time(v), t_s)

Obviously, the connection is feasible. The claim is now that this connection is of minimum totaltime.

Proof: Assume that there is a connection cwith less totaltime, with traveltime t_t, and starttime t_s. That means

t_t+δ(t, t_s)< t_t+δ(t, t_s)

Since there is no node at the source station S with a time between t and time(v) for any node u belonging to S it holds δ(t,time(u)) = δ(t,time(v)) +δ(time(v),time(u)). With that, we have

t_t+δ(time(v), t_s)< t_t+δ(time(v), t_s)

We switch now back into the traingraph: The connection c determines a path p. The length of that path is equal to t_t and the startnode has the time t_s. This path can be

(15)

2.4 Input Data 11 extended to a path fromv towby prepending the staypath from v to the startnode ofp.

The extended pathp⁰ has the length

l(p⁰) =t_t+δ(time(v), t_s)< l(p)

This constitutes a contradiction to the pre-condition thatp is a shortest path.

We have proved that the connection that we get from a shortest path in the traingraph is a fastest connection. The problem is reduced to finding a shortest path starting at a given node v. The destination node is not determined in advance, but there is a set of destination nodes: the nodes belonging to the destination station. To summarize that result, we give an equivalent problem definition:

Problem Definition, Formulation II

Input: Static Input: a traingraph

On-Line Input: a - possibly infinite - sequence of queries.

These are not known in advance and have to be processed “on-line”.

Output: For each query (S, D, t) the length of a shortest path from the node belonging to S occurring immediately after t toD.

2.4 Input Data

The characteristics of the data plays an important role for this study. The goal is to use the special structure and additional information (e.g., coordinates of stations) ofreal-world datato get fast algorithms.

Static data The basis for the experimental study and for the statistics is the timetable data (winter period 1996/97) of all german trains.² The data contains 45,848 trains operating on 6,961 stations. The traingraph for that timetable has 933,066 nodes. Con- sequently, there are 933,066 ·3/2 = 1,339,599 edges in the traingraph. Since every node represents an event (i.e., an arrival or a departure), there had been on average 134 events per station and day during the winter period 1996/97. Figure 2.2 shows a detailed frequency distribution.

2With special courtesy of the TLC Transport-, Informatik- und Logistik-Consulting GmbH/EVA- Fahrplanzentrum, a subsidiary of the Deutsche Bahn AG.

(16)

2.5 Related Work 12

0 50 100 150 200 250 300 350 400 450

0 500 1000 1500 2000

number of stations

number of events

Figure 2.2: The frequency distribution histogram of the stations according to the total number of events at that station (granularity: 5 events).

On-line data Without realistic on-line queries the algorithms could not be optimized for a real-world scenario and no realistic statistics could be done. The processed queries are a “snapshot” of the central Hafas³ server of the Deutsche Bahn AG, the national railroad and train company of Germany. All queries of customers of all ticket offices in Germany have been recorded over several hours. The snapshot consists of more than half a million queries, which might be representative for that special service. The Euclidean distance between the two stations is on average 220 km, whereas a fastest connection takes 276 minutes on average. Figure 2.3 shows the distribution of the queries according to the Euclidean distance between the source and destination station. It is not surprising that for most of the queries, the source and destination station are relatively close.

2.5 Related Work

The shortest path problem is one of the most elementary graph problems. There are several different variants, and there are simple and efficient solvers available. Many text- books devote whole chapters to shortest path problems. We will discuss which variant of the problem we have here and which algorithms can be used as a basis to solve our problem efficiently. After that, we want to have a look on work concerning the general problem itself, that means work on information systems for public transportation or similar systems.

3Hafas is a trademark of the Hacon Ingenieurgesellschaft mbH, Hannover, Germany.

(17)

2.5 Related Work 13

0 5000 10000 15000 20000 25000 30000 35000

0 100 200 300 400 500 600 700 800

Figure 2.3: The frequency distribution histogram of the snapshot of customer queries according to the Euclidean distance between the start station and the destination (granularity: 15km)

0 5000 10000 15000 20000 25000 30000

0 100 200 300 400 500 600 700 800

Figure 2.4: Like Figure 2.3 except that the abscissa now denotes the totaltime of a fastest connection in minutes (granularity: 20 minutes)

(18)

2.5 Related Work 14

2.5.1 Theoretical Work

Problem Classification In the literature [1, 5, 15] the following different shortest-path problems are investigated:

1. Single-pair: Find the shortest path from one node to another node 2. Single-source: Find shortest paths from one node to all other nodes 3. All-pairs: Find shortest paths from every node to every other node

Obviously, from 1 to 3 the problem gets more general. The single-pair problem gets solved along with the single-source problem, and every single-source problem gets solved along with the all-pairs problem.

Algorithm Classification The algorithmic approaches for solving shortest path problems could be classified according to [1] into two groups: label-settingandlabel-correcting.

Both approaches are iterative. They assign tentative distance labels to nodes at each step - the distance labels are upper bounds on the shortest path distances. Label-setting algorithms designate one label as permanent (optimal) at each iteration. In contrast, label-correcting algorithms consider all labels as temporary until the final step, when they all become permanent. Label-setting algorithms often require special graphs, for example acyclic graphs with arbitrary edge lengths or graphs with nonnegative edge lengths.

The label-correcting algorithms are more general and apply to all cases of problems, but the label-setting algorithms are much more efficient.

Suitable Algorithms Since for our problem the source node is given, we have a single- source problem, and because of the set of destination nodes, it is not a single-pair problem.

But, with a small modification of the traingraph, we get a single-pair problem: An additional node t is introduced, and for each node belonging to the destination station, an edge with zero edge length is added from that node to t. This node t is then the single destination node for the single-pair problem. Therefore, concerning the problem class, all known shortest-path algorithms are suitable.

Because the traingraph has nonnegative edge lengths a label-setting algorithm could be used. One reason to do that is the better asymptotic behaviour of label-setting approaches (see the paragraph concerning the algorithm classification). Another reason is that label- setting algorithms could be terminated when the destination node is designated to be permanent; for label-correcting algorithms this could not be done. It turns out that this technique reduces the number of iterations drastically when real-world queries are applied to the algorithm.

The fastest label-setting algorithm solves the problem with an asymptotic worst-case running time linear in the number of edges (see [1], Section 4.4). It relies on a topological ordering of the nodes; this requires an acyclic graph. Unfortunately, the traingraph contains cycles, so this algorithm is not suitable for our problem. It remains Dijkstra’s

(19)

2.5 Related Work 15 algorithm: A label-setting algorithm for graphs with nonnegative edge lengths, which was suggested first by Dijkstra [9] and, independently, by Dantzig [7] and Hillier [22].

Dijkstra’s algorithm solves the single-source shortest path problem. There are several variants of the algorithm, including modifications to solve the single-pair problem more efficiently in practice (in his original paper [9], Dijkstra suggested for single-pair problems to terminate the algorithm when the destination node is designated to be permanent).

Dijkstra’s Algorithm The set of all nodes is divided in three parts: the set ofperma- nent, touched and untouched nodes. Every node is assigned a distance label, initialized with an infinite value. In the beginning, the source node gets the distance label 0 and is designated to be touched, all other nodes are untouched. Then, the iteration begins.

In each iteration, a touched node with smallest distance label is taken. It is designated to be permanent. Let the distance label of that node be d. Further, all outgoing edges of the node are considered. If the distance label of the other endnode of an considered edge is greater than the edge length plus d, that distance label is updated by the latter value, and the node is designated to betouched. If there are no more touched nodes, the algorithm terminates.

The distance labels of all permanent nodes contain the length of a shortest path from the source node. To reconstruct the shortest paths that have been found for thepermanent nodes, each node has to be assigned an edge: the last edge of the shortest path. To get these edges, the updating step of the algorithm has to be extended: If the distance label of an endnode of a considered edge has to be updated properly, that edge becomes the last edge for that endnode.

As usual, n denotes the number of nodes and m the number of edges. The worst- case complexity of the algorithm as it is formulated above is O(n²). This is the optimal running time for fully dense graphs, because then m ∈ Θ(n²) and, of course, every edge has to be examined.

Selection of the smallest touched node The algorithm does not determine how a smallest node is selected from all touched nodes. Usually, the touched nodes are stored in a priority queue. For sparse graphs, the worst-case bound of the algorithm can be improved using an efficient priority queue. If ad-heap is used with d= max{2, m/n} the algorithm runs in O(mlog_dn) time [1]. The best known strongly-polynomial time bound isO(m+nlogn) and is achieved using a Fibonacci heap [11].

Another approach is the Dial variant (see [10], [1] Section 4.6). The running time depends on the largest edge length C of the graph and therefore is pseudo-polynomial (O(m+nC) worst-case complexity, see Section 3.2.2 for details). There are several extensions of Dial’s variant with better worst-case running time [2, 8]. Computational results [13] suggest that it might be the fastest label-setting algorithm for suitable applications.

(20)

2.5 Related Work 16 Speed-up techniques for single-pair problems The following techniques allow us to solve single-pair problems faster in practice, but not in the worst-case.

The bidirectional Dijkstra’s algorithm (see [1] Section 4.5, [15] Section 3.8.5.2) simultaneously applies the “normal” or forward variant of the algorithm starting at the source node and a so-called reverse variant of Dijkstra’s algorithm starting at the destination node. The algorithm can be terminated when one node on a shortest path from the source to the destination has been designated to be permanent by both the forward and the reverse algorithm.

Thegoal-directed unidirectional search(see [15] Section 3.8.5.1, [12]) uses lower bounds on distances to the destination node. The idea is to modify the edge lengths in order to direct the search (by applying Dijkstra’s algorithm) towards the destination. Then, the algorithm could be terminated earlier. Let λ(v) be a lower bound on the distance to the destination for a nodev. The edge lengths are modified as follows:

length(v, w) := length(v, w)−λ(v) +λ(w)

To avoid negative edge lengths, the bound has to fulfill a consistency condition:

length(v, w) +λ(w)≥λ(v) for every edge (v, w)

For every path (s, v₁, v₂, . . . , d) from the source s to the destination d, the length of that path is modified only by the constant value ofλ(s):

length(p) = length(p)−λ(s) +λ(v₁)−λ(v₁) +λ(v₂)−λ(v₂) +. . .+λ(d)

| {z }

0

= length(p)−λ(s)

Thus, a shortest path in the modified graph is a shortest path in the original graph, and vice versa.

For example, assume that the graph is embedded in a metric space, and the edge length is the distance between the two endnodes. Because of the triangular inequality, the distance between a node and the destination is a valid lower bound on the length of a shortest path. An edge that points directly towards the destination has a modified edge length of zero, while the modified length of an edge that points in the other direction is twice the distance. For many Euclidean graph models it is shown that the average running time is linear in the number of nodes [18].

2.5.2 Application-Oriented Work

Public railroad transport Most application-oriented work in this field is commercial, not scientific. Concerning wide area networks, there are for example publications about traffic planning [4], but algorithms for optimal connections are published very rarely.

Algorithms based on a “hierarchical” goal-directed variant of Dijkstra’s algorithm are presented in [16] (see also Section 3.6.1, page 32).

(21)

2.5 Related Work 17 Local public transport is quite different from wide-area public transport, because the timetables are very regular, and the most powerful speed-up techniques are based on the strict periodicity of the busses, trains, etc. In contrast, the experience is that the timetables of the national European train companies are not regular enough to gain a significant profit from these techniques.

Private transport In view of wide-area networks, private transport has been exten- sively investigated. This means mainly “routing planners” for cars on city and country maps. The problem is different to ours in that it is two-dimensional. For a travel by car, the departure time is not important, in case temporal aspects like “rush hours” are ignored. In contrast, the departure time is significant in our scenario due to the fixed departure and arrival times for trains and the lack of periodicity. So it is not surprising that the research has focused on purely geometric techniques.

One example showing an approach for car navigation is [14]: A graph representing a topographical map is hierarchically partitioned by regional boundaries into components.

From one hierarchical level to the higher one, only the boundary nodes of the components are considered. In Chapter 3 we investigate similar ideas.

(22)

Chapter 3 Algorithms

Overview This chapter describes algorithms solving the problem defined in Chapter 2.

The algorithms find a shortest path among all paths in the traingraph from a certain startnode to a set of destination nodes. In the last chapter we have seen that Dijkstra’s algorithm is a good basis. First, this algorithm is introduced in detail. The data structure needed by the algorithm is a priority queue. Two different implementations of priority queues are discussed. Then, in addition to the fundamental extensions of the algorithm, four speed-up techniques are presented: They all reduce the size of thesearch space, that is the number of nodes which have to be touched searching the shortest path.

All of these variants of Dijkstra’s algorithm have been implemented in C++. A by- product of that implementation is the visualization of the individual algorithms.

Visualization The traingraph is naturally embedded in the plane, as every node belongs to a station and every station is embedded in the plane by it’s two coordinates. All edges are straight lines. Note that normally more than one edge of the traingraph will map to one visible edge in the embedding. The structure of a traingraph suggests that there are crossing edges in that embedding of the traingraph. For example, a long direct-edge (i.e., a direct connection between two stations with a large traveltime) will normally connect two stations that are far away with respect to the Euclidean distance. With high probability, such an edge will cross smaller edges.

For the visualization of the algorithms, the whole railroad network is indicated by drawing all “relatively small” edges of the traingraph in the background. The timetable described in Section 2.4 (page 11) is used as static data. Then, one algorithm, that is a certain combination of strategies, is applied to a sample query: Find the fastest connection from Berlin Main East Station to Frankfurt/Main Main Station, with the departure time 10 o’clock. All edges that are touched by the algorithm are drawn highlighted in the foreground of the picture. See Figure 3.1 on page 20 for a first example of the visualization of algorithms.

(23)

3.1 Dijkstra’s Algorithm 19

3.1 Dijkstra’s Algorithm

3.1.1 The Algorithm

The original algorithm for shortest paths proposed by Dijkstra [9] has been described in Section 2.5.1. We have seen that a priority queue Q could be used to store the touched nodes. A priority queue is an abstract data type for maintaining a set of elements, if each element has an associated key. Here, the keys are integer values. The priority queue supports the following operations:

• insert(Q, x) inserts the element x, or relocates it if alreadyx∈Q

• minimum(Q) returns the element with the minimum key

• extract-minimum(Q) removes and returns the element with the minimum key

• empty(Q) returns true iff there is no element in the queue

The following section shows two different implementations of a priority queue. With that data type we are able to reformulate Dijkstra’s algorithm (the single-pair variant) in a short form. The input is a directed graph (V, E) with nonnegative edge lengthsl(e). The source node is s and the destination node is t. Each node v ∈ V is assigned a distance labeld(v), which serves as a key for the priority queue.

Dijkstra’s algorithm

1. for allv ∈V: d(v)← ∞ 2. d(s)←0 and insert(Q, s) 3. while not empty(Q) do

4. v ←extract-minimum(Q)

5. if v =t then terminatethe algorithm 6. for all outgoing edges e= (v, w) of v do 7. if d(w)> d(v) +l(e) do

8. d(w)←d(v) +l(e)

9. insert(Q, w)

We want to use this algorithm to solve queries. For each query we are given the source node s at the source station S, and the destination stationD. In Section 2.5.1, the paragraph about suitable algorithms (page 14), it was shown how to modify the traingraph in order to apply a single-pair algorithm: A single destination node is needed instead of a destination station. We have to add a new node, which becomes the destination node. It belongs to the destination station (which means basically that it inherits the coordinates). All nodes of the destination station are connected by additional edges of length 0 with the

(24)

3.1 Dijkstra’s Algorithm 20

Figure 3.1: The single-pair variant of Dijkstra’s algorithm with restriction of search horizon.

new destination node. What we want to have is a shortest path from s to the node at the destination station which allows the smallest shortest path from s to the destination station. From a shortest path from s to the additional node we can obtaion the desired shortest path by omitting the last edge: Since the length of the path is not changed by omitting or adding one of the additional edges, a path from s to the destination station with smaller length would lead immediately (by adding the edge with zero length) to a path from s to the additional node with smaller length than the path with which we started, and that was a shortest path.

Hence, the above algorithm could be used as it is to solve the query.

3.1.2 Fundamental Extensions

Restriction of search horizon Note that in Step 5 the algorithm is terminated when the destination node is designated to be permanent. This is one of the advantages of the label-setting algorithms (see Section 2.5.1, page 14). In other words, the search horizon of the algorithm is restricted. The algorithm traverses the nodes of the graph starting at s in the order of increasing shortest-path length. Informally speaking, a wavefront (the nodes in the priority queue) starts at the source node and spreads throughout the graph.

When it has passed the destination node, the algorithm terminates.

Figure 3.1 shows the visualization of Dijkstra’s algorithm with restricted search horizon applied to the sample query.

(25)

3.2 Priority Queue 21 Avoiding Initialization of Nodes The analysis of the snapshot consisting of realistic on-line queries has shown a small totaltime for a fastest connection between the source station and the destination station on average (see Figure 2.4 on page 13). This fact strongly suggests that with the restriction of the search horizon for most of the queries only a small part of the graph has to be touched by the algorithm in the Steps 2 - 6.

The only obstacle to sublinearity is Step 1: the initialization of the distance labels. If the algorithm is applied only once, Step 1 is inevitable. But we want to apply the algorithm again and again with the same traingraph. In that case, an amortization technique as it is discussed in [20] can be used to avoid the initialization step:

Every node is given an additional time stamp, which stores the number of the query when it was designated to betouched or permanent the last time. Before the first query, these time stamps have to be initialized with a zero value, but this initialization is done once and for all.¹ If a node is reached by the algorithm, the following two cases have to be distinguished:

• If the time stamp of the node is smaller than the current query number, the distance label is considered to be infinite. The time stamp is updated.

• If the time stamp and the current query number are equal, the distance label is taken as is.

With this technique, the expensive initialization step needs not to be done again and again for each query. Nodes that are outside the search horizon are not touched by the algorithm at all.

3.2 Priority Queue

We want to discuss now the possibilities how to implement a priority queue managing the set of touched nodes. Regarding the worst-case running-time of Dijkstra’s algorithm, the priority queue is the heart of the matter. Despite that, the statistics will reveal that the choice of the priority queue is not that important for the actual performance of the algorithm.

There are several approaches to implement a priority queue (see Section 2.5.1, page 15).

Most of them are based on some kind of heap, for example a binary heap, a d-heap or a Fibonacci heap. In the traingraph, each node has three adjacent edges. That means that the number of edges m is linear in the number of nodesn. Consequently, the worst- case running time for Dijkstra’s algorithm with each of the heap implementations (we mentioned the complexities in Section 2.5.1) becomesO(nlogn). That is the best known worst-case complexity for Dijkstra’s algorithm on a graph like the traingraph. We choose a binary heap as representative for heap implementations.

1In practice, the time stamp is an integer variable which has a maximum value. If more queries than that maximum value have to be solved, the initialization has to be done more than once.

(26)

3.2 Priority Queue 22 In practice, Dial’s implementation of Dijkstra’s algorithm often is the fastest one.

Therefore, Dial’s variant is investigated, too. We get Dial’s implementation by using a special priority queue with the above definition of Dijkstra’s algorithm. That queue will be called Dial’s queue. There are extensions of Dial’s implementation, for example the radix heap implementation [2], but our experience has shown that Dial’s implementation works very well with the traingraph since the maximum edge length of the traingraph is relatively small compared with the number of nodes.

The following two sections will present the two priority queues.

3.2.1 Binary Heap

In a binary heap the elements are stored as nodes in a rooted tree whose directed edges represent a predecessor-successor relationship: The key of the tail element has a smaller or equal key than the head element. The head is called successor of the tail, and the tail is called predecessor of the head. Every node except the root has exactly one predecessor.

The heap is called binary because every node has at most two successors.

4 5

7 5

3

9 root

Clearly, theempty operation and theminimum operation require constant time, since the root of the tree is an element with minimum key value. The other operations insert (including relocate) and extract-minimum can be implemented such that they require O(logn) time, if n denotes the number of elements in the heap. See Appendix A of [1]

for details.

3.2.2 Dial’s queue

Restricted priority queue The priority queue used in Dial’s implementation is not as general as the binary heap. There are two constraints concerning the keys of the elements:

1. The keys have to be integers

(27)

3.2 Priority Queue 23 2. Let thewidthof the queue be the difference between the maximum and the minimum of the keys that are in the queue at a certain time. That width must not exceed a fixed thresholdC at any time.

Dijkstra’s algorithm complies with these constraints: (1) The keys are integers because they are distance labels and therefore minutes. (2) If the threshold C is set to the maximum edge length in the graph, the second constraint is also kept: Obviously it’s sufficient to show that after every insertion of a node in the priority queue the constraint is not violated by that newly inserted node. That means that the difference between that new key and the minimum of all existing keys must not exceed the thresholdC. Insertions in the queue are made only by Step 9 of the algorithm. Assume that a node u is going to be inserted. Before the insertion step, in Step 4 a node v has been removed from the queue. After that removal the minimum key in the queue is greater or equal than the distance label ofv. Note thatu andv are adjacent and that the distance label ofuis the distance label of v plus the length of the edge from v tou. If then the node uis inserted, it does not violate the second constraint because

key(u)−minimum(Q)≤key(u)−key(v) = length(u, v)≤C

Implementation In Dial’s queue the elements are stored inC+1 bucketsb₀, . . . , b_C. An element with keyk is stored in the bucket with index (k modulo C+ 1). From constraint (2) it follows that this bucket only contains elements with key k. Additionally, we have a pointerm holding the index of the bucket that contains the element with the currently minimum key, and a counter that holds the current size of the queue.

17 m

10 10

11

14 14 b0

b1

b4

b5 b2

b3

b6 b7

Each bucket can be realized as a doubly linked list. The operations of the priority queue are very simple:

• The queue is empty if the size-counter is zero.

• An element with key k isinserted by appending it to the bucket with index

(k modulo C + 1). If the key is smaller than the key of an element in b_m, m is assigned the value (k modulo C+ 1). The size-counter has to be increased.

(28)

3.3 Goal-Directed Search 24

• An element is relocated by first removing it from its bucket and then re-inserting it in the queue.

• The element with minimum key is one of the elements of the bucket b_m.

• The element with minimum key isextracted by removing an element from the bucket b_m. If b_m is empty, m has to be increased modulo C + 1 until a bucket is reached that contains elements. The size-counter has to be decreased.

The first four operations take constant time, only the operation extract may take time O(C). Thus, the run time for Dijkstra’s algorithm using Dial’s queue becomesO(m+nC), if n denotes the number of nodes and m the number of edges. In the special case of m∈ O(n) the run time becomes even O(nC).

3.3 Goal-Directed Search

In the last chapter the general outline of this speed-up technique is described (see page 16).

The “classical” applications are graphs which are embedded in an Euclidean space and the edge lengths are the Euclidean distances. The traingraph is embedded in the two- dimensional Euclidean space, but the edge lengths that we use (i.e., the traveltime between two events) is not the Euclidean distance. We can now use the Euclidean distance between two nodesu₁andu₂ in the traingraph, denoted byd_e(u₁, u₂), to define a valid lower bound for the goal-directed variant of Dijkstra’s algorithm, but only if the edge lengths comply with the following condition:

For each edge e= (u₁, u₂) in the traingraph:

length(e) = 0⇔de(u1, u2) = 0

With our edge lengths a violation of the condition would mean that the distance between two stations with different geometric locations can be covered without a loss of time.

That would imply an infinite speed, so the condition seems to be fulfilled for all edges.

The problem is that the smallest time unit is one minute, so it may be necessary to give a connection between very near stations, which takes less than half a minute, a traveltime of zero. However, in the input data that we used for computation all edges fulfill the condition above, so we can go ahead with the definition of the lower bound: For every node in the traingraph such a lower bound on the length of a path to the destination is needed. That bound has to fulfill a consistency condition (see 2.5.1).

For the definition of the lower bound we need to know the maximum (average) “speed”

over all edges:

vmax = max

de(u1, u2) length(e)

e= (u1, u2) edge of the traingraph with length(e)6= 0

(29)

3.4 Bidirected Search 25 With that we can define the lower bound λ(u) for a node u of the traingraph (t is the destination node):

λ(u) := d_e(u, t) v_max

It remains to proove that every edgee= (u₁, u₂) complies with the consistency condition length(e) +λ(u₂)≥λ(u₁).

Proof:

• If length(e) = 0 then

d_e(u₁, u₂) = 0⇒d_e(u₁, t) =d_e(u₂, t)⇒λ(u₁) = λ(u₂).

• Otherwise with length(e)≥d_e(u₁, u₂)/v_max and the triangular inequation d_e(u₁, t)≤d_e(u₁, u₂) +d_e(u₂, t) follows

length(e) +λ(u₂) = length(e) + d_e(u₂, t)

vmax ≥ d_e(u₁, u₂) vmax

+d_e(u₂, t) cmax

= d_e(u₁, u₂) +d_e(u₂, t)

v_max ≥ d_e(u₁, t) v_max

= λ(u₁)

Now the edge length of every edge e = (u₁, u₂) can be modified as described in Sec- tion 2.5.1: length(e) = length(e)−λ(u₂) +λ(u₁). The lower bounds and new edge lengths do not have to be computed and stored in advance, they can be computed “on demand.”

If the maximum speed v_max is known, the modified edge length can efficiently be computed from the original edge length, the coordinates of the endnodes andv_max using the above rule. Then, any other variant of Dijkstra’s algorithm can be used to find a shortest path in the modified traingraph. We have seen already that shortest paths in the modified graph are also shortest paths in the original graph, so there is nothing more to show.

Figure 3.2 visualizes Dijkstra’s algorithm with restriction of the search horizon applied to the traingraph with modified edge lengths.

3.4 Bidirected Search

The bidirectional variant of Dijkstra’s algorithm has also been outlined in Section 2.5.1 (page 16). Two instances of Dijkstra’s algorithm are run simultaneously, one “forward”

instance starting at the source node, and one “backward” instance starting at the destination node. Thereby the “backward” instance inverts all edges, that means in Step 6 of Dijkstra’s algorithm the incoming edges are considered instead of the outgoing edges.

(30)

3.4 Bidirected Search 26

Figure 3.2: Dijkstra’s algorithm applied to the traingraph with goal-directed edge lengths.

In contrast to the other speed-up techniques that we describe in this chapter this one is not directly applicable to our problem: We have a set of destination nodes, and not one single destination node. We managed to define an additional node which is used as single destination node, but it is connected with edges of zero edge length with every node of the destination station. If the “backward” search of the bidirectional algorithm would start at the additional destination node, after the first step all nodes belonging to the destination station would be in the priority queue. That would mean for the backward search to start at every arrival of the destination station simultaneously, while the forward search only starts at the very next departure after the departure time. With high probability such a naive bidirectional variant would rather slow-down the algorithm than speed it up.

However, it is imaginable to estimate the arrival at the destination station somehow and to start the backward search not at all nodes of the destination station, but at some few nodes. Anyway there is one thing that we know for sure: The biderected search will be optimal if we start the backward search at a destination node which is the endnode of a shortest path from the start node to the destination station. We can determine this destination node with some other variant of Dijkstra’s Algorithm, and then have a look how good the bidirectional variant of Dijkstra’s algorithm is in the best case. Note that the following does not lead to a real speed-up technique using the bidirected search, but it is possible to estimate the efficiency of the bidirected search for our problem.

(31)

3.4 Bidirected Search 27

Figure 3.3: The idealized bidirected variant of Dijkstra’s algorithm.

Idealized Bidirected Search We assume that we have a startnode s, a destination node t at the destination station D, and that there is a shortest path from s to D containing the node t as the last node.

As described above we run one forward instance of Dijkstra’s algorithm starting at s and one backward instance starting att with inverted edges simultaneously. The order in which the steps of the two main loops are executed is not yet determined. There are two obvious approaches: (1) The instance with the smaller head element of the priority queue is executed, or (2) both instances are executed alternately, first the forward instance, then the backward instance, again the forward instance, and so on. We call the first approach min-bidirected search and the latter approach alternate-bidirected search.

Dijkstra’s algorithm has to be extended slightly in order to terminate the algorithm as soon as possible: With the indicesf andb added to the distance labelsdand the priority queue Q we distinguish between the forward and the backward instance. If a node v has been touched by both instances then we know a path from s tov with length d_f(v), and a path from v to t with length d_b(v). The concatenation of these paths is a path from s to t with length df(v) +db(v). We say that v induces the latter path. What we do now is to remember the node which induces the path with smallest length that has occurred during the execution of the two instances, and the length of this path. We denote this node withvλ and the length of the shortest path withλ. At the beginning we set λ:=∞. Every time when the distance label of a nodev is set in Step 8 of one of the instances the following is done:

(32)

3.5 Angle Restriction 28 If d(v) of the other instance is smaller than infinity (i.e., v has been touched already by the other instance) andd_f(v) +d_b(v)< λwe setλ:=d_f(v) +d_b(v) and v_λ :=v.

We claim that at the time when there is one nodev_p that has been labeled permanent by both instances the path induced byv_λ is a shortest path fromstot. In general the nodes vp and vλ are different nodes.

Proof: Assume that there is a path p = (s =v₁, v₂, . . . , v_n = t) with length l(p) < λ.

Dijkstra’s algorithm expands along such a shortest path. That means there are indexesi andj such that for the forward instancev1, . . . , vi−1 are permanent andvi is touched (i.e., in the priority queue), and for the backward instance v_j+1, . . . , v_n are permanent and v_j is touched. From the definition of λ follows that i < j, otherwise λ would have been set to the length ofp. For the forward instance of Dijkstra’s algorithm,vi is labeled touched, and for the backward instance v_i has not yet been reached. Since Dijkstra’s algorithm labels the nodes permanent with increasing shortest path length, a shortest pathp₁ from stovi has a length l(p1)≥minimum(Qf) and a shortest pathp2 fromvi tot has a length l(p₂)≥minimum(Q_b).

The node v_p is labeled permanent by both instances. Again with the ordering of the nodes that are labeled permanent in Dijkstra’s algorithm, a shortest path from s to vp has the length df(vp)≤ minimum(Qf) and a shortest path from vp to t has length d_b(v_p)≤minimum(Q_b). Because the nodev_p has been touched by both instancesd_f(v_p) + d_b(v_p)≥λ, and we get the following contradiction to the assumptionl(p)< λ:

l(p) = l(p₁) +l(p₂)

≥ minimum(Qf) + minimum(Qb)≥df(vp) +db(vp)

≥ λ

The idealized bidirectional Dijkstra’s algorithm was also applied to the sample query, see Figure 3.3.

3.5 Angle Restriction

This technique relies on the coordinates associated with the individual stations. The idea is to direct the search towards the destination, similar to the goal-directed search presented in Section 3.3. In contrast to the latter technique, the data has to be prepared.

(33)

3.5 Angle Restriction 29

v

w e

Figure 3.4: The spanning tree of shortest paths of a single-source Dijkstra starting atv: The subtree with root w (all nodes with e as the first edge of the shortest path) is inside the grey circle sector.

Idea In the traingraph, as nodes represent departures and arrivals, there are two different kind of edges: stay-edges and direct-edges. The endnodes of the stay-edges belong to the same station and therefore have no “direction” in the plane. In contrast, a direct- edge e= (v, w) is a vector in the two-dimensional plane. Since v has two outgoing edges (besides e the stay-edge pointing to the very next event at the same station), Dijkstra’s single-source variant starting at v will divide the set N of all nodes into two parts:

• the nodes where the shortest path fromv starts with the edgee; the set of all these nodes is denotedN_e

• the other nodes

A circle sector with centre at the leaving node of a direct-edge e can be determined that contains all nodes N_e. Detours in the geographical sense obviously imply a loss of traveltime, so there are reasons for the assumption that the destinations of shortest paths which use the edge e are roughly in the direction of e, and that the circle sector is relatively small. If for every direct-edge such a circle sector is known, the single-pair variant of Dijkstra’s algorithm could be improved: A direct-edge can be ignored (i.e., the edge is not considered in Step 6) if the destination station does not belong to the circle sector of that edge.

Preprocessing In a preprocessing step, the single-source variant of Dijkstra’s algorithm is applied to each leaving node to compute shortest paths from this node to all other stations. The results are not stored (this is out of question - the space requirement is too

Effiziente Algorithmen für ein Fahrplanauskunftsystem

Universit¨at Konstanz

Effiziente Algorithmen f¨ ur ein Fahrplanauskunftsystem

Frank Schulz

Effiziente Algorithmen f¨ ur ein

Fahrplanauskunftsystem

Diplomarbeit von Frank Schulz

Universit¨ at Konstanz

Fakult¨ at f¨ ur Mathematik und Informatik

Dezember 1999

Contents

Chapter 1 Introduction

Chapter 2 Foundations

2.1 Points in Time

2.2 Problem Definition

Problem Definition

2.3 Model

2.3.1 Definition of the Traingraph

2.3.2 Shortest Paths

Problem Definition, Formulation II

2.4 Input Data

2.5 Related Work

2.5.1 Theoretical Work

2.5.2 Application-Oriented Work

Chapter 3 Algorithms

3.1 Dijkstra’s Algorithm

3.1.1 The Algorithm

Dijkstra’s algorithm

3.1.2 Fundamental Extensions

3.2 Priority Queue

3.2.1 Binary Heap

3.2.2 Dial’s queue

3.3 Goal-Directed Search

3.4 Bidirected Search

3.5 Angle Restriction