Graphs Kernels on Shortest Path Distances

2.2 Graph Kernels based on Shortest Path Distances

2.2.3 Graphs Kernels on Shortest Path Distances

In algorithmic graph theory, the information about the endnodes and the length of shortest paths is commonly represented by a matrix called the shortest path distance matrix.

Definition 7 (Shortest Path Distance Matrix) Let G = (V, E) be a graph of size

|G| = n. Let d(v_i, v_j) be the length of the shortest path between v_i and v_j. The shortest path matrix D of G is then a n×n matrix defined as

D_ij =

d(v_i, v_j) if v_i and v_j are connected,

∞ otherwise (2.24)

58 2. Fast Graph Kernel Functions For defining a graph kernel comparing all pairs of shortest paths from two graphs G andG⁰, we have to compare all pairs of entries fromDandD⁰ that are finite (as only finite entries in S and S⁰ indicate the existence of shortest paths). This can be achieved most easily if we think of D and D⁰ as adjacency matrices defining corresponding graphs, the shortest-path graphs.

Definition 8 (Shortest-Path Graph) LetG= (V, E)be a graph, and letDbe its short-est path distance matrix. Then the shortshort-est-path graph S of G has the same set of nodes V as G, and its set of edges is defined via the adjacency matrix A(S)

A(S)_ij =

1 if D(v_i, v_j)<∞,

0 otherwise (2.25)

where D(v_i, v_j) is the edge label of edge (v_i, v_j) in S.

Hence a shortest-paths graph S contains the same set of nodes as the original graph G.

Unlike in the input graph, there exists an edge between all nodes inS which are connected by a walk inG. Every edge inS between nodesvi andvj is labeled by the shortest distance between these two nodes in G.

Based on this concept of a shortest-path graph, we are now in a position to present our graph kernel on shortest-path distances. The essential first step in its computation is to transform the original graphs into shortest-paths graphs. Any algorithm which solves the all-pairs-shortest-paths problem can be applied to determine all shortest distances in G, which then become edge labels inS. We propose to use Floyd’s algorithm (see Algorithm 1).

This algorithm has a runtime of O(n³), is applicable to graphs with negative edge weights, but must not contain negative-weighted cycles. Furthermore, it is easy to implement. In the following, we will refer to the process of transforming a graph G into S via Floyd’s algorithm as Floyd-transformation.

After Floyd-transformation of our input graphs, we can now define a shortest-path kernel.

Definition 9 (Shortest-path graph kernel) LetGandG⁰ be two graphs that are Floyd-transformed intoSandS⁰. We can then define our shortest-path graph kernel onS = (V, E) and S⁰ = (V⁰, E⁰) as

kshortest paths(S, S⁰) =X

e∈E

e⁰∈E⁰

k_walk¹ (e, e⁰), (2.26)

where k¹_walk is a positive definite kernel walks of length 1, i.e., a kernel on edges.

In the following, we will prove the validity of our shortest-path kernel.

Lemma 10 The shortest-path graph kernel is positive definite.

2.2 Graph Kernels based on Shortest Path Distances 59

Algorithm 1 Pseudocode for Floyd-Warshall’s algorithm [Floyd, 1962] for determining all-pairs shortest paths.

Input: Graph G with n nodes, adjacency matrix A, and edge weightsw for i := 1 to n

for j := 1 ton

if ((A[i, j] == 1) andi 6= j D[i, j] =w[i, j];

else

if ( i==j) D[i, j] = 0;

else

D[i, j] =∞;

end end end end

for k := 1 ton for i := 1 to n

for j := 1 ton

if (D[i, k] + D[k, j]< D[i, j]) D[i, j] :=D[i, k] +D[k, j];

end end end end

Output: Shortest path distance matrix D

60 2. Fast Graph Kernel Functions Proof The shortest-path kernel is simply a walk kernel run on a Floyd-transformed graph considering walks of length 1 only. We follow the proofs in [Kashima et al., 2003] and [Borgwardt et al., 2005]. First, we choose a positive definite kernel on nodes and a positive definite kernel on edges. We then define a kernel on pairs of walks of length 1,k_walk⁽¹⁾ , as the product of kernels on nodes and edges encountered along the walk. As a tensor product of node and edge kernels [Sch¨olkopf and Smola, 2002], k_walk⁽¹⁾ is positive definite. We then zero-extend k⁽¹⁾_walk to the whole set of pairs of walks, setting kernel values for all walks with length 6= 1 to zero. This zero-extension preserves positive definiteness [Haussler, 1999].

The positive definiteness of the shortest-path kernel follows directly from its definition as a convolution kernel, proven to be positive definite by [Haussler, 1999].

Runtime Complexity The shortest-path kernel avoids tottering and halting, yet it remains an interesting question how it compares to the known random walk kernels in terms of runtime complexity.

The shortest-path kernel requires a Floyd-transformation which can be done in O(n³) when using the Floyd-Warshall algorithm. The number of edges in the transformed graph is n², if the original graph is connected. Pairwise comparison of all edges in both transformed graphs is then necessary to determine the kernel value. We have to consider n² * n² pairs of edges, resulting in a total runtime of O(n⁴).

Equal Length Shortest-Path Kernel

Label enrichment — in the spirit of [Mah´e et al., 2004] — can also be applied to our Floyd-transformed graphs to speed up kernel computation. Both edges and nodes can be enriched by additional attributes. When performing the Floyd-Warshall algorithm, one is usually interested in the shortest distance between all nodes. However, if we store information about the shortest paths, i.e., the number of edges or the average edge length in these shortest paths, then we can exploit this extra information to reduce computational cost.

For instance, this can be achieved by setting kernels to zero for all pairs of shortest paths whose number of edges is not identical, i.e.,

k_steps(p, p⁰) =

1 if steps(p) = steps(p⁰),

0 otherwise (2.27)

where p and p⁰ are shortest paths and steps(p) and steps(p⁰) are the number of edges in path p and p⁰, respectively. If the steps kernel is zero for a pair of paths, we do not have to evaluate the node and edge kernel.

Note again that shortest paths need not be unique. Thus some extra criterion might be required to select one particular path out of a set of shortest paths. For instance, one could decide only to consider the shortest paths with minimum number of edges for computing k_steps.

k Shortest-Path Kernel

Even more valuable information for our kernel could be to know not just the shortest path between two nodes, but the k shortest paths. For each of the k shortest paths, one edge

2.2 Graph Kernels based on Shortest Path Distances 61 could then be created in the Floyd-transformed graph. Note that in this case — unlike our general convention in this thesis — we would be dealing with graphs with multiple edges, i.e., several edges between the same pair of nodes.

Finding k shortest walks and paths in a graph is a well-studied topic in graph theory and applied sciences [Yen, 1971, Lawler, 1972]. Many of the algorithms proposed for solving this problem, however, determine k shortest walks, not k shortest paths. Applying these algorithms would reintroduce the problem of tottering into our path-based kernel. It is therefore essential to chose an algorithm for finding ”k loopless shortest paths” in a graph, as this is the term commonly used in the literature. Such algorithms have been proposed over 30 years ago [Yen, 1971, Lawler, 1972] and any of those can be run on our input graphs, as long as there are no cycles in our graphs with negative weights. The setback of this method is the increased runtime complexity for determining k shortest loopless paths. Yen’s algorithm in [Yen, 1971] requires O(kn(m +nlogn)) time complexity for finding k shortest loopless paths between a pair of nodes, where n is the number of nodes and m is the number of edges. Consequently, theoretical complexity would be O(kn⁵) for determining k shortest loopless paths for all pairs of nodes in a fully connected graph and pairwise comparison of all k shortest paths in two graphs would be of complexity O((kn²)∗(kn²)) = O(k²n⁴). As a result, the preprocessing step has a higher runtime complexity than the kernel computation in this case.

A simple way to determine k shortest disjunct paths between two nodes, where no pair of paths shares any identical edge, is to iteratively apply Dijkstra’s algorithm to the same graph and to remove all edges that belong to the currently shortest path. Still, this procedure would be of runtime complexity O(n²k(m+nlogn)), which could become O(kn⁴) in a fully connected graph.

Im Dokument Graph Kernels (Seite 65-69)