Since our structural node descriptors are location-invariant and thus trans-ferable among different graphs, we are able to compare whole graphs by com-paring their respective sets of node descriptors. However, simply collecting the node descriptors in a (ordered) set for each graph is not a straightforward solution, since this would result in different length representations for graphs with different numbers of nodes. To this end, we propose two histogram-based aggregation schemes to discretize the notion of roles. Precisely, the first aggregation scheme is based on classic histograms with all bins having the same size, while the second aggregation scheme can be interpreted as an adaptive histogram where the bins adapt to the distribution of role de-scriptors. For both aggregation schemes, the node roles can be aggregated in a local or global fashion, i.e., a graph representation for a single graph either relies solely on the node roles that appear in that graph, or the repre-sentation relies on global role notions that are defined over all graphs. The intuition behind defining roles on a local view is to be more robust against

outlier roles. Recalling that role descriptors are continuous values, outliers may affect the value range over all role descriptors on a global view such that some equi-sized histogram bins defined over the entire value range become meaningless. On the other hand, the local view approach has the underlying assumption that role notions do not differ substantially over the entire body of graphs.

Somewhat more formally, let G ={G_{1}, . . . , G_{N}} be a set of graphs, with
Gi = (Vi, Ei) being a graph, V = SN

i=1Vi denoting the set of vertices and E = SN

i=1E_{i} being the set of edges. Furthermore, let f(v_{j}) ∈ R^{d} denote
the continuous role descriptors of nodes v_{j} ∈ V in G, as described in the
previous chapter. To derive graph embeddings that follow the baseline
ag-gregation scheme, i.e., the agag-gregation scheme that uses equi-sized bins for
the histogram representations, we simply aggregate the nodes of graph by
discretizing the value range of the role descriptors into equi-sized bins and
counting the occurrences of role descriptors per graph and bin. Hence, the
graph representationF_{i} for graph G_{i} is defined as

F_{i} = [|{v ∈V_{i} |b_{j} ≤f(v)< b_{j+1}}|:j = 1, . . . , k]^{T} ∈R^{k}, (12.1)
with b_{j} being the lower bound value of the j-th histogram bin, and k
de-noting the number of bins. The definition of the bins’ value ranges allows
to generate graph representations from a global or a local perspective. To
define representations on a global view, the set of bins B, with |B| = k, is
defined over the value range of role descriptors collected from all graphs in
the training dataset G_{train} ={G_{0}, . . . , G_{n−1}} ⊆ G, i.e.,

B =

k−1

[

j=0

[b_{j}, b_{j+1}),with (12.2)

b_{j} =

min({f(v)|v ∈Sn−1

i=0 V_{i}}) if j = 0,

max({f(v)|v∈Sn−1

i=0 Vi})−min({f(v)|v∈Sn−1 i=0 Vi})

k else.

(12.3)

In contrast, to define graph representations on a local view, the sets of
bins B_{i}, with |B_{i}|=k and 0≤i < n, are defined for each graph G_{i} ∈G_{train}
individually, i.e.,

Figure 12.1: Workflow for calculating the role-based graph descriptors.

B_{i} =

k−1

[

j=0

[b_{j}, b_{j+1}),with (12.4)

b_{j} =

min({f(v)|v ∈V_{i}}) if j = 0,

max({f(v)|v∈Vi})−min({f(v)|v∈Vi})

k else.

(12.5)
Similarly, we derive graph embeddings that rely on the aggregation scheme
that is based on adaptive histograms as follows. First we collect node
de-scriptors from all graphs (in the global setting) or for each graph individually
(in the local setting) in the training dataset and cluster them with k-Means
[174]. The resulting cluster centers {µ_{i} ∈ R^{l} | i = 1, . . . , k} can be
inter-preted as multi-scale role concepts appearing in the dataset. In a second
step, we assign each node v in a given graph G_{i} to its nearest cluster center
µ(v) and use the resulting count vector

F_{i} = [|{v ∈V_{i} |µ(v) =µ_{j}}|:j = 1, . . . , k]^{T} ∈R^{k}, (12.6)
as representation for that graph. One important advantage of these graph
descriptors in general is that they can be computed very efficiently, i.e., in
linear time with respect to the total number of nodes in the dataset.
Fur-thermore, the number of clustersk can be varied flexibly to explore different
numbers of roles in a graph. For a supervised objective, the hyper-parameter
can simply be optimized over a range of sensible values. However, other
clustering techniques may be employed for discretizing the continuous role
descriptors, too.

Figure 12.1 extends the workflow presented in the previous chapter by ad-ditionally calculating the described graph descriptors using the global, adap-tive approach. The final procedure consists of two blocks: in the first block,

Dataset |G| |L| φ|V| φ|E|

MUTAG 188 2 17.93 19.79

ENZYMES 600 6 32.63 62.14

NCI1 4110 2 29.87 32.30

NCI109 4127 2 29.68 32.13

PROTEINS 1113 2 39.06 72.82

IMDB-BINARY 2000 2 429.63 497.75

IMDB-MULTI 1500 3 13.00 65.94

REDDIT-BINARY 2000 2 429.63 497.75 REDDIT-12K 11929 11 391.41 456.89

REDDIT-5K 4999 5 508.52 594.87

Table 12.1: Benchmark datasets for graph classification. The upper part of the table contains biological networks, the lower part of the table refers to social network datasets. |G|denotes the number of graphs,|L|is the number of classes and φ|V|, resp. φ|E| is the average number of nodes, resp. edges.

the continuous role descriptors for each node are calculated. Given the raw
network – composed of multiple, differently sized components which form
graph structures on their own – as input, we compute the stationary APPR
distributions for each node. From these, we next derive the continuous
role-based node descriptors by computing the entropy values of the distributions
for each node. Stacking these entropy values for each component results in
differently sized and thus incomparable vectors (or matrices in case of
mul-tiple α values for the calculations of the APPR distributions). In order to
enable comparisons between differently sized subgraphs, we first discretize
the notion of roles by employing the k-means algorithm on the continuous
role descriptors in the second block of our procedure^{1}. Secondly, for each of
the subgraph structures, we count the appearances of each role within the
corresponding network to construct equally-sized graph descriptors which
can easily be used for downstream tasks like classifications. Note that the
example depicts the procedure for a single value ofα used for APPR. As we
show in the experiments section, richer representations can be calculated by
using multiple values forα.