In this chapter, we employed the role descriptors presented in the previous chapter to generate embeddings for entire graph structures. We proposed

a straightforward aggregation scheme that uses traditional histograms with equi-sized bins to aggregate the role descriptors for each graph. Further, we used a somewhat more sophisticated aggregation scheme that also relies on histograms but is adaptive to the distribution of all nodes’ role descriptors due to being based on the k-means clustering. Our experimental evaluation showed that even with these simple aggregation schemes, we were already able to outperform state-of-the-art techniques. Hence, we conclude that the nodes’ functions, resp. roles, and in particular their role descriptors (and not only the ones presented in the previous chapter) are indeed useful when aiming at representing entire graph structures as numerical vectors.

(a) Accuracy for MUTAG. (b) Accuracy for ENZYMES.

(c) Accuracy for NCI1. (d) Accuracy for NCI109.

(e) Accuracy for PROTEINS.

Figure 12.2: Accuracy scores achieved with 1-NN classification on the biolog-ical datasets. The orange lines denote the medians, the green triangles depict the mean values and correspond to the values reported in Table 12.2. Again, the DGK results are taken from [41] and report the achieved classification accuracy when applying a SVM.

(a) Accuracy for IMDB-BINARY. (b) Accuracy for IMDB-MULTI.

(c) Accuracy for REDDIT-BINARY. (d) Accuracy for REDDIT-12K.

(e) Accuracy for REDDIT-5K.

Figure 12.3: Accuracy scores achieved with 1-NN classification on the social datasets. The orange lines denote the medians, the green triangles depict the mean values and correspond to the values reported in Table 12.3. Again, the DGK results are taken from [41] and report the achieved classification accuracy when applying a SVM.

### Semi-Supervised Learning on Graphs

The work presented in this chapter has partly been published as the article Semi-Supervised Learning on Graphs Based on Local Label Distributions on the 14th ACM KDD International Workshop on Mining and Learning with Graphs, 2018 [90]. A preliminary version can be found on arXiv [89]. At date, a full research paper version is under review for the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, 2020.

### 13.1 Introduction

The increasing relevance of graph-structured data has been accompanied by an increased interest in learning algorithms which can leverage underlying graph structure to make accurate predictions about the modeled entities. In many of these applications, it is important to categorize data objects into several types such as user classes, functional types or content topics. In many cases, we have these types only available for a portion of the nodes in the network. An important task is now to predict the types or more generally speaking thelabels of the nodes where they are still unknown.

Real-world data is often complex and in order to make accurate pre-dictions, different aspects need to be taken into account. Strong assump-tions, such that a node has the same label as most of its neighbors or that correlations between differently labeled nodes are known a priori, may lead to insufficient exploitation of more complex correlations present in the data. Homophily-based approaches [285, 263, 214, 103, 244, 260] assume that nodes which are closely connected in the graph, should have similar labels.

(a) Homophily (b) Heterophily

(c) Mixed Patterns

Figure 13.1: Different node classification methods rely on different assump-tions on node similarity, e.g., neighboring nodes have similar labels (13.1(a), Homophily) or exhibit specific correlations between labels (13.1(b), Het-erophily). Our method adaptively learns different types of correlations (13.1(a), 13.1(b) and 13.1(c)) appearing (possibly simultaneously) in the same graph for different labels. It also detects patterns at multiple scales and uses information from the whole graph (also from different connected components).

This assumption holds, e.g., in graphs where an edge denotes similarity be-tween two instances. However, relationships bebe-tween two entities modeled by edges in a graph generally may describe any interactions between them rather than being restricted to model only similarity. Another shortcoming of homophily-based methods is that they cannot use information from distant parts of the graph, different connected components or from other graphs with similar structure and labels for the classification decision. These shortcom-ings are shared by existing approaches which support heterophily in graphs [206, 144, 97, 208]. Heterophily is the tendency to connect to nodes with different labels. An example is an heterosexual dating network with the gen-der as a label. Additionally, existing methods either assume that all labels follow the same homophilic or heterophilic pattern, or correlations between pairs of labels must be provided explicitly. However, in many scenarios it is not feasible to manually model correlations between labels. At the same time, many real-life graphs are characterized not only by some specific mix of

Method Homophily Heterophily Mixed Local Variation Adaptive Labels Remote Homophilic Node Embedding [214,

103, 244, 62, 260, 91, 43, 183, 279] 3 7 7 7 7 7 7

Label Diffusion [284, 285, 281, 208,

259] 3 3 7 7 7 3 7

Belief Propagation [206, 144, 97] 3 3 3 7 7 3 7

Planetoid [271] 3 7 7 7 7 (3) (3)

MPNN [98] (3) (3) (3) (3) (3) 7 3

Ada-LLD 3 3 3 3 3 3 3

Table 13.1: Comparison to related node classification methods based on whether they fulfill (3) the desired key properties or not (7). Parentheses indicate partial fulfillment.

heterophily and homophily, but rather show high variations of these patterns for the same labels across the same graph [209]. Another problem of current methods is that the proximity of the considered neighbors often is not taken into account explicitly. However, knowing how far a particular neighbor is may be beneficial. E.g., in real-world social graphs the friends of some friends may contain more information about a node’s label than the direct friends themselves [21].

In this chapter, we propose a label-based approachAda-LLD whichlearns all possible correlations of a node’s label with labels in its local neighborhood, as depicted in Figure 13.1, at multiple scales. The main idea is as follows: If the graph structure is useful for prediction, then there should exist a corre-lation between the label of a node and the distribution of labels in its local neighborhood. Our approach neither requires this correlation to be prede-fined in advance nor to be the fixed across the graph. Therefore, our method first determines the distributions of labels in neighborhoods of different exten-sions. To do so, we consider only the most relevant neighbors for each node, which we determine by using Approximate Personalized PageRank. Given the representations of label distributions from various parts of the graph, our method learns how to infer labels for unlabeled nodes based on these representations. An important advantage of our approach is that it is able to use information from the whole graph (or all available graphs) for classifying a node and that it does not make any assumptions about relationships of labels (it learns them instead).

We summarize our main contributions as follows:

• A novel approach to semi-supervised node classification which is able to learn different types of correlations between labels by considering local label distributions.

• Variations of our base model for detecting and combining label corre-lations at multiple scales.

• An efficient and scalable algorithm for computing local label distribu-tions.

• Thorough experimental evaluation of our models and comparison with state-of-the art methods on several real world datasets.