measure seems difficult since even the notion of local structural neighbor-hood patterns is hard to grasp. In this chapter, we argue that the spread of probability mass under the node’s most relevant local neighbors is a good characteristic for the node’s role. Similarly to [91] we leverage the Approxi-mate Personalized PageRank (APPR)to effectively describe multiple locality structures around the vertices and use the probability distribution vectors as a basis to quantify the structural roles of the nodes. An important feature of our novel node representation is that it is very efficient to compute and thus, even suitable for large data sets. Furthermore, an important difference to previously published related works, e.g., [82], is that our method operates directly in the vertex domain, though the heat kernel diffusion process re-sembles that implied by PPR [71]. Additionally, our method is not restricted tok-hop neighborhoods.
Our empirical evaluation demonstrates that our simple approach outper-forms somewhat more advanced state-of-the-art role-based node representa-tions. With respect to previously published work on the topic of structural node embeddings (see Section 9.1) we summarize the key contributions of the work presented in this chapter as follows:
• A novel structure-based approach to determine role representations for single nodes directly in the vertex domain as opposed to existing diffusion-based approaches which operate in the spectral domain.
• A fast-to-compute approach that retrieves continuous role representa-tions rather than being composed of multiple, computationally rather costly structural features.
• An extensive evaluation of our proposed role representations that shows promising results when comparing our representations to state-of-the-art node embeddings when using their setups.
11.2 Structural Node Representations using
Algorithm 7APPR
Input: Source nodevi, Teleportation probability α, Approximation threshold Output: APPR vector pi
1: pi= 0,ri =ei
2: whilerij ≥dj for some vertex vj do 3: pick anyvj whererij ≥dj
4: push(vj) 5: end while 6: return pi
Algorithm 8push(vj) 1: pij =pij+ (2α/(1 +α))rij
2: forvk with(vj, vk)∈E do
3: rik=rik+ ((1−α)/(1 +α))rij/dj
4: end for 5: rij = 0
Approximate Personalized PageRank
Personalized PageRank (PPR) can be viewed as a special case of the PageRank [201] algorithm, where each random walk starts at the same node vi and at each step there is a chance of α to jump back to vi. The effect of this modification is that the PageRanks are personalized to the nodevi, i.e., they represent the importance of each node from the perspective of the source node vi. Formally, the PPR-vector πi of node vi is given as the solution of the linear system
πi =αei+ (1−α)πiW, (11.1) where W = D−1A is the random walk transition matrix obtained from the n×nadjacency matrixAby normalizing the outgoing links of each node by its out-degree,ei ∈R1×n denotes thei-th unit vector andα is the teleportation parameter.
The probability of transitioning to a neighborvj from a nodeviis given by wij. The entry πij can then be interpreted as the probability that a random walk starting at vi and stops at vj. The expected length of a random walk is determined by the teleportation probability α. With a smaller value, a larger portion of the graph is explored, while a larger value leads to stronger localization.
Intuitively, πij measures the importance of node vj for node vi and the PPR vector πi as a whole yields a distribution of the node importance in the neighborhood ofvi where the extension of the neighborhood is controlled by the parameter α. In particular, the neighborhood is not restricted to nodes with a maximum hop-distance, such as thek-neighborhood, which may
Algorithm 9APPR-Roles
Input: GraphG, LabelsL, Teleportation probabilitiesαs, Approximation thresh-old
Output: Classification modelm 1: role_descriptors = list() 2: for idx in range(G)do 3: v =G.getNode(idx) 4: embv = list() 5: forα inαsdo
6: pαv = APPR(v, α, ) 7: embv.append(entropy(pαv)) 8: end for
9: role_descriptors.append(embv) 10: end for
11: m= LogisticRegression().fit(role_descriptors,L) 12: return m
contain irrelevant nodes or miss important ones. Compared to the shortest path distance, nodes with a larger shortest path distance fromvi could still be more important, e.g., if they can be reached via many different short paths. For similar reasons, nodes with a small shortest path distance might not be equally important. Such effects are captured by PPR.
Local push-based algorithms [133, 33] computeApproximate Personalized PageRank (APPR) very efficiently and lead to sparse solutions, where only the most relevant neighbors are taken into consideration [91]. In addition to the teleportation parameter α, the approximation threshold controls the approximation quality and runtime. The main idea is to start with all probability mass concentrated in the source node and then repeatedly push probability mass to neighboring nodes as long as the amount of mass pushed is large enough. In this work, we consider the algorithm proposed in [22].
In particular, we use an adapted version proposed in [234] which converges faster. The procedure is formalized in Algorithm 7 and Algorithm 8.
For a given graph Gi and teleportation probability α, we compute the APPR-vector p(α)j of each node vj ∈ Vi and store it as the j-th row of the sparse n×n APPR-matrix Pi(α).
Entropy-based Node Descriptors
The APPR-vectorpi of a node vi effectively models the connectivity of that node with respect to all other nodes in the graph as a probability distribution, where the probability mass is concentrated only on vi’s relevant neighbors.
This way, the size of the neighborhood is determined by the parameter α.
Figure 11.2: Workflow for calculating the role-based node descriptors.
The APPR-vector pi additionally focuses on the most relevant neighbors by ignoring nodes with small probabilities and thus providing a sparse neighbor-hood representation. In principle, we could use the APPR-vectors directly as node representations. This would lead to the following feature space:
∆n = (
p∈Rn≥0
n
X
i=1
pi = 1 )
, (11.2)
which is known as then-dimensional standard simplex. However, the result-ing representations model homophily rather than structural properties, since they encode the information to which individual nodes a particular source node is connected. In order to make the representations location-invariant, we need to factor out this information. Since location invariance in this case translates to permutation invariance, we consider the quotient space
∆n
∼ ={[p]|p∈∆n}, (11.3) which corresponds to the set of equivalence classes [p] = {q ∈ ∆n | p ∼ q}
of the equivalence relation ∼ with p ∼ q ⇔ ∃P ∈ P : p = qP where P=
P ∈ {0,1}n×n
P1 = 1, PT1 = 1 is the set of permutation matrices.
As a corresponding quotient map, we can define f : ∆n → ∆n
∼ with f(p) = pPp which maps p to its equivalence class by sorting it using the permutation matrixPp such that1≥f(p)1 ≥ · · · ≥f(p)n≥0.
Though the resulting sorted APPR-vectors qualify as structural node de-scriptors, they are not well suited for further downstream tasks since they
are high-dimensional and sparse. Furthermore, node descriptors would not be comparable among graphs with different numbers of nodes. To this end we need to perform some form of aggregation. Our approach is based on the observation that, in terms of APPR, the structural properties of two nodes differ mostly based on the extent to which they spread their probability mass throughout the graph. For instance, a community node will spread its proba-bility mass evenly to nodes within the same community, whereas a peripheral node will strongly concentrate its probability mass to one or very few nodes to which it is connected. The above behavior can be accurately described by the Shannon entropy H : ∆n
∼ → R with H(p) = −Pn
i=1pilogpi where we use the binary logarithm. In particular, it fulfills the following properties.
Theorem 1. For all p∈ ∆n
∼ it holds that 1. H(p)∈[0,logn].
2. H(p) = 0 if and only if p=e1. 3. H(p) = logn if and only if p= n11. 4. H(p) = logn−DKL(pkn11).
Proof. The proofs are straightforward and can be found in [74].
Intuitively stated, properties (1) to (3) state that the entropy is mini-mized for a distribution with a single peak and maximini-mized for the uniform distribution. Property (4) states that the entropy can be interpreted as the similarity to the uniform distribution in terms of the Kullback-Leibler di-vergence. Our empirical results support the usefulness of this intuition. A further advantage over other applicable dimension reduction techniques is that we can describe each node by a single scalar value which can be visual-ized directly on a color map (as was done in Figure 11.1) and has a simple and intuitive meaning. Note that the entropy function is symmetric, i.e., H(f(p)) =H(p) for all p∈∆n. As a result, the APPR-vectors need not be sorted and the entropy of a single node vi can be computed in linear time with respect to the number of non-zero entries of pi.
Recalling that the teleportation parameter α in APPR controls the ef-fective neighborhood size, we detect roles on multiple scales by computing APPR for multiple parameter values α∈ {α1, . . . , αl} and concatenating for each node its correspondingl entropy values. The final descriptor of nodevi
is then given as
fi =h H
p(αi 1)
, . . . , H
p(αi l)iT
∈Rl. (11.4)
The entire procedure for calculating the role descriptors is sketched in Figure 11.2 and can be summarized as outlined in Algorithm 9. The input for the method is the entire graph dataset G, a list of node labels L, a list of teleportation parametersαs, and the approximation threshold . For each of the nodes v in G, the algorithm stacks the entropy-based representations of the corresponding APPR vectors, denoted as pαv, to generate the role descriptor ofv, i.e., embv. The algorithm finally fits a classification model on the collection of role descriptors and retrieves the resulting model for node classification.