Measures of Nodes - Identification of nodes and Networks

is the true influencer whom one can rely on to spread their information, e.g., advertising?

Further, attacks on which group of proteins could effectively kill unwanted bacteria? Yet, we are not able to answer these questions in this thesis, even in the near future. But the study of these problems on networks might shine the path to the real answers. In what follows, a few measures are introduced as the purpose of gaining the basic concepts. More other methods can be found in Chapter 3 later.

2.2.1 Degree centrality

Thedegree centralityis perhaps the most straightforward and simplest way to measure a node. It identifies nodes directly through their degree. Practically, one might easily convince you that a paper is reliable if it is cited by a lot of papers. In a social network, celebrities (usually owns a lot of connections), for example, cannot feel free to post or comment on something while others with a few connections are possibly able to since they have more influences and their opinions perhaps result in disasters. From the perspective of network science, a nodeiis said to be more important than another one jif and only ifk_i >k_j under the degree centrality.

2.2.2 Eigenvector centrality

The main idea of theeigenvector centrality [43] is that a node connected to important nodes might also be an important node, even though sometimes it only has a few connections.

Therefore, different from the degree centrality considering each node equally, the eigenvector centrality measures the influenceH_i of a nodeithrough summing up the centralities from its neighbors,

H_i = ¹ λ

∑

j∈_Γ(i)

H_j, (2.13)

where λis a constant [43]. The solution of Eq. (2.13) could be well approximated by the power method (see Appendix A.1.2). Hence, for the eigenvector centrality, a nodeiis more influential than another one jif and only ifH_i > H_j_.

2.2.3 Katz centrality

From the way in which the eigenvector centrality employs to obtain H, we know that a nodeiactually gets its scoreH_i by iteratively aggregating the information from its nearest neighbors. In this manner,H_i could possibly contain all the information from the whole network. This is a good strategy, but there is a problem that nodeiunder the eigenvector centrality views the information from other nodes equally, no matter whether those nodes are its nearest neighbors or some others are far away from it. TheKatz centrality[44] can address this problem.

2.2 Measures of Nodes

The Katz centrality uses a parameterαto control the magnitude of the information that it aggregates from different nodes,

H_i =

∑

j∈Γ(i)

Z_ji, Z=

∑

∞ t=1

α^tA^t.

(2.14)

Assuming that α< ¹

λ1, then the Katz centrality could be obtained through

H =αA^TH+1, (2.15)

where H^Tis the transpose ofHand1represents a vector(1, 1, 1, ...)(detains see Appendix A.1.3).

2.2.4 PageRank

Indeed, the Katz centrality can balance the information from nodes with different distances. But still, it suffers from another problem: a node copies its centrality to all its nearest neighbors. In other words, for example, a very important nodeimight connect to a number of nodes in a network, and thus it makes those nodes influential. And because they geti’s centrality directly, some of them would have larger centralities than other important nodes in second or further layers even though those nodes are actually unimportant. Perhaps we can overcome this problem through the adjustment ofαin Eq.(2.15). But it is usually not a preferable way because one cannot know whichαis the best.

To overcome that, thePageRank[45] is presented, which initially is developed for the ranking of web pages. Therefore, it mainly considers the problem in directed networks,

H_i = ¹−α

n +α

∑

j∈_Γ(i)

H_j

k^out_j , (2.16)

where αis a constant parameter called residual probability⁸, which is usually set to 0.85, and k^out_j is the outdegree of node j. In this manner, the centrality H_i of node iis equally divided and assigned. Note that the nearest neighbor setΓ(i)defined for undirected network corresponds to the in-neighbors here. Eq. (2.16) has a problem that a nodeicannot give its score H_i out if k^out_i = 0, which means that iwould ‘absorb’ centralities from other nodes and make∑jH_j smaller and smaller with the increase of iteration. One way to tackle this problem is to let those nodes withk^out_i =0 connect to all other nodes in the network. Thus, we have a modified adjacency matrix, say A^′, and also the corresponding degree sequence k^′^out. Then, in matrix notation, we have

H=αA^′^TD⁻¹H+¹−α

n 1, (2.17)

8Here 1−αcan be understood as that a visit might start from any pages.

in whichDis the diagonal matrix ofk^′^out. Rearranging it, one can exactly get the centrality through

H= ¹−α

n (I−αA^′^TD⁻¹)⁻¹1. (2.18) If lettingHfollow∑i|H_i|=1, then Eq. (2.17) can be rewritten as

H= (αA^′^TD⁻¹+¹−α

n Z)H, (2.19)

where Zis a n×nmatrix with all entries equal to 1, i.e.,ZH = 1. Therefore, we can still employ the power method to get the PageRank centrality.

2.2.5 Closeness centrality

The basic idea of thecloseness centralityis that an important node should be close to as many other nodes as possible. Thus, it calculates the centrality [46] through

H_i = ⁿ

∑j∈N d_ij, (2.20)

which indicates that a node is more important if it has a smaller average length of shortest paths to other nodes. Note that Eq. (2.20) has a normalized term compared to the original definition in ref. [46], which makes it have the capability to compare two nodes from different networks.

2.2.6 Betweenness centrality

The betweenness centrality [47] also relies on the shortest path in a network, but is calculated by counting the number of shortest paths that a node locates at instead of the average length of shortest paths. In this way, compared to the closeness centrality, the betweenness centrality is usually more capable of identifying the importance of a node, like a node with large betweenness centrality might be associated with the ‘bottleneck’ of a communication system. Specifically, the betweenness centrality obtains the centralityH_i _{of a} nodeiby

H_i =

∑

u,v∈N,u̸=i̸=v

#shortest paths containingifrom utov

#shortest paths fromutov , (2.21) which can be further normalized throughH = _max^H−_H−^min_min^H_H.

Im Dokument Identification of nodes and Networks (Seite 37-40)