• Keine Ergebnisse gefunden

Measures of Nodes

Im Dokument Identification of nodes and Networks (Seite 37-40)

is the true influencer whom one can rely on to spread their information, e.g., advertising?

Further, attacks on which group of proteins could effectively kill unwanted bacteria? Yet, we are not able to answer these questions in this thesis, even in the near future. But the study of these problems on networks might shine the path to the real answers. In what follows, a few measures are introduced as the purpose of gaining the basic concepts. More other methods can be found in Chapter 3 later.

2.2.1 Degree centrality

Thedegree centralityis perhaps the most straightforward and simplest way to measure a node. It identifies nodes directly through their degree. Practically, one might easily convince you that a paper is reliable if it is cited by a lot of papers. In a social network, celebrities (usually owns a lot of connections), for example, cannot feel free to post or comment on something while others with a few connections are possibly able to since they have more influences and their opinions perhaps result in disasters. From the perspective of network science, a nodeiis said to be more important than another one jif and only ifki >kj under the degree centrality.

2.2.2 Eigenvector centrality

The main idea of theeigenvector centrality [43] is that a node connected to important nodes might also be an important node, even though sometimes it only has a few connections.

Therefore, different from the degree centrality considering each node equally, the eigenvector centrality measures the influenceHi of a nodeithrough summing up the centralities from its neighbors,

Hi = 1 λ

jΓ(i)

Hj, (2.13)

where λis a constant [43]. The solution of Eq. (2.13) could be well approximated by the power method (see Appendix A.1.2). Hence, for the eigenvector centrality, a nodeiis more influential than another one jif and only ifHi > Hj.

2.2.3 Katz centrality

From the way in which the eigenvector centrality employs to obtain H, we know that a nodeiactually gets its scoreHi by iteratively aggregating the information from its nearest neighbors. In this manner,Hi could possibly contain all the information from the whole network. This is a good strategy, but there is a problem that nodeiunder the eigenvector centrality views the information from other nodes equally, no matter whether those nodes are its nearest neighbors or some others are far away from it. TheKatz centrality[44] can address this problem.

2.2 Measures of Nodes

The Katz centrality uses a parameterαto control the magnitude of the information that it aggregates from different nodes,

Hi =

jΓ(i)

Zji, Z=

t=1

αtAt.

(2.14)

Assuming that α< 1

λ1, then the Katz centrality could be obtained through

H =αATH+1, (2.15)

where HTis the transpose ofHand1represents a vector(1, 1, 1, ...)(detains see Appendix A.1.3).

2.2.4 PageRank

Indeed, the Katz centrality can balance the information from nodes with different distances. But still, it suffers from another problem: a node copies its centrality to all its nearest neighbors. In other words, for example, a very important nodeimight connect to a number of nodes in a network, and thus it makes those nodes influential. And because they geti’s centrality directly, some of them would have larger centralities than other important nodes in second or further layers even though those nodes are actually unimportant. Perhaps we can overcome this problem through the adjustment ofαin Eq.(2.15). But it is usually not a preferable way because one cannot know whichαis the best.

To overcome that, thePageRank[45] is presented, which initially is developed for the ranking of web pages. Therefore, it mainly considers the problem in directed networks,

Hi = 1α

n +α

jΓ(i)

Hj

koutj , (2.16)

where αis a constant parameter called residual probability8, which is usually set to 0.85, and koutj is the outdegree of node j. In this manner, the centrality Hi of node iis equally divided and assigned. Note that the nearest neighbor setΓ(i)defined for undirected network corresponds to the in-neighbors here. Eq. (2.16) has a problem that a nodeicannot give its score Hi out if kouti = 0, which means that iwould ‘absorb’ centralities from other nodes and make∑jHj smaller and smaller with the increase of iteration. One way to tackle this problem is to let those nodes withkouti =0 connect to all other nodes in the network. Thus, we have a modified adjacency matrix, say A, and also the corresponding degree sequence kout. Then, in matrix notation, we have

H=αATD1H+1α

n 1, (2.17)

8Here 1αcan be understood as that a visit might start from any pages.

in whichDis the diagonal matrix ofkout. Rearranging it, one can exactly get the centrality through

H= 1α

n (I−αATD1)11. (2.18) If lettingHfollow∑i|Hi|=1, then Eq. (2.17) can be rewritten as

H= (αATD1+1α

n Z)H, (2.19)

where Zis a n×nmatrix with all entries equal to 1, i.e.,ZH = 1. Therefore, we can still employ the power method to get the PageRank centrality.

2.2.5 Closeness centrality

The basic idea of thecloseness centralityis that an important node should be close to as many other nodes as possible. Thus, it calculates the centrality [46] through

Hi = n

j∈N dij, (2.20)

which indicates that a node is more important if it has a smaller average length of shortest paths to other nodes. Note that Eq. (2.20) has a normalized term compared to the original definition in ref. [46], which makes it have the capability to compare two nodes from different networks.

2.2.6 Betweenness centrality

The betweenness centrality [47] also relies on the shortest path in a network, but is calculated by counting the number of shortest paths that a node locates at instead of the average length of shortest paths. In this way, compared to the closeness centrality, the betweenness centrality is usually more capable of identifying the importance of a node, like a node with large betweenness centrality might be associated with the ‘bottleneck’ of a communication system. Specifically, the betweenness centrality obtains the centralityHi of a nodeiby

Hi =

u,v∈N,u̸=i̸=v

#shortest paths containingifrom utov

#shortest paths fromutov , (2.21) which can be further normalized throughH = maxH−H−minminHH.

Im Dokument Identification of nodes and Networks (Seite 37-40)