• Keine Ergebnisse gefunden

0 50000 100000 150000 200000 250000 300000

Edge betweenness

Figure 2.9:Average number of shortest paths passing trough a motif edge for the 1990 snapshot of the DBLP (all publications dating before or from 1990).

It is clear that the box motif has a significantly stronger tendency than all other motifs for its heavy-weight occurrences to have long construction times. Thus, the heavy-weight occurrences of the box motif seem to span a bridge over time.

2.6.3 Separation in Scientific Area: Interdisciplinary Collaborations

Finally, we look at how the motifs, and in particular the box motif, are distributed across the co-authorship network. The aim is to investigate whether the box motifs lay dominantly within clusters of connected nodes, or rather among such clusters, indicating a certain degree of inter-disciplinary collaborations.

Edge betweenness is a centrality measure that estimates wether an edge lays within a cluster of nodes, or connects two such clusters. The betweenness of an edge is the number of shortest paths between node pairs that go through that edge. Edges between clusters have very high betweenness, as all the shortest paths among both clusters go through those edges.

Obviously, edge betweenness is perfectly suitable for our analysis. Therefore, we compute the edge betweenness of all edges in the co-authorship network and use them as edge weights.

Figure 2.9 shows the average number of shortest paths that use edges of occurrences of a particular motif (normalized by the number of edges in this motif). Clearly, the box motif edges, together with those of motif 3, constitute high betweenness values and hence lay often on paths between larger communities within the network.

Our results are a strong indicator that the box motif is to a certain extent related to interdis-ciplinary collaborations.

exclude trivial effects, such as the number of authors per publication or publications per edge, as responsible for the presented in Section 2.5 findings.

2.7.1 Network Properties

To assure that our two databases comply with already investigated co-authorship networks, we compute a set of network properties usually discussed in related work. These include degree distribution, citation distribution and average clustering coefficient.

None of the computed network measures shows a significant deviation from already published results on collaboration databases. The results are displayed in Figures 2.10, 2.11 and Table 2.1.

1 10 100 1000 10000 100000

1 10 100 1000

Number of nodes

Node degree

Degree Distribution DBLP Degree Distribution CiteSeerX

Figure 2.10:Degree distributions of DBLP and CiteSeerX.

1 10 100 1000 10000 100000

1 10 100 1000 10000

Number of publications

Citation Index

Citation Distribution DBLP Citation Distribution CiteSeerX

Figure 2.11:Citation distributions of DBLP and CiteSeerX.

2.7 Supporting Experiments 35

Network Authors per Paper Papers per Author Clustering Coefficient

DBLP 2.74 4.04 0.658

CiteSeerX 2.69 3.26 0.667

Table 2.1:Average number of authors per paper, papers per author and clustering coefficients for the DBLP and CiteSeerX databases. All values comply with results on co-authorship networks from related work.

The average number of authors per paper and papers per author are very similar to the same quantities computed on many other co-authorship networks investigated in related work [11, 58, 65, 66]. The same holds for the high average clustering coefficient, typical for social and co-authorship networks.

All four degree and citation distributions follow fat tail power law, characteristic for all so far investigated co-authorship networks.

2.7.2 Weight Distributions and Average Values

In Section 2.5 we have presented our main finding, namely the difference in average weight per edge across the various three- and four-node motifs. To assure that the computed average values are well defined and legitimate, we investigate the whole motif weight distributions instead of just looking at their average values. The results are displayed in Figure 2.12.

0 0.2 0.4 0.6 0.8 1

1 10 100 1000 10000

P(x>w)

Motif weight <w>

Motif 1 Motif 2 Motif 3 Motif 4 Motif 5 Motif 6 Motif 7 Motif 8

Figure 2.12:The motif edge weight distributions for all eight motifs within the DBLP database.

All eight distributions are monotone and governed by the box motif as can be seen from Figure 2.12. Therefore, the computed average value are also well defined. Furthermore, the

0 5 10 15 20 25 30 35

0 10 20 30 40 50

Motif weight

Percentage removed heaviest motif instances Motif 6 Motif 4

Figure 2.13:The effect on the average motif edge weight when one gradually removes the heav-iest instances of that motif.

dominance of the box motif does not come from a few motif instances with extreme values, but is rather dictated by the whole weight distribution.

The motif weights are computed over the whole database and with respect to edge weight definition 2.3 from Section 2.5.

2.7.3 Most Successful Motif Instances

Our next step is to investigate how the average motif edge weight changes when one consistently disregards the heaviest motif instances when computing the mean values. Aim of our analysis is to show that it is not the top motif instances that make the box motif so successful, but rather all of them taken together.

Again we investigate the whole DBLP database under edge weight definition 2.3 and take motif 4 as a reference. The results are shown in Figure 2.13.

One observes that the average motif edge weight reduces gradually for the box motif as well as the reference motif 4. The high average value of the box motif is not a result of a few extremely heavy instances, but is rather dominated by the high number of intermediately heavy instances.

2.7.4 Eliminating Trivial Effects

Up to now we have shown that our two databases comply with related work on co-authorship networks. Furthermore, the presented in Section 2.5 mean values are justified and are not influenced by a few extreme values.

2.7 Supporting Experiments 37

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

1 10

Motif 1 Motif 2 Motif 3 Motif 4 Motif 5 Motif 6 Motif 7 Motif 8

Number of papers on edge p

P(x>p)

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

10 20 30 40 50 60 70 80 90 100

Motif 1 Motif 2 Motif 3 Motif 4 Motif 5 Motif 6 Motif 7 Motif 8

P(x>a)

Number of authors on edge a

Figure 2.14:The number of papers respectively co-authors per motif edge for DBLP.

Finally, we want to tackle one last aspect. Namely, to exclude trivial effects as the number of papers and the number of their authors as possible causes for the success of the box motif.

Note that the four edge weight definitions introduced in Section 2.5 implicitly address the above issue. They integrate the number of papers between a pair of authors, the number of co-authors on those publications, or both effects simultaneously. Otherwise, one can assume that the high average value of the box motif comes from one of those two effects.

Recall from Section 2.5.2, that independently of the edge weight definition, the box motif was still the mostsuccessfulone. To exclude any doubt, we have calculated the average number of publications between a pair of authors in all motifs, as well as the number of co-authors on those publications. The results are displayed in Figure 2.14.

One clearly sees that the box motif neither profits from a high number of papers running through its edges, nor have those publications significantly few authors. The box motif does not dominate any of the two distributions and in both cases there is at least one other motif with comparable values. The prevailing weight of the box motif is not a result of any trivial effects one may suspect.

To conclude, we carried out a set of supporting experiments on the analyzed data. We have observed that the properties of the investigated co-authorship networks comply with related work. Furthermore, we showed that the presented results are well defined and justified, as well as that they do not come from certain trivial effects.

Consequently, the success of the box motif revealed in Section 2.5 is apparent and undeniable.