• Keine Ergebnisse gefunden

Bipartite random graphs and standard cuckoo hashing

and the next term of the expansion is given by 12κ2κ3κ1+ 3κ2κ412κ22κ2112κ22κ223

24κ32m(1 +ε) = 512ε+ 4ε2+ 12ε3−ε4

48ε31)m . (4.19) Further, we calculate an asymptotic expansion of the factor preceding the integral apply-ing Stirlapply-ing’s formula (3.23), Combining these results, we get the desired asymptotic expansion,

#G2m,m(1ε) = (2m)2m(1ε) Thus, the probability that a randomly selected graph does not contain a complex com-ponent equals holds, we can replace ε by ε in (4.22). Consequently, the same expansion holds if we consider the series (m,m(1 +ε)),(m+ 1,(m+ 1)(1 +ε)),(m+ 2,(m+ 2)(1 +ε)), . . . and the corresponding hash tables, what completes the proof of the theorem.

4.3 Bipartite random graphs and standard cuckoo hashing

Our next goal is the adoption of this ideas to bipartite cuckoo graphs, as defined in Chapter 1. Again, we are interested in the probability, that such a graph contains no complex component. More precisely, we consider bipartite multigraphs consisting of m labelled nodes of each type and nlabelled edges. Each of the labelled edges represents a key and connects two independent uniform selected nodes of different type. The analysis is once again based on generating functions. However, we have to replace the univariate functions by (double exponential) bivariate generating functions to model the different types of nodes, what makes the analysis more complicated. Nonetheless, we obtain the following result.

4 Sparse random graphs

Theorem 4.2. Suppose that ε (0,1) is fixed. Then the probability p(n, m) that a cuckoo hash of n= (1−ε)m data points into two tables of size m succeeds, (that is, the corresponding cuckoo graph contains no complex component,) is equal to

p(n, m) = 1−(2ε25ε+ 5)(1−ε)3

A sketch of the proof can be found in Kutzelnigg [2006]. A detailed version of the proof is given in Drmota and Kutzelnigg [2008].

Proof of Theorem 4.2

Once more, we start counting all bipartite graphs without considering the type of their components. Let Gm1,m2,n denote the set of all node and edge labelled bipartite multi-graphs (V1, V2, E) with |V1|=m1,|V2|=m2, and |E|=n. By definition, it is clear that the number of all graphs of the family Gm1,m2,n equals

#Gm1,m2,n =mn1mn2. (4.25) In particular, we are interested in the case m1 = m2 = m and n = (1−ε)m, where ε∈(0,1). This means that the graph is relatively sparse.

Next, letGm1,m2,ndenote those graphs inGm1,m2,n without complex components, that is, all components are either trees or unicyclic. Further,

g(x, y, v) =

denotes the corresponding generating function. First, we want to describe this generating function. For this purpose we will now consider bipartite trees.

We call a tree bipartite if the vertices are partitioned into two classesV1 (“black” nodes) and V2 (“white” nodes) such that no node has a neighbour of the same class. They are called labelled if the nodes of first type, that is nodes in V1, are labelled by 1,2, . . . ,|V1| and the nodes of second type are independently labelled by 1,2, . . . ,|V2|, see also Gimenez et al. [2005].

Let T1 denote the set of bipartite rooted trees, where the root is contained in V1, similarly T2 the set of bipartite rooted trees, where the root is contained in V2, and ˜T the class of unrooted bipartite trees. Furthermore, let t1,m1,m2 resp. t2,m1,m2 denote the number of trees inT1resp.T2withm1nodes of type of type 1 andm2of type 2. Similarly we define ˜tm1,m2. The corresponding generating functions are defined by

t1(x, y) =

The assertion of the following lemma is a first attempt exploring these trees.

4.3 Bipartite random graphs and standard cuckoo hashing The explicit formula for ˜tm1,m2 is originally due to Scoins [1962].

Proof. The functional equations (4.30) are obviously given by their recursive description.

Note that t1(x, y) = t2(y, x) holds and that t1(x, x) equals the usual tree function t(x) defined in (4.4). Thus,t1(x, y) andt2(x, y) are surely analytic functions for|x|< e1 and

|y|< e1. This holds due to the fact that the radius of convergence of t(x) equals 1/e.

In the last section, we mentioned that the generating function of usual unrooted labelled trees is given by t(x)−t(x)2/2. Thus, (4.31) is a generalisation of this result, and can be proved in a similar way. First, consider a rooted tree, possessing a black root labelled by 1, as an unrooted tree. Next, examine an unordered pair (t1, t2) of trees from T1×T2, and join the roots by an edge. If the black node labelled by 1 is contained int1, consider the root of t2 as new root, and we obtain a tree possessing a white root and at least one black node. Else, consider the root of t1 as new root, and we obtain a tree with a black root node not labelled by 1.

Lagrange inversion applied to the equation t1(x, y) =xexp

yet1(x,y) to choose the root of type 1 in an unrooted tree with m1 nodes of type 1.

Later on, we will make use of the partial derivatives of this functions.

Lemma 4.3. The partial derivatives of the functions ˜t(x, y), t1(x, y) and t1(x, y) are

4 Sparse random graphs

Proof. All this results can be easily calculated using implicit differentiation. For instance, we obtain ∂xt1(x, y) with the equation system

∂xt1(x, y) =

∂x

xet2(x,y)

=et2(x,y)+xet2(x,y)

∂xt2(x, y), (4.38)

∂xt2(x, y) =

∂x

yet1(x,y)

=yet1(x,y)

∂xt1(x, y). (4.39)

Next, we draw our attention on unicyclic components.

Lemma 4.4. The generating function of a connected graph with exactly one cycle is given by

c(x, y) =

k1

1

2kt1(x, y)kt2(x, y)k= 1

2log 1

1−t1(x, y)t2(x, y). (4.40) Proof. Of course, a cycle has to have an even number of nodes, say 2k, whereknodes are black and the otherknodes are white. A cyclic node of black colour can be considered as the root of a rooted tree of the setT1 and similarly, a white cyclic node can be considered as the root of a rooted tree of the set T2. Note that we have to divide the product of the generating functions t1(x, y)kt2(x, y)k by 2k to account for cyclic order and change of orientation. Hence, the corresponding generating functions of a unicyclic graph with 2k cyclic points is given by

1

2kt1(x, y)kt2(x, y)k. (4.41) Consequently, the claimed equation holds.

Using these functions, we can describe the generating functiong(x, y, v).

Lemma 4.5. The generating function g(x, y, v) is given by g(x, y, v) = e1v˜t(xv,yv)

1−t1(xv, yv)t2(xv, yv). (4.42) Proof. We have to count graphs where each component is either an unrooted tree (that is counted by ˜t(x, y)) or a graph with exactly one cycle. Since a cyclic component of size m1 +m2 possesses exactly the same number of edges as nodes and since there are (m1 +m2)! possible edge labels, the corresponding generating function that takes the edges into account in given by c(xv, yv). Similarly, a tree of size m1+m2 has exactly n=m1+m21 edges. Consequently the generating function ˜t(xv, yv)/v corresponds to a bipartite unrooted tree. Hence, the generating functiong(x, y, v) is given by

g(x, y, v) =e1v˜t(xv,yv)+c(xv,yv)= ev1˜t(xv,yv)

1−t1(xv, yv)t2(xv, yv), (4.43) which completes the proof of the lemma.

Corollary 4.1. The number of graphs #Gm1,m2,n is given by

#Gm1,m2,n= m1!m2!n!

(m1+m2−n)![xm1ym2] ˜t(x, y)m1+m2n

1−t1(x, y)t2(x, y). (4.44)

4.3 Bipartite random graphs and standard cuckoo hashing

We use the corollary and Cauchy’s Formula and obtain

#Gm,m,n= (m!)2n! This is in fact an integral that can be asymptotically evaluated using a (double) saddle point method, see Theorem 3.2. Additionally, we obtain by Stirling’s formula (3.23) the asymptotic expansion

For our problem, it turns out that if ε= 1 n t(x) =xet(x) equals the tree function. Hence we get

t1(x0, x0) = 1−ε = n

For instance, we further obtain κ20= t1(x0, y0) Further cummulants can be calculated in the same way, but have been computed with help of a computer algebra system in a half-automatic way. The maple source file is included in the attached CD-Rom, see also Appendix B.

We set

f ˜t, g→ 1

1−t1t2

, k→2m−n, m1 →m, and m2 →m,

4 Sparse random graphs

in Theorem 3.2 and apply the saddle point method. Thus, we obtain an asymptotic expansion of the double integral of (4.45) possessing the leading coefficient

˜t(x0, y0)2mn Furthermore, the coefficient of 1/mof this asymptotic expansion is given by

C = ε610ε5+ 21ε4327ε2+ 20ε5

12ε3(2 +ε)2(1−ε) . (4.54) With help of this results and (4.46) we finally obtain the asymptotic expansion

#Gm,m,(1ε)m =m2(1ε)m of the number of graphs without complex components.

We may now replaceε by ε=ε+O(1/m). Let p(n, m) denote the probability, that every component of the cuckoo graph is either a tree or unicyclic, after the insertion of n edges. So, we finally obtain

p(n, m) = #Gm,m,(1ε)m This step completes the proof of Theorem 4.2. Figure 4.1 depicts the graph of h(ε) = (2ε2 5ε+ 5)(1−ε)3/(12(2−ε)2ε3). The series expansion of the function h(ε) with We want to note that it is also possible to obtain a slightly more precise asymptotic expansion for

where ˜h(ε) is again explicit. This can be done by refining the calculations related to Lemma 3.2.

For example, we can apply these expansions in order to obtain asymptotic represen-tations for the probability q(n+ 1, m) that the insertion of the n+ 1-st edge creates a bicyclic component, conditioned on the property, that the firstninsertions did not create such a component.

Lemma 4.6. The probability that the insertion of then+1-st inserted key forces a rehash is given by

4.4 Asymmetric cuckoo hashing

As mentioned in Chapter 1, asymmetric cuckoo hashing uses tables of different size. To be precise, we choose the tables in such a way, that the first table holds more memory cells than the second one. Thus, we expect that the number of keys actually stored in the first table increases, what leads to improved search and insertion performance. In this section, we adopt the previous analysis such that it covers the asymmetric variant too.

During the analysis, we make use of the factor of asymmetryc. This constant determines the size of both hash tables, which holdm1=m(1+c)respectivelym2= 2m−m1 cells.

Thus, the equationc= 0 corresponds to the standard and hence symmetric algorithm.

Theorem 4.3. Suppose that c [0,1) and ε (1−√

1−c2,1) are fixed. Then, the probability that an asymmetric cuckoo hash ofn=(1−ε)m data points into two tables of size m1=m(1 +c) respectively m2 = 2m−m1 succeeds, (that is, the corresponding cuckoo graph contains no complex component,) is equal to

1(1−ε)3(103+ 9ε23c2ε2+ 9εc215ε+ 2c410c2) See also Kutzelnigg [2008] for a sketch of the proof of this theorem.

Proof of Theorem 4.3

Since Corollary 4.1 already covers cuckoo graphs with a different number of nodes of each type, the only difference to the proof of Theorem 4.2 is the application of the saddle point method. Now, our starting point is the generalised equation

#Gm1,m2,n= −m1!m2!n! According to Theorem 3.2, the saddle point is determined by the system consisting of the equations

4 Sparse random graphs

Using Lemma 4.3, the system becomes to m1

m1+m2−n = t1(x0, y0)

˜t(x0, y0) and m2

m1+m2−n = t2(x0, y0)

˜t(x0, y0). (4.64) Further, with help of Lemma 4.2, we obtain the following equations:

m1

m1+m2−n = t1(x0, y0)

t1(x0, y0) +t2(x0, y0)−t1(x0, y0)t2(x0, y0), m2

m1+m2−n = t2(x0, y0)

t1(x0, y0) +t2(x0, y0)−t1(x0, y0)t2(x0, y0). (4.65) Solving this system for t1(x0, y0) and t2(x0, y0) exhibits the solution

t1(x0, y0) = n m2

and t2(x0, y0) = n m1

. (4.66)

Finally, it turns out that the saddle point is given by x0 =t1(x0, y0)et2(x0,y0) = n

m2emn1 and y0=t2(x0, y0)et1(x0,y0)= n m1emn2.

(4.67) Again, we introduce the notation

ε = 1 n

m = 1(1−ε)m

m . (4.68)

We observe that due to the singularity of the denominator, the saddle point method is only applicable if the relation 1 > t1(x0, y0)t2(x0, y0) holds. Hence we obtain the inequality

1> n m1

n

m2 = n (1 +c)m

n

(1−c)m = (1−ε)2

1−c2 , (4.69)

and finally the condition

ε >1

1−c2. (4.70)

The cummulants can be calculated in a similar way as the saddle point, but have been computed using a computer algebra system in a half-automatic way. The maple source file is included in the attached CD-Rom, see also Appendix B.

Further, we apply Theorem 3.2 using the setting f ˜t, g→ 1

1−t1t2, k→m(1 +ε), m1 →m(1 +c), and m2 →m(1−c), to obtain an asymptotic expansion of the double integral of (4.62). In particular, we obtain that this asymptotic expansion possesses a leading coefficient equal to

˜t(x0, y0)2mn 2π(2m−n)xm0 y0m

1−t1(x0, y0)t2(x0, y0)

κ20κ02−κ211

= (1−ε2)m(1+ε)e2m(1ε)(1−c)m(1+c)(1 +c)m(1c) 1 +ε 2πm(1−c2)m(1+ε)(1−ε)2m

(1−ε)(1−c2) , (4.71)