Bipartite random graphs and standard cuckoo hashing

and the next term of the expansion is given by 12κ₂κ₃κ₁+ 3κ₂κ₄−12κ²₂κ²₁−12κ²₂κ₂−5κ²₃

24κ³₂m(1 +ε) = 5−12ε+ 4ε²+ 12ε³−ε⁴

48ε³(ε−1)m . (4.19) Further, we calculate an asymptotic expansion of the factor preceding the integral apply-ing Stirlapply-ing’s formula (3.23), Combining these results, we get the desired asymptotic expansion,

#G^◦_2m,m(1₋_ε) = (2m)^2m(1⁻^ε⁾ Thus, the probability that a randomly selected graph does not contain a complex com-ponent equals holds, we can replace ε by ε in (4.22). Consequently, the same expansion holds if we consider the series (m,m(1 +ε)),(m+ 1,(m+ 1)(1 +ε)),(m+ 2,(m+ 2)(1 +ε)), . . . and the corresponding hash tables, what completes the proof of the theorem.

4.3 Bipartite random graphs and standard cuckoo hashing

Our next goal is the adoption of this ideas to bipartite cuckoo graphs, as deﬁned in Chapter 1. Again, we are interested in the probability, that such a graph contains no complex component. More precisely, we consider bipartite multigraphs consisting of m labelled nodes of each type and nlabelled edges. Each of the labelled edges represents a key and connects two independent uniform selected nodes of diﬀerent type. The analysis is once again based on generating functions. However, we have to replace the univariate functions by (double exponential) bivariate generating functions to model the diﬀerent types of nodes, what makes the analysis more complicated. Nonetheless, we obtain the following result.

4 Sparse random graphs

Theorem 4.2. Suppose that ε ∈ (0,1) is ﬁxed. Then the probability p(n, m) that a cuckoo hash of n= (1−ε)m data points into two tables of size m succeeds, (that is, the corresponding cuckoo graph contains no complex component,) is equal to

p(n, m) = 1−(2ε²−5ε+ 5)(1−ε)³

A sketch of the proof can be found in Kutzelnigg [2006]. A detailed version of the proof is given in Drmota and Kutzelnigg [2008].

Proof of Theorem 4.2

Once more, we start counting all bipartite graphs without considering the type of their components. Let G_m₁_,m₂_,n denote the set of all node and edge labelled bipartite multi-graphs (V1, V2, E) with |V1|=m1,|V2|=m2, and |E|=n. By deﬁnition, it is clear that the number of all graphs of the family G_m₁_,m₂_,n equals

#Gm₁,m₂,n =mⁿ₁mⁿ₂. (4.25) In particular, we are interested in the case m₁ = m₂ = m and n = (1−ε)m, where ε∈(0,1). This means that the graph is relatively sparse.

Next, letG^◦_m₁_,m₂_,ndenote those graphs inG_m₁_,m₂_,n without complex components, that is, all components are either trees or unicyclic. Further,

g^◦(x, y, v) =

denotes the corresponding generating function. First, we want to describe this generating function. For this purpose we will now consider bipartite trees.

We call a tree bipartite if the vertices are partitioned into two classesV1 (“black” nodes) and V₂ (“white” nodes) such that no node has a neighbour of the same class. They are called labelled if the nodes of ﬁrst type, that is nodes in V₁, are labelled by 1,2, . . . ,|V₁| and the nodes of second type are independently labelled by 1,2, . . . ,|V2|, see also Gimenez et al. [2005].

Let T₁ denote the set of bipartite rooted trees, where the root is contained in V₁, similarly T2 the set of bipartite rooted trees, where the root is contained in V2, and ˜T the class of unrooted bipartite trees. Furthermore, let t_1,m₁_,m₂ resp. t_2,m₁_,m₂ denote the number of trees inT₁resp.T₂withm₁nodes of type of type 1 andm₂of type 2. Similarly we deﬁne ˜tm₁,m₂. The corresponding generating functions are deﬁned by

t₁(x, y) =

The assertion of the following lemma is a ﬁrst attempt exploring these trees.

4.3 Bipartite random graphs and standard cuckoo hashing The explicit formula for ˜t_m₁_,m₂ is originally due to Scoins [1962].

Proof. The functional equations (4.30) are obviously given by their recursive description.

Note that t₁(x, y) = t₂(y, x) holds and that t₁(x, x) equals the usual tree function t(x) deﬁned in (4.4). Thus,t1(x, y) andt2(x, y) are surely analytic functions for|x|< e⁻¹ and

|y|< e⁻¹. This holds due to the fact that the radius of convergence of t(x) equals 1/e.

In the last section, we mentioned that the generating function of usual unrooted labelled trees is given by t(x)−t(x)²/2. Thus, (4.31) is a generalisation of this result, and can be proved in a similar way. First, consider a rooted tree, possessing a black root labelled by 1, as an unrooted tree. Next, examine an unordered pair (t₁, t₂) of trees from T₁×T₂, and join the roots by an edge. If the black node labelled by 1 is contained int₁, consider the root of t2 as new root, and we obtain a tree possessing a white root and at least one black node. Else, consider the root of t₁ as new root, and we obtain a tree with a black root node not labelled by 1.

Lagrange inversion applied to the equation t1(x, y) =xexp

ye^t¹^(x,y) to choose the root of type 1 in an unrooted tree with m₁ nodes of type 1.

Later on, we will make use of the partial derivatives of this functions.

Lemma 4.3. The partial derivatives of the functions ˜t(x, y), t₁(x, y) and t₁(x, y) are

4 Sparse random graphs

Proof. All this results can be easily calculated using implicit diﬀerentiation. For instance, we obtain _∂x^∂t1(x, y) with the equation system

∂

∂xt₁(x, y) = ∂

∂x

xe^t²^(x,y)

=e^t²^(x,y)+xe^t²^(x,y) ∂

∂xt₂(x, y), (4.38)

∂

∂xt₂(x, y) = ∂

∂x

ye^t¹^(x,y)

=ye^t¹^(x,y) ∂

∂xt₁(x, y). (4.39)

Next, we draw our attention on unicyclic components.

Lemma 4.4. The generating function of a connected graph with exactly one cycle is given by

c(x, y) =

k≥1

2kt1(x, y)^kt2(x, y)^k= 1

2log 1

1−t1(x, y)t2(x, y). (4.40) Proof. Of course, a cycle has to have an even number of nodes, say 2k, whereknodes are black and the otherknodes are white. A cyclic node of black colour can be considered as the root of a rooted tree of the setT1 and similarly, a white cyclic node can be considered as the root of a rooted tree of the set T2. Note that we have to divide the product of the generating functions t₁(x, y)^kt₂(x, y)^k by 2k to account for cyclic order and change of orientation. Hence, the corresponding generating functions of a unicyclic graph with 2k cyclic points is given by

2kt₁(x, y)^kt₂(x, y)^k. (4.41) Consequently, the claimed equation holds.

Using these functions, we can describe the generating functiong^◦(x, y, v).

Lemma 4.5. The generating function g^◦(x, y, v) is given by g^◦(x, y, v) = e¹^v^˜^t(xv,yv)

1−t₁(xv, yv)t₂(xv, yv). (4.42) Proof. We have to count graphs where each component is either an unrooted tree (that is counted by ˜t(x, y)) or a graph with exactly one cycle. Since a cyclic component of size m1 +m2 possesses exactly the same number of edges as nodes and since there are (m₁ +m₂)! possible edge labels, the corresponding generating function that takes the edges into account in given by c(xv, yv). Similarly, a tree of size m₁+m₂ has exactly n=m1+m2−1 edges. Consequently the generating function ˜t(xv, yv)/v corresponds to a bipartite unrooted tree. Hence, the generating functiong^◦(x, y, v) is given by

g^◦(x, y, v) =e¹^v^˜t(xv,yv)+c(xv,yv)= e^v¹^˜^t(xv,yv)

1−t₁(xv, yv)t₂(xv, yv), (4.43) which completes the proof of the lemma.

Corollary 4.1. The number of graphs #G^◦_m₁_,m₂_,n is given by

#G^◦_m₁_,m₂_,n= m₁!m₂!n!

(m1+m2−n)![x^m¹y^m²] ˜t(x, y)^m¹^+m²⁻ⁿ

1−t1(x, y)t2(x, y). (4.44)

4.3 Bipartite random graphs and standard cuckoo hashing

We use the corollary and Cauchy’s Formula and obtain

#G^◦_m,m,n= −(m!)²n! This is in fact an integral that can be asymptotically evaluated using a (double) saddle point method, see Theorem 3.2. Additionally, we obtain by Stirling’s formula (3.23) the asymptotic expansion

For our problem, it turns out that if ε= 1− n t(x) =xe^t(x) equals the tree function. Hence we get

t₁(x₀, x₀) = 1−ε = n

For instance, we further obtain κ20= t1(x0, y0) Further cummulants can be calculated in the same way, but have been computed with help of a computer algebra system in a half-automatic way. The maple source ﬁle is included in the attached CD-Rom, see also Appendix B.

We set

f →˜t, g→ 1

√1−t1t2

, k→2m−n, m1 →m, and m2 →m,

4 Sparse random graphs

in Theorem 3.2 and apply the saddle point method. Thus, we obtain an asymptotic expansion of the double integral of (4.45) possessing the leading coeﬃcient

˜t(x₀, y₀)^2m⁻ⁿ Furthermore, the coeﬃcient of 1/mof this asymptotic expansion is given by

C = ε⁶−10ε⁵+ 21ε⁴−2ε³−27ε²+ 20ε−5

12ε³(−2 +ε)²(1−ε) . (4.54) With help of this results and (4.46) we ﬁnally obtain the asymptotic expansion

#G^◦_m,m,(1₋_ε)m =m²⁽¹⁻^ε^)m of the number of graphs without complex components.

We may now replaceε by ε=ε+O(1/m). Let p(n, m) denote the probability, that every component of the cuckoo graph is either a tree or unicyclic, after the insertion of n edges. So, we ﬁnally obtain

p(n, m) = #G^◦_m,m,₍₁₋_ε)m This step completes the proof of Theorem 4.2. Figure 4.1 depicts the graph of h(ε) = (2ε² −5ε+ 5)(1−ε)³/(12(2−ε)²ε³). The series expansion of the function h(ε) with We want to note that it is also possible to obtain a slightly more precise asymptotic expansion for

where ˜h(ε) is again explicit. This can be done by reﬁning the calculations related to Lemma 3.2.

For example, we can apply these expansions in order to obtain asymptotic represen-tations for the probability q(n+ 1, m) that the insertion of the n+ 1-st edge creates a bicyclic component, conditioned on the property, that the ﬁrstninsertions did not create such a component.

Lemma 4.6. The probability that the insertion of then+1-st inserted key forces a rehash is given by

4.4 Asymmetric cuckoo hashing

As mentioned in Chapter 1, asymmetric cuckoo hashing uses tables of diﬀerent size. To be precise, we choose the tables in such a way, that the ﬁrst table holds more memory cells than the second one. Thus, we expect that the number of keys actually stored in the ﬁrst table increases, what leads to improved search and insertion performance. In this section, we adopt the previous analysis such that it covers the asymmetric variant too.

During the analysis, we make use of the factor of asymmetryc. This constant determines the size of both hash tables, which holdm1=m(1+c)respectivelym2= 2m−m1 cells.

Thus, the equationc= 0 corresponds to the standard and hence symmetric algorithm.

Theorem 4.3. Suppose that c ∈ [0,1) and ε ∈ (1−√

1−c²,1) are ﬁxed. Then, the probability that an asymmetric cuckoo hash ofn=(1−ε)m data points into two tables of size m1=m(1 +c) respectively m2 = 2m−m1 succeeds, (that is, the corresponding cuckoo graph contains no complex component,) is equal to

1−(1−ε)³(10−2ε³+ 9ε²−3c²ε²+ 9εc²−15ε+ 2c⁴−10c²) See also Kutzelnigg [2008] for a sketch of the proof of this theorem.

Proof of Theorem 4.3

Since Corollary 4.1 already covers cuckoo graphs with a diﬀerent number of nodes of each type, the only diﬀerence to the proof of Theorem 4.2 is the application of the saddle point method. Now, our starting point is the generalised equation

#G^◦_m₁_,m₂_,n= −m1!m2!n! According to Theorem 3.2, the saddle point is determined by the system consisting of the equations

4 Sparse random graphs

Using Lemma 4.3, the system becomes to m₁

m₁+m₂−n = t₁(x₀, y₀)

˜t(x₀, y₀) and m₂

m₁+m₂−n = t₂(x₀, y₀)

˜t(x₀, y₀). (4.64) Further, with help of Lemma 4.2, we obtain the following equations:

m1+m2−n = t1(x0, y0)

t1(x0, y0) +t2(x0, y0)−t1(x0, y0)t2(x0, y0), m₂

m₁+m₂−n = t₂(x₀, y₀)

t₁(x₀, y₀) +t₂(x₀, y₀)−t₁(x₀, y₀)t₂(x₀, y₀). (4.65) Solving this system for t₁(x₀, y₀) and t₂(x₀, y₀) exhibits the solution

t₁(x₀, y₀) = n m2

and t₂(x₀, y₀) = n m1

. (4.66)

Finally, it turns out that the saddle point is given by x₀ =t₁(x₀, y₀)e⁻^t²^(x⁰^,y⁰⁾ = n

m₂e⁻^mⁿ1 and y₀=t₂(x₀, y₀)e⁻^t¹^(x⁰^,y⁰⁾= n m₁e⁻^mⁿ2.

(4.67) Again, we introduce the notation

ε = 1− n

m = 1−(1−ε)m

m . (4.68)

We observe that due to the singularity of the denominator, the saddle point method is only applicable if the relation 1 > t₁(x₀, y₀)t₂(x₀, y₀) holds. Hence we obtain the inequality

1> n m₁

m₂ = n (1 +c)m

(1−c)m = (1−ε)²

1−c² , (4.69)

and ﬁnally the condition

ε >1−

1−c². (4.70)

The cummulants can be calculated in a similar way as the saddle point, but have been computed using a computer algebra system in a half-automatic way. The maple source ﬁle is included in the attached CD-Rom, see also Appendix B.

Further, we apply Theorem 3.2 using the setting f →˜t, g→ 1

√1−t₁t₂, k→m(1 +ε), m₁ →m(1 +c), and m₂ →m(1−c), to obtain an asymptotic expansion of the double integral of (4.62). In particular, we obtain that this asymptotic expansion possesses a leading coeﬃcient equal to

˜t(x₀, y₀)^2m⁻ⁿ 2π(2m−n)x^m₀ y₀^m

1−t₁(x₀, y₀)t₂(x₀, y₀)

κ₂₀κ₀₂−κ²₁₁

= (1−ε²)^m(1+ε⁾e^2m(1⁻^ε⁾(1−c)^m(1+c)(1 +c)^m(1⁻^c)√ 1 +ε 2πm(1−c²)^m(1+ε⁾(1−ε)^2m

(1−ε)(1−c²) , (4.71)

Im Dokument Random Bipartite Graphs and their Application to Cuckoo Hashing (Seite 49-57)