The asymmetric bipartite Cuckoo Graph - Random Bipartite Graphs and their Application to Cuckoo

to infer that the probability that exactly 2k nodes are contained in cycles equals 1·3·5· · ·(2k−1)

2^kk!

1−(1−ε)²(1−ε)^2k, (6.101)

in limit.

Nodes in cyclic components

If we count the number of all nodes contained in cyclic components, the generating function modiﬁes to

g^◦_v(x, y, v, w) = exp₁

v˜t(xv, yv)

1−t1(xvw, yvw)t2(xvw, yvw). (6.102) Thus we kook the nodes of trees attached to cycles into account, in contrast to the previous calculation. Thus, it is still straightforward to calculate asymptotic mean and variance. This completes the proof of Theorem 6.2.

6.4 The asymmetric bipartite Cuckoo Graph

It is even possible to adopt the results of the previous section to the case of an asymmetric bipartite graph. In particular, all results related to the cyclic structure are slightly modiﬁed only. However, the calculation of the distribution of tree components is much more complicated.

Theorem 6.3. Suppose that c ∈ [0,1) and ε∈ (1−√

1−c²,1) are ﬁxed and that n = (1−ε)m. Then a random labelled bipartite multigraph withm1 =m(1+c)respectively m₂ = 2m−m₁ vertices and n edges satisﬁes the following properties.

1. The number of unicyclic components with cycle length 2k has in limit a Poisson distribution P o(λ_k) with parameter

λ_k = 1 2k

(1−ε)² 1−c²

, (6.103)

and the number of unicyclic components has in limit a Poisson distributionP o(λ), too, with parameter

λ=−1 2log

1−(1−ε)² 1−c²

. (6.104)

2. Let T be deﬁned by the equation T =

l=0

l^k⁻^l⁻¹(k−l)^l⁻¹ x^l₀y₀^k⁻^l

l!(k−l)!. (6.105)

Similar we deﬁne T₁ and T₂ using the equations T₁=

l=0

l^k⁻^l(k−l)^l⁻¹ x^l₀y₀^k⁻^l

l!(k−l)! and T₂ =

l=0

l^k⁻^l⁻¹(k−l)^l x^l₀y^k₀⁻^l

l!(k−l)!. (6.106)

6 The Structure of the Cuckoo Graph

Hereby, (x₀, y₀) denotes the saddle point as calculated in Chapter 4 and is hence given by

x0 = 1−ε 1−cexp

−1−ε 1 +c

and y0 = 1−ε 1 +cexp

−1−ε 1−c

. (6.107)

Then, the number of tree components with k vertices possesses asymptotic mean m(1−c²)

1−ε T, (6.108)

and asymptotic variance

(T₂−T +T₁)²

(−1 +ε)³ c⁴−(−T₁+T₂)(T₂−2T+T₁)

(−1 +ε)² c³+(T₂²+T₁²+T)ε² (−1 +ε)³ c² +(−4T₁T₂−2T−3T₁²−3T₂²+ 2T T₁+ 2T T₂)ε

(−1 +ε)³ c²+ −2T²+ 2T T₂+ 2T T₁+T (−1 +ε)³ c² +(−T₁+T₂)(−T₂+ 2εT₂−2T−T₁+ 2εT₁)

(−1 +ε)² c+(−T+T₁²+T₂²)ε² (−1 +ε)³ +(−2T T₁+ 2T−T₁²+ 4T₁T₂−2T T₂−T₂²)ε

(−1 +ε)³ +T²+T₂²−T+T₁²−2T₁T₂ (−1 +ε)³

. (6.109) 3. The number of vertices contained in cycles has in limit the distribution with

char-acteristic function

φ(s) =

' 1− ⁽¹₁⁻₋^ε)_c2²

1−e^2is⁽¹₁⁻₋^ε)_c₂²

, (6.110)

and, hence, expectation is asymptotically given by (1−ε)²

2ε−ε²−c², (6.111)

and variance by

2(1−ε)²(1−c²)

(2ε−ε²−c²)² . (6.112)

4. Furthermore, the expected value of the number of nodes in unicyclic components is asymptotically given by

(1−ε)²(2−ε−c²)

(2ε−ε²−c²)² , (6.113)

and its variance by

2c⁶+ (14 + 3ε²−11ε)c⁴+ (−26 + 31ε−11ε²+ε⁴−ε³)c²+ (ε²−3ε+ 4)(2−ε)²

(2ε−ε²−c²)⁴(1−ε)⁻² .

(6.114)

6.5 Comparison and conclusion

Proof. The proof of the statements number 1., 3., and 4. is almost identical to the proof of the corresponding statements of Theorem 6.2. The only diﬀerence is that the symmetric saddle point satisfying x₀ =y₀ is replaced by the asymmetric saddle point given in the theorem. However, this does not inﬂuence the actual calculation until the functions are evaluated. Hence it is suﬃcient to plug in the modiﬁed saddle point in (6.76), (6.79), and (6.99) to obtain the claimed results.

On the other hand, the calculation concerning the number of trees of given size becomes much more complicated, although it follows the same principle. This is due to the fact that there exist no simple closed formula for ˜t_k(x₀, y₀) any longer, cf. Lemma 6.7. The calculation itself is performed using Maple, see the corresponding worksheet for further information.

6.5 Comparison and conclusion

In this section, we provide numerical results to support our analysis and compare the graph structures of the diﬀerent variations of cuckoo hashing. See Chapter 9 for details on implementation and setup. We start considering tree components because of their fundamental impact on the behaviour of the algorithms. Note that the tree structure of the bipartite cuckoo graph possesses the same limiting distribution as the usual cuckoo graph, that is related to the simpliﬁed algorithm. Furthermore, we notice that the num-ber of isolated nodes increases as the asymmetry increases. Thus, we expect a better performance of unsuccessful search operations because of the asymmetry.

Tables 6.1 to 6.5 display the average number of trees of size one (i.e. isolated nodes), two, and ﬁve counted during 10⁵ experiments. Each of this tables provides data for one ﬁxed load factor (respectively ε). Further, we consider several variations of cuckoo hashing, including the standard and simpliﬁed algorithm, as well as some asymmetric versions. Recall that higher asymmetry leads to a lower maximum load factor (cf. Chap-ter 4), hence some small values for ε are invalid for some asymmetric data structures.

From the data given in this tables, we see that our asymptotic results are good approxi-mations for all investigated settings.

Furthermore, these tables provide the numerically obtained average maximum tree component. This number is of interest, because the size of the largest tree component it is a natural bound for the maximum number of steps necessary to insert a key into a tree component. We notice that this number increases asεdecreases. The results of our experiments lead to the conjecture, that this parameter possesses again the same asymp-totic behaviour for both standard and simpliﬁed cuckoo hashing. Finally, we observe that asymmetry leads to a lager maximum tree component.

Next, we draw our attention on the structure of cyclic components. Note that the corresponding results of Theorem 6.1 and 6.2 are in some sense related, but not identical.

Table 6.6 provides numerical data for the number of nodes in cyclic components and the number of cycles. Our experiments, using settingsm= 5·10³ tom= 5·10⁵, show that the size of the data structure does not have signiﬁcant inﬂuence on this parameters. Because of this, we do not provide data for diﬀerent table sizes. From the results presented in the table, we see again that the asymptotic results obtained in this chapter provide suitable estimates. We notice that asymmetry leads to an increased number of cycles and nodes in cyclic components. Furthermore, both parameters are higher if we consider the usual

6 The Structure of the Cuckoo Graph

cuckoo graph instead of the symmetric bipartite version.

We do not provide further numerical results that would require more information such as the length of the cycle or the number of nodes contained in the cycle itself. This is caused by the fact that this parameters are not of high relevance for cuckoo hashing, and that it would require a diﬀerent implementation of our software, see Chapter 9. However, we are again interested in the the average size of the maximum cyclic component, because the number of steps that are required to perform an insertion into a cyclic component is bounded by twice the size of the maximum cyclic component. It turns out, that this number increases with the load factor, but usually the size of the maximum tree component is dominant, except is we consider small tables possessing a high load.

6.5 Comparison and conclusion

isolatednodestreeswith2nodestreeswith5nodesaverage ε=0.4msamplerel.samplerel.samplerel.samplerel.samplerel.samplerel.max. meanerrorvar.errormeanerrorvar.errormeanerrorvar.errortree 500054880.005%676-1.114%904-0.031%768-0.755%67-0.071%61-0.307%26 standard10000109760.004%1339-0.044%1808-0.025%1528-0.250%1340.019%1220.376%30 cuckoo50000548810.001%66810.139%9036-0.004%76170.048%6720.009%614-0.426%39 hashing1000001097620.000%133410.293%18073-0.006%152400.008%13440.005%1233-0.756%43 5000005488100.000%665490.526%90359-0.001%761540.068%67210.001%61000.305%54 500054980.009%6640.343%893-0.053%7480.487%67-0.078%62-0.453%27 asymm.10000109960.004%1334-0.176%1786-0.016%1514-0.701%1350.003%123-0.528%30 cuckoo50000549800.001%6661-0.043%8927-0.001%7524-0.062%673-0.007%6140.004%40 hashing1000001099610.000%132390.584%17854-0.003%150380.009%1347-0.001%12200.587%44 c=0.15000005498040.000%662890.443%89271-0.002%75231-0.048%6734-0.005%61180.289%54 500055280.006%6510.793%860-0.033%7150.834%68-0.067%610.574%28 asymm.10000110570.004%1317-0.388%1719-0.016%1443-0.097%135-0.050%124-0.480%32 cuckoo50000552860.001%6588-0.455%8596-0.004%7218-0.107%677-0.007%6160.461%41 hashing1000001105720.001%13187-0.538%17191-0.003%14496-0.524%1353-0.009%12360.075%46 c=0.25000005528640.000%65793-0.322%85952-0.000%719410.220%67660.004%6215-0.487%57 500055820.007%6330.675%803-0.033%6620.785%68-0.032%63-0.493%30 asymm.10000111640.002%12710.286%1605-0.006%13300.342%136-0.084%1250.414%34 cuckoo5000055824-0.001%6399-0.392%80240.002%6723-0.726%680-0.008%626-0.049%45 hashing1000001116460.001%127280.167%16050-0.004%13387-0.287%1361-0.003%12480.223%50 c=0.3500000558235-0.000%63900-0.244%802450.001%67180-0.653%6804-0.003%62440.136%62 500056630.004%6080.059%719-0.010%589-0.103%68-0.064%630.260%33 asymm.10000113270.000%1216-0.029%14380.011%1180-0.229%136-0.055%1260.060%38 cuckoo50000566370.001%60630.268%7190-0.002%58820.114%682-0.019%632-0.127%51 hashing1000001132740.000%12198-0.320%14379-0.000%117690.071%13630.007%1265-0.122%57 c=0.45000005663710.000%601051.139%718950.000%583120.973%68140.003%6337-0.331%72 500054880.002%6650.627%904-0.013%7590.451%67-0.008%610.424%26 simpliﬁed10000109760.002%13340.296%1807-0.012%15100.895%134-0.003%123-0.649%30 cuckoo50000548810.000%66550.529%9036-0.001%7631-0.130%672-0.013%612-0.037%39 hashing1000001097620.000%133200.446%18072-0.002%151200.799%13440.008%1229-0.386%43 5000005488110.000%66987-0.128%90360-0.001%76377-0.224%6721-0.001%6134-0.248%53 Table6.1:Numberoftreesofsizesone,two,andﬁveforε=0.4.Thetableprovidessamplemeanandsamplevarianceofthenumber oftreesobtainedoverasampleofsize105 .Additionallywegivetherelativeerrorwithrespecttotheasymptoticapproximations ofTheorem6.1.Finally,thetabledepictsthenumericallyobtainedaveragesizeofthemaximumtreecomponent.

6 The Structure of the Cuckoo Graph

isolatednodestreeswith2nodestreeswith5nodesaverageε=0.2msamplerel.samplerel.samplerel.samplerel.samplerel.samplerel.max.meanerrorvar.errormeanerrorvar.errormeanerrorvar.errortree500044930.007%8560.351%808-0.023%6950.256%78-0.061%75-0.348%67standard1000089860.003%1719-0.028%1615-0.008%1395-0.101%156-0.017%1500.252%81cuckoo50000449330.001%85570.398%8076-0.003%69330.494%781-0.002%755-0.582%118hashing10000089866-0.001%170180.960%161520.001%138730.442%1563-0.008%14930.568%1355000004493280.000%85933-0.021%80759-0.000%696480.033%7815-0.002%75070.029%178500045070.007%8500.376%795-0.004%684-0.096%78-0.108%75-0.000%69asymm.1000090150.006%1710-0.246%1590-0.018%13610.431%156-0.012%150-0.249%83cuckoo50000450770.001%8576-0.557%7946-0.001%6847-0.143%7780.016%755-0.777%121hashing100000901550.000%170190.217%158930.001%135371.000%1557-0.006%14970.013%139c=0.15000004507730.000%85727-0.522%79465-0.002%679070.674%7786-0.001%74570.408%184500045520.011%834-0.017%756-0.030%646-0.173%77-0.073%740.158%74asymm.1000091040.001%1676-0.570%15110.002%1289-0.011%154-0.046%1480.019%90cuckoo50000455200.001%83090.296%7555-0.001%6476-0.473%7690.008%7370.541%133hashing10000091041-0.000%166530.088%151100.001%128310.457%1538-0.012%14780.340%154c=0.25000004552010.000%831630.209%75551-0.001%644250.042%7690-0.000%7473-0.790%205500046280.011%7920.956%690-0.034%5790.120%75-0.089%720.202%85asymm.1000092570.007%15880.674%1379-0.023%11560.208%150-0.104%146-0.601%106cuckoo5000046290-0.000%79180.967%68940.004%57560.646%7500.024%727-0.136%160hashing10000092580-0.000%15997-0.042%137880.002%11600-0.124%1500-0.004%14381.010%186c=0.3500000462899-0.000%80093-0.177%689390.000%576710.448%7500-0.004%72450.219%252500047430.007%750-0.071%595-0.010%4841.066%72-0.050%70-0.050%107asymm.1000094870.004%14990.003%1191-0.011%9730.521%143-0.016%140-0.255%137cuckoo50000474380.001%74460.645%5955-0.008%48421.009%717-0.018%6950.407%219hashing100000948760.001%148191.138%11909-0.004%97110.725%1434-0.004%13781.319%259c=0.4500000474382-0.000%748160.174%595430.001%49020-0.222%71690.003%69620.262%364500044930.006%8550.528%808-0.020%6970.014%78-0.059%750.660%67simpliﬁed1000089860.001%17160.159%1615-0.003%1394-0.050%156-0.011%1500.145%81cuckoo50000449320.001%8594-0.030%8076-0.003%69140.759%782-0.019%752-0.114%117hashing100000898640.002%171490.199%16152-0.003%139260.060%1563-0.020%14920.654%1355000004493270.000%86128-0.247%80760-0.002%692410.617%7815-0.001%7534-0.324%178

Table6.2:Numberoftreesofsizesone,two,andﬁveforε=0.2.Thetableprovidessamplemeanandsamplevarianceofthenumberoftreesobtainedoverasampleofsize10 5.AdditionallywegivetherelativeerrorwithrespecttotheasymptoticapproximationsofTheorem6.1.Finally,thetabledepictsthenumericallyobtainedaveragesizeofthemaximumtreecomponent.

6.5 Comparison and conclusion

isolatednodestreeswith2nodestreeswith5nodesaverage ε=0.1msamplerel.samplerel.samplerel.samplerel.samplerel.samplerel.max. meanerrorvar.errormeanerrorvar.errormeanerrorvar.errortree 500040650.009%9160.925%744-0.015%6420.205%76-0.113%75-0.461%132 standard1000081310.008%18500.021%1488-0.013%1290-0.272%152-0.074%149-0.690%173 cuckoo50000406570.000%9279-0.313%7439-0.004%64060.372%7590.009%7410.073%291 hashing100000813140.000%18503-0.015%14877-0.001%12875-0.114%1519-0.015%1491-0.505%350 5000004065680.000%92629-0.138%74385-0.001%642910.018%75920.004%7466-0.678%507 500040820.009%9130.455%731-0.022%632-0.338%75-0.066%74-0.085%137 asymm.1000081640.000%1841-0.350%14610.015%1262-0.207%151-0.007%148-0.667%180 cuckoo50000408220.002%91540.208%7305-0.006%62790.260%755-0.017%738-0.062%306 hashing100000816440.001%18393-0.252%14609-0.002%125230.534%15090.002%1476-0.074%371 c=0.1500000408224-0.000%91753-0.019%730440.001%629360.024%7545-0.002%7386-0.160%541 500041320.011%895-0.124%690-0.011%589-0.014%74-0.023%720.466%152 asymm.1000082650.006%17810.424%1380-0.015%1181-0.178%148-0.033%145-0.053%205 cuckoo50000413280.001%89350.064%6901-0.004%5970-1.324%739-0.002%7230.158%365 hashing100000826560.001%177140.936%13802-0.000%116471.171%1479-0.007%1449-0.124%450 c=0.25000004132800.000%888700.599%690090.000%584460.809%7393-0.003%71421.323%675 500042190.020%8530.135%623-0.031%5230.012%71-0.060%700.024%185 asymm.1000084400.007%17080.034%1245-0.006%1049-0.253%142-0.046%140-0.010%260 cuckoo50000422020.002%84980.528%6226-0.003%52060.492%711-0.016%699-0.204%519 hashing100000844060.000%17104-0.108%12451-0.003%104410.204%1422-0.010%13930.155%663 c=0.35000004220290.001%852730.182%62255-0.002%52370-0.111%71080.001%68991.124%1081 500043480.037%7831.793%528-0.021%435-0.223%67-0.173%66-0.322%248 asymm.1000086980.020%15800.889%1056-0.011%871-0.378%133-0.117%1310.235%378 cuckoo50000434980.003%79360.459%5280-0.003%4368-0.701%665-0.024%6510.663%968 hashing100000869970.001%15969-0.144%105580.003%8718-0.491%1331-0.008%13100.032%1420 c=0.45000004349890.000%79799-0.088%52795-0.002%43558-0.419%6653-0.002%65340.280%3165 500040650.009%9180.712%744-0.013%6410.335%76-0.118%740.146%132 simpliﬁed100008131-0.000%18370.729%14880.001%12810.362%152-0.024%149-0.488%172 cuckoo50000406570.000%9333-0.891%7439-0.001%6482-0.803%759-0.011%7390.324%290 hashing100000813140.000%18580-0.429%148770.002%12910-0.386%1519-0.016%14820.090%350 500000406571-0.000%93055-0.598%743840.000%64603-0.468%75920.001%74150.022%507 Table6.3:Numberoftreesofsizesone,two,andﬁveforε=0.1.Thetableprovidessamplemeanandsamplevarianceofthenumber oftreesobtainedoverasampleofsize105 .Additionallywegivetherelativeerrorwithrespecttotheasymptoticapproximations ofTheorem6.1.Finally,thetabledepictsthenumericallyobtainedaveragesizeofthemaximumtreecomponent.

6 The Structure of the Cuckoo Graph

isolatednodestreeswith2nodestreeswith5nodesaverageε=0.06msamplerel.samplerel.samplerel.samplerel.samplerel.samplerel.max.meanerrorvar.errormeanerrorvar.errormeanerrorvar.errortree500039060.016%9351.160%717-0.016%6150.804%74-0.120%73-0.113%182standard1000078120.010%1897-0.245%1434-0.003%12390.089%148-0.088%1450.292%254cuckoo50000390620.001%94410.208%7172-0.002%6216-0.251%740-0.008%7220.582%492hashing100000781250.000%189200.005%143430.001%123860.126%14790.001%14400.805%626500000390628-0.000%94640-0.039%71718-0.001%619100.159%73960.008%72250.449%999500039230.017%9340.455%704-0.003%607-0.099%74-0.119%711.180%189asymm.1000078470.005%18590.919%14070.003%12070.524%147-0.049%145-0.429%266cuckoo50000392360.001%93330.488%7037-0.000%60220.708%734-0.023%7170.509%530hashing10000078473-0.000%18799-0.218%140740.001%121200.087%14690.006%1444-0.183%681c=0.15000003923600.000%934650.349%703680.001%604020.414%7344-0.008%7210-0.019%1107500039760.026%9041.047%663-0.042%568-0.252%72-0.103%700.832%212asymm.1000079520.011%18160.580%1326-0.010%11310.170%144-0.077%1410.060%308cuckoo50000397650.001%9198-0.712%66310.000%5723-1.071%7170.000%7040.190%671hashing100000795300.001%18310-0.235%13263-0.003%11350-0.228%1435-0.014%1414-0.273%895c=0.2500000397656-0.000%910330.329%663130.001%565210.178%7173-0.007%70050.651%1575500040660.039%8571.640%596-0.015%5000.126%69-0.147%68-0.091%259asymm.1000081340.021%17300.741%11910.007%1011-1.048%137-0.063%1340.658%398cuckoo50000406780.006%8757-0.466%5955-0.002%5028-0.478%686-0.020%6700.794%1038hashing100000813590.002%173360.564%119090.003%99920.162%1372-0.025%13460.426%1549c=0.35000004068010.000%87321-0.174%59550-0.001%500130.056%6860-0.002%67350.328%3606500039060.014%950-0.379%717-0.006%623-0.444%74-0.066%73-0.632%181simpliﬁed1000078120.007%18880.202%1434-0.009%1245-0.419%148-0.065%1450.140%254cuckoo5000039063-0.000%93750.907%71720.003%61880.203%7400.012%727-0.212%492hashing100000781250.001%187840.724%14344-0.001%12424-0.182%1479-0.006%1453-0.097%627500000390628-0.000%938260.821%71718-0.001%618550.248%73970.002%72020.772%1002

Table6.4:Numberoftreesofsizesone,two,andﬁveforε=0.06.Thetableprovidessamplemeanandsamplevarianceofthenumberoftreesobtainedoverasampleofsize10 5.AdditionallywegivetherelativeerrorwithrespecttotheasymptoticapproximationsofTheorem6.1.Finally,thetabledepictsthenumericallyobtainedaveragesizeofthemaximumtreecomponent.

6.5 Comparison and conclusion

isolatednodestreeswith2nodestreeswith5nodesaverage ε=0.04msamplerel.samplerel.samplerel.samplerel.samplerel.samplerel.max. meanerrorvar.errormeanerrorvar.errormeanerrorvar.errortree 500038280.028%9550.030%704-0.025%6080.125%73-0.099%72-0.011%216 standard1000076570.007%18980.665%14070.008%1226-0.768%146-0.060%1420.844%313 cuckoo50000382890.002%9591-0.383%7037-0.001%6120-0.574%728-0.012%723-1.003%681 hashing100000765780.001%190380.367%140740.000%121380.262%1456-0.003%14300.090%907 5000003828920.000%952160.343%70371-0.000%604410.667%7281-0.001%7214-0.793%1602 500038460.024%9430.432%690-0.011%5940.139%72-0.105%72-0.807%223 asymm.1000076920.015%18900.222%1381-0.014%11840.514%145-0.044%143-0.386%327 cuckoo50000384660.001%9491-0.214%6902-0.003%59390.164%723-0.022%712-0.254%737 hashing100000769320.002%18974-0.166%13804-0.003%11953-0.457%1445-0.009%14140.507%999 c=0.1500000384665-0.000%95304-0.626%690190.000%59716-0.376%72240.003%70930.166%1831 500038990.037%9121.117%650-0.049%5520.407%71-0.201%70-0.600%250 asymm.1000078000.023%18191.353%1299-0.018%11020.700%141-0.127%139-0.031%381 cuckoo50000390060.004%92190.004%64960.001%5548-0.016%705-0.020%694-0.070%951 hashing100000780130.002%183930.248%12993-0.004%110770.144%14090.011%1392-0.320%1375 c=0.25000003900750.000%916380.604%649590.003%549120.999%7047-0.000%7004-0.956%2955 500038280.021%9500.553%704-0.020%612-0.600%73-0.152%710.460%215 simpliﬁed1000076570.010%18970.707%14070.003%12150.198%146-0.081%1430.279%311 cuckoo50000382890.001%9583-0.304%7037-0.000%6094-0.152%7280.001%717-0.176%677 hashing10000076579-0.000%19141-0.168%140740.005%12181-0.094%14560.014%1436-0.327%904 500000382894-0.000%96114-0.597%703710.001%606050.397%72810.004%7172-0.204%1600 Table6.5:Numberoftreesofsizesone,two,andﬁveforε=0.04.Thetableprovidessamplemeanandsamplevarianceofthenumber oftreesobtainedoverasampleofsize105 .Additionallywegivetherelativeerrorwithrespecttotheasymptoticapproximations ofTheorem6.1.Finally,thetabledepictsthenumericallyobtainedaveragesizeofthemaximumtreecomponent.

6 The Structure of the Cuckoo Graph

nodesincycliccomponentsnumberofcyclesaveragesamplerel.samplerel.samplerel.samplerel.max.meanerrorvar.errormeanerrorvar.errorcycliccomp.

ε= 0.4

standardcuckoohashing1.41-0.032%16.5-1.672%0.2230.109%0.224-0.204%1.32asymmetricc.h.,c=0.11.440.064%17.0-0.376%0.2260.160%0.226-0.093%1.35asymmetricc.h.,c=0.21.560.301%19.7-1.385%0.2340.537%0.2340.312%1.46asymmetricc.h.,c=0.31.80-0.332%24.41.211%0.254-1.075%0.253-0.586%1.68asymmetricc.h.,c=0.42.230.802%35.71.858%0.2790.131%0.2800.033%2.08simpliﬁedcuckoohashing1.851.316%18.32.641%0.4560.438%0.4550.656%1.67

ε= 0.2

standardcuckoohashing8.860.326%420.90.898%0.5090.381%0.5070.732%7.99asymmetricc.h.,c=0.19.320.329%475.3-1.828%0.5170.605%0.522-0.331%8.39asymmetricc.h.,c=0.211.01-0.050%619.71.941%0.5490.043%0.5480.311%9.89asymmetricc.h.,c=0.315.000.052%1132.70.135%0.608-0.041%0.610-0.465%13.37asymmetricc.h.,c=0.426.31-0.283%3324.6-1.320%0.7170.053%0.7160.152%23.30simpliﬁedcuckoohashing9.950.507%445.31.048%0.8010.496%0.8020.377%8.75

ε= 0.1

standardcuckoohashing42.360.649%8191.81.593%0.8300.079%0.8270.415%37.36asymmetricc.h.,c=0.146.751.056%9775.53.762%0.857-0.511%0.856-0.453%41.02asymmetricc.h.,c=0.265.661.940%19211.23.892%0.9240.394%0.9171.234%57.52asymmetricc.h.,c=0.3143.881.865%86487.56.546%1.106-0.144%1.1000.409%125.48asymmetricc.h.,c=0.41059.1332.367%3394840.066.167%1.6103.343%1.5755.465%913.83simpliﬁedcuckoohashing44.810.422%8375.52.041%1.1490.241%1.1500.137%38.89

ε= 0.06

standardcuckoohashing123.542.350%64162.57.368%1.0690.625%1.0660.876%107.83asymmetricc.h.,c=0.1146.522.734%90427.47.306%1.1100.426%1.1130.204%127.33asymmetricc.h.,c=0.2272.485.264%292639.016.073%1.2570.680%1.2580.561%236.17asymmetricc.h.,c=0.31353.4942.292%5116340.077.172%1.6894.558%1.6506.758%1167.11simpliﬁedcuckoohashing127.242.543%65382.47.068%1.4060.046%1.4060.041%109.82

ε= 0.04

standardcuckoohashing279.764.803%307422.015.485%1.2680.367%1.2700.269%242.42asymmetricc.h.,c=0.1357.386.960%490130.020.621%1.3300.456%1.3330.235%309.49asymmetricc.h.,c=0.2897.5125.207%2590900.056.156%1.5682.576%1.5513.660%773.91simpliﬁedcuckoohashing284.335.223%305917.016.757%1.612-0.140%1.6030.380%244.60

Table6.6:Thetableshowssamplemeanandsamplevarianceofthenumberofnodescontainedincycliccomponentsandthenumberofcycles.Weprovidenumericallyobtainedresultsusingasampleofsize10 5andgivetherelativeerrorwithrespecttotheasymptoticapproximationsofTheorem6.1.Furthermore,weprovidethenumericallyobtainedaveragesizeofthelargestcycliccomponent.Alldepictedvaluesareobtainedusingaﬁxedtablesizedeterminedbytheparameterm=5·10 5,butournumericaldataobtainedusingdiﬀerenttablesizesarealmostidentical.

Chapter 7 Construction Cost

7.1 Introduction

So far, we analysed the failure probability of the various cuckoo hash algorithms and obtained some information on the structure of the related cuckoo graph. In this chapter, we investigate the average case behaviour of insertion operations. The cost of a single insertion is thereby measured by the number of moved keys during this procedure, hence it equals one plus the number of kick-out operations. Unfortunately, the exact behaviour is very complex to describe and no exact result is known so far. Hence, we cannot give an exact result, but we are looking for a suitable upper bound.

7.2 Simpliﬁed cuckoo hashing

As usual, we start with the analysis of the simpliﬁed algorithm.

Theorem 7.1. Suppose that ε ∈(0,1) is ﬁxed. Then, an upper bound for the expected number of steps to construct a simpliﬁed cuckoo hash data structure possessing a table of size 2m with n=(1−ε)m keys is

min

C,−logε 1−ε

n+O(1), (7.1)

where the constant implied by O(1) depends on ε. By performing numerical calculations it turns out that this bound holds for C= 4.

Proof of Theorem 7.1

Denote the failure probability of a simpliﬁed cuckoo hashing attempt by p. Clearly, the expected number of attempts to construct the data structure is hence given by 1/(1−p).

We have already shown that the equation p = O(1/m) holds. This implies that the expected number of rehashes to build the hash table, what we denote byN, is inO(1/m).

Furthermore, the additional complexity of a failed attempt isO(n), because we detect an endless loop in the insertion procedure after at most 2nsteps. Therefore, it is only left to show the claimed bound for the situation where cuckoo hashing succeeds,i.e. the cuckoo

7 Construction Cost

graph contains only trees and cyclic components. Using this result, we conclude thatC_i, the number of steps required during thei-th unsuccessful construction, is inO(m) hence the equation

i=1

Ci =ENEC1 =O(1) (7.2)

holds, cf. Devroye and Morin [2003].

Consider the graph just before the insertion of the l-th edge (resp. key) and denote the node addressed by the ﬁrst hash function by x_l and the second by y_l. Recall that a new key is always placed inx_lby the standard insertion procedure. The number of steps needed to perform this insertion is fully determined by the component containingx_l, and not aﬀected by the component containingy_l, unless x_l belongs to a cyclic component or both nodes belong to the same component (including the situation wherex_l=y_l holds).

In the latter case, the new edge creates a cycle. Hence we end up with a component containing a cycle anyway. But this is a very rare event, because as we know from Theorem 6.1, that the expected number of nodes contained in cyclic components is ﬁnite.

More precisely, Lemma 7.1 shows that the expected cost caused by cyclic components is constant.

Lemma 7.1. The expected number of all steps performed while inserting elements in cyclic components is constant.

Proof. Letη denote the random variable that determines the number of nodes contained in cyclic components. Assume that η equals k. Then, the insertion of each of the k corresponding keys takes at most 2ksteps, because during an insertion, no node is visited more than twice (cf. Figure 1.6) and each cyclic component holds at mostk keys. The total number of expected steps is therefore bounded by

2k²P(η =k) = 2Vη+ (Eη)² =O(1), (7.3)

what is constant for all ﬁxed εbecause of the results from Theorem 6.1.

The cuckoo graph contains 2m−l+1 trees before the insertion of thel-th node. Denote the number of steps needed for the insertion in tree T by ν(T) and the number of its nodes by m(T). The probability, that a tree possessing m(T) nodes is chosen equals m(T)/(2m). Using ˜t(x), the generating function of unrooted trees deﬁned in (4.5), we obtain the generating function H(x)/(2m) that counts the maximum insertion cost of a tree component, weighted by the probability that this component is chosen as follows:

˜t(x) :=

x^m(T⁾ m(T)! =

˜t_kx^k

k!, (7.4)

H(x) :=

m(T)ν(T)x^m(T⁾

m(T)!. (7.5)

7.2 Simpliﬁed cuckoo hashing

Next, we extend this to a set consisting ofk such trees:

1 Neglecting costs caused by cyclic components, we obtain a result indicating the average complexity for the insertion of the l−th key, cf. (4.9). We proceed similar to former calculations, using the saddle point method to extract the coeﬃcient. A slight diﬀerence is the new occurring function H(x), but it behaves as a constant factor, so we only need to know H(x0), what we consider as follows.

We give a ﬁrst estimate using the tree size m(T) as upper bound of the parameter ν(T), and obtain for realx the inequality

H(x)≤

hold. Altogether, this gives us C(l) = (2m−l+ 1)H(x₀) and further summation over alll leads to

1 as m goes to inﬁnity. Together with Lemma 7.1, this completes the proof of the ﬁrst bound of Theorem 7.2.

Next, we try to obtain a better bound using a more suitable estimate for ν(T). Note that the selection of the node x_l in a tree component, transforms this component into a rooted tree. The insertion procedure starts at the root and the number of required steps is bounded by the height of this tree. We introduce the denotations

• t_n for the number of rooted trees withnnodes,

• t^[k]n for the number of rooted trees withn nodes and height less or equalk,

7 Construction Cost

• and h_n for the sum of the heights of all rooted trees with nnodes.

Moreover, we introduce the corresponding generating functions:

t(x) =

Due to Flajolet and Odlyzko [1982], we know that

t(x)−t^[k](x)∼2δ(x) (1−δ(x))ⁿ

in a Δ-domain around its singularitye⁻¹,cf. Flajolet and Sedgewick [2008].

Now, we use the asymptotic approximation ofh(x) as upper bound ofH(x) and obtain similar to (7.9) the upper bound

C(l)≤ − m

for the construction time. This is of course valid only near the singularity, that is for 1−(l−1)/mclose to zero. Nevertheless, this result is suitable to prove the second bound stated in Theorem 7.2. This is because of the fact that the integral

1/2 0

−log 2 (1−(1−a)e^a)

1−a da (7.15)

is obviously bounded for ε→ 0, in contrary to the corresponding integral of (7.10). See Figure 7.1 for a plot of the two bounds. It is easy to see, that the second bound is not valid if εis not close to 0. Finally, we compute a numerical value for the constant C by combining this two bounds.

To be on the safe side, we may for instance set x equal to 0.025 and obtain a bound approximately equal to 4.

7.3 Standard cuckoo hashing

It is straightforward to generalise the ideas that lead us to the previous result. Surpris-ingly, it turns out that the same bound holds for the classical cuckoo hash algorithm too.

7.3 Standard cuckoo hashing

0 3 6 9 12

−3

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 ε steps

Figure 7.1: Bounds for the expected number of steps per insertion, depending on the momentary value of ε. The continuous line corresponds to the bound based on the size of a tree component. Further, the dashed line indicates the bound obtained using the diameter as estimator, that is accurate for ε close to 0 only.

Theorem 7.2. Suppose that ε ∈(0,1) is ﬁxed. Then, an upper bound for the expected number of steps to construct a cuckoo hash data structure possessing two tables of size m with n=(1−ε)m keys is

min

C,−logε 1−ε

n+O(1), (7.17)

where the constant implied by O(1) depends on ε. By performing numerical calculations it turns out that this bound holds for C= 4.

Proof of Theorem 7.2

Note that all initial considerations stated in the proof of the simpliﬁed version can easily be adopted. For instance, the assertion of Lemma 7.1 holds, however the proof is now based on Theorem 6.2. The further proof continues in a similar way, though we have to use bivariate generating functions once more.

Consider the bipartite cuckoo graph before the insertion of the l-st node. At this moment, it contains exactly 2m−l+ 1 tree components. Denote the number of steps needed for the insertion in tree T by ν(T) and its number of nodes of ﬁrst resp. second type by m1(T) resp. m2(T). The probability, that a tree possessing m1 nodes of ﬁrst kind is chosen equals m₁/m. Using the bivariate generating function ˜t(x, y) of unrooted bipartite trees, we obtain the generating function H(x, y)/m counting the maximum insertion cost of a tree component times its selection probability as follows:

˜t(x, y) :=

x^m¹^(T⁾y^m²^(T⁾ m1(T)!m2(T)! =

m₁,m₂

˜tm₁,m₂

x^m¹y^m²

m1!m2!, (7.18) H(x, y) :=

m₁(T)ν(T) x^m¹^(T⁾y^m²^(T⁾

m₁(T)!m₂(T)!. (7.19)

7 Construction Cost

Again, we extend this to a set consisting of k such trees:

1 By neglecting costs caused by cyclic components, we get a result indicating the average complexity for the insertion of thel−th key. As usual we proceed, using the saddle point method to extract the coeﬃcient. The function H(x, y) behaves as a constant factor, so we need to calculate respectively estimate H(x₀, x₀) only, what can be done as follows.

The ﬁrst estimate is again based on using the size of the tree component, that is now equal to m₁(T) +m₂(T), as upper bound of the parameterν(T). Hence we infer for real valued xand y the inequality

H(x, y)≤

m₁(T) (m₁(T) +m₂(T)) x^m¹^(T⁾y^m²^(T⁾

m₁(T)!m₂(T)!. (7.22) Further, note that the relation

m₁(T) x^m¹^(T⁾y^m²^(T⁾

m₁(T)!m₂(T)! =x ∂

∂x˜t(x, y) =t₁(x, y) (7.23) holds. Recall that t₁(x, x) equals t(x), so we establish

H(x, x)≤x ∂ Alto-gether, this provides us the bound

C(l) = (2m−l+ 1)H(x₀, x₀) and hence we obtain the same ﬁrst bound as given for the simpliﬁed algorithm, see (7.10).

Again, we try to obtain a better result using a more suitable estimate for ν(T). The selection of the node x_l belonging to a tree component, transforms this component into a rooted bipartite tree. The insertion procedure starts at the root and the number of required steps is bounded by the height of this tree. Further, recall that we are only

7.4 Asymmetric cuckoo hashing

interested in the special case x = y. Because of this, we can once more consider usual (non-bipartite) rooted trees instead.

Hence we use the asymptotic approximation of h(x) as upper bound of H(x, x) and obtain the upper bound

Note that this second bound also equals the bound obtained analysing the simpliﬁed algorithm. Thus the remaining details of this proof can be shown as before.

7.4 Asymmetric cuckoo hashing

Finally, we consider the generalised asymmetric data structure. We are still able to provide an estimate based on the component sizes, similar to the previous results. On the other hand, the derivation of the second bound is strongly based on the symmetry.

Hence, it is not possible to adopt this result without further knowledge on the height of rooted bipartite trees.

Theorem 7.3. Suppose that c ∈ [0,1) and ε ∈ (1−√

1−c²,1) are ﬁxed. Then, an upper bound for the expected number of steps to construct an asymmetric cuckoo hash data structure possessing two tables of size m1 =m(1 +c) respectivelym2 = 2m−m1 where the constant implied by O(1) depend on ε.

Proof. As mentioned above, the proof is related to the proofs of Theorem 7.1 and 7.2.

The expected number of steps performed in cyclic components is still in O(1) due to Theorem 6.3. Further note that (7.22), the generating function H(x, y) counting the maximum insertion cost of a tree component, still satisﬁes the inequality

H(x, y)≤

m₁(T) (m₁(T) +m₂(T)) x^m¹^(T⁾y^m²^(T⁾

m₁(T)!m₂(T)!. (7.28) Recall that multiplying the generating function bym₁(T) corresponds to marking a node of ﬁrst kind. Hence we obtain

H(x, y)≤x ∂

7 Construction Cost

Similar to (7.9), we hence obtain the integral 1

1−ε 1

(1−c)(2 +c−a) (2−a)a−c² da

= 1−c 2(1−ε)

log 1−c²

2ε(1−ε)−c² −2(1 +c)

√1−c² artanh e−1

√1−c²

, (7.31) that completes the proof.

Im Dokument Random Bipartite Graphs and their Application to Cuckoo Hashing (Seite 87-104)