deposit_hagen
Publikationsserver der Universitätsbibliothek
Mathematik und
Informatik
Informatik-Berichte 05 – 04/1980
Ernst-Erich Doberkat
Inserting a new element into a heap
Inserting a new element into a heap
Ernst-Erich Doberkat*
Apri 1 1980
* A (proper) subset of this research has been sponsored by the I n s t i t u te o f E d u c a t i o n a 1 I n f o r m a t i c s o f t h e F o r s c h u n g s - u n d
Entwicklungszentrum fUr objektivierte Lehr- und Lernverfahren GmbH, Paderborn, FRG .
Abstract
This paper analyzes a well known algorithm which inserts a new element into a heap in such a way that the heap condition is not violated. The analysis is done with respect to the average number of comparisons which are needed to find the correct position for the
new element. In contrast to Porter and Simon, who have analyzed
this algorithm, too (IEEE Trans. Softw. Eng., vol. SE-1,1975,292-298) the present paper gives an explicit formula for the quantities in question (in terms of the binary representation of the number of elements involved). This is done under the assumption that all possible heaps are equally probable and the element tobe inserted is equidistributed, too. Since the components of the heap are
assumed tobe real numbers, the assumption on the distribution must be formulated in terms of Probability Theory, rather than in a
combinatorial manner, and consequently methods of Analysis are used rather heavily. Moreover it is shown that the usual finite discrete model for analyzing the average performance of sorting algorithms can be embedded into the model via a functor which preserves
probabilities, and expectations.
Key words and phrases: Heapsort, priority queues, sorting, analysis of algorithms.
CR categories: 5.25, 5.31, 4.34.
1. Introduction
In this paper the expected performance of the following Insertion Al gori thm i s cons i de red.
Input:
Output:
Method:
1 r ~, 7J [ '11 7
a rzec,p C.L--' , • • • ,a ivj cf 1·eo,l number-s„ a r-eal x
a heap a[l], ... ,a[N+l] with ~ inserted in its correct pcsition
1. {InitiaZization}
j := N+1; -:, := j div 2;
2.{Search for the correct position}
while (i>o) and (x~a[i]) do begin a[j] := a[i];
1., • - i div 2 end;
a[d] :=x.
This algorithm has been suggested by Knuth ([3], Exercise 5.2.3.16, Solution, p.617) and is easily seen tobe correct by keeping track of the path the index j goes in the tree in finding the correct
position for x. An analysis of some aspects of the average performance of this algorithm has been undertaken by Porter and Simon [5]. They consiaer two probabilistic models:
Model I assumes that every heap has the same probability to occur, and that the element tobe inserted is equally likely tobe in one of the N intervals which are determined by the elements of the heap.
This assumption is justified in case the heap has been constructed with Floyd's version of Heapsort (:3:, p.154 ff) andin case i,:; 1s assumed that every permutation of -'l, . . . ,rl••· 02.s the s2me probab~litj
2
to occur, since in this case the heap construction does preserve equidistribution ([3], Theorem 5.2.3.H; [l], Theorem 2). If however the original method to construct a heap due to Williams ~7] is used, this Model is shown tobe inadequate by Porter and Simon since the latter method violates equidistribution.
Their Model II assumes consequently that the distribution of all possible heaps is generated by N-fold application of the algorithm to the heap a[l], ... ,a[i-1] and the element a[i] which is tobe inserted, l:;;i::;N, when initially all permutations of {l, ... ,N}
are equally probable.
Denote by A(N+l) the average number of levels the new element is sifted up, then Porter and Simon derive under Model I a recurrence formula for A, from which they deduce the following limiting behavior:
CO 1
l im A ( 2 L)
fu
=
zk-1 , L----co
l i m A(2L-l) = 1.
L-co
Moreover they give empiric values for A under Model TI.
In contrast to this, the present paper derives an explicit formula for A(N+l) in terms of the binary representation of N+l under a slight generalization of Model I. It is assumed that the heap in question and the element tobe inserted are real numbers (for simplicity taken from the open interval ]0,1[) which are chosen stochastic independently according to an equidistribution on an appropriately given probability space. Hence the model delt with in this notemakes use of measure theoretic (rather than combinatorial) arguments.Moreover the method of working with random variables and their distributions which has been developed in the analysis of
heapsort [1] turns out tobe useful in the situation considered here, too. By a rather straightforward construction it is shown that Model I sketched above is contained in the assumptions here, and that consequently the results obtained here generalize those
i n [ 5 ] .
This paper is organized as follows: in Section 2 some notations are fixed and some auxiliar results from Mathematical Analysis are cited for the reader's convenience. In Section 3 the average number of comparisons, i.e. the number of times the comparison (x>a[i]) is executed, in order to control the while-loop, is investigated and the expectation of this random variable is computed. From this the number A(N+l) mentioned above is obtained. The formulae deduced for the average number of comparisons are rather compact. Hence in Section 4 an explicit representation i s derived i n case N+l i s a cornerstone, i . e. N+l equals a power of " L. ( i n t h i s case N+l i s the leftmost leaf on 1 e.v e 1 0 i n the tree representing {1, . . . ,N+l}) or N-+ 1 = 2 m + l - 1 for some m (then N+l i s the rightmost leaf on level 0) . This makes use of a finite version of Euler's Partition Formula
([4],p.10).
2. Some Preparations
Let a be a K-dimensional real array, then a is said tobe a hcaµ iff a :
Li
JJ < a [ i Jh o 1 d s fo r 2 ::; i ::; K. Representing the finite set o, ...
,n
as a binary tree with root 1 such thatLi
J i s the father of i , and assigning a [ i ] as a label to node i ' then t h i s labeling i s a heap iff no 0 ff- spring has a smaller label, than its father; ·in particular a:lJ4
equals min a[i ]. Now let (dk ... d
0 )
2 be the binary representation of lsi sK
K, then the node
is said tobe the sµecial node on level j ([2],p.154). The special nodes will play a central r6le in the following analysis; they are geometrically distinguished from the other nodes in the tree since {T(K,j); 0 sj sk} forms the path from the root 1 to K. The follov,ing Figure 1 (borrowed from [3], p.154) shows the shape of a heap, each node is labeled in binary notation corresponding to its subscript in the heap.
~
1111✓
(1111 100 1
11111 1111
- Fi g. 1 -
Denote by g(K,i) the number of nodes which are in the subtree of
{l, . . . ,K} that is rooted at i (thus e.g. g(K,l)=K, g(K,j)=l if
. K )
J > LzJ ; t h e n i t i s w e l l kno\vn that
h o l d s ( [ 3 ] , p . 1 5 4 , E. q . ( 1 4 ) ) . F i n a l l y , d e n o t e b y ;._ K t h e p r o d u c t o f a l 7
subtree sizes, i.e.
K ,, ·. = -11-1
AK i = 1
g(K,i).
Now fix a positive integer N as the size of the given heap, and let N+l be the size of the heap tobe constructed by inserting the new element; N+l has the binary representation (cm ... c
0 )
2 which is refered to in the sequel. It is assumed that the components of the given heap and the element tobe inserted are taken from the open i n t e r v a l ] 0 , 1 [ a n d a r e m u t u a l l y d i f f e r e n t , i . e . vl e c o n s i d e r
G :={(x 1 , ... ,xN+l); xi E]0,1[, xi *xj if i *j,
(x 1 , ... ,xN) is a heap}
as the set of inputs for the algorithm presented above. Now endow G with its Borel sets BG, that is with the smallest class of subsets of G which is closed under countable unions, and intersections, and which contains the open sets of G. BG is the domain for Lebesgue measure as the generalized volume. Moreover, fix a probability space
( rl , A , lP ) a s 1v e l l a s a m e a s u r a b l e m a p o : lt - G . T h e n u n d e r t h e o b s e r - v a t i o n o f w E lt t h e a l gor i t h m i s t o b e ex e c u t e d v-1 i t h ( o 1 ( w), ... , o N ( w) ) as the given heap, and oN+l(w) as the element tobe inserted,
respectively (oj(w) denotes the j-th component of o(w)). Denote by AN+lthe (N+l)-dimensional Lebesgue measure, then it is assumed that the probability for all those observations w E~ for which o(w) EA
(A EBG given) is essentially the (N+l)-dimensional volume of A. But since the probability in case A = G must be 1, a suitable factor must be found. From [1], Theorem 2 it is easily derived that
1 i .e. that the N-dimensional volume of the set of all heaps is the reciprocal of the product of all subtree sizes. Consequently we assume that
0
lP(oEA) .- lP({wErl; o(w) EA})
= XN •\rHl(A)
holds. Note that this assumption implies in particular that the heap
( o 1 ' . . . , o N )
and the element oN+l tobe inserted are stochastically independent random variables, as one would expect intuitively.
As i n [ 1 ] we need the Change.
06
Vai1-,labte. F o iLm ula for manipulations of integrals: Let X and y be open subsets of IR k for some k such thatX i s the i ma ge of y under some differentiable homeomorphism g. Then the integral over X for every integrable map h:X-IR can be evaluated as follows:
= fh(g(yy 1, ... ,yk)) idet g1 (y 1 , ... ,yk) dy 1 ... dyk
= fhog • jdet 91 1 d11k
y
(see [6], Theorem 8.27). This Formula is applied now in order to
demonstrate that the discrete case considered by Porter and Simon [5]
is implicit in the continuous case via a simple transformation.
Let H be the set of all permutations TI of {l, ... ,N+l} such that the vector (n(l), . . . ,n(N)) meets the heap condition, and map G onto H
by the map K, where
K(x) =7T
such tha t n( i) = k hol ds i ff '<; i s the k-th sma i lest component of x.
S i n c e ( x
1 , . . . , x N ) i s a h e a p , s o i s ( r; ( 1 ) , . . . , TI ( N ) ) ; t h i s f o l : o v1 s immediately from the construction. Now let
p :=JP(00K=1)
11
be the probability that the associated permutation is 11E H, hence
p = .>,_N+l,H)
'IT XN ( TT ,
where x EH iff K(x) = n. It v;ill be shown that p = P-rr. holds when-
~ 111 "2
ever n 1,rr
2EH.
This is done in the following way: let K be uniquely determined permutation of {l, ... ,N+l} such that
holds, and associate with K the homeomorphism h which is defined by h(x1,···,xN+l) .- (xK(l)'···,xK(N+l)),
then it is readily seen that
(Ko h)(x) = K(x) o K
hol ds for any x EG. Consequently we have
a nd
vxEG: 1 det h1 (x)/ = 1.
Now the Chanae of Variable Formula applies:
j, 1 d '. N + 1
= XN r.
h [ H ]
'IT1.
= per-
" l
8
(considering 1 as the constant function 1). Consequently the discrete counterpart of the assumption on the equidistribution of o reads as follows: every heap with N elements which can be constructed from permutations of {l, ... ,N+l} is equally likely. This is exactly the probabilistic assumption of Model I in [5], hence the results ob- tained here generalize those of Porter and Simon for their Model I.
Since we will obtain other random variables from cr for which the distribution must be computed, we need another tool from Analysis, the Uniquene~~ Theo~em, for probability measures. Let U be an open set in lRk for some k, endowed v;ith the Borel sets Bu, and let µ
1,w2 be probability measures on Bu· Then w1 = w2 holds iff
Jg dµ =
f
g dµ 2 holds for any bounded and continuous function U l Ug:U-JR ([6], Theorem 2.14).
3. The Analysis
Since in the representation of {1, ... ,N+l} as a tree only the special path {t(N+l,j); Osjsm} is of interest (the label tobe inserted moves upward along the special path), we concentrate our efforts o~
this path and fade it out. This is done by means of a map F, and the composed random variable F o 0 \1/ill be investig<1ted further. For this, define
Y .- {(y
0 , • • • ,ym); Y; E: ]0,1[ are mutually different, and
Y1> ... >ym}'
and endow Y with its Borel sets By. Then
F
( G
<
1
L(x1,···,xi'l+l)
y
( x, ( N + 1 , o) ' · · · 'x, ( N+ 1, m) )
is measurable, and onto. Now let µ(A):= IP(FoaEA)
b e t h e prob ab i l i ty t hat t h e spe c i a l p a t h i n ( o
1 , ... , o N +
1 ) i s an el ement of the Borel set Ac Y, then JJ i s of course a probabi 1 i ty measure on By, and an explicit formula for µ will be derived now:
Theorem 1:
j+c.
Let a(j) := 2 J_l and denote by ZN+l the product of the sizes of all tnose subtrees which are rooted at nonspecial nodes. Then we
The proof of this Theorem requires some technical preparations (note that the integrand is independent of y
0). For this, let H(s,a) := {(x
1, ... ,x
5 ) ; x.E]a,l[ are mutually different,
2 -1 1
(x1, ... ,x
5 ) forms a heap}
2 -1
be the set of all those heaps with 25-1 members which are bounded below by a. Elements of H(s,a) arise in a natural manner: consider x E G , t h e n t h e 1 a b e l s o f a l l t h o s e n o d e s v1 h i c h a r e r o o t e d a t t h e brother of -r(N+l,j) for some j vlith O < j < m forma heap from
Lemma 1:
25-1
Given 0:S:a:S:l, \ (H(s,a))
25-1
= (1-a)
X s 2 -1
10
holds.
Proof: 1) Consider first the case c( = O; then the assertion is easily deduced from [1], Theorem 2. The general case will be now deduced
from the special case a = 0 by means of the Change of Variable Forrnula.
Before doing this note that if
A(s) :={(x 1 , ... ,x ); (1-x
1, ... ,1-x
5 )EH(s,O)}
25-l 2 -1
then the Lebesgue measure of A(s) and H(s,O) are equal.
2) No\'/ let
A ( s) - 1
1_ (l-x
1, ... ,l-x ),
a 25-l
then ~ is a continuously differentiable homeomorphism such that for the J acobi an
l 2 -1 s
1 det ~1 (x) [ = (1_a)
holds for any x E H(s,ct). Consequently we have
l~ow take y E Y and consider the set CY of all x EG such that F(x) = y holds. From x E CY a (N-m)-tuplet x' is obtained by considering only the labels at the nonspecial nodes. Denote by G(y) the set of x' obtained from xECY in this manner. In terms of set theory, G(y) is t h e y -sec t i o n o f G, c p . [ 6 J , p . 14 6 . A forma 1 de f i n i t i o n r e ad s a s follows: let .11.c ]-~k, and (x. , ... ,x. ) E
n/
for someJ.<
k; then thel l 11
(x. , ... ,x. )-section of A,
l 1 l _P.
k o
A(x. , ... ,x )cIR·--l.
l 1 l
-t
is defined by
(x1, ... ,xk_ 0)EA(x. , ... ,x.)
-c. l l l _,e
iff
Lemma 2:
G i v e n ( y
O , • • • , y m ) E Y , 111 e h a v e
i
m-1 .. . -1-1 (1-y. )a(l) r:N+l i=O l+l
Proof: 0) If Be
:nl
for some k, denote by B the set ßL, {(x1, ... ,xk); xi =xj for some i *j},t h e n >. k ( B ) = .\ k (
B )
i s e a s i l y s e e n . Sets o f type B a r e c o n v e n i e n t t o work with in what follows, since one needs not have an eye on mutual different components in the vectors under consideration albeit the Lebesgue measure remains unchanged.1) Let N+l be even, anct let x(i) := (x. 1, 1, ... ,x. l,a l ("';) be an element of H(i+ci,Yi+l) which will be recruited for a labeling of the tree representing {l, . . . ,N+l}. If ß(i) is the brother of T(N+l,i), let x(i) be the vector of labels for the tree rooted at s(i); more formally, the node 2a·S(i)+b has x ~ as its label (1t1here
i,2°+b
O::;a:s;i+c.-1, O:s;b::;2a-1). In this manner all nonspec1al nodes get
l
a label, and if the special path is labeled by y, thEn an element of
-
G is generated. This is so since N+l is even, and consequently any nonspecial node has some s(i) as its ancestor.Working in this manner, a homeomorphism
12
m-1 ~ ~
1: - ,
f
H(i+c.,y. 1)i=l l 7+ -- G(y)
is obtained, and the Change of Variable Formula asserts that ( 1 )
m-1
-,-, X ( ..
i=l a 7 )
m-1 .
- 1-
1 (1-y. )a(l) i=O i+l
h o l d s b y L e m m a 1 . N o t e t h a t i n t h e c a s e c o n s i d e r e d c( ( O ) = 0 •
2) In case N+l is odd, label the tree as above. But since N is in no subtree which has s(k) for some k as its root, this node has not yet be assigned a label. This is done with an element of the interval
]y1,1[, and we get a map m-1 ~
T:]y,,l[ x - , - , H(i+c.,y.
1)
.J. i=l l 7+
~
- G(y),
thus Eq.,(1) is established again, since a(O) = 1 holds in this case.
3) lt is now an easily proved observation that m-1
i I= 11 X a ( i ) = z: N + 1
holds, since for any node j which has more than one offsprings there exists exactly one k such that ß(k) is the ancestor of j. This
c o 111 p 1 e te s t h e p r o ~
Now a proof for Theorem 1 ca n be accompl i shed; for thi s, l et '.; be a bounded and continuous function Y - ~, then we have
(
l)Jdµ = ;\N j.;:(F(x1, ... ,xN+l))dx 1 ... dxN+l G
d F ( x
1. , ... , x,,. 1 ')) dx1 ... dx,; ,
,' ' l~T- 1 t T l
G(x,("·l c lj-,- , 0 ), ... ,x c ~ ,,,J..J'l , 1 , m) 1
!
XN i(P(Y 0, • · · ,ym) [J
1 dx 1 ... dxN_ J dy ... dy<, G ( y , .. . y ) m o m
r o m
*
XNf
m-l ci(i)= ;;::N+l tr(Yo•···,Ym) "II (1-yi+l) dyo···dym;
y 1=0
equalities marked with an asterisk derive from Fubini 1s Theorem, see [6], Theorem 7.8. This chain of equalities proves Theorem 1, having the Uniqueness Theorem for probabilities in mi~
Now let us abbreviate F o o by r;, then r; is a random variable the distribution of which is described in Theorem 1. If uiEri: is observed such that for y := s(w) the relations
Y1> ... >y.>y l O >Y·+1> ... l >y m
hold, then y is inserted between y., and y. 1 • In this situation
0 l l+~
the algorithm considered needs i+l comparisons. This motivates the split of Y into subsets according to the position of y
0 in the chain (y1, ... ,y ). If yEY, O<i<m are given, let yc.Y. iff y
0 is in the
ITT l
interval ]y.,y.
1[. Y, and Y are defined as the sets of all
l l+ 0 m
v e c t o r s y E Y i n 1-1 h i c h t h e f i r s t c o m p o n e n t y i s t h e g r e a t e s t , a n d
0
the smallest component, respectively. Now define the random variable
T:~ - - {0, ... ,m} by
T ( w)
( i + 1, i f
d
w) E Y. , a nd i < mj l
) i_ rn , 1· f qw ' ) E y· m'
then the expectation [(T) of T is the looked-for quantity whicn will be calculated now. I t is easy to see that
m-1
[(T) =) (i+l). IP (c:;:: Yi) + m JP(, ~ Ym)
i=O
14
Consequently, IP( c_; E Y.) shoul d be known. The fol l owi ng auxiliar
... l
result makes the task of computing this probability easier. Let
Lemma 3:
( m E · l
j
-1-1 (1-y.) 1dy ... dy = - - - - y o i = 0 l o m i o ( i o + E 1) ... ( i o + E 1 + •.. +Ern) provi ded Ei 2 0.Proof: By the definition of Y the integral in question equals
0
1 1
f(
l-ym)Em 1(1 )Em-1 ) -ym-l0 y
m
dy ... dy .
o m
Solving the innermost integral reduces the number of integrals by one, anci proceeding in this manner the asserted equation g r o ~ This result enables us to compute JP(c_; EY.).
l
Proposition:
In case i>O, we have IP( c_;EYi)
1 -1-· m 1 1 .
= _,.(..,...l_c_1· ---1-. -. -. c-o--.)-2 . _I lL - ( 1 c. 1 ••• c ) ? '.'
J=l+ J- 0 -
m
if however i = 0, W(c_; EY
0) ==-1-1-[l- (l l ) ] . . 1 c. 1· .. c 2
J = J - 0
Proof: 1) An argument analogous to that in the proof of Lemma 3
demonstrates that
r m-1 .
i -1-1 ( l -y . ) a ( 1 ) dy ... dy
1 i=O 1+1 o m
y . C.
= l
1 ·; ( 0) (; ( 0)
+::: (
1 )) ... (a (
0)+ ... +a (
m-1) )C.
holds. But ci(i) =a( i) + l = 2 l • 2 l and 2 7-c. is alv1ays equal to 1,
l
hence
k
L
ci ( i ) =i=O
=
= Letting A = y i n Theorem
1 = µ(Y)
= - -XN •
z:N+l
k i
2k+l , - l + ~ c.2
l = 1
(lck ... c
0 ) 2 -l
g(N+l,-r(N+l,k+l)) - 1.
1 ' i t i s seen th a t
1 m
-1-1 (g(N+l,T(N+l,i))-1) i=l
n e n c e t h e r e h o l d s f o r a ny B o r e l s e t A c Y
_m_ ( ( - ~ a ( i )
)J ( A ) = 1 1 g ( N + 1 , T ( N + 1 ' i ) ) -1 ) . 1 1 1 ( 1 - y . ' 1 ) dy ... dy .
i=l Ai=O 1T o m
2) Fix i > 0, and let
(j) i : y
0 Y.
1
be the inverse sorting function, then <1i1 is evidently a homeomorphis~
with its Jacobian idet <1i'. 1 identical to unity. Consequently we have
1
m-i
= -1-1
j=l m-1
r rn-1
{g(N+l,,(N+l,i ))-1} · j' -_1-1 y, j=Ü
1
(1 -y )a(j) d
j + 1 dy o . • · y m
= -1-! { ... }. (
l
h(y0, ••• ,ym) dy ... dyJ o m
=
j=l
m-1
j=l 11 m-1
j=l 11
( \
.
\ ... ,
!>1 i y o l
J h(cp7(y0, .•• ,y:n)) dy0 .•• dym Yo
a ( 0 ) o. ( i -1 ) o • J. (-: ) •
( 1-y ) ... ( 1-y. 1 ) \ ( 1-y.) ( 1-y. ) . .
0 1-.1' l 1+1
0 ;:l I r;i-1 \
• • •(1-y ) \'' 1oy ... oy
m c m
16
by the Change of Variable Formula, where, of course,
m-1 .
h ( Y
0, • • • , Y m) : =-_I_I_ ( 1-y ., +
1 ) a ( J ) i s w r i t t e n f o r ab b r e v i a t i o n . Th i s
J=Ü J
proves the first part of the Proposition. On considering
r o a(o) a(m-1)
j (
1 -Y o) ( 1-Y 1 ) . ; . ( 1 -Y m) dy o ... dy m ,yo
the second part is proved, and this proves the Propositi~
On collecting the above considerations, it is proved Theorem 2:
The expected number [(T) of times the comparison (x
<
a[i ]) is executed in the Insertion Algorithm equals[(T)
Now let L be the number of levels the new element moves up in the tree (and indeed it is this quantity which is investigated by Porter and S i m o n i n [ 5 J ) • S i n c e L ( w ) = i i f f z; ( w ) E Y i f o r O ::; i ::; m , 1v e h a v e f o r the expectation of L:
= [(T) - N+l N
111-l -1-1 ( 1 j=i+l
This is the explicit representation of the expected value investigated in [4:, which is called A(N+l) there. Porter and Simon obtain a
recurrence formula for A by bringing A(N+l) 1n relation to the value of A when the tree rooted at T(N+l,m-1), i .e. rooted at the special
node which has 1 as its father is considered ([5], lheorem 1 and the first paragraph of its proof). This recurrence relation reads here
( 2)
ctnd can be obtained from E(L) now without combinatorial reasonings.
By the way it might be interesting to know whether one can deduce the explicit fovm of [(L) from Eq(2) by using techniques from the Calculus of Finite Differences ([21, Chapter XI). By some tricky manipulations Porter and Simon derive
Ci. '
and
lim A(2m+l_l) = 1 m-oo
([5], Theorems 3,4), and this yields.
Corollary:
In case N+l is a power of 2, E(T) equals asymptotically
- 00 1
l
+L
- k - = 2.60669515 ... , k=l 2 -1in case N+2 is a power of 2, [(T) equals asymptotically
=-_J
This statement gives the asymptotic bounds for [(T) in case N+l is the left- or the rightmost node in the tree representing
{l, ... ,N+l}. The next Section deals with a simple asymptotic expansion of [(T) in these cases.
18
4. Cornerstones
Let N+l = cm+l _ 1, then c. = 1 for O:::; i:::; m, hence
1
m-1 m-1
L
1 -1-1 (1- 1 )i=l (ici-1 · · .co)2 j=i+l (lcj-1• · .co)2
This implies
[ ( T) 2N-m
= N+T'
as the average of comparisons, and
[(L) = N-m N+l
as the average number of levels the new element moves up the tree.
L e t u s h a v e a l o o k a t t h e c a s e f ·1 + 1 = 2 m , t h e n c i = 0 f o r O :::; i :::; m -1 , and
m-1 m-1
L ;
-1-1 (1--L)i=i 2i j=i+l 2J
m-1 i r:i-i+l l l
= ~ - - II ( 1 - - . - . )
1=1 21 j=l 21 2J
has tobe computed. From the Theory of Partitions it is seen that
([4], p.10, Article 246)
( 1 -x n-r+L )• .. (l-vn\ :!:__J_
( 1 -X ) • • • ( 1 -X ) r
n r r 2r(r-1)/2
= 1 +
2=:(
-1) a (2n-r+l_l) ... (2n-l) if X=r;-1c.
r=l 2r•n
Now an easy inductive reasoning shows that
1·3···(2r-l) ,
,.
( 3 ) 2r(r-1)/2
(2n-r+ -1) •··(2n-1) 1 ,.. r • n
C.
= _ _ _ l _ _ _ + (-l)r 2r(r-1)/2 1 • 3 • • •( 2 r - 1 ) 1 • 3 • • •( 2 r - 1 )
1
2t(t-1)/2 1 • 3 • · ·( 2 -t 1 )
1 ,..r-t 1·3··•(c. -1) Before evaluating this somewhat clumsy sum let us consider the case in which the summation in the partition formula above is terminated before n:
n aj
<.:C .
j=R+l 1·3 ···(2J-1) a R+l
< 2R(R-1)/2
In case
ä~l
2 we have an error which is smaller than 1• 1
2 2R(R-1)/2 Now letting R= 4 and expanding by means of (3), it is seen that
m-1 . m-1
L -
1. -1-1 (1 - ~ ) =i=l 21 j=i+l 2J
v-1 i t h 0 s c: < 1 0 -3 . F r o m t h i s
66150-m + 39533 883· m - 1201
a. - - - - + ')
33075{N+l) 2646·(N+l)~
+ 315m + 199 3 + E 2646-(N+l)
[(T) = N(l+a)+m + N [- 66150m + 39533 + 883m - 1201 N+l (N+1) 2
33075 2649(N+l)
, 315m+ 199 , .L ~
7 2 6 4 6 ( f h 1 ) 2 , , ·~
(0sssl0 ~) - ?
is computed in case N+l=2m. In principle the exactness can be as great as we want (but the effort to compute the corresponding values
20
I..D LD ~
---
.
..
1-___, N N N
Lw
("'; N ,-,.
. .
N N N
0 N
zvs
Z1 S
ns
N
grows somewhat exponentially).
Using well-knov-1n ictentities it is possible to give an explicit
representation for the expectation of Tin case N+l is no cornerstone at all. But this representation seems not to give additional infor- mations, and instead of showing it here Figure 2 displays the
behavior of [(T) in an interval.
Acknowledgement:
Help from Uwe Manthey for computing the values depicted in Figure 2 is gratefully acknowledged.
References:
[1] E.-E. Doberkat: A Note on the Average Behavior of Heapsort.
Informatik-Berichte Nr.3, Department of MathematiCS,University of Hagen, February 1980; submitted for publication
[2] Ch. Jordan: Calculus of Finite Differences. Budapest, 1939 ( Re p r i n t e d b y Ch e l s e a , Ne \1/ York 19 6 5 , Th i r d Ed i t i o n)
[3] D.E. Knuth: The Art cf Computer Programming, vol.III - Sorting and Searching. Addison- Wesley, Reading, Mass., 1973
[4] Maj. P.A. MacMahon: Combinatory Analysis, vol. II. Cambridge Uni- versity Press, 1916 (reprinted by Chelsea, New York, 1960) [5J Th. Porter and I. Simon: Random Insertion into a Priority Queue
Structu re. IEEE Trans. Soft'd. Engineering SE-1 ( 1975), 292-298 [6] W. Rudin: Real and Complex Analysis, Second Edition. Tata
M c G r a 'd - H i l l P u b l i s h i n g c o . , N e \1/ D e 1 h i , 1 9 7 4
: 7 i J . \~ . J . \·! i 1 l i a m s : A l g o r i t h m 2 3 2 : rl E J\. P S O R T . C o mm . .l'I C i·1 7
(1964), 347-348.