• Keine Ergebnisse gefunden

The Search Problem

N/A
N/A
Protected

Academic year: 2021

Aktie "The Search Problem"

Copied!
11
0
0

Wird geladen.... (Jetzt Volltext ansehen)

Volltext

(1)

4. Searching

Linear Search, Binary Search, Interpolation Search, Lower Bounds [Ottman/Widmayer, Kap. 3.2, Cormen et al, Kap. 2: Problems 2.1-3,2.2-3,2.3-5]

119

The Search Problem

Provided

A set of data sets examples

telephone book, dictionary, symbol table Each dataset has a keyk.

Keys are comparable: unique answer to the questionk1 ≤k2 for keysk1, k2.

Task: find data set by keyk.

120

The Selection Problem

Provided

Set of data sets with comparable keysk.

Wanted: data set with smallest, largest, middle key value. Generally:

find a data set withi-smallest key.

Search in Array

Provided

ArrayA withnelements(A[1], . . . , A[n]). Keyb

Wanted: indexk, 1≤ k ≤n withA[k] =bor ”not found”.

10

4

20

2

22

1

24

6

28

9

32

3

35

5

38

8

41

10

42

7

(2)

Linear Search

Traverse the array fromA[1]toA[n]. Best case: 1comparison.

Worst case: ncomparisons.

Assumption: each permutation of the nkeys with same probability. Expectednumber of comparisons:

1 n

Xn i=1

i = n+ 1 2 .

123

Search in a Sorted Array

Provided

Sorted arrayAwithnelements (A[1], . . . , A[n])with A[1]≤ A[2] ≤ · · · ≤A[n].

Keyb

Wanted: indexk, 1≤ k ≤n withA[k] =bor ”not found”.

10

1

20

2

22

3

24

4

28

5

32

6

35

7

38

8

41

9

42

10

124

Divide and Conquer!

Searchb= 23.

10

1

20

2

22

3

24

4

28

5

32

6

35

7

38

8

41

9

42

10

b <28

10

1

20

2

22

3

24

4

28

5

32

6

35

7

38

8

41

9

42

10

b >20

22

3

24

4

28

5

10

1

20

2

32

6

35

7

38

8

41

9

42

10

b >22

24

4

10

1

20

2

22

3

28

5

32

6

35

7

38

8

41

9

42

10

b <24

24

4

10

1

22

3

20

2

28

5

32

6

35

7

38

8

41

9

42

10

erfolglos

Binary Search Algorithm BSearch(A,b,l,r)

Input : Sorted arrayAofn keys. Key b. Bounds1lrn orl > rbeliebig.

Output : Index of the found element. 0, if not found.

m← b(l+r)/2c

if l > rthen// Unsuccessful search return0

else if b=A[m] then// found returnm

else if b < A[m] then// element to the left returnBSearch(A, b, l, m1)

else// b > A[m]: element to the right returnBSearch(A, b, m+ 1, r)

(3)

Analysis (worst case)

Recurrence (n = 2k) T(n) =

(d fallsn = 1, T(n/2) +c fallsn >1. Compute:

T(n) =T n 2

+c= T n 4

+ 2c

=T n 2i

+i·c

=T n n

+ log2n·c.

⇒ Assumption: T(n) =d+clog2n

127

Analysis (worst case)

T(n) =

(d ifn = 1, T(n/2) +c ifn >1. Guess: T(n) =d+c·log2n

Proof by induction:

Base clause: T(1) =d.

Hypothesis: T(n/2) =d+c·log2n/2 Step: (n/2→ n)

T(n) =T(n/2) +c= d+c·(log2n−1) +c =d+clog2n.

128

Result

Theorem

The binary sorted search algorithm requiresΘ(logn)fundamental operations.

Iterative Binary Search Algorithm

Input : Sorted arrayAofn keys. Key b.

Output : Index of the found element. 0, if unsuccessful.

l1;r n whilelr do

m← b(l+r)/2c if A[m] =bthen

return m

else if A[m]< bthen lm+ 1 else

rm1 return0;

(4)

Correctness

Algorithm terminates only ifAis empty or bis found.

Invariant: If bis inA thenbis in domainA[l, ..., r]

Proof by induction

Base clauseb ∈A[1, .., n](oder nicht) Hypothesis: invariant holds afteri steps.

Step:

b < A[m] ⇒ b∈ A[l, .., m−1]

b > A[m] ⇒ b∈ A[m+ 1, .., r]

131

Can this be improved?

Assumption: valuesof the array are uniformly distributed.

Example

Search for ”Becker” at the very beginning of a telephone book while search for ”Wawrinka" rather close to the end.

Binary search always starts in the middle.

Binary search always takesm=

l+ r2l .

132

Interpolation search

Expected relative position ofbin the search interval[l, r]

ρ = b−A[l]

A[r]−A[l] ∈ [0,1].

New ’middle’: l+ρ·(r−l)

Expected number of comparisons O(log logn) (without proof).

? Would you always prefer interpolation search?

! No: worst case number of comparisonsΩ(n).

Exponential search

Assumption: keybis located somewhere at the beginning of the ArrayA. nvery large.

Exponential procedure:

1 Determine search domainl=r,r= 1.

2 Doubleruntilr > nor A[r]> b.

3 Setr ← min(r, n).

4 Conduct a binary search withl← r/2,r.

(5)

Analysis of the Exponential Search

Letmbe the wanted index.

Number steps for the doubling ofr: maximallylog2m. Binary search then alsoO(log2m).

Worst case number of steps overallO(log2n).

? When does this procedure make sense?

! If m << n. For example if positive pairwise different keys and b << N (N: largest key value).

135

Lower Bounds

Binary and exponential Search (worst case): Θ(logn)comparisons.

Does foranysearch algorithm in a sorted array (worst case) hold that number comparisons =Ω(logn)?

136

Decision tree

3

1

2

5

4 6

b < A[3]

b < A[5]

b > A[3]

b > A[1] b > A[5]

For any inputb=A[i]the algorithm must succeed decision tree comprises at leastnnodes.

Number comparisons in worst case = height of the tree = maximum number nodes from root to leaf.

Decision Tree

Binary tree with heighthhas at most 20+ 21+· · ·+ 2h−1 = 2h−1<2hnodes.

At leastn nodes in a decision tree with heighth. n <2h ⇒ h >log2n.

Number decisions =Ω(logn). Theorem

Any search algorithm on sorted data with lengthnrequires in the worst caseΩ(logn) comparisons.

(6)

Lower bound for Search in Unsorted Array

Theorem

Any search algorithm with unsorted data of length nrequires in the worst caseΩ(n)comparisons.

139

Attempt

? Correct?

”Proof”: to findbin A,bmust be compared with each of the n elementsA[i](1≤ i ≤n).

! Wrong argument! It is still possible to compare elements withinA.

140

Better Argument

Consideri comparisons withoutbandecomparisons withb. Comparisons geenrateg groups. Initiallyg =n.

To connect two groups at least one comparison is needed:

n−g ≤i.

At least one element per group must be compared with b. Number comparisonsi+e≥ n−g+g =n.

5. Selection

The Selection Problem, Randomised Selection, Linear Worst-Case Selection [Ottman/Widmayer, Kap. 3.1, Cormen et al, Kap. 9]

(7)

Min and Max

? To separately find minimum an maximum in(A[1], . . . , A[n]),2n comparisons are required. (How) can an algorithm with less than 2n comparisons for both values at a time can be found?

! Possible with 32N comparisons: compare 2 elemetns each and then the smaller one with min and the greater one with max.

143

The Problem of Selection

Input

unsorted arrayA = (A1, . . . , An)with pairwise different values Number1≤ k ≤n.

OutputA[i] with|{j : A[j] < A[i]}|=k−1 Special cases

k= 1: Minimum: Algorithm withn comparison operations trivial.

k= n: Maximum: Algorithm withn comparison operations trivial.

k= bn/2c: Median.

144

Approaches

Repeatedly find and remove the minimumO(k·n). Median: O(n2)

Sorting (covered soon): O(nlogn) Use a pivotO(n) !

Use a pivot

1 Choose apivotp

2 PartitionAin two parts, thereby determining the rank ofp.

3 Recursion on the relevant part. Ifk =rthen found.

p > > > >

p > > > >

p p

1 r n

(8)

Algorithmus Partition( A[l..r ], p )

Input : ArrayA, that contains the sentinelpin the interval [l, r] at least once.

Output : ArrayApartitioned in [l..r]around p. Returns position ofp.

whilel < rdo whileA[l]< pdo

ll+ 1 whileA[r]> pdo

rr1 swap(A[l],A[r]) if A[l] =A[r]then

ll+ 1 returnl-1

147

Correctness: Invariant

InvariantI:Aipi[0, l),Ai> pi(r, n],k[l, r] :Ak =p. whilel < rdo

while A[l]< pdo ll+ 1 while A[r]> pdo

rr1 swap(A[l],A[r]) if A[l] =A[r] then

ll+ 1 returnl-1

I

IundA[l]p IundA[r]p IundA[l]pA[r]

I

148

Correctness: progress

whilel < rdo

whileA[l]< pdo ll+ 1 whileA[r]> pdo

rr1 swap(A[l], A[r]) if A[l] =A[r] then

ll+ 1 returnl-1

progress ifA[l]< p progress ifA[r]> p

progress ifA[l]> poderA[r]< p progress ifA[l] =A[r] =p

Choice of the pivot.

The minimum is a bad pivot: worst caseΘ(n2)

p1 p2 p3 p4 p5

A good pivot has a linear number of elements on both sides.

p

·n ·n

(9)

Analysis

Partitioning with factorq (0< q <1): two groups withq ·n and (1−q)·n elements (without loss of generalityg ≥1−q).

T(n) ≤ T(q·n) +c·n

=c·n+q ·c·n+T(q2·n) =...=c·n

logXq(n)−1 i=0

qi+T(1)

≤ c·n X

i=0

qi

| {z }

geom. Reihe

= c·n· 1

1−q = O(n)

151

How can we achieve this?

Randomness to our rescue (Tony Hoare, 1961). In each step choose a random pivot.

1 4

1 4 1

2

schlecht gute Pivots schlecht

Probability for a good pivot in one trial: 12 =:ρ.

Probability for a good pivot afterktrials: (1−ρ)k−1·ρ. Expected value of the geometric distribution: 1/ρ= 2

152

[Expected value of the Geometric Distribution]

Random variableX ∈N+withP(X =k) = (1−p)k1·p. Expected value

E(X) = X

k=1

k·(1−p)k−1·p= X k=1

k·qk−1·(1−q)

= X

k=1

k·qk1−k·qk = X

k=0

(k+ 1)·qk−k·qk

= X

k=0

qk = 1

1−q = 1 p.

Algorithm Quickselect ( A[l..r], i )

Input : ArrayAwith length n. Indices1lirn, such that for all xA[l..r] it holds|{j|A[j]x}| ≥land|{j|A[j]x}| ≤r.

Output : Partitioniertes ArrayA, so dass|{j|A[j]A[i]}|=i if l=r thenreturn;

repeat

choose a random pivotxA[l..r]

pl

forj=lto r do

if A[j]xthen pp+ 1 until l+r4 p 3(l+r)4

mPartition(A[l..r], x) if i < mthen

quickselect(A[l..m], i) else

(10)

Median of medians

Goal: find an algorithm that even in worst case requires only linearly many steps.

Algorithm Select (k-smallest)

Consider groups of five elements.

Compute the median of each group (straighforward) Apply Select recursively on the group medians.

Partition the array around the found median of medians. Result: i If i =kthen result. Otherwise: select recursively on the proper side.

155

Median of medians

1 groups of five

2 medians

3 recursion for pivot

4 base case

5 pivot (level 1)

6 partition (level 1)

7 median = pivot level 0

8 2. recursion starts . . .

. . . . . . . . .

156

How good is this?

≤ m

m

≥ m

Number points left / right of the median of medians (without median group and the rest group)≥ 3·(d12dn5ee −2)≥ 3n10 −6

Second call with maximallyd7n10 + 6eelements.

Analysis

Recursion inequality:

T(n)≤ T ln 5

m+T

7n 10 + 6

+d·n.

with some constantd. Claim:

T(n) =O(n).

(11)

Proof

Base clause: chooseclarge enough such that T(n) ≤c·nfür alle n≤ n0. Induction hypothesis:

T(i) ≤ c·i für allei < n. Induction step:

T(n) ≤T ln 5

m+T

7n 10 + 6

+d·n

=c·ln 5

m+c· 7n

10 + 6

+d·n.

159

Proof

Induction step:

T(n) ≤ c·ln 5

m+c· 7n

10 + 6

+d·n

≤ c· n

5 +c+c· 7n

10 + 6c+c+d·n = 9

10 ·c·n+ 8c+d·n.

Choosec ≥80·dandn0 = 91.

T(n)≤ 72

80 ·c·n+ 8c+ 1

80 ·c·n =c· 73

80n+ 8

| {z }

nfürn > n0

≤c·n.

160

Result

Theorem

Thek-the element of a sequence ofn elements can be found in at mostO(n) steps.

Overview

1. Repeatedly find minimum O(n2) 2. Sorting and choosingA[i] O(nlogn) 3. Quickselect with random pivot O(n)expected 4. Median of Medians (Blum) O(n)worst case

1 4

1 4 1

2

schlecht gute Pivots schlecht

Referenzen

ÄHNLICHE DOKUMENTE

By aggregating the results of all the tests performed, the result for a single test run of 64 instances could be derived (PHYB.XL). The combination of parallelization and

The parallel searches are carried out by differently configured instances of a tabu search algorithm, which cooperate by the exchange of (best) solutions at the end of defined

The corn planting dates derived from Google Insights for Search at state level were based on a combination of corn specific terms (e.g. corn planting + planting corn + plant corn)

In a first step, the topic graph is computed on the fly from a set of web snippets that has been col- lected by a standard search engine using the initial user query.. Rather

One of the essential ingredients of an A ∗ search is an admissi- ble heuristic function for estimating the cost-to-go, i.e., in our case the length of a CLCS for any

The most successful algorithms (w. quality and running time) in practice rely on local search....

Using similar methods, we improve the best known smoothed upper bound for the popular k-means method to n O(k) , once again independent of the

For instances in which n points are placed uniformly at random in the unit square and the distances are measured according to the Manhattan metric, Chandra, Karloff, and Tovey show