21. Greedy Algorithms

(1)

21. Greedy Algorithms

Activity Selection, Fractional Knapsack Problem, Huffman Coding Cormen et al, Kap. 16.1, 16.3

(2)

Activity Selection

Coordination of activities that use a common resource exclusively.

ActivitiesS = {a₁, a₂, . . . , a_n} with start- and finishing times 0≤ s_i ≤ f_i < ∞, increasingly sorted by finishing times.

a1= (1,4)

a2= (3,5) a3= (0,6)

a4= (5,7) a5= (3,9)

a6= (5,9) a7= (6,9)

a8= (8,11) a9= (8,12) a10 = (2,14)

a11 = (12,16)

Activity Selection Problem: Find a maximal subset of compatible (non-intersecting) activities.

(3)

Dynamic Programming Approach?

Let Sij = {a_k : fi ≤ sk∧ fk ≤ sj}. Let Aij be a maximal subset of compatible activities from S_ij. Moreover, let a_k ∈ A_ij and

Aik = Sik ∩Aij,Aki = Skj ∩Aij, thus Aij = Aik+ {a_k}+Akj.

A_ik a_k Akj

f_i s_j

(4)

Dynamic Programming Approach?

Let c_ij = |A_ij|. Then the following recursion holdsc_ij = c_ik+ c_kj + 1, therefore

c_ij =

(0 fallsS_ij = ∅,

max_a_k_∈S_ij{c_ik +c_kj + 1} fallsS_ij 6= ∅.

Could now try dynamic programming.

(5)

Greedy

Intuition: choose the activity that provides the earliest end time (a₁).

That leaves maximal space for other activities.

Remaining problem: activities that start aftera₁ ends. (There are no activites that can end before a₁ starts.)

(6)

Greedy

Theorem

Given: Subproblem S_k,a_m an activity from S_k with earliest end time.

Thena_m is contained in a maximal subset of compatible activities fromS_k.

Let A_k be a maximal subset with compatible activities from S_K and a_j be an activity from A_k with earliest end time. If a_j = a_m ⇒done.

If aj 6= am. Then consider A⁰_k = Ak − {a_j} ∪ {a_m}. A⁰_k conists of compatible activities and is also maximal because|A⁰_k| = |A_k|.

(7)

Algorithm RecursiveActivitySelect( s, f, k, n )

Input : Sequence of start and end points (s_i, f_i), 1≤i≤n, s_i < f_i, f_i ≤f_i+1 for all i. 1≤k≤n

Output : Set of all compatible activitivies.

m←k+ 1

whilem ≤n and sm ≤fk do m←m+ 1

if m≤n then

return {a_m} ∪RecursiveActivitySelect(s, f, m, n) else

return ∅

(8)

Algorithm IterativeActivitySelect( s, f, n )

Input : Sequence of start and end points (s_i, f_i), 1≤i≤n, s_i < f_i, f_i ≤f_i+1 for all i.

Output : Maximal set of compatible activities.

A← {a₁} k ←1

for m←2 to n do if s_m ≥f_k then

A←A∪ {a_m} k←m

return A

Runtime of both algorithms: Θ(n)

(9)

The Fractional Knapsack Problem

set ofn ∈ N ^items{1, . . . , n} Each itemi has value v_i ∈ N ^and weightw_i ∈ N. The maximum weight is given as W ∈ N^{. Input is} denoted as E = (v_i, w_i)_i=1,...,n.

Wanted: Fractions 0≤ q_i ≤ 1(1≤ i ≤ n) that maximise the sum Pn

i=1q_i ·v_i under Pn

i=1q_i ·w_i ≤W.

(10)

Greedy heuristics

Sort the items decreasingly by value per weight v_i/w_i. Assumptionv_i/w_i ≥ v_i+1/w_i+1

Let j = max{0 ≤k ≤ n: Pk

i=1w_i ≤ W}. Set q_i = 1 for all 1≤ i ≤ j.

q_j+1 = ^W⁻

Pj i=1wi

wj+1 . q_i = 0 for all i > j + 1.

That is fast: Θ(nlogn) for sorting andΘ(n)for the computation of the q_i.

(11)

Correctness

Assumption: optimal solution(r_i)(1 ≤ i ≤n).

The knapsack is full: P

ir_i ·w_i = P

iq_i·w_i = W.

Considerk: smallesti with r_i 6= q_i Definition of greedy: q_k > r_k. Let x = qk −rk > 0.

Construct a new solution(r⁰_i): r_i⁰ = ri∀i < k. r_k⁰ = qk. Remove weightPn

i=k+1δ_i = x·w_k from items k+ 1 to n. This works because Pn

i=kr_i ·w_i = Pn

i=kq_i·w_i.

(12)

Correctness

n

X

i=k

r_i⁰vi = rkvk +xwk

v_k w_k +

n

X

i=k+1

(riwi −δi)v_i w_i

≥ r_kv_k +xw_k v_k w_k +

n

X

i=k+1

r_iw_iv_i

w_i −δ_iv_k w_k

= r_kv_k +xw_kv_k

w_k −xw_kv_k w_k +

n

X

i=k+1

r_iw_iv_i w_i =

n

X

i=k

r_iv_i.

Thus (r⁰_i)is also optimal. Iterative application of this idea generates the solution(q_i).

(13)

Huffman-Codes

Goal: memory-efficient saving of a sequence of characters using a binary code with code words..

Example

File consisting of 100.000 characters from the alphabet {a, . . . , f}.

a b c d e f

Frequency (Thousands) 45 13 12 16 9 5 Code word with fix length 000 001 010 011 100 101 Code word variable length 0 101 100 111 1101 1100

(14)

Huffman-Codes

Consider prefix-codes: no code word can start with a different codeword.

Prefix codes can, compared with other codes, achieve the optimal data compression (without proof here).

Encoding: concatenation of the code words without stop character (difference to morsing).

af f e → 0·1100·1100·1101 → 0110011001101 Decoding simple because prefixcode

0110011001101 →0·1100·1100·1101 → af f e

(15)

Code trees

100

86

58

a:45 b:13

28

c:12 d:16

14

e:9 f:5 0

0

0 0

0

0 1

1 1

1

100

a:45 55

25

c:12 b:13

30

14 d:16 0

0

0 0

0 1

1

1 1

1

(16)

Properties of the Code Trees

An optimal coding of a file is alway represented by a complete binary tree: every inner node has two children.

Let C be the set of all code words, f(c) the frequency of a

codeword cand d_T(c) the depth of a code word in tree T. Define the cost of a tree as

B(T) =X

c∈C

f(c)·d_T(c).

(cost = number bits of the encoded file)

In the following a code tree is called optimal when it minimizes the costs.

(17)

Algorithm Idea

Tree construction bottom up

Start with the set C of code words

Replace iteriatively the two nodes with smallest frequency by a new

parent node. a:45 b:13 c:12 d:16 e:9 f:5

14 25

30 55

100

(18)

Algorithm Huffman( C )

Input : code words c∈C

Output : Root of an optimal code tree n← |C|

Q←C

for i= 1 to n−1do allocate a new node z

z.left←ExtractMin(Q) // extract word with minimal frequency.

z.right←ExtractMin(Q)

z.freq←z.left.freq+z.right.freq Insert(Q, z)

return ExtractMin(Q)

(19)

Analyse

Use a heap: build Heap in O(n). Extract-Min in O(logn) for n Elements. Yields a runtime ofO(nlogn).

(20)

The greedy approach is correct

Theorem

Let x, y be two symbols with smallest frequencies inC and let T⁰(C⁰) be an optimal code tree to the alphabetC⁰ = C − {x, y}+{z}with a new symbolz with f(z) = f(x) + f(y). Then the treeT(C) that is constructed fromT⁰(C⁰) by replacing the nodez by an inner node with childrenx and y is an optimal code tree for the alphabet C.

(21)

Proof

It holds thatf(x)·dT(x) + f(y)·dT(y) =

(f(x) +f(y))·(d_T⁰(z) + 1) = f(z)·d_T⁰(x) +f(x) +f(y). Thus B(T⁰) = B(T)−f(x)−f(y).

Assumption: T is not optimal. Then there is an optimal tree T⁰⁰ with B(T⁰⁰) < B(T). We assume thatx and y are brothers in T⁰⁰. Let T⁰⁰⁰ be the tree where the inner node with childrenx andy is replaced by z. Then it holds that

B(T⁰⁰⁰) = B(T⁰⁰)−f(x)−f(y) < B(T)−f(x)−f(y) = B(T⁰). Contradiction to the optimality ofT⁰.