Chapter 6 Distributed DBS

(1)

Chapter 6 Distributed DBS

Bayer, Elhardt, Kießling, Killar: Verteilte DBSe.

Informatik Spektrum, 1984, 7, 1-19

Chapter 6.1 Goals and Characteristics 1. HW and communication costs

 distributed systems become attractive (e.g.

Client / Server) 2. reliability, availability 3. flexibility, extensibility

4. performance becomes scalable 5. local autonomy

6. safety, e.g. mirroring of servers

Structures of distributed DBS 1. data partitioning

2. full redundancy

(2)

- frequently read data replicated, e.g. catalogue - frequently changed data partitioned

corresponding to update frequency

 goal conflicts

Generally: high redundancy is

- good for reads: little communication

- bad for writes: communication and coordination is expensive

Maintaining consistency: task of VDBS Fragments: relations, horizontal, vertical,

reconstruction

(3)

(4)

(5)

Chapter 6.2 Catalogue Management 1. Full Address Names:

Direct addressing: i.e. address is part of file name Problems?

2. Indirect Addressing:

Via catalogues

invariant names for relations address only in catalogue

additional info in catalogue: - schema - statistics - acces rights

 needed for queries, i.e. requires catalogue access, no extra overhead for indirect

addressing.

3. Aspects for Design Decision 3.1 Central catalogue:

- slow, lots of communication - simple updates

- single point of failure

(6)

3.2 Fully redundant catalogue:

- fast access (local)

- expensive updates in many places 3.3 Local Catalogues:

- for local files

- for global files: names known, addresses unknown

 - via catalogue server or via broadcast (no single point of failure)

3.4 Probable Location Catalogue:

- cache last known address (small, only for locally used files)

- if no success, use method 3.3

(7)

Chapter 6.3 Distributed Query Evaluation

Tradeoff: communication cost versus I/O cost is shifting!

Chapter 6.3.1 Horizontal Partitioning R = R1  R2  ...  Rn disjunct

K1 ... Kn computer nodes

 (R) =  (R1)  ...   (Rn) = _i^_ⁿ₁^⁽^Ri⁾

 (R) = _I^N_₁^⁽^Ri⁾

Question: Result distributed or centrally required?

Duplicates for ?

(8)

Goal Conflict: Parallelism  communication cost

send... send

ZER1: ZER2: ...

K1 K2 Kn

...

K1 K2 Kn

Aggregate Functions: let R = R1  R2  ...  Rn

 MIN (R,A)  MIN{MIN (R1  A) ..., MIN (Rn  A)}

 SUM (R,A)  ^_ⁿ

i 1 SUM(Rj  A)

 COUNT (R,A)  ^_ⁿ

i 1 COUNT (Rj  A)

 AVG (R,A)  SUM (R,A) / COUNT (R,A) Unio

n

K1

(R1) (R2) (Rn)

R2 Rn R1

ZERn

(9)

Vertical Partitioning:

Results in additional joins

Replication: of R as R1 ... Rn on K1 ... Kn

Optimize query for all variants of replication, choose cheapest

(10)

(11)

(12)

(13)

(14)

(15)

R on KR

S on KS

Example on KR on KS

R r1 r2 S s1 s2 s3

1 7 1 6 4

2 2 2 5 1

3 7 1 4 2

4 8 5 3 3

5 6 5 2 5

6 3 6 1 8

7 9

Result in K

R[r2 = s1]S r1 r2 s1 s2 s3

2 2 2 5 1

5 6 6 1 8

a) tuples (1,6,4), (1,4,2), (5,3,3), (5,2,5) of S are not in result b) tuples (1,7), (3,7), (4,8), (6,3), (7,9) of R are not in result

(16)

- transport complete relations, compute join centrally - request single tuples (Join-Partners)

Idea: Semi-Join

A semi-join of R on S:

R  r = s]S is defined as

R  r = s]S = {ttR  t.rs(S)}

Send s(S) to KR, on KR compute join-partner for S

Plan A:

Step 1: compute s1(S) in KS

s1(S) s1 1 2

5 and send s1(S) to KR

(17)

Step 2: compute R = R<r2 = s1]S in KR

R r1 r2

2 2

5 6

Step 3: compute r2(R) in KR

r2(R) r2

2 send to KS, 6 send R to K

Step 4: compute S=S<s1=r2]R in KS

S s1 s2 s3

2 5 1

6 1 8 an send S to K

Step 5: compute R[r2=s1]S in K

R[r2=s1]S r1 r2 s1 s2 s3

(18)

5 6 6 1 8 Plan B:

Step 1: compute filter r2(R)

r2(R) r2 7 2

8 an send r2(R) to KS

6 3 9 send R to K

Step 2: compute S=S<s1=r2]R in KS

S s1 s2 s3

2 5 1 and send S to K

6 1 8

Step 3: compute R[r2=s1]S in K

(19)

5 6 6 1 8

Plan C:

Step 1: compute filter r2(R) in KR and send to KS

Step 2: compute S=S<s1=r2]R in KS and send S to KR

Step 3: compute join R[r2=s1]S in KR and send to K Transport volume = 6+6+10 = 22 values

(20)

(21)

Hash-Filter: so far:

- local data reduction: push ,

- semi join

Idea: make A(S) coarser by hash table:

dom (A) = value domain of join attribute A HA[1:f] = hash table of bits

h : dom (A)  [1:f]

In KS : (1) HA [i] := 0 for i=1, ..., f (2) aA(S) : HA[h(a)] := 1

without solving collisions!!!

Instead of A(S) send HA from KS to KR

Instead of exact semijoin R‘=R<R.A=S.A]S with A(S) Compute R‘‘HA=R<R.A=S.A]HAS with

Tuple tR is in R‘‘HA : HA[h(t.a)] = 1 R‘  R‘‘HA