Chapter 6 Distributed DBS
Bayer, Elhardt, Kießling, Killar: Verteilte DBSe.
Informatik Spektrum, 1984, 7, 1-19
Chapter 6.1 Goals and Characteristics 1. HW and communication costs
distributed systems become attractive (e.g.
Client / Server) 2. reliability, availability 3. flexibility, extensibility
4. performance becomes scalable 5. local autonomy
6. safety, e.g. mirroring of servers
Structures of distributed DBS 1. data partitioning
2. full redundancy
- frequently read data replicated, e.g. catalogue - frequently changed data partitioned
corresponding to update frequency
goal conflicts
Generally: high redundancy is
- good for reads: little communication
- bad for writes: communication and coordination is expensive
Maintaining consistency: task of VDBS Fragments: relations, horizontal, vertical,
reconstruction
Chapter 6.2 Catalogue Management 1. Full Address Names:
Direct addressing: i.e. address is part of file name Problems?
2. Indirect Addressing:
Via catalogues
invariant names for relations address only in catalogue
additional info in catalogue: - schema - statistics - acces rights
needed for queries, i.e. requires catalogue access, no extra overhead for indirect
addressing.
3. Aspects for Design Decision 3.1 Central catalogue:
- slow, lots of communication - simple updates
- single point of failure
3.2 Fully redundant catalogue:
- fast access (local)
- expensive updates in many places 3.3 Local Catalogues:
- for local files
- for global files: names known, addresses unknown
- via catalogue server or via broadcast (no single point of failure)
3.4 Probable Location Catalogue:
- cache last known address (small, only for locally used files)
- if no success, use method 3.3
Chapter 6.3 Distributed Query Evaluation
Tradeoff: communication cost versus I/O cost is shifting!
Chapter 6.3.1 Horizontal Partitioning R = R1 R2 ... Rn disjunct
K1 ... Kn computer nodes
(R) = (R1) ... (Rn) = in1(Ri)
(R) = IN1(Ri)
Question: Result distributed or centrally required?
Duplicates for ?
Goal Conflict: Parallelism communication cost
send... send
ZER1: ZER2: ...
K1 K2 Kn
...
K1 K2 Kn
Aggregate Functions: let R = R1 R2 ... Rn
MIN (R,A) MIN{MIN (R1 A) ..., MIN (Rn A)}
SUM (R,A) n
i 1 SUM(Rj A)
COUNT (R,A) n
i 1 COUNT (Rj A)
AVG (R,A) SUM (R,A) / COUNT (R,A) Unio
n
K1
(R1) (R2) (Rn)
R2 Rn R1
ZERn
Vertical Partitioning:
Results in additional joins
Replication: of R as R1 ... Rn on K1 ... Kn
Optimize query for all variants of replication, choose cheapest
R on KR
S on KS
Example on KR on KS
R r1 r2 S s1 s2 s3
1 7 1 6 4
2 2 2 5 1
3 7 1 4 2
4 8 5 3 3
5 6 5 2 5
6 3 6 1 8
7 9
Result in K
R[r2 = s1]S r1 r2 s1 s2 s3
2 2 2 5 1
5 6 6 1 8
a) tuples (1,6,4), (1,4,2), (5,3,3), (5,2,5) of S are not in result b) tuples (1,7), (3,7), (4,8), (6,3), (7,9) of R are not in result
- transport complete relations, compute join centrally - request single tuples (Join-Partners)
Idea: Semi-Join
A semi-join of R on S:
R r = s]S is defined as
R r = s]S = {ttR t.rs(S)}
Send s(S) to KR, on KR compute join-partner for S
Plan A:
Step 1: compute s1(S) in KS
s1(S) s1 1 2
5 and send s1(S) to KR
Step 2: compute R = R<r2 = s1]S in KR
R r1 r2
2 2
5 6
Step 3: compute r2(R) in KR
r2(R) r2
2 send to KS, 6 send R to K
Step 4: compute S=S<s1=r2]R in KS
S s1 s2 s3
2 5 1
6 1 8 an send S to K
Step 5: compute R[r2=s1]S in K
R[r2=s1]S r1 r2 s1 s2 s3
5 6 6 1 8 Plan B:
Step 1: compute filter r2(R)
r2(R) r2 7 2
8 an send r2(R) to KS
6 3 9 send R to K
Step 2: compute S=S<s1=r2]R in KS
S s1 s2 s3
2 5 1 and send S to K
6 1 8
Step 3: compute R[r2=s1]S in K
5 6 6 1 8
Plan C:
Step 1: compute filter r2(R) in KR and send to KS
Step 2: compute S=S<s1=r2]R in KS and send S to KR
Step 3: compute join R[r2=s1]S in KR and send to K Transport volume = 6+6+10 = 22 values
Hash-Filter: so far:
- local data reduction: push ,
- semi join
Idea: make A(S) coarser by hash table:
dom (A) = value domain of join attribute A HA[1:f] = hash table of bits
h : dom (A) [1:f]
In KS : (1) HA [i] := 0 for i=1, ..., f (2) aA(S) : HA[h(a)] := 1
without solving collisions!!!
Instead of A(S) send HA from KS to KR
Instead of exact semijoin R‘=R<R.A=S.A]S with A(S) Compute R‘‘HA=R<R.A=S.A]HAS with
Tuple tR is in R‘‘HA : HA[h(t.a)] = 1 R‘ R‘‘HA