1
DWH, Prof. Bayer, SS 2002 1
Caller
Prefix smallint 100 Number integer 107 Name string Adress... string ...
Callee
Prefix smallint 100 Number integer 107 Name string Adress... string ...
TimeOfCall
Year smallint 10 Month string 12 Day smallint 31 Hour smallint 24 Minute smallint 60 RateCents decimal
LocCaller
XCoord integer 104 YCoord integer 104 LocCalleee
XCoord integer 104 YCoord integer 104 CallsFacts
Prefix...YCoord (13 key components, 7 dimensions) DurationSec integer
Solution proposal for exercise 1 on sheet 1
DWH, Prof. Bayer, SS 2002 2
Solution proposal for exercise 2 on sheet 1
108 calls/day * 365 days/year * 49 B/call = 1.7885 * 1012B/year ~ 2 TB/year
The size of the space spanned by the dimensions is:
100*107 *100 *107*10 * 12 *31*24*60* 104* 104* 104* 104
= 5,4 * 1040
The number of Tuples is 108 calls/day * 365 days/year =
3.65 * 1010 Sparsity
1- (3.65 * 1010/ 5,4 * 1040) ~ 1-10-30 =0.999999999999999999999999999999 i.e. extremely sparse, but not unusual for datamining
2
DWH, Prof. Bayer, SS 2002 3
Solution proposal for exercise 3 on sheet 1
The partially aggregated cube has 100 * 100 * 10 * 12 * 31 * 24 * 60 * 1016
= 5,4 * 1026 tuples, it cannot be computed or stored with forseeable technology.
DWH, Prof. Bayer, SS 2002 4
Solution proposal for exercise 4 on sheet 1
1. Bit Vectors of length 3.65*1010Bits = 36.5*109Bits = 5*109B = 5 GB uncompressed.
Bit vectors for TimeOfCall : 137 vectors of 5 GB ~ 685 GB
Bit vectors for LocCaller : 10.000 vectors of 5 GB ~ 50 TB per coordinate à Bit vectors usable at most for TimeOfCall, but not for other dimensions
2. Compound B-Trees: 45B/compound key + 4B/fact
relation size ~ 2 TB/8KB/page = 0.25*109pages = 225.000.000 pages Height of B-Tree: at 45 B/key + 4 B/pointer
à branching degree ~ 160 for 8 KB pages àheight 5
3
DWH, Prof. Bayer, SS 2002 5
Solution proposal for exercise 4 on sheet 1 continued
Example Query1:
select Name, Prefix, Number, Year, Month, sum(duration) fromCaller C, CallsFacts F
whereC.Name = ‘Rudolf Bayer’
and C.Prefix = F.Prefix and C.Number = F.Number group byYear, Month
Time estimate: ~ 10 calls/day = 3650 calls/year = 3650 calls/year* 49 B/call = 182.500 B/year/ 8000 B/page = 23 pages*10 ms/page ~ ¼ second with B-Tree index
Example Query2: ... from Callee C ...
àB-Tree not usable àrelation scan at 10 MB/s à~ 2 days