MISTRAL
Performance of TPC-D Benchmark and Datawarehouses
Prof. R. Bayer, Ph.D.
Dr. Volker Markl
Dept. of Computer Science, Technical University Munich and Bavarian Research Center for Knowledgebased Systems
(FORWISS)
Test Bed for Performance Measurements
Hardware
– Compaq Proliant 5000 – 4 Pentium II 200 MHz – 512 MB RAM
– hard disk: 7 * 4 GB = 28 GB
Operating System
– Windows NT 4.0
RDBMS
– Oracle 8 – 8kB pages
Access Methods
– Tetris Algorithm for UB-Trees
– Oracle IOT (clustering B*-Tree)
– Oracle FTS (full table scan)
TPC-D Schema
ORDERKEY LINEITEM (L_) SF*6000K Tuples
PARTKEY SUPPKEY LINENUMBER
QUANTITY EXTENDEDPRICE
DISCOUNT TAX RETURNFLAG
LINESTATUS COMMITDATE RECEIPTDATE SHIPINSTRUCT SHIPMODE COMMENT ORDERKEY
ORDER (O_) SF*1500K Tuples
CUSTKEY ORDERSTATUS
TOTALPRICE ORDERDATE ORDERPRIORITY
CLERK SHIPPRIORITY
COMMENT
PARTKEY PART (P_) SF*200K Tuples
NAME MFGR BRAND
TYPE SIZE CONTAINER RETAILPRICE
COMMENT
CUSTKEY CUSTOMER (C_)
SF*150K Tuples
NAME ADDRESS NATIONKEY
PHONE ACCTBAL MKTSEGMENT
COMMENT
SUPPKEY SUPPLIER (S_)
SF*10K Tuples
NAME ADDRESS NATIONKEY
PHONE ACCTBAL COMMENT PARTKEY PARTSUPP (PS_)
SF*800K Tuples
SUPPKEY AVAILQTY SUPPLYCOST
COMMENT
NATIONKEY NATION (N_) 25 Tuples
NAME REGIONKEY
COMMENT
REGIONKEY REGION (R_)
5 Tuples
NAME COMMENT
SELECT
L_ORDERKEY, SUM(L_EXTENDEDPRICE*(1-L_DISCOUNT)) AS REVENUE, O_ORDERDATE, O_SHIPPRIORITY
FROM CUSTOMER, ORDER, LINEITEM WHERE
C_MKTSEGMENT = 'FOOD' AND C_CUSTKEY = O_CUSTKEY AND L_ORDERKEY = O_ORDERKEY AND O_ORDERDATE < DATE 1.5.98 AND L_SHIPDATE > DATE 1.6.98
GROUP BY L_ORDERKEY, O_ORDERDATE, O_SHIPPRIORITY
ORDER BY REVENUE DESC, O_ORDERDATE
Tetris algorithm Q3
O_ORDERKEY
O_CUSTKEY
C_MKTSEGMENT
C_CUSTKEY
L_ORDERKEY
L_SHIPDATE
sort direction
CUSTOMER ORDER
LINEITEM
1
2
Response times 50% LINEITEM (Q3)
0 1000 2000 3000 4000
0 0,25 0,5 0,75 1
TPC-D scaling factor
ti m e in s
IOT SHIPDATE IOT ORDERKEY FTS
Tetris
Temporary Storage 50% LINEITEM (Q3)
0 25 50 75 100
0 0,25 0,5 0,75 1
TPC-D scaling factor
te m p or ar y st or ag e in M B
IOT SHIPDATE and FTS Tetris
IOT ORDERKEY
Sorting 50% of LINEITEM
Table Size
(Scaling Factor) SF
33 MB (0.025)
81 MB (0.0625)
131 MB (0.1)
163 MB (0.125)
326 MB (0.25)
651MB (0.5)
1302MB (1) Tetris 1 st response 0.3s 0.5s 0.7s 1,1s 1,3s 1,3s 3,3s
Tetris Slices 64 128 128 128 256 256 512
Time IOT ORDERKEY 64.7s 184.3s 306.7s 356.2s 834.3s 1753.6s 3604.1s Time IOT SHIPDATE 72.5s 226.9s 401.3s 554..3s 1223.7s 2569.8s 5286.4s Time FTS-Sort 34.1s 126.7s 234.0s 381.1s 816.5s 1479.4s 3276.4s Time Tetris 23.1s 64.4s 92.5s 106.2s 257.5s 441.2s 1062.2s
Cache Tetris 0.3.MB 0.3MB 0.9MB 1.1MB 1.4MB 2.1MB 2.6MB
Temp Storage IOT/FTS 17MB 40MB 65MB 81MB 183MB 326MB 751MB
Forecasting Revenue Change Query (Q6)
SUM(L_EXTENDEDPRICE*L_DISCOUNT) AS REVENUE FROM LINEITEM
WHERE
L_SHIPDATE >= [date] AND
L_SHIPDATE <= [date] + INVERVAL 1 YEAR AND L_DISCOUNT BETWEEN [discount] -0.01
AND [discount] + 0.01 AND
L_QUANTITY < [quantity]
Forecasting Revenue Change Query (Q6)
LINEITEM Tetris
L_SHIPDATE L_DISCOUNT
L_QUANTITY
LINEITEM IOT
L_SHIPDATE L_DISCOUNT
L_QUANTITY
LINEITEM FTS
L_SHIPDATE L_DISCOUNT
L_QUANTITY
Performance of Q6
0 50 100 150 200
0 0,25 0,5 0,75 1
TPC-D scaling factor
ti m e in s IOT QUANTITY
IOT DISCOUNT IOT SHIPDATE FTS
Tetris
Table Size
Scaling Factor (SF)
33 MB (0.025)
81 MB (0.0625)
131 MB (0.1)
163 MB (0.125)
326 MB (0.25)
651MB (0.5)
1302MB (1) Time IOT QUANTITY 43,6s 109,2s 180,1s 225,2s 460,7s 921,4s 1842,8s Time IOT DISCOUNT 31,2s 78,3s 126,4s 158,2s 339,2s 678,4s 1356,8s Time IOT SHIPDATE 21,2s 53,7s 81,6s 102,1s 208,1s 416,3s 832,5s
Time FTS 5,2s 12,1s 19,2s 23,8s 47,7s 93,9s 187,6s
Time UB-Tree 1,1s 2,5s 4,5s 5,8s 12,0s 21,3s 30,5s
FORWISS
Snowflake Schema GFK
Sales Turnover Price
All Year
Month4_Period Month2_Period
All Year Month4_Period Month2_Period
Orange Juice
Asia
Apple
Juice
Asia
Performance Measurements GFK
DBMS
– TransBase (covering, clustering compound B*-Trees)
– UB/API on top of TransBase (UB-Tree, two ESQL Statements are optimized and processed per UB-Tree page access)
– TransBase Hypercube (UB-Tree inside the DBMS Kernel)
Database
– real world data warehouse from GFK – 3D Snowflake Schema
» Time (3 years = 18 MP)
» Segment (10500 outlets)
» Product (~ 500000 items in 604 product groups) – 42 Mio fact tuples (~ 4 GB fact table size)
Computer
– Sun ULTRA 1 Workstation (64 MB Main Memory)
MHC to encode hierarchies:
– TIME_CS (5 bits)
– SEGMENT_CS (24 bits) – PRODUCT_CS (29 bits)
Compound on
(PRODUCT_CS, TIME_CS, SEGMENT_CS) or (TIME_CS, SEGMENT_CS, PRODUCT_CS)
UB-Tree (UB/API) on
{TIME_CS, PRODUCT_CS, SEGMENT_CS}
GFK Datawarehouse Reports
R eta il A u d it - G er m an y - S eg m en ta tion r ep o rt C a m co rd er P an elm ar k e t A p ril/M a y 1 9 9 9
su m (sa les) a vg (p rice ) m in (price) m a x (pr ic e) a cc(tu rn o v er) sh ar e(tu rn o ve r) T o ta l
H i8 m o n o H i8 ste reo ...
R eta il A u d it - G er m an y - R u n n in g r ep or t C a m co rd er P an elm ark et S a les u n its
Ju n /Ju l 9 8 A u g /S ep 9 8 O ct/N ov 9 8 D ec/Ja n 9 9 F eb /M a r 9 9 A p r/M a y 9 9 to ta l
H i8 m o n o H i8 ste reo ...
R e ta il A u d it - G er m a n y - S e g m en tation h itlist C a m c or d er P an elm ar k et A p ril/M a y 1 9 9 9
su m (sa les) a vg (p rice ) m in (p ric e) m a x (p r ice) a cc(tu rn o v er) sh ar e(tu rn o ve r) 1 . < ite m >
2 . < ite m >
3 . < ite m >
4 . ...
selectivity << 1%
Compound: fixed 2MP, varying PG
0 10.000 20.000 30.000 40.000 50.000 60.000 70.000 80.000
12504 12536 13600 13618 13636 13662 13681 14926 14949 15520 15537 16520 16536 16585 17230 17246 18005 18023 18040 18057 18074 18091 19743 19762 19789 20815 20832 20851 20867 20888 21734 22038 23213 23256 24137 25823 26039 30439
PRODUCTGROUP_ID
Time (ms)
UB-Tree: fixed 2MP, varying PG
0 10.000 20.000 30.000 40.000 50.000 60.000 70.000 80.000
12504 12536 13600 13618 13636 13662 13681 14926 14949 15520 15537 16520 16536 16585 17230 17246 18005 18023 18040 18057 18074 18091 19743 19762 19789 20815 20832 20851 20867 20888 21734 22038 23213 23256 24137 25823 26039 30439
Time (ms)
Response Time & Result Set Size
0 10.000 20.000 30.000 40.000 50.000 60.000 70.000 80.000
0 10.000 20.000 30.000 40.000 50.000 60.000 70.000 Record Count
Time (ms)
Compound UB/Kernel UB/API
Clustering of UB-Trees
26 27 20 41 43 46 67 87
190
1007
0 200 400 600 800 1000 1200
0,1 0,2 0,3 0,4 0,5 0,6 0,7 0,8 0,9 1
Absolute frequency
Ø = 0.85
s / d c
clustering factor
Clustering depending on Result Set Size
0 0,2 0,4 0,6 0,8 1 1,2
0 10.000 20.000 30.000 40.000 50.000 60.000 70.000
Row Count
Clustering Factor