• Keine Ergebnisse gefunden

Performance of TPC-D Benchmark and Datawarehouses MISTRAL

N/A
N/A
Protected

Academic year: 2022

Aktie "Performance of TPC-D Benchmark and Datawarehouses MISTRAL"

Copied!
23
0
0

Wird geladen.... (Jetzt Volltext ansehen)

Volltext

(1)

MISTRAL

Performance of TPC-D Benchmark and Datawarehouses

Prof. R. Bayer, Ph.D.

Dr. Volker Markl

Dept. of Computer Science, Technical University Munich and Bavarian Research Center for Knowledgebased Systems

(FORWISS)

(2)

Test Bed for Performance Measurements

 Hardware

– Compaq Proliant 5000 – 4 Pentium II 200 MHz – 512 MB RAM

– hard disk: 7 * 4 GB = 28 GB

 Operating System

– Windows NT 4.0

 RDBMS

– Oracle 8 – 8kB pages

 Access Methods

– Tetris Algorithm for UB-Trees

– Oracle IOT (clustering B*-Tree)

– Oracle FTS (full table scan)

(3)

TPC-D Schema

ORDERKEY LINEITEM (L_) SF*6000K Tuples

PARTKEY SUPPKEY LINENUMBER

QUANTITY EXTENDEDPRICE

DISCOUNT TAX RETURNFLAG

LINESTATUS COMMITDATE RECEIPTDATE SHIPINSTRUCT SHIPMODE COMMENT ORDERKEY

ORDER (O_) SF*1500K Tuples

CUSTKEY ORDERSTATUS

TOTALPRICE ORDERDATE ORDERPRIORITY

CLERK SHIPPRIORITY

COMMENT

PARTKEY PART (P_) SF*200K Tuples

NAME MFGR BRAND

TYPE SIZE CONTAINER RETAILPRICE

COMMENT

CUSTKEY CUSTOMER (C_)

SF*150K Tuples

NAME ADDRESS NATIONKEY

PHONE ACCTBAL MKTSEGMENT

COMMENT

SUPPKEY SUPPLIER (S_)

SF*10K Tuples

NAME ADDRESS NATIONKEY

PHONE ACCTBAL COMMENT PARTKEY PARTSUPP (PS_)

SF*800K Tuples

SUPPKEY AVAILQTY SUPPLYCOST

COMMENT

NATIONKEY NATION (N_) 25 Tuples

NAME REGIONKEY

COMMENT

REGIONKEY REGION (R_)

5 Tuples

NAME COMMENT

(4)

SELECT

L_ORDERKEY, SUM(L_EXTENDEDPRICE*(1-L_DISCOUNT)) AS REVENUE, O_ORDERDATE, O_SHIPPRIORITY

FROM CUSTOMER, ORDER, LINEITEM WHERE

C_MKTSEGMENT = 'FOOD' AND C_CUSTKEY = O_CUSTKEY AND L_ORDERKEY = O_ORDERKEY AND O_ORDERDATE < DATE 1.5.98 AND L_SHIPDATE > DATE 1.6.98

GROUP BY L_ORDERKEY, O_ORDERDATE, O_SHIPPRIORITY

ORDER BY REVENUE DESC, O_ORDERDATE

(5)

Tetris algorithm Q3

O_ORDERKEY

O_CUSTKEY

C_MKTSEGMENT

C_CUSTKEY

L_ORDERKEY

L_SHIPDATE

sort direction

CUSTOMER ORDER

LINEITEM

1

2

(6)

Response times 50% LINEITEM (Q3)

0 1000 2000 3000 4000

0 0,25 0,5 0,75 1

TPC-D scaling factor

ti m e in s

IOT SHIPDATE IOT ORDERKEY FTS

Tetris

(7)

Temporary Storage 50% LINEITEM (Q3)

0 25 50 75 100

0 0,25 0,5 0,75 1

TPC-D scaling factor

te m p or ar y st or ag e in M B

IOT SHIPDATE and FTS Tetris

IOT ORDERKEY

(8)

Sorting 50% of LINEITEM

Table Size

(Scaling Factor) SF

33 MB (0.025)

81 MB (0.0625)

131 MB (0.1)

163 MB (0.125)

326 MB (0.25)

651MB (0.5)

1302MB (1) Tetris 1 st response 0.3s 0.5s 0.7s 1,1s 1,3s 1,3s 3,3s

Tetris Slices 64 128 128 128 256 256 512

Time IOT ORDERKEY 64.7s 184.3s 306.7s 356.2s 834.3s 1753.6s 3604.1s Time IOT SHIPDATE 72.5s 226.9s 401.3s 554..3s 1223.7s 2569.8s 5286.4s Time FTS-Sort 34.1s 126.7s 234.0s 381.1s 816.5s 1479.4s 3276.4s Time Tetris 23.1s 64.4s 92.5s 106.2s 257.5s 441.2s 1062.2s

Cache Tetris 0.3.MB 0.3MB 0.9MB 1.1MB 1.4MB 2.1MB 2.6MB

Temp Storage IOT/FTS 17MB 40MB 65MB 81MB 183MB 326MB 751MB

(9)

Forecasting Revenue Change Query (Q6)

SUM(L_EXTENDEDPRICE*L_DISCOUNT) AS REVENUE FROM LINEITEM

WHERE

L_SHIPDATE >= [date] AND

L_SHIPDATE <= [date] + INVERVAL 1 YEAR AND L_DISCOUNT BETWEEN [discount] -0.01

AND [discount] + 0.01 AND

L_QUANTITY < [quantity]

(10)

Forecasting Revenue Change Query (Q6)

LINEITEM Tetris

L_SHIPDATE L_DISCOUNT

L_QUANTITY

LINEITEM IOT

L_SHIPDATE L_DISCOUNT

L_QUANTITY

LINEITEM FTS

L_SHIPDATE L_DISCOUNT

L_QUANTITY

(11)

Performance of Q6

0 50 100 150 200

0 0,25 0,5 0,75 1

TPC-D scaling factor

ti m e in s IOT QUANTITY

IOT DISCOUNT IOT SHIPDATE FTS

Tetris

(12)

Table Size

Scaling Factor (SF)

33 MB (0.025)

81 MB (0.0625)

131 MB (0.1)

163 MB (0.125)

326 MB (0.25)

651MB (0.5)

1302MB (1) Time IOT QUANTITY 43,6s 109,2s 180,1s 225,2s 460,7s 921,4s 1842,8s Time IOT DISCOUNT 31,2s 78,3s 126,4s 158,2s 339,2s 678,4s 1356,8s Time IOT SHIPDATE 21,2s 53,7s 81,6s 102,1s 208,1s 416,3s 832,5s

Time FTS 5,2s 12,1s 19,2s 23,8s 47,7s 93,9s 187,6s

Time UB-Tree 1,1s 2,5s 4,5s 5,8s 12,0s 21,3s 30,5s

(13)

FORWISS

Snowflake Schema GFK

Sales Turnover Price

All Year

Month4_Period Month2_Period

All Year Month4_Period Month2_Period

(14)

Orange Juice

Asia

Apple

Juice

Asia

(15)

Performance Measurements GFK

 DBMS

– TransBase (covering, clustering compound B*-Trees)

– UB/API on top of TransBase (UB-Tree, two ESQL Statements are optimized and processed per UB-Tree page access)

– TransBase Hypercube (UB-Tree inside the DBMS Kernel)

 Database

– real world data warehouse from GFK – 3D Snowflake Schema

» Time (3 years = 18 MP)

» Segment (10500 outlets)

» Product (~ 500000 items in 604 product groups) – 42 Mio fact tuples (~ 4 GB fact table size)

 Computer

– Sun ULTRA 1 Workstation (64 MB Main Memory)

(16)

 MHC to encode hierarchies:

– TIME_CS (5 bits)

– SEGMENT_CS (24 bits) – PRODUCT_CS (29 bits)

 Compound on

(PRODUCT_CS, TIME_CS, SEGMENT_CS) or (TIME_CS, SEGMENT_CS, PRODUCT_CS)

 UB-Tree (UB/API) on

{TIME_CS, PRODUCT_CS, SEGMENT_CS}

(17)

GFK Datawarehouse Reports

R eta il A u d it - G er m an y - S eg m en ta tion r ep o rt C a m co rd er P an elm ar k e t A p ril/M a y 1 9 9 9

su m (sa les) a vg (p rice ) m in (price) m a x (pr ic e) a cc(tu rn o v er) sh ar e(tu rn o ve r) T o ta l

H i8 m o n o H i8 ste reo ...

R eta il A u d it - G er m an y - R u n n in g r ep or t C a m co rd er P an elm ark et S a les u n its

Ju n /Ju l 9 8 A u g /S ep 9 8 O ct/N ov 9 8 D ec/Ja n 9 9 F eb /M a r 9 9 A p r/M a y 9 9 to ta l

H i8 m o n o H i8 ste reo ...

R e ta il A u d it - G er m a n y - S e g m en tation h itlist C a m c or d er P an elm ar k et A p ril/M a y 1 9 9 9

su m (sa les) a vg (p rice ) m in (p ric e) m a x (p r ice) a cc(tu rn o v er) sh ar e(tu rn o ve r) 1 . < ite m >

2 . < ite m >

3 . < ite m >

4 . ...

 selectivity << 1%

(18)

Compound: fixed 2MP, varying PG

0 10.000 20.000 30.000 40.000 50.000 60.000 70.000 80.000

12504 12536 13600 13618 13636 13662 13681 14926 14949 15520 15537 16520 16536 16585 17230 17246 18005 18023 18040 18057 18074 18091 19743 19762 19789 20815 20832 20851 20867 20888 21734 22038 23213 23256 24137 25823 26039 30439

PRODUCTGROUP_ID

Time (ms)

(19)

UB-Tree: fixed 2MP, varying PG

0 10.000 20.000 30.000 40.000 50.000 60.000 70.000 80.000

12504 12536 13600 13618 13636 13662 13681 14926 14949 15520 15537 16520 16536 16585 17230 17246 18005 18023 18040 18057 18074 18091 19743 19762 19789 20815 20832 20851 20867 20888 21734 22038 23213 23256 24137 25823 26039 30439

Time (ms)

(20)

Response Time & Result Set Size

0 10.000 20.000 30.000 40.000 50.000 60.000 70.000 80.000

0 10.000 20.000 30.000 40.000 50.000 60.000 70.000 Record Count

Time (ms)

Compound UB/Kernel UB/API

(21)

Clustering of UB-Trees

26 27 20 41 43 46 67 87

190

1007

0 200 400 600 800 1000 1200

0,1 0,2 0,3 0,4 0,5 0,6 0,7 0,8 0,9 1

Absolute frequency

Ø = 0.85

s / d c

clustering factor

(22)

Clustering depending on Result Set Size

0 0,2 0,4 0,6 0,8 1 1,2

0 10.000 20.000 30.000 40.000 50.000 60.000 70.000

Row Count

Clustering Factor

(23)

Summary UB-Tree

 Excellent performance on large real DBs, > factor 10

 Very low storage requirement

 1st answer extremely fast, interactive use!!

 Response time proportional to size of answer

 Wide applicability: all DBs are multidimensional!!

 Easy integration into DBMS, simple DDL extension

 Very useful as middleware

Patent applications

Referenzen

ÄHNLICHE DOKUMENTE

Through an analysis of the spatial modelling within the films in light of Yuri Lotman’s cultural semiotics, particularly his ‘notion of boundary’ and the aspect of

Je mehr die Schülerinnen und Schüler mit der Me- thode Lapbook vertraut sind, desto freier können sie sich ein Thema erarbeiten, bis sie irgendwann nur noch Blankovorlagen

Liest man das Verhältnis der klas- sischen deutschen Philosophie zum Judentum aus dieser begrifflichen Perspek- tive, so zeigt sich gerade eine Vielschichtigkeit und

A comprehensive analysis of food and agricultural policy should take into account the process of policy formation (Figure 1, left-hand side), and also the links between the

Two strategies have been proposed in the literature to meet this need; first, free 25(OH)D as a marker that might be superior to total 25(OH)D in reflecting VitD status

Y Evening Reception hosted by the German Agriculture and Food Industry on 18 January 2014, starting at 19:00 hrs with the participants of the Berlin Agriculture Minister’s Summit

C_MKTSEGMENT = 'FOOD' AND C_CUSTKEY = O_CUSTKEY AND L_ORDERKEY = O_ORDERKEY AND O_ORDERDATE &lt; DATE 1.5.98 AND L_SHIPDATE &gt; DATE 1.6.98. GROUP BY L_ORDERKEY,

Original GL30 (upper left), OSM-derived LULC map (upper right) and updated GL30 (lower left) for the Kathmandu study area..