• Keine Ergebnisse gefunden

Ch. 2.1 DWH-Concepts Ch. 2 DWH-Architectures

N/A
N/A
Protected

Academic year: 2022

Aktie "Ch. 2.1 DWH-Concepts Ch. 2 DWH-Architectures"

Copied!
7
0
0

Wird geladen.... (Jetzt Volltext ansehen)

Volltext

(1)

Ch. 2 DWH-Architectures

Ch. 2.1 DWH-Concepts

(2)

Def: by Inmon:

A datawarehouse is a subject oriented, non-volatile and time variant collection of data in support of management decisions Note: „collection“ is too narrow,

AP = analytical processing is missing, like

DB + DBS

DWH + DWHS

(3)

Steps to build a DWH

• Acquisition of data

• Data cleansing

• Storage

• Processing: AP

• Maintenance, ...

Not possible with classical DB-

(4)

OLTP versus OLAP

Thematic focus

• OLTP: many small transactions (microscopic view of business processes, individual steps at lowest level, single order, delivery)

• OLAP: finances in general, personnel in general, ...

• OLAP requires integration and unification of many detailed data into big picture

• Time orientation

• Durability: data extracted once, no updates

(5)

Technical Comparison OLTP vs OLAP

• OLTP: high rate of updates, several thousand t/s

• OLAP: read only transactions, very complex, DWH is loaded at certain time intervals, e.g. after the end of the month, quarter

– Compute intensive

– Special systems with new access methods, e.g.

multidimensional data organization and access methods

– Special OLAP systems necessary to offload

(6)

ROLAP and MOLAP

Solution 1: ROLAP relational online analytical processing, built on top of relational DBS, additional middleware or client front end Solution 2: MOLAP: multidimensional online

analytical processing

• new model

• new data organizations

• new algorithms

• new query languages

• new optimization techniques

(7)

A first DWH Example

Mining of mobile phone calls:

(Caller, Callee, Time, Duration, Geogr.

Location) ~ 100 B/tuple In BRD

107 users * 10 calls/(day*user) * 100 B/call =

= 1010 B/day ~ 3*1012 B/year = 3 TB/year Scanning data at 107 B/s takes

Referenzen

ÄHNLICHE DOKUMENTE

– Both of the two child nodes to the left and right of the deleted element have the minimum number of elements (L-1) and then can then be joined into a legal single node with

• Cost estimate: (Height of the tree or 1 for hash index) plus #pages that contain overflow lists. 5.5

• For a non-cluster index the expected number of blocks accessed is #blocks ≈ #result size. • If no index is given, #blocks

• DB administrators may provide optimization hints to override optimizer heuristics and results. – Uses explain statement‟s PLAN_TABLE

8.1 Basic join order optimization 8.2 Join cost and size estimations 8.3 Left-deep join trees.. 8.4 Dynamic programming 8.5

– A complete schedule is called serial, if it consists of any permutation of the transactions, where each transaction is fully executed before the next starts. – Often

– 2PL means that for each transaction all necessary locks are acquired before the first lock is released.

– If a transaction does not reach its commit point, no harm has been done and no undo operations are needed – It may still be necessary to redo changes of already. committed and