Ch. 2 DWH-Architectures
Ch. 2.1 DWH-Concepts
Def: by Inmon:
A datawarehouse is a subject oriented, non-volatile and time variant collection of data in support of management decisions Note: „collection“ is too narrow,
AP = analytical processing is missing, like
DB + DBS
DWH + DWHS
Steps to build a DWH
• Acquisition of data
• Data cleansing
• Storage
• Processing: AP
• Maintenance, ...
Not possible with classical DB-
OLTP versus OLAP
Thematic focus
• OLTP: many small transactions (microscopic view of business processes, individual steps at lowest level, single order, delivery)
• OLAP: finances in general, personnel in general, ...
• OLAP requires integration and unification of many detailed data into big picture
• Time orientation
• Durability: data extracted once, no updates
Technical Comparison OLTP vs OLAP
• OLTP: high rate of updates, several thousand t/s
• OLAP: read only transactions, very complex, DWH is loaded at certain time intervals, e.g. after the end of the month, quarter
– Compute intensive
– Special systems with new access methods, e.g.
multidimensional data organization and access methods
– Special OLAP systems necessary to offload
ROLAP and MOLAP
Solution 1: ROLAP relational online analytical processing, built on top of relational DBS, additional middleware or client front end Solution 2: MOLAP: multidimensional online
analytical processing
• new model
• new data organizations
• new algorithms
• new query languages
• new optimization techniques
A first DWH Example
Mining of mobile phone calls:
(Caller, Callee, Time, Duration, Geogr.
Location) ~ 100 B/tuple In BRD
107 users * 10 calls/(day*user) * 100 B/call =
= 1010 B/day ~ 3*1012 B/year = 3 TB/year Scanning data at 107 B/s takes