Data Warehousing
& Data Mining
Wolf-Tilo Balke Silviu Homoceanu
Institut für Informationssysteme Technische Universität Braunschweig http://www.ifis.cs.tu-bs.de
4. Queries
4.1 Query processing 4.2 Queries in DW / OLAP 4.3 Physical Modeling
Data Warehousing & OLAP – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 2
4. Queries
• Queries are posed to the DBMS and processed before the actual evaluation
Data Warehousing & OLAP – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 3
4.1Query processing
Query Processor
DDL Interpreter
Query Evaluation
Engine
Applications Programs Object Code
Embedded DML Precompiler
DML Compiler
Storage Manager
Data
• How queries are answered
– Queries are usually stated in a high level declarative language as SQL
• For relational DB it can be mapped to relational algebra (RA)
– For evaluation it has to be translated to a low level execution plan
• Expressions that can be used at physical level of the file system
–E.g., for RDB physical relational algebra
Data Warehousing & OLAP – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 4
4.1 Query processing
4.1Query processing
Parser &
Translator
Statistics Query
Evaluation Engine
Query Optimizer
Data
Access Paths Execution
Plan Query
Result
Relational Algebra
Expression
• Parsing and translation
– Queries need to be translated
• A scanner tokenizes the query
–DB language keywords, table names, attribute names, etc.
• The parser checks syntax and verifies relations, attributes, data types, etc.
• Translate the query into its internal form
–Translated into relational algebra
4.1 Query processing
• Optimization
– Several relational algebra expressions might lead to the same result
• But different statements might also result in very different performance
– Query optimization is the heart of every database kernel
• Finding optimal plans may cost too much, but avoid crappy plans by all means
Data Warehousing & OLAP – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 7
4.1 Query processing
• Evaluation
– The query-execution engine takes a query-evaluation plan, executes it, and returns the answers to the query – For the result of each operator a temporary file has
to be created
• Temporary files can be input for other operators
• Storing the temporary files on the disk is expensive, but necessary if DB buffer is small
Data Warehousing & OLAP – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 8
4.1 Query processing
• DW queries are big queries – Imply a large portion of the data – Read only queries
•
no Updates
• Redundancy a necessity
– Materialized Views, special-purpose indexes, de-normalized schemas
• Data is refreshed periodically – E.g., Daily or weekly
• Their purpose is to analyze data – OLAP (OnLine Analytical Processing)
Data Warehousing & OLAP – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 9
4.2 DW Queries
• OLAP usage fields – Management Information
• Sales per product group / area / year – Government
• Population census – Scientific databases
• Geo-, Bio-Informatics – Etc.
• Goal: Response Time of seconds / few minutes
Data Warehousing & OLAP – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 10
4.2 DW Queries
• ODS can also run analytical queries…but they are not so good at it
• OLTP and OLAP are to each other as Water and Oil
– Lock Conflicts: OLAP blocks OLTP
• E.g., an OLAP query can block the sales activity of all the stores trying to update the DB
– Database design:
• OLTP - normalized,
• OLAP - de-normalized
Data Warehousing & OLAP – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 11
4.2 Why use DW
• Tuning, Optimization
– OLTP - inter-query parallelism, heuristic optimization
– OLAP - intra-query parallelism, full-fledged optimization
• Freshness of Data – OLTP - serializability – OLAP - reproducability
• Precision – OLTP - ACID
– OLAP - Sampling, Confidence Intervals
Data Warehousing & OLAP – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 12
4.2 OLTP vs. OLAP
• The solution is to run OLTP and OLAP separately
• DW is a special sandbox for OLAP – As input it uses OLTP systems – DW aggregates and replicates data
• Special schema
– New data is periodically uploaded to the Warehouse – Old data is deleted from Warehouse
– Archiving done by OLTP system for legal reasons
Data Warehousing & OLAP – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 13
4.2 Why use DW
• Comparisons
– Show me the sales per region for this year and compare it to that of the previous year to identify discrepancies
• Multidimensional ratios
– Show me the contribution to weekly profit made by all items sold in the northeast stores between 1
stof May and 7
thof May
• Ranking and statistical profile
– Show me sales, profit and average call volume per day for my 10 most profitable sales-people
• Custom consolidation
– Show me the income statement by quarter for the last four quarters for my northeast region operations
Data Warehousing & OLAP – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 14
4.2 Typical analytical requests
• OLAP queries
SELECT d
1.x, d
2.y, d
3.z, sum(f.t
1), avg(f.t
2) FROM Fact f, Dim1 d
1, Dim2 d
2, Dim3 d
3WHERE a < d
1.field < b AND d
2.field = c GROUP BY d
1.x, d
2.y, d
3.z;
• The idea is to
– Select by Attributes of Dimensions
•
E.g., region = „Europe“
– Group by Attributes of Dimensions
•
E.g., region, month, quarter – Aggregate on measures
•
E.g., sum(price * volume)
Data Warehousing & OLAP – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 15
4.2 Typical queries
• How do we differentiate between OLAP and non- OLAP products? - OLAP rules
• Published in a controversial white paper – “Providing OLAP to the User-Analysts:
An IT Mandate. (Arbor Software, 1993)”
• Dr. Codd was accused that he allowed his name to be used, but did not put to much work into it
Data Warehousing & OLAP – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 16
4.2 Codd’s OLAP rules
ORACLE
Hyperions
Solutions Arbor
Software
• Rules organization
– 12 rules + 6 (extension rules) added in 1995 – 4 feature groups
• Basic features
• Special features
• Reporting features
• Dimension control
4.2 Codd’s OLAP rules
• 1) Multidimensional Conceptual View
– Data is viewed in multidimensional form in a matrix.
An enterprise becomes multidimensional
• E.g., profits could be viewed by region, product, time period or scenario (actual budget, forecasts, etc.)
– Advantages
• Multidimensional models enable more straight-forward manipulation of data
–E.g., slice, dice, etc.
4.2 Basic features
• 2) Intuitive data manipulation
– Existence of a GUI with drag-and-drop feature and other graphical facilities
– Intuition is a vague term…
Data Warehousing & OLAP – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 19
4.2 Basic features
• 3) Accessibility: OLAP as a Mediator
– Middleware between heterogeneous data sources and OLAP front end
• 4) Batch extraction vs. Interpretative Extraction – OLAP has to have its own staging database – It should also allow live access to external data – Similar to what HOLAP is today
Data Warehousing & OLAP – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 20
4.2 Basic features
• 5) OLAP analysis models
– Categorical: typical descriptive statistics
• Comparison of historical values
– Exegetical: what we have been doing with spreadsheets (slice, dice, drill down)
• Discovering reasons for what we found through the categorical model
– Contemplative: what if analysis
• E.g., What is the effect of closing the Alaska store, to the company
Data Warehousing & OLAP – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 21
4.2 Basic features
• 5) OLAP analysis models – Formulaic: goal seeking models
• You know the outcome you want but you don’t know how to get there
• The model keeps changing parameters and doing contemplations until it gets the desired result or proves it is impossible
–E.g., How can I increase the sales of bikinis in the Alaska store? The outcome can be:
»Many solutions…
»No solutions: Bikini sales in Alaska are doomed to failure
»Unacceptable solutions: Close down all but the Alaska store
Data Warehousing & OLAP – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 22
4.2 Basic features
• 6) Client/Server Architecture – Allow users to share data easily
• 7) Transparency
– The client should not have to be aware of how connections to the OLAP engine or other data sources is made
• 8) Multiuser support
– OLAP is read-only therefore no need for transaction control
– New OLAP systems allow data query while data is being streamed from external sources
Data Warehousing & OLAP – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 23
4.2 Basic features
• Special features
– 9) Treatment of non-normalized data
• Can load data also from non-RDBMS sources – 10) Store OLAP results
• OLAP data is expensive
• Reconstructing it over and over from the live data is not a good idea
–OLAP DB is a snapshot of the state of the data sources
Data Warehousing & OLAP – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 24
4.2 Special features
• 11) Extraction of missing values – 2 kinds of missing values
• NULL as in SQL meaning we don’t know the value of the attribute
• Missing value meaning that the attribute will never have a value for that entity
• 12) Treatment of missing values
• All missing values are ignored by the OLAP analyzer, regardless of their source
Data Warehousing & OLAP – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 25
4.2 Special features
• 13) Flexible reporting – Vague
• 14) Uniform reporting performance
– Codd required that reporting performance would not be significantly degraded by increasing the number of dimensions or database size
– Sounds more like a goal then a rule
• 15) Automatic adjustment of physical level – OLAP systems adjust its physical storage automatically – Dynamical adjusted HOLAP
Data Warehousing & OLAP – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 26
4.2 Reporting features
• Dimension control – 16) Generic dimensionality
•
Each dimension must be equivalent in both its structure and operational capabilities
•
Controversial rule
– 17) Unlimited dimensions and aggregation levels
•
Unlimited…is physically impossible so we should settle with a large number
–E.g., it should support at least 15 to 20 dimensions
– 18) Unrestricted cross-dimensional operations
•
Operation is not the same as calculation
–E.g. “What is Friday divided by red?” but operation on mixed data is possible “How many red shirts did we sell on Friday?”
Data Warehousing & OLAP – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 27
4.2 Dimension control
• Codd’s OLAP rules turned out not to be a success
• Other attempts to define OLAP and offer OLAP guides were made by
– The OLAP council – Analytical Solutions Forum – OLAP Solutions
• Nigel Pendse’s FASMI test
Data Warehousing & OLAP – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 28
4.2 Codd’s OLAP rules
• The FASMI test:
– Fast
•
System is targeted to respond users within ~ 5 seconds
–Complex analysis should take no longer than 20 second•
This can be achieved with exotic hardware and lots of pre- calculated scenarios
– Analysis
•
The system can cope with any business logic and statistical analysis that is relevant for the application and the user, and keep it easy enough for the target user
– Shared
•
The system should implement security requirements
•
Not all OLAP products are read-only
4.2 FASMI
• The FASMI test:
– Multidimensional
• The most important factor
• Should support multidimensional conceptual views
• Full support for hierarchies – Information
• All the data and derived information (meta-data), needed
• The question is how much input data they can handle not how much GB they use to store it
4.2 FASMI
• Typical OLAP operations – Roll-up
– Drill-down – Slice and dice – Pivot (rotate)
• Other operations – Aggregate functions – Ranking and comparing – Drill-across
– Drill-through
Data Warehousing & OLAP – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 31
4.2 OLAP operations
• Roll-up (drill-up)
– Taking the current aggregation level of fact values and doing a further aggregation
– Summarize data by
• Climbing up hierarchy
• Or by dimensional reduction
• A mix of these 2 techniques
– Used for obtaining an increased generalization
• E.g., from Time.Week to Time.Year
Data Warehousing & OLAP – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 32
4.2 Roll-up
• Roll-up operations can be classified into – Dimensional roll-ups
• Are done solely on the fact table by dropping one or more dimensions, where the dimensions retained are represented by their keys (basic attributes of the attribute hierarchy)
–E.g., drop Client dimension
Data Warehousing & OLAP – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 33
4.2 Roll-up
Store City
District Region Country
Article Prod. Group Prod. Family
Prod. Categ
Week Month Day Quarter
Year
Sales Turnover Client
• Hierarchical roll-ups
– are done on the fact table and some dimension tables by climbing up the attribute hierarchies of dimensions whose hierarchies are used and having at least one attribute of each dimension
• E.g., climbed the Time hierarchy to Quarter and Article hierarchy to Prod. group
Data Warehousing & OLAP – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 34
4.2 Roll-up
Store City
District Region Country
Article Prod. Group Prod. Family
Prod. Categ
Week Month Day Quarter
Year
Sales Turnover Client
• Climbing above the top
– In an ultimate case, hierarchical roll-up above the top level of an attribute hierarchy (attribute “ALL”) can be viewed as converting to a dimensional roll-up
Data Warehousing & OLAP – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 35
4.2 Roll-up
ALL
Electronics
Video Audio
Video recorder Video recorder Camcorder
TR-34 TS-56
…
…
TV
…
Clothes
…
Article Prod. Group Prod. Family Category
• Drill-down (roll-down) – Reverse of Roll-up
– Represents a de-aggregate operation
• From higher level of summary to lower level of summary – detailed data
– Introducing new dimensions
– Requires the existence of materialized finer grained data
• You can’t drill if you don’t have the data
Data Warehousing & OLAP – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 36
4.2 Drill-down
Data Warehousing & OLAP – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 37
4.2 Roll-up drill-down example
Jim Bob Mary
Joe’s 45 33 30
Salitos 50 36 42
Roots 38 31 40
Jim Bob Mary
133 100 112
Jim Bob Mary
Wolters 48 40 40
Becks 45 31 37
Krombacher 40 29 35
Roll-up by BAR
Drill-down by brand
€ by bar/drinker
€ by drinker
€ by brand/drinker
• Slice
– Reducing the number of dimensions by taking a projection of facts on a proper subset of dimensions for some selected values of dimensions that are being dropped
Data Warehousing & OLAP – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 38
4.2 Slice
π
StoreId, TimeId, Ammount( σ
ArticleId = LaptopId(Sales))
– Amounts to equality select condition – WHERE clause in SQL
• E.g., slice Laptops
Data Warehousing & OLAP – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 39
4.2 Slice
818 Product
Geography Time 13.11.2008 18.12.2008 Laptops
CellP.
• Dice
– Amounts to range select condition on one dimension, or to equality select condition on more than one dimension
• E.g., Range SELECT
Data Warehousing & OLAP – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 40
4.2 Dice
π
StoreId, TimeId, Amount( σ
ArticleId
∈{Laptop, CellP}
(Sales))
818 Product
Geography Time 13.11.2008 18.12.2008 Laptops
CellP.
• E.g., Equality SELECT on 2 dimensions Product and Time
4.2 Dice
π
StoreId, Amount( σ
ArticleId = Laptop ∧MonthID = December
(Sales))
818 Product
Geography Time
December January Laptops
CellP.
• Pivot (rotate)
– Refers to re-arranging data for viewing purposes
• E.g., display cities down the pages and products across a page
– The simplest view of pivoting is that it selects two dimensions to aggregate the measure
• The aggregated values are often displayed in a grid where each point in the (x, y) coordinate system corresponds to an aggregated value of the measure
• The x and y coordinate values are the values of the selected two dimensions
– The result of pivoting is also called cross–tabulation
4.2 Pivot
• Consider pivoting the following data
Data Warehousing & OLAP – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 43
4.2 Pivot
Location
CityId City 1 Well 2 Nels
3 Auck
Time TimId Day
1 Mon
2 Tue
3 Wed
4 Thu
5 Fri
6 Sat
7 San
8 Mon
Sales
CityId PerId TimId Amnt
1 1 1 230
1 1 2 300
1 1 8 310
1 2 7 50
2 3 1 550
2 3 5 100
3 4 6 880
3 5 1 60
3 5 2 60
3 5 4 140
• Pivoting on City and Day
Data Warehousing & OLAP – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 44
4.2 Pivot
Mon Tue Wed Thu Fri Sat San SubTotal
Auckland 60 60 0 140 0 880 0 1140
Nelson 550 0 0 0 100 0 0 650
Wellington 540 300 0 0 0 0 50 890
SubTotal 1150 360 0 140 100 880 50 2680
Auck Nels Well SubTotal
Mon 60 550 540 1150
Tue 60 0 300 360
Wed 0 0 0 0
Thu 140 0 0 140
Fri 0 100 0 100
Sat 880 0 0 880
San 0 0 50 50
SubTotal 1140 650 890 2680
• Analytical requests are hard to express – Most analysts and decision makers
won’t enjoy it
– But wait…there are solutions
• OLAP clients allow operations to be performed through GUIs
Data Warehousing & OLAP – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 45
4.2 Typical analytical requests
SELECT f.region, z.month, sum(a.price * a.volume) FROM Order a, Time z, PoS f WHERE a.pos = f.name AND a.date = z.date GROUP BY f.region, z.month
• How do these operations look like for the user?
– E.g., Crystal Decisions OLAP software
• 2 dimensions … is trivial
• E.g., Products by Store
Data Warehousing & OLAP – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 46
4.2 OLAP data visualization
Product dimension
Store dimension
• 3 dimensions
– We can visualize sold quantity on 3 dimensions as layers
Data Warehousing & OLAP – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 47
4.2 OLAP data visualization
Store dimension
Product dimension
• More dimensions are difficult to represent – If we introduce Time dimension, a data cell could be
represented by its 4 dimensions as follows:
• Abc from Supplier dimension
• Batteries from Products dimension
• Uptown from Store dimension
• And Monday from Time dimension
Data Warehousing & OLAP – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 48
4.2 OLAP data visualization
• OLAP products represent 3 and more dimensional data reducing it to a 2D layout
– By picking values of the dimensions which can not be displayed
• E.g., Display the number sold of Products by any of the Stores on Monday
Data Warehousing & OLAP – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 49
4.2 OLAP data visualization
• Another way is by nesting on the same axis
Data Warehousing & OLAP – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 50
4.2 OLAP data visualization
• OLAP reporting has to be very flexible – The IBM way of an OLAP web based report
Data Warehousing & OLAP – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 51
4.2 OLAP data visualization
• Drill-down operation – Can be performed easy
by going down on the hierarchy and choosing the granularity
Data Warehousing & OLAP – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 52
4.2 OLAP data visualization
• Trends Visualization – With the help of charts
4.2 OLAP data visualization
• We have seen how it looks at the user level and on the conceptual side
• But…how do operations translate from user level downwards?
– Well…it depends on the physical models used
• DOLAP (Desktop OLAP)
• MOLAP (Multidimensional OLAP)
• ROLAP (Relational OLAP)
• HOLAP (Hybrid OLAP)
4.3 Physical models
O L A P R O L A P H O L A P M O L A P D O L A
T P
i m e
• DOLAP
– Developed as extension to the production system reports – The idea behind is
•
It downloads a small hypercube from a central point (data mart or DW)
•
Performs multidimensional analysis while disconnected from the data source
•
The computation occurs on the client – Requires little investment
– Most are limited to a single user
– They lack the ability to manage large data sets
Data Warehousing & OLAP – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 55
4.3 Physical models
• MOLAP
– Presentation layer provides the multidimensional view – The OLAP server stores data in a
multidimensional structure
• Computation occurs in this layer during the loading step (not at query)
Data Warehousing & OLAP – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 56
4.3 Physical models
MOLAP Interface
MDB Client
Presentation
Server
Data
• Advantages
– Excellent performance
• Fast data retrieval
• Optimal for slicing and dicing
• Complex calculations
• All calculations are pre-generated when the cube is created
Data Warehousing & OLAP – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 57
4.3 MOLAP
• All calculations are pre-generated when the cube is created
Data Warehousing & OLAP – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 58
4.3 MOLAP
all
time supplier
time,item time,location
time,supplier item,location
item,supplier location,supplier
time,item,location
time,item,supplier time,location,supplier
item,location,supplier
time, item, location, supplier
0-D(apex) cuboid
1-D cuboids
2-D cuboids
3-D cuboids
4-D(base) cuboid
item location
• Disadvantages
– Limited amount of data it can handle
•
Cube can be derived from large amount of data, but only
summary level information will be included in the cube– Requires additional investment
•
Cube technology are often proprietary – Enormous amount of overhead
•
An input file of 200 MB can expand to 5 GB with calculations
• Products:
– Cognos (IBM), Essbase (Oracle), Microsoft Analysis Service, Palo (open source)
Data Warehousing & OLAP – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 59
4.3 MOLAP
• Things to consider when choosing MOLAP – MOLAP tools traditionally have difficulty querying
models with dimensions with very high cardinality (i.e., millions of members)
– Some MOLAP products have difficulty updating and querying models with more than 10 dimensions
• It depends on
–the complexity and cardinality of the dimensions in question –the number of facts or measures stored
– Other MOLAP products can handle hundreds of dimensions
Data Warehousing & OLAP – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 60
4.3 MOLAP
• ROLAP
– Presentation layer provides the multidimensional view – The ROLAP Server generates
SQL queries, from the OLAP OLAP requests, to query the RDBMS
– Data is stored in RDBs
Data Warehousing & OLAP – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 61
4.3 ROLAP
Server ROLAP
Server
RDBMS Client
Presentation
Data
• Special schema design: e.g., star, snowflake
• Special indexes: e.g., bitmap, R-Trees
• Advantages
– Proven technology (relational model, DBMS) – Can handle large amounts of data (VLDBs)
• Disadvantages
– Limited SQL functionalities
• Products
– Microsoft Analysis Service, Siebel Analytics (now Oracle BI), Micro Strategy, Mondrian (open source)
Data Warehousing & OLAP – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 62
4.3 ROLAP
• Based on OLAP needs
Data Warehousing & OLAP – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 63
4.3 ROLAP vs. MOLAP
OLAP needs MOLAP ROLAP
User Benefits
Multidimensional View √ √
Excellent Performance √ -
Analytical Flexibility √ -
Real-Time Data Access - √
High Data Capacity - √
MIS Benefits
Easy Development √ -
Low Structure Maintenance - √
Low Aggregate Maintenance √ -
• HOLAP
– Best of both worlds
• Storing detailed data in RDBs
• Storing aggregated data in MDBs – Different partitioning approaches
between MOLAP and ROLAP
• Vertical
• Horizontal
Data Warehousing & OLAP – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 64
4.3 HOLAP
Server HOLAP
Server
RDBMS Presentation
Data
MDDB
• Vertical partitioning
– Aggregations are stored in MOLAP for fast query performance,
– Detailed data in ROLAP to optimize time of cube processing (loading the data from the OLTP)
• Horizontal partitioning
– HOLAP stores some slice of data, usually the more recent one (i.e. sliced by Time dimension) in MOLAP for fast query performance
– Older data in ROLAP
4.3 HOLAP
• Other approaches
– Store some cubes in MOLAP and others in ROLAP, leveraging the fact that in a large cuboid, there will be dense and sparse sub-regions
4.3 HOLAP
– ROLAP
•
RDBMS - star/snowflake schema – MOLAP
•
MDBMS - Cube structures, array based storage – ROLAP or MOLAP
•
Data models used play major role in performance differences – MOLAP
•
for summarized and relatively “small” volumes of data (50GB) – ROLAP
•
for detailed and larger volumes of data (TB)
– HOLAP is emerging as the OLAP server of choice
Data Warehousing & OLAP – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 67
4.3 Conclusions
• How do these operations look like?
– As queries they can be expressed through query languages as SQL or MDX
– The original SQL/92 was not fit for OLAP
• But SQL99 has extensions for OLAP functions
–GROUP BY, CUBE operators• But since the subject is more comprising… we will discuss it in the next lecture
Data Warehousing & OLAP – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 68
4.3 OLAP operations
• Queries
– OLAP query languages
– Logical modeling - implementation
Data Warehousing & OLAP – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 69