• Keine Ergebnisse gefunden

Data Warehousing & Mining Techniques

N/A
N/A
Protected

Academic year: 2021

Aktie "Data Warehousing & Mining Techniques"

Copied!
12
0
0

Wird geladen.... (Jetzt Volltext ansehen)

Volltext

(1)

Data Warehousing

& Mining Techniques

Wolf-Tilo Balke Silviu Homoceanu

Institut für Informationssysteme Technische Universität Braunschweig http://www.ifis.cs.tu-bs.de

• Last week:

– What is a Data Warehouse – Applications and users – Lifecycle and phases

• Architecture and Data model – This lecture

Data Warehousing & OLAP – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 2

2. Summary

2. Architecture 2.1 Basic Architecture 2.2 Storage Structures 2.3 Tier Architectures 2.4 Distributed DW 2.5 DW Data Modeling

Data Warehousing & OLAP – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 3

2. Architecture

• Architecture of a DW

Data Warehousing & OLAP – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 4

2.1 Basic Architecture

Summary

Data Raw Data

Metadata

Users

Analysis

Reporting

Mining Warehouse

Flat files Operational System

Operational System

Data Sources Staging Area

Inventory Purchasing

Sales Data Marts

• The Data Staging Area – Is both a storage and process

area (the ETL process) – It represents everything that

happens between the operational source system and the data presentation area

– The key architectural requirement for data staging area is that it is off-limits to business users and does not provide query and presentation services

Data Warehousing & OLAP – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 5

2.1 Basic Architecture

Warehouse Data

Sources Staging Area

Data Marts

• Customers aren’t invited to visit the kitchen…

– Similar to a restaurant’s kitchen, the data staging area should be accessible only to skilled professionals

Data Warehousing & OLAP – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 6

2.1 Basic Architecture

(2)

• The Data Presentation Area

– Is where data is organized, stored and made available for queries, report writers, and other analytical processing

– This area is the Warehouse as far as the business community is concerned

Data Warehousing & OLAP – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 7

2.1 Basic Architecture

Warehouse Data

Sources Staging Area

Data Marts

• Storage structure

– After extraction from the operational data, in DW information is stored in databases

– The databases are operated by a DBMS

– Different database structures can be used for a DW:

• Relational model (RDB) operated by a RDBMS

• MultiDimensional model (MDB) operated by a MDBMS

Data Warehousing & OLAP – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 8

2.2 Storage Structures

• RDB and MDB are complementary and do not have to exclude each other

– In the staging area some RDBMS can be used, however it must be off-limits to user queries because of performance reasons

– By default, normalized databases are excluded from the presentation area, which should be strictly multi-dimensionally (MDBMS)

Data Warehousing & OLAP – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 9

2.2 Storage Structures

• DB in relational model

– A database is seen as a collection of predicates over a finite set of variables

– The content of the DB is modeled as a set of relations in which all predicates are satisfied

Data Warehousing & OLAP – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 10

2.2 Relational DB

Books Title ISBN (PK) Price Publisher (FK)

Category (FK) BookCategory

Cat_ID (PK) Description Publisher Name ID (PK)

• A relation is defined as a set of tuples that have the same attributes

– It is usually described as a table

Data Warehousing & OLAP – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 11

2.2 Relational DB

Relation Attribute

Tuple

• Multidimensional DB (MDB) are optimized for DW and OLAP applications

– They are created using input from the staging area – Designed for efficient and convenient storage and

retrieval of large volumes of data

– Stored, viewed and analyzed from different perspectives called dimensions

Data Warehousing & OLAP – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 12

2.2 Multidimensional DB

(3)

• Example: an automobile manufacturer wants to increase sale volumes

– Evaluation requires to view historical sale volume figures from multiple dimensions

– Sales volume by model, by color, by dealer, over time

Data Warehousing & OLAP – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 13

2.2 MDB

• A relational structure of the given evaluation would be

Data Warehousing & OLAP – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 14

2.2 MDB

Model Color Sales volume

Mini VAN Blue 324

Mini VAN Black 113

Mini VAN Red 18

Sedan Black 160

Sedan Blue 115

Sedan Red 6

Sports coupe Red 16

Sports coupe Black 16

Sports coupe Blue 12

Data Warehousing & OLAP – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 15

2.2 Multidim. Structure

113 324 18

160 115 6

16 12 16

Mini VAN

Coupe Sedan

Blue Red

Black

289 451 40

455

281

44

* 1560

*

• The complexity grows quickly with the number of dimensions and the number of positions

– Example: 3 dimensions with 10 values each and no indexes

– If we consider viewing information in a RDB it would result in a worst case of 10

3

=1000 records view

Data Warehousing & OLAP – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 16

2.2 MDB

• Now, if we consider performance

– For responding to a query when car type = Sedan, color = Blue, and dealer = Berg

• RDBMS has to search through 1000 records to find the right record

• MDB has more knowledge about where data lies

• The maximum of searches in the case of MDB is of 30 positions

• Average case 18 vs. 501 positions

Data Warehousing & OLAP – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 17

2.2 MDB

• If the query is more relaxed

– Total sales across all dealers for all colors when car type = sedan

• RDBMS still has to go through the 1000 records

• MDB, however, goes only through a slice of 10x10

Data Warehousing & OLAP – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 18

2.2 MDB

(4)

• Performance advantages

– MDBs are an order of magnitude faster than RDBMSs

– Performance benefits are more for queries that generate cross-tab views of data (the case of DW)

Conclusion

– The performance advantages offered by MDBs facilitates the development of interactive decision support applications like OLAP that can be impractical in a relational environment

Data Warehousing & OLAP – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 19

2.2 MDB

• Any database manipulation is possible with both technologies

• MDBs however offer some advantages in the context of DW:

– Ease of data presentation – Ease of maintenance – Performance

Data Warehousing & OLAP – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 20

2.2 RDB vs. MDB

• Ease of data presentation

– Data views are natural output of the MDBs

– Obtaining the same views in RDB requires a complex query

•Example with Walmart and Sybase:

–select sum(sales.quantity_sold) from sales, products, product_categories, manufacturers, stores, cities where manufacturer_name = ‘Colgate’

and product_category_name = ‘toothpaste’

and cities.population < 40 000

and trunc(sales.date_time_of_sale) = trunc(sysdate-1) and sales.product_id = products.product_id and sales.store_id = stores.store_id

and products.product_category_id = product_categories.product_category_id and products.manufacturer_id = manufacturers.manufacturer_id and stores.city_id = cities.city_id

Data Warehousing & OLAP – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 21

2.2 RDB vs. MDB

• Ease of data presentation

– Top k queries cannot be expressed well in SQL

• Find the five cheapest hotels in Frankfurt

SELECT * FROM hotels h WHERE h.city = Frankfurt AND 5 >

(SELECT count(*) FROM hotels h1 WHERE h1.city = Frankfurt AND h1.price < h.price);

• Some RDBMS extended the functionality of SQL with STOP AFTER functionality

SELECT * FROM hotels WHERE city = Frankfurt Order By price STOP AFTER 5;

Data Warehousing & OLAP – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 22

2.2 RDB vs. MDB

• Ease of maintenance

– No additional overhead to translate user queries into requests for data

• Data is stored as it is viewed

– RDBs use indexes and sophisticated joins which require significant maintenance and storage to provide same intuitiveness

Data Warehousing & OLAP – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 23

2.2 RDB vs. MDB

• Performance

– Performance of MDBs can be matched by RDBs through database tuning

– Not possible to tune the database for all possible ad- hoc queries

Aggregate navigators are helping RDBs to catch up with MDBs as far as aggregation queries are concerned

Data Warehousing & OLAP – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 24

2.2 RDB vs. MDB

(5)

• When MDBs are in-appropriate?

– If the dataset types are not highly related, using a MDB results in a sparse representation

Data Warehousing & OLAP – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 25

2.2 MDB

113 324 18

160 115 6

16 12 16

Mini VAN

Coupe Sedan

Blue Red

Black

34 25

45

Smith

Fox James

115 3

6

34

• When MDBs are appropriate?

– In the case of highly interrelated dataset types MDBs are recommended for greatest ease of access and analysis

– Examples of applications

• Financial Analysis and Reporting

• Budgeting

• Promotion Tracking

• Quality Assurance and Quality Control

• Product Profitability

Data Warehousing & OLAP – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 26

2.2 MDB

• Popular DW architectures – Generic Two-Tier Architecture – Independent Data Mart

– Dependent Data Mart and Operational Data Store – Logical Data Mart and Active Warehouse

– Three-Tier Architecture

• Other

– One-Tier Architecture – N-Tier Architecture – Web-based Architecture

Data Warehousing & OLAP – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 27

2.3 Tier Architectures

• Generic Two-Tier Architecture – Data is not completely current in the DW – Periodic extraction

Data Warehousing & OLAP – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 28

2.3 Layered Architectures

• Data analysis comes in two flavors

– Depending on the execution place of the analysis

• Thin Client

Analytics are executed on the server

Client just displays

This architecture fits well for Internet/Intranet DW access

Data Warehousing & OLAP – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 29

2.3 Layered Architectures

Server

Data storage Analysis

Client HTTP, IIOP

• Fat Client

The server just delivers the data

Analytics are executed on the client

Communication between client and server must be able to sustain large data transfers

Data Warehousing & OLAP – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 30

2.3 Layered Architectures

Server

Data storage Analysis

Client

ODBC, JDBC, NFS

(6)

• Independent Data Mart – Mini warehouses – limited in scope

– Separate ETL for each independent Data Mart – High Data Marts access complexity

Data Warehousing & OLAP – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 31

2.3 Layered Architectures

• Dependent Data Mart and Operational Data Store

– Single ETL for the DW

– Data Marts are loaded from the DW – More simple

data access than in the previous case

Data Warehousing & OLAP – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 32

2.3 Layered Architectures

• Logical Data Mart and Active Warehouse – The ETL is near real-time

– Data Marts are not separate databases, but logical views of the DW

Data Warehousing & OLAP – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 33

2.3 Layered Architectures

Data Warehousing & OLAP – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 34

2.3 DW vs. Data Marts

DW Data Marts

Application independent

Specific DSS application Centralized, Decentralized by user area Planned Organic, possibly not planned

DW Data Marts

Historical, detailed, summarized

Some history, detailed, summarized Lightly

denormalized

Highly denormalized Scope

DW Data Marts

Multiple subjects One central subject

DW Data Marts

Many internal and external sources

Few internal and external sources Data

Sources Subjects

Other characteristics

DW Data Marts

Flexible Restrictive

Data-oriented Project oriented

Long life Short life

Large Start small, becomes

large Single complex structure

Multiple, semi-complex structure, together complex

• Generic Three-Tier Architecture – Derived data

• Data that had been selected, formatted, and aggregated for DSS support

– Reconciled data

• Detailed, current data intended to be the single, authoritative source for all decision support

Data Warehousing & OLAP – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 35

2.3 Layered Architectures

Data mart metadata

DW metadata

Operational metadata Derived data

Data Mart

Reconciled data DW and ODS

Operational data Operational data

• One-Tier Architecture – Theoretically possible

– Might be interesting for mobile applications

• N-Tier Architecture

– Higher tier architecture is also possible

• But the complexity grows with the number of tier-interfaces

• Web-based Architecture – Advantages:

• Usage of existing software, reduction of costs, platform independence

– Disadvantages:

• Security issues: data encryption/user access and identification

Data Warehousing & OLAP – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 36

2.3 Layered Architectures

(7)

• In most cases the economics and technology greatly favor a single centralized DW

• But in some cases, distributed DW make sense

• Types of distributed DW – Geographically distributed

• Local DW/global DW

– Technologically distributed DW

• Logically one DW, physically more DW – Independently evolving distributed DW

• Uncontrolled growth

Data Warehousing & OLAP – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 37

2.4 Distributed DW

• Geographically distributed

– In the case of corporations spread around the world

• Information is needed both locally and globally – A distributed DW makes sense

• When much processing occurs at the local level

• Even though local branches report to the same balance sheet, the local organizations are their own companies

Data Warehousing & OLAP – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 38

2.4 Distributed DW

USA HQ Local DW

Data Warehousing & OLAP – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 39

2.4 Distributed DW

Europe Site A Local DW

Asia Site B Local DW

Global DW All IBM

IBM/Teradata

Sybase

Local operational

processing

Local operational processing

Local operational processing

• Technologically distributed DW

– Placing the DW on the distributed technology of a vendor

– Advantages

• The entry cost is cheap – large centralized hardware is expensive

• No theoretical limit to how much data can be placed in the DW – we can add new servers to the network

Data Warehousing & OLAP – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 40

2.4 Distributed DW

– As the DW starts to expand network data communication starts playing an important role

• Example: Let’s simplify and consider we have 4 nodes holding each data regarding the last 4 years

• Now let’s consider we have a query which needs to access the data from the last 4 years:

such a query arises the issue of transporting large amount of data between processors

Data Warehousing & OLAP – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 41

2.4 Distributed DW

2005 2006

2007 2008

• Independently evolving distributed DW – In practice there are many cases in which independent

DW are developed concurrently and uncontrolled in the same organization

• The first step many corporations make is to build a DW for financial or marketing

• Once it is successfully set up, other parts of the organization follow independently the process resulting in the coexistence of more indepen- dent DW in the same organization

• This problem will be addressed later

Data Warehousing & OLAP – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 42

2.4 Distributed DW

(8)

• Data Modeling / DB Design - Basics

– Is the process of creating a data model by analyzing the requirements needed to support the business processes of an organization

• It is sometimes called database modeling/design because a data model is eventually implemented in a database

Data Warehousing & OLAP – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 43

2.5 Data Modeling

• Data models

– Provide the definition and format of data

– Graphical representations of the data within a specific area of interest

• Enterprise Data Model: represents the integrated data requirements of a complete business organization

• Subject Area Data Model: Represents the data requirements of a single business area or application

Data Warehousing & OLAP – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 44

2.5 Data Modeling

Data Warehousing & OLAP – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 45

2.5 Phases

Requirement Analysis

Conceptual Design

Physical Design Functional

Analysis

Application Program Design

Transaction Implementation

Logical Design Data requirements

Conceptual schema

Logical schema DBMS Independent

DBMS Dependent

Application

• Conceptual Design

– Transforms data requirements to conceptual model

– Conceptual model describes data entities, relationships, constraints, etc. on high-level

•Does not contain any implementation details

•Independent of used software and hardware

• Logical Design (next lecture)

– Maps the conceptual data model to the logical data model used by the DBMS

•e.g. relational model, dimensional model, …

•Technology independent conceptual model is adapted to the used DBMS software

• Physical Design (next lecture)

– Creates internal structures needed to efficiently store/manage data

•Table spaces, indexes, access paths, …

•Depends on used hardware and DBMS software

Data Warehousing & OLAP – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 46

2.5 Phases

• Going from one phase to the next:

• The phase must be complete

The result serves as input for the next phase

• Often automatic transition is possible with additional designer feedback

Data Warehousing & OLAP – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 47

2.5 Phases

Conceptual

Design Logical

Design

Physical Design ER-diagram,

UML, … Tables,

Columns, … Tablespaces,

Indexes, …

• Highest conceptual grouping of ideas – Data tends to naturally cluster with data from the

same or similar categories relevant to the organization

• The major relationships between subjects have been defined

– Least amount of detail

Data Warehousing & OLAP – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 48

2.5 Conceptual Model

(9)

• Conceptual design – See RDB1 course

– Entity-Relationship (ER) Modeling

• Entities - “things” in the real world

E.g. Car, Account, Product

• Attributes – property of an entity, entity type, or relationship type

E.g. color of a car, balance of an account, price of a product

• Relationships – between entities there can be relationships, which also can have attributes

E.g. Person owns Car

Data Warehousing & OLAP – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 49

2.5 Conceptual Model

Conceptual Design

ER-diagram, UML, …

Car Account Product

Car ColorColor

Car Person owns

Data Warehousing & OLAP – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 50

2.5 Conceptual Model

Student Professor

registration registration number

name

title credits

id

name department

Lecture Lecture

Course of Study enrolls

name part of

prereq.

curriculum semester curriculum

semester

id

attends teaches

instantiates time

day of week day of week

room semester

Lecture instance 1

N N

N N 1

N

N

1

N N N

• Conceptual design in usually done using the Unified Modeling Language (UML)

– Class Diagram, Component Diagram, Object Diagram, Package Diagram…

– For Data Modeling only Class Diagrams are used

• Entity type becomes class

• Relationships become associations

• There are special types of associations like:

aggregation, composition, or generalization

Data Warehousing & OLAP – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 51

2.5 Conceptual Model

Conceptual Design

ER-diagram, UML, …

CLASS NAME attribute 1 : domain attribute n : domain operation 1 operation m

• Logical design arranges data into a logical structure

– Which can be mapped into the storage objects supported by DBMS

• In the case of RDB, the storage objects are tables which store data in rows and columns

Data Warehousing & OLAP – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 52

2.5 Logical Model

Logical Design

Tables, Columns,

Relation Attribute

Tuple

• Physical design specifies the physical configuration of the database on the storage media

– Detailed specification of:

data elements, data types, indexing options, and other parameters residing in the DBMS data dictionary

Data Warehousing & OLAP – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 53

2.5 Physical Model

Physical Design

Tablespaces Indexes

• Managing Complex Data Relationships

– Helps keep track of the complex environment that is a DW

• Many complex relationships exist, with the ability to change over time

– Transformations and integration from various systems of record need to be worked out and maintained – Provides the means of supplying users with a

roadmap through the data and relationships

Data Warehousing & OLAP – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 54

2.5 Data Model in DW

(10)

• Modeling business queries – Goal

• Define the purpose, and decide on the subject(s) for the data warehouse

• Identify questions of interest – Subject

• Who bought the products?

(customers and their structure)

• Who sold the product? (sales organization)

• What was sold? (product structure)

• When was it sold? (time structure)

Data Warehousing & OLAP – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 55

2.5 Conceptual Model

Time

Customers Employees

Products Business

Model

• For Conceptual design in DW conventional techniques like E/R or UML are not appropriate

– Lack of necessary semantics for modeling the multidimensional data model

– E/R are constituted to

• Remove redundancy in the data model

• Facilitate retrieval of individual records – Therefore optimize OLTP

– In the case of DW, however redundancy and Materialized Views help speed up Analytical queries

Data Warehousing & OLAP – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 56

2.5 Conceptual Model

• Components

– Facts: a fact is a focus of interest for decision-making, e.g., sales, shipments..

– Measures: attributes that describe facts from different points of view, e.g. , each sale is measured by its revenue

– Dimensions: discrete attributes which determine the granularity adopted to represent facts, e.g. , product, store, date

– Hierarchies: are made up of dimension attributes

• Determine how facts may be aggregated and selected, e.g. , day – month – quarter - year

Data Warehousing & OLAP – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 57

2.5 Conceptual Model in DW

• Conceptual design models for DW

– Multidimensional Entity Relationship (ME/R) Model – Multidimensional UML (mUML)

– Other methods e.g., Dimension Fact Model, Totok approach, etc.

Data Warehousing & OLAP – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 58

2.5 Conceptual Model

• ME/R Model

– Its purpose is to create an intuitive representation of the multidimensional data that is optimized for high-performance access

– It represents a specialization and evolution of the E/R to allow specification of multidimensional semantics

Data Warehousing & OLAP – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 59

2.5 Multidim. E/R Model

• ME/R notation was influenced by the following considerations

– Specialization of the E/R model

• All new elements of the ME/R have to be specializations of the E/R elements

• In this way the flexibility and power of expression of the E/R models are not reduced

– Minimal expansion of the E/R model

• Easy to understand/learn/use: the number of additional elements should be small

– Representation of the multidimensional semantics

• Although being minimal, it should be powerful enough to be able to represent multidimensional semantics

Data Warehousing & OLAP – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 60

2.5 Multidim. E/R Model

(11)

• There are 3 main ME/R constructs – The fact node

– The level node

– A special binary classification edge

Data Warehousing & OLAP – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 61

2.5 Multidim. E/R Model

Fact Characteristics

Classification level

• Lets consider a store scenario designed in E/R – Entities bear little semantics

– E/R doesn’t support classification levels

Data Warehousing & OLAP – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 62

2.5 Multidim. E/R Model

Article Store

Product group Product

group

Package District City Name

Date

Article Nr is sold Is

in Is packed

in

Belongs Belongs to

Is in 1

1

n n

n

m

• ME/R notation:

Data Warehousing & OLAP – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 63

2.5 Multidim. E/R Model

Sales Characteristics Store

City District Region Country

Article Prod. Group Prod. Family

Prod. Categ

Week

Day Month

Quarter Year

• ME/R notation:

– Sales was elected as fact node

– The dimensions are product, geographical area and time

– The dimensions are represented through the so called Basic Classification Level

– Alternative paths in the classification level are also possible

Data Warehousing & OLAP – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 64

2.5 Multidim. E/R Model

Week Month Day

Sales Characteristics Store

Article

Day

• UML is a general purpose modeling language

• It can be tailored to specific domains through the use of the following mechanisms

– Stereotypes: building new elements – Tagged values: new properties – Constraints: new semantics

Data Warehousing & OLAP – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 65

2.5 Unified Modeling Language

• Stereotype

– Grants a special semantics to an UML construction without modifying it

– There are 4 possible representations of the stereotype in UML

Data Warehousing & OLAP – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 66

2.5 mUML

Icon Decoration Label None

Fact 1

Fact 2 <<Fact>>

Fact 3

Fact 4

(12)

• Tagged value

– Define properties by using a pair of tag and data value

• Tag = Value

• E.g. formula=“UnitsSold*UnitPrice”

Data Warehousing & OLAP – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 67

2.5 mUML

<<Fact-Class>>

Sales UnitsSold: Sales UnitPrice: Price /VolumeSold: Price {formula=“UnitsSold*UnitPrice”

, parameter=“UnitsSold, UnitPrice”}

Data Warehousing & OLAP – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 68

2.5 mUML

<<Dimensional-Class>>

Week

<<Fact-Class>>

Sold products

<<Fact-Class>>

Sales

<<Dimensional-Class>>

Day

1..*

<<Dimension>>

Time

<<Dimensional-Class>>

Month

<<Dimensional-Class>>

Quarter

<<Dimensional-Class>>

Year

<<Dimensional-Class>>

Store

<<Dimensional-Class>>

City

<<Dimensional-Class>>

Region

<<Dimensional-Class>>

Land

<<Dimensional-Class>>

Prod. Categ

<<Dimensional-Class>>

Prod. Group

<<Dimensional-Class>>

Product

<<Dimension>>

Geography

<<Dimension>>

Product

<<Roll-up>>

Product categ

<<Roll-up>>

Product Group

<<Roll-up>>

Distributor Country

<<Roll-up>>

Country

<<Roll-up>>

Region

<<Roll-up>>

<<Roll-up>> City Week

<<Roll-up>>

Year

<<Roll-up>>

Quarter

<<Roll-up>>

Month

<<Shared -Roll-up>>

Year 1..2

• Storage structures:

– MDB are more suitable for DW

• Architectures:

– One-Tier Architecture: interesting for mobile applications

– N-Tier Architecture: complexity grows N – Web-based Architecture: reduction of costs

• Security issues: data encryption/user access and identification

Data Warehousing & OLAP – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 69

Summary

• DW are usually distributed geographically and technologically

• Data Modeling – Conceptual Modeling – In conceptual modeling for DW, conventional

techniques like E/R or UML are not appropriate – Appropriate methods are:

• Multidimensional Entity Relationship (ME/R) Model

• Multidimensional UML (mUML)

Data Warehousing & OLAP – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 70

Summary

• Data Modeling (continued) – Logical model

– Physical model

Data Warehousing & OLAP – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 71

Next lecture

Referenzen

ÄHNLICHE DOKUMENTE

Data Warehousing &amp; Data Mining– Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 3?. Why should you

Data Warehousing &amp; OLAP – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig

Data Warehousing &amp; Data Mining– Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 3?. Why should you

Data Warehousing &amp; OLAP – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 3..

Data Warehousing &amp; OLAP – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig

Data Warehousing &amp; OLAP – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig

Data Warehousing &amp; OLAP – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig

Data Warehousing &amp; OLAP – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig