• Keine Ergebnisse gefunden

Data Warehousing & Mining Techniques

N/A
N/A
Protected

Academic year: 2021

Aktie "Data Warehousing & Mining Techniques"

Copied!
17
0
0

Wird geladen.... (Jetzt Volltext ansehen)

Volltext

(1)

Data Warehousing

& Mining Techniques

Exercise Sheet 1 April 20th, 2021

(2)

A data warehouse is basically a (large) database storing copies of transactional data for analytical purposes (and not for the daily business!)

It is mostly defined by Inmon to fulfill the following characteristics w.r.t. to the data

contained in it:

Subject orientation Integration

Non-volatility Time variancy

Exercise 1.1

(3)

OLTP systems are used for day-to-day tasks like process invoices, register

incoming materials and so on.

The data stored by those systems represent the current state of the business (stored in ODS)

OLAP allows multi-dimensional analysis of the data, i.e. on historical data and not on the ODS!

So a data warehouse is an OLAP system?

Exercise 1.2

(4)

Not really!

OLAP systems are usually build upon data warehouses. The DW stores the data and OLAP uses it. However, sometimes the term OLAP is

used as a synonym for a data warehouse ….

Exerise 1.2

The Data Warehouse …

… and OLAP

More on OLAP in Lecture 6

(5)

In Data Mining we build a model of our data and try to produce insights based on the model.

Examples:

The price of gasoline usually increases during public holidays

Customers who spent more then $100 on dumb bells are likely to buy protein supplements

Exercise 1.3

(6)

Exercise 2.1

Classical SDLC Requirements gathering

Analysis Design Programming

Testing Integration Implementation

We know what we want and therefore build the system in a top down fashion!

DW SDLC

Implement warehouse Integrate data

Test for bias Program against data

Design DSS system Analyze results

Understand requirements

We don‘t know what we want but we will learn it during the implementation!

(7)

Exercise 2.2

Talk to the end users; determine the relevant ODS, KPIs and design the logical and physical model

Lecture 3

Use the input from the design phase, prototpye it and go back to the user

Deploy the solution into production Day-to-day business, esp. management of ETL

Lecture 7

Improve the system over time

(8)

The case for 𝐁𝐢𝐨𝐧𝐭𝐞𝛘

Mid-size manufacturer of antibiotics (in Germany)

Chance to register a new antibiotic

However, now they are in dire need for a new analytics solution

1. Name some advantages of a data warehouse in comparison to the legacy solution.

2. Give some examples of subjects which may be suitable for Bionteχ.

3. Would you agree, that the data integration is a moderate task based on the information you have?

4. Name and explain a few examples for analytics based on the data warehouse.

(9)

Holistic view on the company

Raw data allows for more ways to analyse the data

Data preparation is standardized (less error prone)

More complex queries and analytics possible

(how to query over multiple excel spreadsheets with reasonable effort

Exercise 3.1

(10)

Some examples for subjects:

Customers (activities, orders, aquisitions, custom antibiotica formulas etc.)

Product (batches, costs, sales, etc.)

Regions (sales, customers, potential admission regions etc.)

Suppliers (…)

Exercise 3.2

(11)

Definitively not!

Inmon: subject-oriented, integrated, …

We need to know which data we want and how to integrate it into a common schema

ERP and CRM may have different formats [classic: product numbers in production and sales are not identical]

External reports my contain different date formats or currencies

Building Data Pipelines is time consuming

Exercise 3.3

(12)

Exercise 3.4

Data Warehouse

OLAP Data Mining

Visualization Reporting

What did we sell in the last year and where?

What‘s the value for a certain KPI?

Etc.

How much units of the new antibiotic can we sell in the US if registered there next year?

Typically a collection of KPIs and emerging trends

Explain the data in a visual format

(13)

Exercise 3.5

Some DB

(14)

Exercise 3.5 Bonus

So, what is a data lake?

Basically what you want it to be

Typically the idea of storing all data in its raw

format in a central place and then run analytics on it

(15)

AWS Lake Formation

(16)

However

(17)

Data Warehouse Architecture

Basic architectures Storage models

Layers

Middleware

Next Lecture

Referenzen

ÄHNLICHE DOKUMENTE

Data Warehousing & Data Mining– Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 3?. Why should you

Data Warehousing & OLAP – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig

Data Warehousing & OLAP – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig

Data Warehousing & OLAP – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig

Data Warehousing & OLAP – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig

Data Warehousing & OLAP – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig

Data Warehousing & OLAP – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig

Data Warehousing & OLAP – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig