Data Warehousing & Data Mining

(1)

Data Warehousing

& Data Mining

Wolf-Tilo Balke Silviu Homoceanu

Institut für Informationssysteme Technische Universität Braunschweig http://www.ifis.cs.tu-bs.de

5. Queries

5.1 OLAP query languages

OLAP operators in SQL

MDX (MultiDimensional eXpressions)

5.2 Data modeling

Logical modeling - implementation

Data Warehousing & OLAP – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 2

5. Queries

5.1 How does OLAP work?

OLAP Interface

MDDB Presentation

HOLAP Server

RDBMS Presentation

MDDB

ROLAP Server

RDBMS Presentation

• OLAP systems

– Client/server architecture

• The client displays reports and allows interaction with the end user to perform the OLAP operations and other custom queries

• The server is responsible for providing the requested data.

How? It depends on whether it is MOLAP, ROLAP, HOLAP, etc.

5.1 How does OLAP work ?

• OLAP server

– High-capacity, multi-user data manipulation engine specifically designed to support and operate on multidimensional data structures

– It is optimized for

• Fast, flexible calculation and transformation of raw data based on formulaic relationships

5.1 How does OLAP work ?

• OLAP server may either

– Physically stage the processed multidimensional information to deliver consistent and rapid response times to end users

• MOLAP

– Populate its data structures in real-time from relational or other databases

• ROLAP

– Or offer a choice of both

• HOLAP

5.1 How does OLAP work ?

(2)

• We have seen that

– The best way to represent data at the presentation level is multidimensional

• Regardless if the storage is multidimensional (MOLAP) or relational (ROLAP)

• Optimal for analyze purposes:

easy to understand by the decision makers, natural representations of the data in businesses, etc.

5.1 How does OLAP work ?

927 103 812 102

39 580 30 501 680 952 605818825

31 512 14 400

• Getting from OLAP operations to the data – As in the relational model, through queries

• In OLTP we have SQL as the standard query language – However, OLAP operations are hard to express in

SQL

– There is no standard query language for OLAP – Choices are:

• SQL-99

• MDX (Multidimensional expressions)

5.1 OLAP query languages

• SQL-99

– Prepare SQL for OLAP queries – New SQL commands

• GROUPING SETS

• ROLLUP

• CUBE

– New aggregate functions – Queries of type “top k”

5.1 OLAP query languages

• Shortcomings of SQL/92 with regard to OLAP queries

– Hard or impossible to express in SQL

• Multiple aggregations

• Comparisons (with aggregation)

• Reporting features – Performance penalty

• Poor execution of queries with many AND and OR conditions

– Lack of support for statistical functions

5.1 SQL-99

• Multiple aggregations in SQL/92

– Create a 2D spreadsheet that shows sum of sales by maker as well as car model

– Each subtotal requires a separate aggregate query

5.1 SQL-99

BMW Mercedes SUV

Sedan Sport

By model

By make

SUM

SELECT model, make, sum(amt) FROM sales GROUP BY model, make

union

SELECT model, sum(amt) FROM sales GROUP BY model union

SELECT make, sum(amt) FROM sales GROUP BY make union

SELECT sum(amt) FROM sales

• Comparisons in SQL/92

– This year’s sales vs. last year’s sales for each product

• Requires a self-join

• CREATE VIEW v_sales AS SELECT prod_id, year, sum(qty) AS sale_sum FROM sales GROUP BY prod_id, year;

• SELECT cur.prod_id, cur.year, cur.sale_sum, last.year, last.sale_sum FROM v_sales cur, v_sales last WHERE cur.year = (last.year+1) AND cur.prod_id = last.prod_id;

5.1 SQL-99

(3)

• Reporting features in SQL/92 – Too complex to express

• RANK (top k) and NTILE (“top X%” of all products)

• Median

• Running total, moving average, cumulative totals

– E.g., moving average over a 3 day window of total sales for each product

• CREATE OR REPLACE VIEW v_sales AS SELECT prod_id, time_id, sum(qty) AS sale_sum FROM sales GROUP BY prod_id, time_id;

• SELECT end.time, avg(start.sale_sum) FROM v_sales start, v_sales end WHERE end.time >= start.time AND end.time <=

start.time + 2 GROUP BY end.time;

5.1 SQL-99

• Grouping operators

– Extensions to the GROUP BY operator

• GROUPING SET

• CUBE

• ROLLUP

5.1 SQL-99

• GROUPING SET

– Used for reporting purposes

– Replaces the series of UNIONed queries

• SELECT dept_name, CAST(NULL AS CHAR(10)) AS job_title, COUNT(*) FROM personnel

GROUP BY dept_name UNION ALL

SELECT CAST(NULL AS CHAR(8)) AS dept_name, job_title, COUNT(*) FROM personnel

GROUP BY job_title;

• Can be re-written as:

SELECT dept_name, job_title, COUNT(*) FROM Personnel GROUP BY GROUPING SET (dept_name, job_title);

5.1 Grouping operators

• The issue of NULL values

– The new grouping functions generate NULL values at the subtotal levels

– So we have generated NULLs and real NULLs from the data itself

– How do we tell the difference?

• Through the GROUPING function return value:

GROUPING(job_title) which returns 0 for NULL in the data and 1 for generated NULL

5.1 Grouping set

• ROLLUP

– Produces a result set that contains subtotal rows in addition to regular grouped rows

– GROUP BY ROLLUP (a, b, c) is equivalent to GROUP BY GROUPING SETS

(a, b, c),(a, b), (a), ()

– N elements of the ROLLUP translate to (N+1) grouping sets

– Order is significant to ROLLUP!

• GROUP BY ROLLUP (c, b, a) is equivalent with grouping sets of (c, b, a), (c, b), (c), ()

5.1 Grouping operators

• ROLLUP operation, e.g.,:

– SELECT year, brand, SUM(qty) FROM sales GROUP BY ROLLUP(year, brand);

5.1 ROLLUP

Year Brand SUM(qty)

2008 Mercedes 250

2008 BMW 300

2008 VW 450

2008 1000

2009 Mercedes 50

… … …

2009 400

1400

(year, brand)

(year)

(ALL) (year, brand) (year)

(4)

• CUBE operator

– Contains all the subtotal rows of a ROLLUP and in addition cross-tabulation rows

– Can also be thought as a series of GROUPING SETs – All permutations of the cubed grouping expressions

are computed along with the grand total

• N elements of a CUBE translate to 2

ⁿ

grouping sets:

–

GROUP BY CUBE (a, b, c) is equivalent to

GROUP BY GROUPING SETS(a, b, c) (a, b) (a, c) (b, c) (a) (b) (c) ()

5.1 Grouping operators

5.1.1 CUBE operator

SUV SEDAN SPORT

By model By Make & model By Make & Year

By model& Year

By Make By Year

Sum The Data Cube and The Sub-Space Aggregates SUV

SEDAN SPORT

BMW MERC

By Make

By model

Sum Cross Tab SUV

SEDAN SPORT

By model

Sum Group By (with total) Sum

Aggregate

• E.g., CUBE operator

– SELECT year, brand, SUM(qty) FROM sales GROUP BY CUBE (year, brand);

5.1 CUBE

Year Brand SUM(qty)

2008 Mercedes 250

2008 BMW 300

2008 VW 450

2008 1000

2009 Mercedes 50

… … …

2009 400

Mercedes 300

BMW 350

VW 650

1400

(year, brand) (year)

(ALL) (year, brand) (year) (brand)

• Diagram of OLAP function evaluation

5.1 OLAP functions

Partitioning Sorting Dynamic

Window Dynamic

Window Aggregation

Partitioning Sorting Dynamic

Window Dynamic

Window Aggregation

. . .

OVER(PARTITION BY ORDER BY ROWS BETWEEN…) RANK(), SUM()…

• The window clause

– Specify that we want to perform an action over a set of rows

– 3 sub-clauses: Partitioning, ordering and aggregation grouping

– General format:

<aggregate function> OVER ([PARTITION BY

<column list>] ORDER BY <sort column list>

[<aggregation grouping>])

5.1 OLAP functions

• Moving averages are hard to compute with SQL-92

– It involves multiple self joins for the fact table

• With the window clause we can create dynamical windows: expressed in the

<aggregation grouping>

• SELECT … AVG(sales) OVER (PARTITION BY region ORDER BY month ASC ROWS 2 PRECEDING) AS SMA3…

–

Moving average of 3 rows

5.1 Window clause

(5)

• Ranking operators in SQL

– Row numbering is the most basic ranking function

• ROW_NUMBER() returns a column as an expression that contains the row’s number within the result set

• E.g., SELECT SalesOrderID, CustomerID, ROW_NUMBER() OVER (ORDER BY SalesOrderID) as RunningCount FROM Sales WHERE SalesOrderID > 10000 ORDER BY SalesOrderID;

5.1 Ranking in SQL

SalesOrderID CustomerID RunningCount

43659 543 1

43660 234 2

43661 143 3

43662 213 4

43663 312 5

• ROW_NUMBER doesn’t consider tied values – 2 equal considered values get 2 different returns

– The behavior is non-deterministic

• Each tied value could have its number switched!!

5.1 Ranking in SQL

SalesOrderID RunningCount

43659 1

43659 2

43660 3

43661 4

• RANK and DENSE_RANK functions – Allow ranking items in a group

– The difference between RANK and DENSE_RANK is that DENSE_RANK leaves no gaps in ranking sequence when there are ties

– Syntax:

• RANK ( ) OVER ( [query_partition_clause]

order_by_clause )

• DENSE_RANK ( ) OVER ( [query_partition_clause]

order_by_clause )

5.1 Ranking in SQL

• SQL99 Ranking e.g.,

5.1 Ranking in SQL

CHANNEL CALENDAR SALES RANK DENSE_RANK

Direct sales 02.2009 10,000 1 1

Direct sales 03.2009 9,000 2 2

Internet 02.2009 6,000 3 3

Internet 03.2009 6,000 3 3

Partners 03.2009 4,000 5 4

SELECT channel, calendar, TO_CHAR(TRUNC(SUM(amount_sold),-6), '9,999,999') SALES, RANK() OVER (ORDER BY Trunc(SUM(amount_sold),-6) DESC) AS RANK, DENSE_RANK() OVER (ORDER BY TRUNC(SUM(amount_sold),-6) DESC) AS DENSE_RANK FROM sales, products …

• Other flavors of ranking – Group ranking

• RANK function can operate within groups: the rank gets reset whenever the group changes

• A single query can contain more than one ranking function, each partitioning the data into different groups

5.1 Ranking in SQL

• This is accomplished with the PARTITION BY clause

– E.g., SELECT … RANK() OVER (PARTITION BY channel ORDER BY SUM(amount_sold) DESC) AS RANK_BY_CHANNEL

5.1 Group Ranking

CHANNEL CALENDAR SALES RANK _BY_CHANNEL

Direct sales 02.2009 10,000 1

Direct sales 03.2009 9,000 2

Internet 02.2009 6,000 1

Internet 03.2009 6,000 1

Partners 03.2009 4,000 1

(6)

• The treatment of NULL values: NULLs are treated as normal values

– A NULL value is equal to another NULL value – They are given ranks according to

• The ASC | DESC options provided for measures

• The NULLS FIRST | NULLS LAST clause

5.1 Ranking in SQL

MONTH SOLD NULL FIRST

ASC NULL LAST

ASC NULL FIRST

DESC NULL LAST

DESC

01 34535 5 3 3 1

02 32123 4 2 4 2

03 27500 3 1 5 3

04 1 4 1 4

05 1 4 1 4

• Top k ranking

– By enclosing the RANK function in a sub-query and then applying a filter condition outside the sub-query

5.1 Ranking in SQL

SELECT * FROM (SELECT country_id,

SUM(amount_sold) SALES,

RANK() OVER (ORDER BY SUM(amount_sold) DESC ) AS COUNTRY_RANK FROM sales, products, customers, times, channels

WHERE ... GROUP BY country_id) WHERE COUNTRY_RANK <= 5;

• NTILE

– Not a part of SQL99 standards but adopted by major vendors

– Splits a set into equal groups

• It divides an ordered partition into buckets and assigns a bucket number to each row in the partition

• Buckets are calculated so that each bucket has exactly the same number of rows assigned to it or at most 1 row more than the others

5.1 NTILE

• SELECT … NTILE(3) OVER (ORDER BY sales) NT_3 FROM …

– NTILE(4) – quartile – NTILE(100) – percentage

5.1 NTILE

CHANNEL CALENDAR SALES NT_3

Direct sales 02.2009 10,000 1

Direct sales 03.2009 9,000 1

Internet 02.2009 6,000 2

Internet 03.2009 6,000 2

Partners 03.2009 4,000 3

• MDX (MultiDimensional eXpressions) – Developed by Microsoft

• Not really brilliant

• But adopted by major OLAP providers due to Microsoft's market leader position

– Used in

• OLE DB for OLAP (ODBO) with API support

• XML for Analysis (XMLA): specification of web services for OLAP

5.1 MDX

• Similar to SQL syntax

– SELECT

• axes dimensions, on columns and rows – FROM

• Data source cube specification

• If joined, data cubes must share dimensions – WHERE

• Slicer - restricts the data area

5.1 MDX

SELECT {Deutschland, Niedersachsen, Bayern, Frankfurt} ON COLUMNS, {Qtr1.CHILDREN, Qtr2, Qtr3} ON ROWS

FROM SalesCube

WHERE (Measures.Sales, Time.[2008], Products.[All Products]);

(7)

• Lists

– Enumeration of elementary nodes from different classification levels

•E.g. {Deutschland, Niedersachsen, [Frankfurt am Main], USA}

• Generated elements

– Methods which lead to new sets of the classification levels

•Deutschland.CHILDREN generates: {Niedersachsen, Bayern,…}

•Niedersachsen.PARENT generates Deutschland

•Time.Quarter.MEMBERS generates all the elements of the classification level

• Functional generation of sets

– DESCENDENT(USA, Cities): children of the provided classification levels

– GENERATE ({USA, France}, DESCENDANTS(Geography.CURRENT, Cities)): enumerates all the cities in USA and France

5.1 MDX

• Sets nesting combines individual coordinates to reduce dimensionality

–

5.1 MDX

SELECT CROSSJOIN({Deutschland, Sachsen, Hannover, BS}{Ikeea, [H&M-Möbel]}) ON COLUMNS,

{Qtr1.CHILDREN, Qtr2} ON ROWS FROM salesCube

WHERE (Measure.Sales, Time.[2008], Products.[All Products]);

Deutschland Sachsen Hannover BS

Ikeea H&M- Möbel

Ikeea H&M- Möbel Jan 08

Feb 08 Mar 08 Qtr2

• Relative selection

– Uses the order in the dimensional structures

•Time.[2008].LastChild : last quarter of 2008

•[2008].NextMember : {[2009]}

•[2008].[Qtr4].Nov.Lead(2) : Jan 2009

•[2006]:[2009] represents [2006], .., [2009]

• Methods for hierarchy information extraction

•Deutschland.LEVEL : country

•Time.LEVELS(1) : Year

• Brackets

•{}: Sets, e.g. {Hannover, BS, John}

•[]: text interpretation of numbers, empty spaces between words or other symbols

–E.g. [2008], [Frankfurt am Main], [H&M]

•(): tuple e.g. WHERE (Measure.Sales, Time.[2008], Products.[All Products])

5.1 MDX

• Special functions and filters

– Special functions TOPCOUNT(), TOPPERCENT(), TOPSUM()

• E.g.

– Filter function

• E.g.

5.1 MDX

SELECT {Time.CHILDREN} ON COLUMNS,

{TOPCOUNT(Deutschland.CHILDREN, 5, Sales.turnover)} ON ROWS FROM salesCube

WHERE (Measure.Sales, Time.[2008]);

SELECT FILTER(Deutschland.CHILDREN, ([2008], Turnover) > ([2007], Turnover)) ON COLUMNS, Quarters.MEMBERS ON ROWS

FROM salesCube

WHERE (Measure.Sales, Time.[2008], Products.Electronics);

• Time series – Set Values Expressions

• Choosing time intervals

–PERIODSTODATE(Quarter, [15-Nov-2008]): returns 1.10.-15.11.2008 –LASTPERIODS(3, [Sept-2008]): returns [June-2008], [July-2008], [Aug-

2008]

– Member Value Expressions

• Pre-periods

–PARALLELPERIOD(Year, 3, [Sep-2008]): returns [Sep-2005]

– Numerical functions

• COVARIANCE, CORRELATION

• LINEAR REGRESSION

5.1 MDX

• XMLA (XML for Analysis)

– Most recent attempt at a standardized API for OLAP – Allows client applications to talk to multi-dimensional data

sources

– In XMLA, mdXML is a MDX wrapper for XML – Underlying technologies

• XML, SOAP, HTTP – Service primitives

• DISCOVER

–Retrieve information about available data sources, data schemas, server infos…

• EXECUTE

–Transmission of a query and the corresponding conclusion

5.1 mdXML

(8)

• Now we know – What OLAP looks like

• From the outside

• From the inside

–

SQL 99, MDX

– How data is modeled

• Conceptual level

–

mUML, ME/R

• Logical level

–

Cubes, dimensions

5.2 Data Modeling

Store dimension

Product dimension

• But how do we implement it…

– On a logical level – On a physical level

5.2 Data Modeling

Requirement Analysis

Conceptual Design

Physical Design Functional

Analysis

Application Program Design

Transaction Implementation

Logical Design Data requirements

Conceptual schema

Logical schema DBMS Independent

DBMS Dependent

Application

• Implementation of the multidimensional data model can be:

– Relational

• Snowflake-schema

• Star-schema – Multidimensional

• Array technique

5.2 Implementation

• Relational Implementation – Main goals:

• As low loss of semantically knowledge as possible e.g., classification hierarchies

• The translation from multidimensional queries must be efficient

• The RDBMS should be able to run the translated queries efficiently

• The maintenance of the present tables should be easy and fast e.g., when loading new data

5.2 Implementation

• Going from multidimensional to relational – Representations for cubes, dimensions, classification

hierarchies and attributes

– Implementation of cubes without the classification hierarchies is easy

• A table can be seen as a cube

• A column of a table can be considered as a dimension mapping

• A tuple in the table represents a cell in the cube

• If we interpret only a part of the columns as dimensions we can use the rest as measures

• The resulting table is called a fact table

5.2 Implementation 5.2 Implementation

818 Product

Geography Time 13.11.2008 18.12.2008

Article Store Day Sales

Laptops Hannover, Saturn 13.11.2008 6 Mobile Phones Hannover Saturn 18.12.2008 24 Laptops Braunschweig

Saturn

18.12.2008 3 Laptops

Mobile p.

(9)

• Snowflake-schema

– Simple idea: use a table for each classification level

• This table includes the ID of the classification level and other attributes

• 2 neighbor classification levels are connected by 1:n connections e.g., from n Days to 1 Month

• The measures of a cube are maintained in a fact table

• Besides measures, there are also the foreign key IDs for the smallest classification levels

5.2 Snowflake Schema

• Snowflake?

– The facts/measures are in the center – The dimensions spread

out in each direction and branch out with their granularity

5.2 Snowflake Schema

5.2 Snowflake Example

Sales Product_ID Day_ID Store_ID Sales Revenue Product

Product_ID Description Brand Product_gro up_ID

… Product group Product_group_ID Description Product_categ_ID

Product category Product_category_ID Description

Store Store_ID Description State_ID

… State State_ID Description Region_ID Region

Region_ID Description Country_ID

Country Country_ID Description

Day Day_ID Description Month_ID Week_ID

Week Week_ID Description Year_ID

Year Year_ID Description Month

Month_ID Description Quarter_ID

Quarter Quarter_ID Description Year_ID n

n n

n

n n

1 1

1

1 1

1

1 1 1

1

fact table

dimension tables

time

location

• Snowflake schema – Advantages

– With a snowflake schema the size of the dimension tables will be reduced and queries will run faster

• If a dimension is very sparse (most measures corresponding to the dimension have no data)

• And/or a dimension has long list of attributes which may be queried

5.2 Snowflake Schema

• Snowflake schema – Disadvantages

– Fact tables are responsible for 90% of the storage requirements

• Thus, normalizing the dimensions usually lead to insignificant improvements

– Normalization of the dimension tables can reduce the performance of the DW because it leads to a large number of tables

• E.g., when connecting dimensions with coarse granularity these tables are joined with each other during queries

• A query which connects Product category with Year and Country is clearly not performant (10 tables need to be connected)

5.2 Snowflake Schema 5.2 Snowflake Example

Sales Product_ID Day_ID Store_ID Sales Revenue Product

Product_ID Description Brand Product_gro up_ID

… Product group Product_group_ID Description Product_categ_ID

Product category Product_category_ID Description

Store Store_ID Description State_ID

… State State_ID Description Region_ID Region

Region_ID Description Country_ID

Country Country_ID Description

Day Day_ID Description Month_ID Week_ID

Week Week_ID Description Year_ID

Year Year_ID Description Month

Month_ID Description Quarter_ID

Quarter Quarter_ID Description Year_ID n

n n

n

n n

1 1

1

1 1

1

1 1 1

1

(10)

• Star schema

– Basic idea: use a denormalized schema for all the dimensions

• A star schema can be obtained from the snowflake schema through the denormalization of the tables belonging to a dimension

5.2 Star Schema

5.2 Star Schema - Example

Sales Product_ID Time_ID Geo_ID Sales Revenue Product

Product_ID Product group Product category Description

…

Geography Geo_ID Store State Region Country

…

Time Time_ID Day Week Month Quarter Year n n n

1 1

1

• Advantages

– Improves query performance for often-used data – Less tables and simple structure

– Efficient query processing with regard to dimensions

• Disadvantages

– In some cases, high overhead of redundant data

5.2 Star Schema

• Snowflake vs. Star

5.2 Snowflake vs. Star

– The structure of the classifications are expressed in table schemas – The fact table and

dimension tables are normalized

– The entire classification is expressed in just one table – The fact table is normalized

while in the dimension table the normalization is broken

• This leads to redundancy of information in the dimension tables

5.2 Examples

• Snowflake • Star

Product_ID Description Brand Prod_group_ID

10 E71 Nokia 4

11 PS-42A Samsung 2

12 5800 Nokia 4

Bold Berry 4

Prod_group_ID Description Prod_categ_ID

2 TV 11

4 Mobile Pho.. 11

Prod_categ_ID Description

11 Electronics

Product_

ID

Description …Prod.

group Prod. categ

10 E71 …Mobile Ph.. Electronics

11 PS-42A …TV Electronics

12 5800 Mobile Ph.. Electronics 13 Bold Mobile Ph.. Electronics

• When should we go from Snowflake to star?

– Heuristics-based decision

• When typical queries relate to coarser granularity (like product category)

• When the volume of data in the dimension tables is relatively low compared to the fact table

–

In this case a star schema leads to negligible overhead through redundancy, but performance is improved

• When modifications on the classifications are rare compared to insertion of fact data

–

In this case these modifications controlled through the data load process of the ETL reducing the risk of data anomalies

5.2 Snowflake to Star

(11)

• Snowflake or Star?

– It depends on the necessity

• Fast query processing or efficient space usage – However, most of the time a mixed form

is used

• The Starflake schema: some dimensions stay normalized corresponding to the snowflake schema, while others are denormalized according to the star schema

5.2 Do we have a winner?

• The Starflake schema

– The decision on how to deal with the dimensions is influenced by

• Frequency of the modifications: if the dimensions change often, normalization leads to better results

• Amount of dimension elements: the bigger the dimension tables, the more space normalization saves

• Number of classification levels in a dimension: more classification levels introduce more redundancy in the star schema

• Materialization of aggregates for the dimension levels:

if the aggregates are materialized, a normalization of the dimension can bring better response time

5.2 Our forces combined

• Galaxies

– In pratice we usually have more measures described by different dimensions

• Thus, more fact tables

5.2 More Schemas

Sales Product_ID Store_ID Sales Revenue Store

Store_ID

…

Date Date_ID

…

Product Product_ID

…

Receipts Product_ID Date_ID

Vendor Vendor_ID

…

• Other schemas – Fact constellations

• Pre-calculated aggregates – Factless fact tables

• Fact tables do not have non-key data

–

Can be used for event tracking or to inventory the set of possible occurrences

– …

5.2 More Schemas

• Multidimensional implementation

– The representation of the multidimensional data can be implemented relationally with a finite set of transformation steps, however:

• Multidimensional queries have to be first translated to the relational representation

• A direct interaction with the relational data model is not fit for the end user

5.2 Multidimensional?

• Data structures

– The basic data structure for multidimensional data storage is the array

– The elementary data structures are the cubes and the dimensions

• C=((D

₁

, …, D

_n

), (M

₁

, …, M

_m

))

– The storage is intuitive as arrays of arrays, physically linearized

• More about linearization and related issues will be discussed in the lecture optimization

5.2 Multidimensional?

(12)

• Defining the physical structures – Setting up the database environment – Setting up the appropriate security – Preliminary performance tuning strategies