• Keine Ergebnisse gefunden

Requirements of Comprehensive Multidimensional Analysis

Extending the Multidimensional Data Model

3.2 Requirements of Comprehensive Multidimensional Analysis

A multitude of multidimensional models proposed in recent years is a result of specifying different sets of requirements a model has to meet. In this section, we integrate diverse requirements and properties defined by various authors with respect to comprehensive multidimensional analysis over complex data into a unified framework. This framework will serves as a reference for designing an extended data model.

The requirements can be subdivided intoi) staticproperties dealing with the structuring of the multidi-mensional data space anddynamicproperties dealing with the supported analysis tasks.

A set of major static multidimensional modeling properties, proposed in the research literature [2, 104, 145] including our own works [113, 114, 115, 116, 118, 120], can be summarized as follows:

1. Explicit separation of cube structure and its contents. The structure of a data cube is modeled as a fact-dimensional scheme. The actual content is crucial for refining the scheme in order to identify irregular hierarchies, partial roll-ups, etc.

2. Facts with no measures. The terms “fact” and “measure” are often used interchangeably in the data warehousing literature. However, some applications deal with multidimensional data structures without explicitly defined measures. Besides, according to one of Kimball’s laws, any many-to-many relation-ship should be modeled as fact [81]. Therefore, it is necessary to refine the concept of a “fact” and enable modeling of facts with no measures.

3. Complex measures. The model should support composite and derived measures as well as the specifi-cation of each measure’s additivity, i.e., aggregation semantics.

4. Complex facts. The model should be capable of handling deviating behavior within facts, such as heterogeneity, variable granularity, and missing values.

3.2 : Requirements of Comprehensive Multidimensional Analysis 39

5. Multi-fact schemes. Some application scenarios require the data to be modeled as multiple related fact schemes in order to preserve the actual relationships in the data. Inter-factual relationships take the form of shared dimensions in a unified multidimensional space.

6. Fully and partially shared dimensions. Data cubes may be compatible to one another at a non-bottom granularity level. This happens when their schemes have at least one pair of partially shared dimen-sions, i.e., converging or overlapping at a category, non-bottom for at least on of them. To recognize partial sharing,

7. it is imperative to provide a methodology for identifying scheme overlap.

8. Multiple roles of dimension categories. In multi-fact schemes, the same dimension or category may be used in multiple roles (e.g., date dimension may be used as order date and payment date characteristics of a fact). Therefore, it is imperative to distinguish between the categories and and their roles.

9. Many-to-many fact-dimensional relationships. Many-to-many mappings between facts and dimensions are common in practice and, therefore, should be manageable by the model.

10. Explicit hierarchies in dimensions. Hierarchies should be presented explicitly in the multidimensional scheme as the former determine valid aggregation paths. Furthermore, the model has to distinguish between dimension level attributes and property attributes belonging to a particular level.

11. Complete hierarchies. In a complete hierarchy, all child-level members fully roll-up to one parent-level and the extension of the parent-level consists of those child members only [104]. The model should provide constructs to specify the completeness, i.e., non-expandability, of a hierarchy.

12. Multiple hierarchies. A dimension can have multiple aggregation paths, which may be mutually exclu-sive or compatible and which may or may not converge at some upper level.

13. Distinction between alternative and parallel hierarchies. Multiple alternative hierarchies refer to the same hierarchical property and thus represent mutually exclusive aggregation paths. Parallel hierar-chies are based on mutually independent hierarchical properties and may be used in combination as aggregation criteria.

14. Complex dimensions. To support complex dimensions, the model should capture the causes of the complexity, such as non-covering, non-onto, and non-strict mappings, heterogeneity, etc.

15. Partial roll-up behavior. Roll-up relationship between a fact and a dimension or between dimension categories may be full (each member participates in the relationship) or partial (members are allowed not to participate in the relationship). Partial roll-up may be a result of optionality, heterogeneity, or specialization. The model should distinguish between various kinds of roll-up relationships.

16. Totally ordered hierarchies. A dimension hierarchy is normally defined in terms of partial ordering (parent-child relationships within pairs of members). However, in some hierarchies, members of the same hierarchy level may have to be ordered to reflect some semantics and enable default sorting according to this ordering.

As for the dynamic properties of the extended multidimensional model, we identify the following ones:

1. Symmetric treatment of facts and dimensions. In multi-fact schemes, the duality of fact and dimension roles fades since one fact may act as a dimension of another fact, or a dimension may be turned into a fact of a specific query.

2. Symmetric treatment of measure and dimension attributes. Any attribute within a fact scheme may be converged into a measure of a specific query.

40 Chapter 3 : Extending the Multidimensional Data Model

3. Measure used as dimension. Some queries may need to use some measure attribute as a dimension w.r.t. another measure within the same fact scheme.

4. Drill-across. Drill-across is a logical operator for constructing a multicube by joining a set of multiple related cubes, projected to the subset of their common dimensional characteristics, in order to explore their measures in parallel or to derive new measures.

5. Ad hoc measure derivation. Measures, not available in the static model, can be added at query time by specifying their derivation formulas.

6. Ad hoc dimension derivation. Dimensions, derivable from the existing one but not included into the original scheme, can be added at query time by specifying their derivation formulas.

7. Ad hoc hierarchy specification. The user should be able to manipulate the existing hierarchies (e.g., merge multiple categories into one) as well as to arrange dimensional values into a new hierarchy.

8. Resolution of many-to-many mappings. In the presence of non-strict hierarchies, users should be prompted to resolve multi-parent relationships for correct aggregation (drill-aside operation).

Table 3.1 summarizes the above mentioned static and dynamic properties grouping them according to the construct of the conceptual model, to which they belong.

Table 3.1: Overview of the desirable multidimensional properties

Concept / Construct Static property Dynamic property

Fact galaxy ™Fully and partially shared dimensions

™Dimension category in multiple roles

™Fact in dimension role

™DRILL-ACROSSoperation

™Fact overlap scheme

Fact ™Non-measurable facts

™Heterogeneous facts

™Variably grained facts

™Missing values

™Dimension treated as a fact

™Derived fact

Measure ™Complex measures

™Derived measures

™Aggregation semantics

™Ad hoc derived measures

™Combined measures

™Measure from a dimension (PUSH) Dimension ™Explicit hierarchies

™Multiple hierarchies

™Parallel vs. alternative hierarchies

™Complex/irregular hierarchies

™Heterogeneous hierarchies

™Complete hierarchies

™Derived categories

™Totally ordered hierarchies

™Ad hoc defined categories

™Ad hoc defined dimensions

™Dimension from a measure (PULL)

™Ad hoc hierarchies

Roll-up relationship ™Partial roll-up

™Optional roll-up

™Generalization/specialization

™Many-to-many mappings

™Resolution of non-strict roll-ups