• Keine Ergebnisse gefunden

Dimensions and Hierarchies in the Multidimensional Data Model

4.4 Classification of Hierarchy Types

4.4.3 Types of Heterogeneous Hierarchies

Heterogeneous hierarchies occur whenever members of the same category roll-up along different paths. The term“frozen”dimension is introduced in [61] to denote minimal homogeneous dimension instances repre-senting different structures implicitly combined into a heterogeneous dimension. Typically, a heterogeneous hierarchy is the result of including subtypes that can be represented by a generalization/specialization rela-tionship. In the context of multidimensional modeling, specialization is an opportunity to specify optional attributes, categories and even hierarchies and, hence, is an important mechanism for semantically correct organization of optional hierarchies.

As can be seen in the fragment of the metamodel, depicted in Figure 4.14, heterogeneous hierarchies are subdivided intonon-coveringandgeneralized, based on the criterion of scheme homogeneity.

NON-COVERING HIERARCHY

Anon-coveringhierarchy is obtained by allowing roll-up relationships of a category to be partial, so that its members may skip levels. Non-covering hierarchies have a homogeneous hierarchy scheme, i.e., each category has at most one parent category, however, the members are allowed to skip the parent level and roll up directly to an upper level. Therefore, in the hierarchy instance, each member has a single parent member, however, the path length from the bottom category to the root varies from one member to another.

DEFINITION IL-NON-COVERING HIERARCHY. A hierarchyH is non-covering, if it contains an exclusive partial roll-up relationship to a set of categories, which form a hierarchy:Non-coveringpHq ð pHeterogeneouspHq ^ DCi, Cj, Ck PCH :pCi„part pCj|Ckq ^ pCj „Ck_Cj „ Ckqq.

An example of a non-covering hierarchy scheme isproject„partpoffice„fullbuilding„fullcity|cityq „full Jproject, due to the mutually exclusive partial roll-up relationships ofprojecttoofficeand tocity, whereoffice

„city. A sample instance of this hierarchy is shown in Figure 4.15, revealing the cause of heterogeneity:

homogeneous heterogeneous

hierarchy

non-covering

"frozen"

generalization specialization

overlapping disjoint instance homogeneity

scheme homogeneity grain

abstraction

strictness mixed

complete incomplete

closure generalized

Figure 4.14: Categorization of heterogeneous hierarchy types

76 Chapter 4 : Dimensions and Hierarchies in the Multidimensional Data Model

C1

A B building

office

project

A11 A12 A20 A23 B15 B16 B21

P1 P2 P3 P4 P5 P6 P7 P8 P9 P10 P11 P12 P13

B13

city

P16 P14 P15

C2 ALL

C3

Figure 4.15: Project locations as a non-covering hierarchy

apparently, there exist two types of projects, internal projects located in cityC1have an office assignment, whereas for external projects only the city is specified.

In the classification of Malinowski and Zimányi [109], non-covering hierarchies are treated as a special case of generalized hierarchies. In our classification, however, the former is not considered generalized as it does not make explicit use of a generalization/specialization relationship. As we proceed with the definition of generalized hierarchies, we will provide further insights on why we distinguish between heterogeneous hierarchies of type non-covering and generalized.

GENERALIZED HIERARCHY

Conventional multidimensional models do not support the object-oriented feature of inheritance. Instead, all subclass categories of a heterogeneous class are represented as optional aggregation paths. Disadvantages of this approach are unavailability of subtypes as subdimensions of their own and, consequently, impossibility to conveniently navigate to a specific subtype or compare measure values aggregated by subtype.

Ageneralizedhierarchy contains categories that can be represented by a generalization relationship. At the scheme level, the dimension graph has multiple exclusive paths converging at some categories. Cate-gories, at which the alternative paths split and join are calledsplittingandjoininglevels, respectively. A dimension category isgeneralized, if its individual members differ considerably, e.g., have different prop-erties and/or roll up along different paths. Unlike non-covering hierarchies, where members simply skip hierarchy levels in the same path, generalization allows to unify instances with different hierarchy schemes.

As an example, let us considerstaffhierarchy inpurchaserdimension scheme in Figure 4.4. Categorystaffis specialized into subclass categoriesteaching staffandadmin. staff, each with its own hierarchy scheme.

In the formalization, we introduce predicates_and_ to express direct and transitive specialization relationship, respectively, between a pair of categories or category types:Ci _Cj states thatCiis a special-ization ofCjand also thatCjis a generalization ofCi. Specialization of category type into multiple subclass category types is expressed as a set of multiple related specializations: tCi, . . . ,Cnu _ Cj. Technically, specialization relationship is a degeneration of partial roll-up (a subset of the superclass’ members belongs to a subclass) to a one-to-one cardinality. Similarly, multiple related disjoint specializations correspond to a set of exclusive partial roll-up relationships.

DEFINITION IL-GENERALIZED HIERARCHY. A hierarchy isgeneralized, if it includes categories with a generalization/specialization relationship between them:

GeneralizedpHq ð pHeterogeneouspHq ^ DpCi, Cjq PCH :Ci_Cjq.

4.4 : Classification of Hierarchy Types 77

C1

A B building

office

project

A11 A12 A20 A23 B15 B16 B21

P1 P2 P3 P4 P5 P6 P7 P8 P9 P10 P11 P12 P13

B13

city

P16 P14 P15

C2 C3

ALL

C1

internal external

Figure 4.16: A non-covering hierarchy transformed into a generalized hierarchy

With the above definition, it becomes evident that non-covering hierarchies do not represent a subclass of generalized ones. However, the former type can be reshaped into the latter one by arranging diverse roll-up behaviors into subclasses. The sample non-covering hierarchy of project locations, depicted in Figure 4.15, can be turned into a generalized one by specializing project category into subtypesinternal andexternal, with the resulting instance shown in Figure 4.16 (subtype nodes are filled with grey color). The resulting generalized hierarchy consists of homogeneous subtrees of its specializations. Whether non-covering hierar-chies should be transformed into generalized or not depends on the expected query patterns: a homogeneous scheme of a non-covering hierarchy is useful for treating all members of a generalized category as the same class, however, the hierarchy is non-summarizable; a heterogeneous scheme of a generalized hierarchy sep-arates different behaviors within a category into subtype hierarchies, thus treating them as different classes and yielding a summarizable mapping. Normalization of non-covering mappings is presented in Chapter 7.

X-DFM provides two alternatives for modeling specialization relationships (see Table 3.4): i)a set of related partial roll-up edges enables modeling of splitting levels, andii)a generalization/specialization edge (or edge set) and abstract categories can be used to model both splitting and joining levels and abstract superclass category types, respectively. In Figure 4.4, a specialization edge set was used to model the sets of subtype relationships of the categoriespurchaserandstaff.

The generalization/specialization construct inX-DFM may appear somewhat counter-intuitive as it does not reveal the direction of the roll-up relationship. Naturally, a superclass represents a higher hierarchy level with respect to its subclasses, as shown in Figure 4.17 at the example of the inheritance hierarchies inpurchaserdimension. Two alternatives of subtyping the generalized category are presented: the left-side tree is a structural hierarchy (unitvs. staff) and the right-hand tree is a functional one (admin. purchaser vs.teaching purchaser). It is not always obvious how the inheritance relationships in a dimension should be modeled. In order to be useful, the inheritance hierarchy should reflect the desired navigation structure for the analysis, i.e., provide meaningful aggregation levels. Multiple inheritance hierarchies in the same dimension may be supported similarly to multiple alternative hierarchies.

Unlike in inheritance hierarchies, superclass relationships in roll-up hierarchies appear “upside-down”, i.e., with a generalized category below its specializations, as inpurchaserhierarchy (Figure 4.4). This is due to the fact that generalized categories in dimensions are used to “homogenize” diverse classes into a single bottom category so that the latter can be used as a dimension in a fact scheme. Considering bi-directionality (i.e., one-to-one relationship) of generalization, we suggest that the respectiveX-DFM construct should be resolved into roll-up edges and propose the following approach to modeling generalized hierarchies:

78 Chapter 4 : Dimensions and Hierarchies in the Multidimensional Data Model

purchaser

unit staff

admin.

staff teaching staff admin.

unit teaching

unit

department faculty chair

admin.

purchaser

purchaser

admin.

staff teaching

staff admin.

unit teaching

unit

department faculty chair

teaching purchaser

Figure 4.17: Examples of inheritance hierarchies inpurchaserdimension

purchaser

unit staff

admin. staff teaching staff admin.unit

teaching unit

department faculty

chair

T purchaser T unit

T teaching unit

T chair T department T faculty

T staff

T admin. unit T admin. staff T teaching staff

Figure 4.18: An inheritance hierarchy with local root categories added

1. A “pure” inheritance hierarchy, i.e., without roll-up relationships, is constructed via a stepwise decom-position of the generalized category into subclasses according to the sets of mutual properties. Multiple inheritance hierarchies can be specified for the same dimension, as was shown in Figure 4.17.

2. Each subclass category typeCis assembled into a hierarchy of its own by adding an abstract local root nodeJCon top. Figure 4.18 shows the resulting graph for the left-hand side inheritance hierarchy from Figure 4.17. Shared-target style specialization relationships are replaced by a distinct specialization edge for each subclass to improve the visibility of the scheme. The most general superclass category on top of the hierarchy is augmented by the abstract root node.

3.

4. Proceeding top-down, each non-abstract category typeC is augmented with aggregation hierarchies valid for that category, placing the hierarchy scheme betweenCand its local rootJC. In the resulting scheme, subclass-specific hierarchies appear connected to the respective subclass categories, whereas common hierarchies are attached to the respective generalized category. Figure 4.19 shows the results of augmenting the dimension scheme from Figure 4.18 with aggregation paths. At the highest gen-eralization levelpurchaser, the members are aggregable bylocation. Superclass categoryunithas no aggregation paths of its own, whereasadmin. staffandteaching staffhave their specific hierarchies as well as common ones, associated with their generalization categorystaff.

5. The hierarchy scheme obtained so far has multiple categories at the bottom. However, to be usable as a dimension, the scheme must originate at a single bottom category. Actually, the scheme in Figure 4.19 already contains a category qualifying as the bottom level, namelypurchaser. This category has no incoming roll-up relationships and contains the union of values of all purchaser subtypes, which corresponds to the grain ofpurchaserdimension. The bottom category is made evident in the scheme

4.4 : Classification of Hierarchy Types 79

admin. unit T admin. staff T teaching staff

faculty

Figure 4.19: An inheritance hierarchy with aggregation hierarchies added

T

Figure 4.20: A generalized hierarchy scheme with a common bottom level

by pushing the former below all other categories as shown in Figure 4.20. Besides, specialization of purchaserinto staff andunitis replaced by a set of exclusive partial roll-up relationships with one-to-one cardinality. This replacement is fully justified as it corresponds to a complete non-overlapping specialization (i.e., each member ofpurchasercategory is of type eitherstafforunit).

6. The representation of the generalized scheme obtained so far is further improved by “pushing down”

specializations and replacing them by related partial roll-up sets in the same manner as it was done for the bottom category in the previous step. Figure 4.21 shows the final state of the generalized hierarchy scheme, in which each generalized category is represented as a child category of its specializations.

SPECIALIZATION HIERARCHY

In the example ofpurchaserdimension all specializations arecomplete(each member of a generalized cate-gory belongs to at least one subtype) anddisjoint(each member of a generalized category belongs to at most

80 Chapter 4 : Dimensions and Hierarchies in the Multidimensional Data Model

T

T

unit staff

admin. unit

teaching staff

admin. unit teaching unit

section

purchaser T unit

T teaching unit T chair T department T faculty

T staff

admin. unit T admin. staff T teaching staff

faculty department chair

section faculty department

section faculty

admin. staff

section faculty department chair office

building

position location

section

purchaser

Figure 4.21: A completed generalized hierarchy scheme

one subtype). These two properties ensure correct summarization when aggregating measures at different ab-straction levels, e.g., the total expenditures of allstaffmembers are the sum of the expenditures of alladmin.

staffand allteaching staffmembers. In some scenarios, however, non-summarizable generalized hierarchies occur. To investigate the causes of non-summarizability, we propose to distinguish between specialization and generalization hierarchies. Even though these two relationships represent two sides of the same phe-nomenon, which is inheritance, or abstraction, there exist differences as to which of the two relationships lays the foundation of a given generalized hierarchy scheme:

Ageneralizationhierarchy results from the necessity of abstracting multiple categories into one in order to treat their members as the same dimension of a fact scheme. Dimensionpurchaseris an example of a generalization hierarchy, in which the bottom levelpurchaseris an artificially constructed superclass, which unites the members ofunitandstaffcategories. Since the actual dimensional data originates from multiple category types and the superclass is superimposed as a union of those categories’ instances, the resulting hierarchy guarantees correct summarization.

Aspecializationhierarchy emerges when a category type, originally treated as a single class, is divided into subclasses to enable subclass-specific characteristics and/or aggregation levels. An example of a specialization hierarchy is that ofstaffcategory, which is subdivided intoteaching staffandadmin. staff to aggregate the respective along the hierarchy scheme, specific to their class. Since the actual actual dimensional data originates from a single superclass and multiple subclasses are introduced upon it as partial roll-up targets, the resulting specialization is not guaranteed to be complete or disjoint.

Subdivision into generalization and specialization hierarchy types is a purely semantic one as it considers the origin of abstraction in a dimension. This property might be invisible in the resulting generalized scheme or manifest itself in form of an incomplete or overlapping specialization.

DEFINITIONIL-COMPLETE SPECIALIZATION. A set of direct specialization relationships of a gen-eralized categoryCiiscomplete, if each ofCi’s members belongs to at least one subclass:

CompleteSpecializationpCiq ð @emPCi:pDCj,DenPCj :Cj_Ci^en_emq.

4.4 : Classification of Hierarchy Types 81

adminstrative teaching admin. / teaching

unit / chair

Figure 4.22: Staff hierarchy with incomplete specialization

DEFINITION IL-INCOMPLETE SPECIALIZATION. A set of direct specialization relationships of a generalized categoryCiisincomplete, ifCicontains members not belonging to any subclass:

IncompleteSpecializationpCiq ð DemPCi :p@Cj, Cj _Ci:pEen PCj :em_enqq.

As an example of an incomplete specialization, considerstaff hierarchy. In the context of fact scheme PURCHASE, depicted in Figure 4.4, subdivision intoadmin. staffandteaching staffis sufficient as only those staff categories may function as purchasers. In the university-widestaffhierarchy, however, there may exist further staff types. Figure 4.22 shows a sample instance of the resulting incomplete specialization hierarchy, in which the bottom level staffcontains members (marked with red background color) covered by neither admin. staffnorteaching staff.

Incompleteness of a specialization can be expressed inX-DFM in a straightforward fashion by augment-ing the set of partial roll-up edges of the specialization relationship with an additional edge, which shows the aggregation path for those members not involved in the specialization. A more elegant option, however, would be to normalize the specialization into a complete one by either adding the missing subclasses or using just a single additional subclass (e.g.,others) to cover those members not participating in the specialization.

Figure 4.23 shows two variants of modeling incomplete specialization at the example ofstaffdimension: the scheme in (a) preserves the original hierarchy by adding the respective partial roll-up edge, whereas in (b) the incompleteness is normalized by adding a missing specialization class.

admin. unit

(b) Normalization into a complete specialization Figure 4.23: Handling incomplete specialization hierarchies inX-DFM

82 Chapter 4 : Dimensions and Hierarchies in the Multidimensional Data Model

staff

person

student admin. unit

teaching staff T staff

admin. staff

T T teaching staff

admin. staff

section faculty department chair office

building

position

student T

section faculty department T person

birth date birth year age group generation

country subcontinent continent

Figure 4.24: Person hierarchy with overlapping specialization

Another property of specialization isdisjointness, which guarantees that each member of a generalized category belongs to at most one of its direct subclasses, yielding non-overlapping subclass instances.

DEFINITIONIL-DISJOINT SPECIALIZATION. A set of direct specialization relationships of a gener-alized categoryCiisdisjoint, if each ofCi’s members belongs to at most one subclass:

DisjointSpecializationpCiq ð @Cj, Cj _Ci,@Ck, Ck _Ci :pEemPCi,Een PCj,EepPCk:en_ em^ep_emq.

DEFINITION IL-OVERLAPPING SPECIALIZATION. A set of direct specialization relationships of a generalized categoryCiisoverlapping, ifCicontains members belonging to more that one subclass:

OverlappingSpecializationpCiq ð DCj, Cj _ Ci,DCk, Ck _ Ci,Dem P Ci,Den P Cj,Dep P Ck : pem_en^em_epq.

As an example of anoverlapping, ornon-exclusive, specialization, let us consider modelingperson di-mension in the university context. Persons can be subdivided into two major categories, namelystudentand staff. While most persons fall into one of the two categories, there may exist cases of students employed as staff members. Figure 4.24 shows the scheme of the resulting generalized hierarchy. The most general class personat the bottom rolls up along age and origin hierarchies and specializes itself intostaff andstudent. Due to its non-exclusive paths, overlapping specialization may not be replaced by a set of exclusive partial roll-ups. Therefore, it is shown using the overlapping specialization edge set construct inX-DFM.

To restore summarizability, an overlapping specialization must be transformed into a disjoint one. Various strategies are conceivable depending on the analysis requirements. For example, if the student status is more significant for the members belonging to bothstudentandstaff, specialization of those members into staff may simply be removed from the hierarchy instance. In many cases, however, it is desirable to keep the original relationships. A normalization technique for overlapping specialization is proposed in Section 7.2.2.

A peculiar type of generalized hierarchies is amixed, orragged, hierarchy, marked as non-summarizable in the hierarchy categorization shown in Figure 4.14. Mixed granularity occurs due to allowing the same

4.4 : Classification of Hierarchy Types 83

teaching unit section

T teaching unit

chair T department T faculty

faculty

chair T department T faculty

faculty department chair

section T

(a) Hierarchy scheme with redundant paths

teaching unit section

T teaching unit

chair T department T faculty

faculty

chair T department T faculty

faculty department chair

section T

(b) Hierarchy scheme with shared category types

Figure 4.25: Revealing mixed grain by eliminating redundant scheme fragments

member values to represent the bottom grain, on the one hand, and serve as parents for values of other cate-gories, on the other hand. As a result, the bottom level is a generalized category, whose direct specializations have hierarchical relationships with one another.

DEFINITION IL-MIXED-GRAIN HIERARCHY. A generalized categoryCi ismixed, if there exists a direct or transitive roll-up relationship betweenCi’s direct specializations:

MixedpCiq ð DCj, Cj _Ci,DCk, Ck_Ci:pCj „Ck_Cj „ Ckq.

Mixed grain phenomenon can be observed in theteaching unithierarchy inpurchaserdimension depicted in Figure 4.21: bottom categoryteaching unitconsists of subtypeschair,department, and faculty, whereas chairrolls-up todepartmentanddepartmentrolls-up tofaculty, i.e., the three subtypes build an aggregation hierarchy of their own. The double role (i.e., bottom and non-bottom level) of categoriesdepartmentand facultyinpurchaseris to be understood as follows: departments and faculties may act as purchasers in their right but also consist of small units, which also act as purchasers in their own right. In the dimension scheme, the mixed grain pattern can be revealed by merging conform categories into shared category types, as shown in Figure 4.25 at the example of the generalized hierarchyteaching unit: (a) is the original scheme fragment inpurchaserdimension and (b) is its equivalent in a unified multidimensional space, i.e., with each set of conform categories represented as one shared category type.

Aggregating measures along a mixed-grain hierarchy is not trivial. As an example, consider a simple query “What is the total amount spent on purchases by facultyX?”. This formulation appears ambiguous as it can be interpreted in at least three ways:

1. the total amount spent by all subdivisions of facultyX,

1. the total amount spent by all subdivisions of facultyX,