• Keine Ergebnisse gefunden

Relational Implementation of the Multidimensional Data Model

7.2 Mapping Heterogeneous Hierarchy Schemes

7.2.1 Mapping Non-Covering Hierarchies

The traditional approach to handling a non-covering mapping is to normalize it into a covering one by insert-ing placeholder elements to fill the gaps of the missinsert-ing parent nodes. However, this method is criticized for

“polluting” the data with artificial entries and losing the original roll-up relationships. Besides, it is not clear

7.2 : Mapping Heterogeneous Hierarchy Schemes 137

C1

A B

A11 A12 A20 A23 B15 B16 B21

P1 P2 P3 P4 P5 P6 P7 P8 P9 P10 P11 P12 P13

B13

C2 ALL

C3

P18

P14 P15 P16 P17 project

building city

office project T

Figure 7.3: A non-covering hierarchy of project locations: instance (left) and scheme (right)

C1

A B

A11 A12 A20 A23 B15 B16 B21

P1 P2 P3 P4 P5 P6 P7 P8 P9 P10 P11 P12 P13

B13

C2 ALL

C3

project building city

office project T

P18 P14 P15 P16 P17

other

Figure 7.4: State of the non-covering hierarchy withcitynormalized to covering

how each new node should be named. We propose a slightly modified mapping-to-covering method, which partially overcomes the above shortages and provides a fully automated normalization routine.

As an example of a non-covering mapping, let us consider the location hierarchy inprojectdimension, with a sample instance and its underlying scheme shown in Figure 7.3. Member values with skipping edges are marked with red color. We use a slightly modified example than the one used previously in Figure 4.15 in order to increase the complexity of the non-covering behavior. In the current example,projectmembers roll up either toofficeor directly tocity, or have no location at all (i.e., roll up directly to the root).

Normalization of a non-covering hierarchy schemeHand its instanceHevolves as follows:

1. The hierarchy scheme is inspected top-down to identify levels at which skipping occurs. CategoryCj qualifies as anon-covering level, if it is targeted by a full roll-up edgeCk „(full) Ci, on the one hand, and there exists at least one bypassing (i.e., rolling up to an upper level of Cj) partial roll-up edge Ci „(part) Cm, whereCi „ Cj andCj „ Cm, on the other hand. In our example,city,buildingand officerepresent a set of non-covering categories.

2. The instance of each non-covering level is traversed in order to identify missing member nodes:

If a non-covering categoryCj represents the highest aggregation level (Cj„ JH), a placeholder element namedother1is added to the member set ofCj and all skipping edges are re-assigned to roll up to this new element. In our example, depicted in Figure 7.4, such a node (marked with grey filling) is added tocityas a parent ofprojectmembersP15andP16. The scheme is modified accordingly by removing the partial roll-up relationshipproject„(part)Jproject.

1Another suitable name, such asn/a,unknown,none, etc., can be used instead ofother.

138 Chapter 7 : Relational Implementation of the Multidimensional Data Model

C1

A B

A11 A12 A20 A23 B15 B16 B21

P1 P2 P3 P4 P5 P6 P7 P8 P9 P10 P11 P12 P13

A11 A12 A20 A23 B15 B16 B21

P1 P2 P3 P4 P5 P6 P7 P8 P9 P10 P11 P12 P13

Figure 7.5: Mapping to covering by normalizing non-covering levelsbuilding(top) andoffice(bottom)

A non-covering category Ci at any lower aggregation level is normalized by inserting a new member for each roll-up edge bypassingCiand adjusting that edge to roll up to the respective new member inCi. The states of the hierarchy after normalizingbuildingandoffice levels are shown in Figure 7.5. The obtained hierarchy is homogeneous and symmetric.

3. The following naming convention is applied to placeholder elements: the name of the respective parent member, supplemented by a ‘*’ symbol, is inherited2. For example, skipping edgeP14„C1 (see Figure 7.4) is resolved by creating a placeholder element named C1*inbuilding, resulting in edges P14„C1*andC1*„C1(see Figure 7.5 (top)). In case of a recursive name propagation, the number of ‘*’ symbols indicates the depth of the normalization.

The advantage of the proposed normalization approach along with the applied naming strategy is that it enables a fully automated generation of placeholder nodes with meaningful names, which contain the information about the original parent element (the propagated name) and the depth of the normalization (the number of supplement symbols).

The penalty of losing heterogeneity can be overcome by creating multiple alternative hierarchies: i)the covering hierarchy obtained by applying the above normalization technique andii)a generalization hierarchy that extracts each aggregation path from the non-covering scheme into a homogeneous subclass hierarchy, as described in Section 4.4.3 and depicted in Figure 4.16. With our modified example of a non-covering hierarchy, the transformation into a generalization would yield three specializations ofproject:i)internalfor instances that roll up tooffice,ii)externalfor instances that roll up tocity, andiii)othersfor instances with no parent location category. Figure 7.6 illustrates the process of obtaining the final normalized hierarchy scheme:

(a) is the original non-covering hierarchy scheme of project locations; (b) shows the two normalized schemes – a covering one (left) and a generalization (right); 7.6c is a final normalized scheme consisting of the two

2Any other symbol or string can be used as a supplement.

7.2 : Mapping Heterogeneous Hierarchy Schemes 139

(b) Normalization into a covering (left) and generalization (right) schemes

(c) Resulting scheme with normalized hierarchies as multiple alternatives Figure 7.6: Transformation of a non-covering hierarchy into a set of multiple alternative hierarchies

<<PK>> CityID

Figure 7.7: Logical schema ofprojectlocation hierarchies

normalizations as alternative hierarchies. The obtained scheme is summarizable and supports aggregation of projectinstances either along the uniform (covering) path or along subclass-specific hierarchies.

The actual relational mapping of a balanced hierarchy according to the galaxy schema is straightforward:

each level is stored in its own dimensional table that references the respective parent-level table. Back to our example, the four hierarchy levels are mapped to tablesPROJECT,OFFICE,BUILDING, andCITY. Figure 7.7 shows the logical schema ofprojecthierarchy. Dimension levels affected by the normalization to covering contain a boolean fieldPlaceholderfor marking placeholder entries.

The alternative specialization-based hierarchy is also present in the logical schema. There exist different approaches to the relational representation of inheritance. The actual choice depends on the capacities of the backend system (object-relational features, triggers, rules, materialized views, etc.) and storage constraints.

We propose a rather simple solution based onderived tables, which do not require any advanced database

140 Chapter 7 : Relational Implementation of the Multidimensional Data Model

procedures. A derived table is similar to a materialized view: both are created as data copies populated with the data queried from other tables. However, derived tables are superior to materialized views as the former have a full-fledged scheme definition and, therefore, support the use of integrity constraints, such as primary and foreign keys, triggers, and check clauses. Foreign keys are imperative for linking fact tables to dimensions as well as for assembling dimension tables into hierarchies. Triggers and check constraints are useful for keeping the derived data consistent and up-to-date.

Back to our example, the two project specializations intointernal andexternalare represented by the derived tables shown with a darker background color in Figure 7.7. In case of a non-covering hierarchy, its normalized covering instance is captured by a homogeneous hierarchy scheme. The alternative specialization hierarchy, however, is used to imitate the original non-covering instance and its heterogeneous hierarchy paths. Therefore, the bottom-level tablePROJECTcontains the properties of both alternatives: a foreign key OfficeIDfor the covering mapping and aProjTypeattribute for storing the belonging to a specialization class of the non-covering mapping. That is why the specialization classes inherit only those properties of the generalized level, attributing to the non-covering mapping.

Specialization tables of a non-covering hierarchy do not have subtype-specific characteristics apart from the roll-up relationships. The logical mapping of such a specialization is a table containing the key attribute of the generalized type and the subtype-specific reference to the parent level. The table is populated via a query extracting the necessary data from the underlying relations. As an example, consider the maintenance of the derived tableEXTERNAL, instantiated using the following SQL statement:

INSERT INTO EXTERNAL

SELECT PROJECT.ProjectID, BUILDING.CityID FROM PROJECT, OFFICE, BUILDING

WHERE PROJECT.ProjType = ’External’ AND PROJECT.OfficeID = OFFICE.OfficeID AND OFFICE.BuildingID = BUILDING.BuildingID

The consistency of the data copy with respect to its sources is guarded usingtriggers. A trigger is a special kind of stored procedure that automatically executes when a specified event (such as insert, update, or delete) occurs in the specified data source. In case of the tableEXTERNAL, its trigger functionUpdateExternalis defined as to react to the changes in any of the source tables, i.e., inPROJECT,OFFICEorBUILDING.