• Keine Ergebnisse gefunden

Relational Implementation of the Multidimensional Data Model

7.3 Enforcing Summarizability in Homogeneous Hierarchies

7.3.2 Mapping to Strict

A data warehouse designer confronted with non-strict hierarchies has two major choices: i) to preserve non-strictness in the data and to provide mechanisms for correct aggregation in such hierarchies, andii)to transform a non-strict hierarchy into a strict one.

If the analysis admits or even requires a hierarchy to be mapped to strict, the actual choice of the trans-formation technique depends on the semantics behind non-strict roll-up relationships. In what follows, we present two strategies for mapping to strict, namely, a manual one based on edge elimination and an algorith-mic one based on fusing multiparent elements.

EDGE ELIMINATION

If the accuracy of many-to-many relationships is not crucial for the analysis, the hierarchy can be transformed into strict via a simple edge elimination: each set of multi-parent roll-up relationships is reduced to a single

“priority” edge. Priority edges can be specified manually based on the use preferences, or picked randomly if no such preferences are available. Only the resulting strict hierarchy is implemented. Figure 7.20 shows a sample strict mapping of the sample course hierarchy, obtained via edge elimination. Note that this technique may have a side effect of leaving some of the affected parent elements non-onto (childless), as is the case withEthics Departmentin our example.

Mathematics Statistics Information Biology Life Science Engineering

Bio-informatics Chemistry Sociology Philosophy

& Ethics

Figure 7.20: The state of the hierarchy after eliminating multi-parent roll-ups

7.3 : Enforcing Summarizability in Homogeneous Hierarchies 151

“FUSED” MULTI-PARENT ELEMENTS

Pedersen et al. [144] propose a solution based on turning a non-strict hierarchy into a strict one based on

“fusing” each set of multiple parent elements into one “fused” value. The hierarchy is normalized bottom-up.

First, explicit and implied non-strict roll-ups between hierarchy levels are to be identified. In our sample hierarchy, depicted in Figure 7.19, overlapping subtrees exist atdepartment,faculty, andsectionlevel. New parent elements are inserted as a new category between the original parent and child categories of a non-strict roll-up and is namedset-of ‘p’, where‘p’ is the name of the parent category. For example, categoryset-of departmentis inserted in-betweencourseanddepartment, resulting in a strict roll-up relationship between courseandset-of department. Elements of all new categories at different levels are linked to one another to produce a strict hierarchy. Figure 7.21 shows the state of the hierarchy after inserting three additional levels.

Since the original algorithm in [144] prohibits non-onto nodes in the input hierarchy, we suggest treating them just like onto nodes within the same category at this stage. The resulting schemecourse„set-of department

„set-of faculty„set-of section„ Jcourseis strict.

Note that the elements of the new category are also linked to the relevant values of the original parent category (e.g.,set-of departmentrolls up todepartment) resulting in a non-strict mapping between the two, as reflected in the hierarchy scheme shown on the right of its instance in Figure 7.21. The authors propose to disable non-summarizable aggregation along such paths by “unlinking” the original parent category from its upward roll-up path. Thereby, upper-level aggregates can be reached only through the hierarchy of fused categories and not the original one. For our example, it means detaching departmentfrom set-of faculty, facultyfromset-of sectionandsectionfromJcourse. In Figure 7.21, the edges affected by this elimination are marked with a red cross in both the instance and the scheme.

Unlinking elements from their upward hierarchy paths results in the existence of subtrees not reaching the root node. In [144], unlinked categories are denoted “unsafe” and are exempted from the aggregation.

However, the algorithm preserves unlinked subtrees in the hierarchy instance along with the corresponding scheme fragments, detached from the root category. In our model, such fragments are inadmissible by defi-nition (each member in a dimension finally rolls up to the root) and, therefore, have to be entirely removed from the final state of the hierarchy. Figure 7.22 shows the resulting strict hierarchy after the removal of the

“unsafe” categories. Note that we also marked the childless fused elementPedagogicsas onto.

Mathematics Statistics Information Biology Life Science Engineering

Bio-informatics Chemistry Sociology Philosophy

& Ethics

Science Biology* Biology &

Chemistry Chemistry* Psychology* Sociology Philosophy

& Ethics

& Biology Biology Chemistry set-of

faculty

Pedagogics ignored roll-up edge

Figure 7.21: The state of the hierarchy after adding categories with “fused” elements

152 Chapter 7 : Relational Implementation of the Multidimensional Data Model

Mathematics Statistics Information Biology Life Science Engineering

Bio-informatics Chemistry Sociology Philosophy

& Ethics

Science Biology* Biology &

Chemistry Chemistry* Psychology* Sociology Philosophy

& Ethics

& Biology Biology Chemistry set-of

faculty

Figure 7.22: The state of the sample hierarchy after unlinking non-strict roll-up relationships

BRIDGE TABLES

Another group of techniques aims at storing a non-strict mapping “as it is” or with additional hints for correct aggregate computation. A currently popular data warehouse implementation of a many-to-many relationship between a pair of categories is known asbridge table[87]. A separate bridge table is created for each non-strict roll-up relationship in the hierarchy to compensate for the missing parent level reference in the child category’s dimension table. In addition to the child-parent pair itself, a bridge table must include information about measure distribution, i.e., how a lower-level aggregate is to be split between its multiple upper-level parent aggregates. This information can be made available in one of the following representations:

1. “Weighted” non-strictness. In the conceptual model, we introduced a summarizable hierarchy of typeweighted non-strict(see Section 4.4.1): whenever an element rolls up to multiple parent elements, each of these roll-ups is assigned a degree of belonging (weight) valued between 0 and 1. The sum of one child node’s degrees of belonging should be exactly 1.0 to ensure summarizable aggregation. The sample covering hierarchy from Step 1, augmented with edge weights for resolving non-strict roll-up relationships atdepartmentandsectionlevels is depicted in Figure 7.23.

As for the implementation of the resulting hierarchy, the dimension tablesCOURSEandFACULTYhave no parent level reference. Instead, two bridge tables, COURSE_DEPARTMENTandFACULTY_SECTION, have to be created to capture the respective weighted non-strict roll-up relationships.

In the process of aggregate computation, consistent results are ensured by multiplying the measure, aggregated along a weighted non-strict hierarchy, with each enclosed element’s degree of belonging.

2. Ad hoc edge elimination. If multi-parent relationships are relatively infrequent or if a hierarchy is used in different contexts, “priority” parent values can be specified ad hoc. Whenever a roll-up oper-ation encounters a multi-parent reloper-ationship, the user is prompted to specify interactively, to which of the parent aggregates the child aggregate should be assigned. The advantage of this strategy compared to simple edge elimination is leaving the resolution up to the user, thus supporting various user pref-erences. Figure 7.24 shows a non-strict hierarchy and its ad hoc strict variant, obtained by marking the edges to be excluded when performing a roll-up. Edge elimination can be seen as a degeneration of weighted non-strictness, in which one of the parent elements is assigned the weight of1and all the others are set to0. Therefore, this technique can be implemented similarly, i.e., using bridge tables with a data field for storing the degree of belonging. Ad hoc specification of a user-defined strict hierarchy variant is done on that user’s copy of the bridge table.

7.3 : Enforcing Summarizability in Homogeneous Hierarchies 153

ALL

Mathematics Sociology Pedagogics Philosophy Ethics

multi-parent member

Computer Science Biology Social

Science

Chemistry Psychology Humanities*

Biology Life Science Chemistry Sociology Philosophy

& Ethics Psychology

Mathematics Statistics Information Engineering

Bio-informatics

Figure 7.23: Resolving multi-parent relationships by assigning weights to roll-up edges

Mathematics Statistics Information Biology Life Science Engineering

Bio-informatics Chemistry Sociology Philosophy

& Ethics

Figure 7.24: The state of the hierarchy after ad hoc edge elimination