• Keine Ergebnisse gefunden

Non-Conventional Applications

6.2 Acquisition of Multidimensional Schemes from the E/R Schemes

6.2.2 Identifying Facts and Dimensions

Once the transformation of the E/R scheme is complete, a cardinality-based transformation into a multidi-mensional scheme can be applied. Essentially, the task consists in determining for each entity type whether it maps to a fact, a bottom-level or an upper level dimension category.

As facts build the focus of a multidimensional scheme, the first step is concerned with identifying candi-date facts in the scheme. Remember that, technically, a fact structure is a collection of properties, which have many-to-many relationship to each other and a one-to-many relationship to the fact’s grain. Therefore, there exist just three structures in terms of the E/R model, which satisfy this cardinality constraint:

An entity type that hasn:1relationships with multiple other entity types, Ann-ary relationship between a set of entity types,

Anm:nrelationship between a pair of entity types.

For the sake of simplicity, the first two cases can be unified as anyn-ary relationship can be converted into an entity type by replacing each branch with a binary relationship towards the respective participating entity type. Besides, the concept of an entity type is generally superior to that of a relationship as the former may participate in other relationships. The third case is typical for a fact degeneration, i.e., anm:nrelationship between a fact and a dimension, but may also occur in a non-strict dimension hierarchy.

IDENTIFYING FACTS

Generally, a fact is given by an entity typeEfinvolved into multiplen:1relationships with other entity types (whereas existence of1:n,m:nor1:1relationships betweenEf and other entity types is not prohibited).

Ef corresponds to the fact’s grain, whereas the set of the related entity types along with the attributes of Ef define the fact’s dimensional context. To investigate the properties ofEfas a candidate fact scheme, all relationships ofEf are arranged into the following mutually disjoint sets:

E rec¡pEfqis a set of recursive (i.e., connecting the entity type to itself) relationships ofEf,

E n:1¡pEfqis a set ofEf’s candidate dimensions, i.e., a set of its non-key attributes and entity types with whichEfhas ann:1relationship,

Esuper¡qpEfqis a set of superclasses, i.e., direct generalizations, ofEf, Esub¡qpEfqis a set of subclasses, i.e., direct specializations, ofEf,

E 1:1¡pEfqis a set ofEf’s identifier dimensions, i.e., a set of entity types and attributes with which Ef has a1:1relationship,

E 1:n¡pEfqis a set ofEf’s candidate sub-facts, i.e., entity types with whichEfhas a1:nrelationship, E m:n¡pEfqis a set ofEf’s candidate degenerate facts, i.e., a set of entity types with whichEf has anm:nrelationship.

Convergence of the E/R scheme into a multidimensional one evolves in a bottom-up fashion, starting with the entity types that qualify as terminal facts, i.e., the elements of the finest grain, and proceeding to the entities of coarser grain.

DEFINITIONER-TERMINAL FACT CANDIDATE. Entity typeEfis aterminal factcandidate, if it is not involved into any decomposition or specialization relationship, i.e.,E 1:n¡pEfq E sub¡pEfq H.

A1:nrelationship betweenEfand some other entity typeEkindicates a composition or an aggregation relationship and, thus, existence of a fact roll-up pattern (Ekrolls-up toEf). A specialization relationship of

6.2 : Acquisition of Multidimensional Schemes from the E/R Schemes 119

Figure 6.5: Transforming entity typeSTEP(left) into a fact scheme (right)

COMPONENT

Figure 6.6: Transforming entity typeACTIVITY(left) into a fact scheme (right)

Ef implies that subclasses ofEf represent more specialized facts thanEf as they inherit all characteristics ofEf and may have further characteristics of their own.

In our surgical workflow model, three entity types qualify as terminal facts, namely,STEP,EVENT, and STATE. Figure 6.5 shows the part of the E/R diagram referring toSTEPand its relationships types as well as its mapping to a 4-dimensional fact scheme. For consistency, n:1relationship with full participation, i.e., with(1,1)and(1,*)as its structural constraints, are all renamed as “rolls-up-to”. The transformation appears straightforward as the only non-empty set of related categoriesE n:1¡pSTEPq=tINSTRUMENT, BODY PART,BODY STRUCTURE,ACTIVITYumaps seamlessly to a set of the fact’s dimensions.

As an example of a more complex fact candidate at a coarser granularity level, let us consider the en-tity typeACTIVITY, depicted in Figure 6.6, with its non-empty setsE n:1¡pACTIVITYq=tTIME-OFFSET, ACTIONu,E super¡pACTIVITYq= tCOMPONENTu, and E 1:n¡pACTIVITYq= tSTEPu. AsSTEPhas al-ready been mapped to a fact scheme, the1:nrelationship is interpreted as fact roll-up. COMPONENTas a superclass ofACTIVITYis also represented as a fact, yielding a fact generalization pattern.

Finally, consider an example of identifying and modeling degenerate facts. Once an entity typeEf has been converted to a fact, its degenerate facts correspond toEf’s relationships inE m:n¡pEfq(satellite facts and fact associations) andE¡rec¡pEfq(fact self-associations). Figure 6.7 (left) shows a fragment of the E/R diagram modeling a generalized entity typeCOMPONENTand its relationships. COMPONENT’sm:n relationship withDATAand a recursive relationshiptriggersare converted to a satellite fact COMPONENT-DATAand a self-associationCOMPONENT-TRIGGER, respectively, as depicted in Figure 6.7 (right).

Having considered various examples of identifying parts of the E/R scheme that qualify to be converted into facts, we are ready to provide an algorithmic description of acquiring fact schemes from accurate E/R schemes. Algorithm 1 is invoked on each “terminal” entity typeEf, outputting a set of fact schemes, obtained by recursively applying itself to each entity type identified as a candidate fact. Sets Esub¡qpEfqand E1:n¡qpEfqused for identifying “terminal” entity types become obsolete inside the algorithm as it proceeds in the bottom-up fashion.

In the first step, Algorithm 1 creates an empty fact type and converts the attributes of the underlying entity into measures and degenerated dimensions, as shown in the subroutine Algorithm 2.

120 Chapter 6 : Data Warehouse Design for Non-Conventional Applications

Algorithm 1: ConvertToFact

Data: Entity typeEf, Set of previously identified fact schemesF Result: Updated set of fact schemesF

begin

F ÐÝConvertAttributespEf,Fq; E rec¡ÐÝ H;

E super¡ÐÝ H;

E 1:1¡ÐÝ H;

E n:1¡ÐÝ H;

E m:n¡ÐÝ H;

RelÐÝgetRelationshipspEfq;

foreachEfEiPReldo ifEf Eithen

appendpEfEi,E rec¡q;

else ifEiGeneralizationpEfqthen appendpEi,E super¡q;

else

cCardinalitypEfEiq;

switchcdo case1 : 1

appendpEi,E 1:1¡q; casen: 1appendpEi,E n:1¡q;

otherwise

appendpEi,E m:n¡q;

foreachEiPE 1:1¡do

addDimensionpEi,F,“shadow2q;

foreachEiPE n:1¡do

addDimensionpEi,F,“normal2q; ifqualif iesAsF actpEiqthen

F ÐÝConvertT oF actpEi,Fq;

foreachEiPE super¡do

addDimensionpEi,F,“superclass2q;

F ÐÝConvertT oF actpEi,Fq;

foreachEfEiPE rec¡do

FkÐÝCreateF actSelf AssociationpF, EfEiq;

appendpFk,Fq;

foreachEiPE m:n¡do

FkÐÝCreateDegenerateF actpF, Eiq; appendpFk,Fq;

foreachEiPE 1:n¡do

addDimensionpEi,F,“normal2q;

appendpF,Fq;

returnF; end

6.2 : Acquisition of Multidimensional Schemes from the E/R Schemes 121

Algorithm 2: ConvertAttributes Data: Entity typeEf

Result: Fact typeFcorresponding toEf

begin

F ÐÝcreateF actpEfq;

AttrgetAtributespEfq; foreachAPAttrdo

ifisM easurepAqthen addM easurepA,Fq; else ifisIdentif ierpAqthen

addDimensionpA,F,“identif ier2q; else

addDimensionpA,F,“degenerated2q;

returnF;

Figure 6.7: Transformingm:nand recursive relationships ofCOMPONENT(left) into degenerate fact schemes (right)

IDENTIFYING DIMENSION HIERARCHIES

Fact schemes produced by Algorithm 1 are incomplete in a sense that dimensions are defined solely in terms of their bottom categories. Therefore, the next step is to model dimension hierarchies. Once the E/R scheme is brought into an accurate state defined in the previous subsection, dimension hierarchies become easily identifiable: each category type corresponds to an entity type and the partial order on the category types is given by the hierarchical, i.e., many-to-one, relationships between categories.

Similarly to the fact conversion procedure, dimension schemes are constructed in a bottom-up fashion by rooting the dimension’s graph at the bottom category and recursively adding roll-up relationships until the top level is reached. In the presence of multiple and heterogeneous hierarchies the resulting dimension scheme contains diverging and converging paths.

Roll-up behaviour of an entity type is determined by its relationships. As dimension categories are identi-fied bottom-up, the set ofrelevantrelationships is reduced to1:1,n:1, andm:n. Let us consider the process of hierarchy modeling at the example ofphasedimension inCOMPONENT. The corresponding part of the E/R diagram (simplified for presentation purposes) is given in Figure 6.8.

Possible roll-up behaviours of a candidate dimension category given by an entity typeEd can be cate-gorized based on the number of its relevant relationships, their structural constraints and interdependencies (detailed definitions of the presented hierarchy types are given in Chapter 4):

122 Chapter 6 : Data Warehouse Design for Non-Conventional Applications

Figure 6.8: Fragment of the E/R scheme relevant for buildingphasedimension ofCOMPONENT

Homogeneous (non-)hierarchyemerges in the existence of at most one relevant relationship:

Non-hierarchy is given, ifEdis not involved into any relevant relationship. As an example, consider a non-hierarchical dimensionRECORDERof fact schemeWORKFLOWin Figure 6.8.

Simple hierarchy is given by ann:1relationship between Ed and some other entity type Ei with (1,1)as the structural constraint onEd’s participation (full roll-up ofEdtoEi). For instance, PHASEandWORKFLOWyield a simple hierarchy.

Non-strict hierarchy is given by anm:nrelationship betweenEd and some other entity type. Non-strict hierarchies are not supported by the conventional OLAP.

Heterogeneous hierarchyemerges in the existence of an optional roll-up or a single set of relevant mutually exclusive relationships:

Optional hierarchy is given by ann:1relationship betweenEd and some other entity typeEiwith (0,1) as the structural constraint onEd’s participation as this relationship produces a partial roll-up ofEdtoEi.

Non-covering hierarchy results from a set of related partialn:1relationships. The partiality is given by (0,1) as the structural constraint onEd’s participation in each relationship. Besides, the diverging roll-up paths of Ed ought to converge at a later stage. An example of such partial related roll-ups is the relationship betweenCITY,STATE, andCOUNTRYin Figure 6.3.

Specialization hierarchy emerges from a specialization relationship ofEdinto multiple subclass cat-egories. As an example, consider a generalized categorySYSTEMin Figure 6.4.

Multiple hierarchies correspond to multiple relevant relationships that are mutually non-exclusive.

Figure 6.9 shows the relationships of the categoryDATEas an example of multiple hierarchies.

Alternative hierarchies result from multiple roll-up relationships towards mutually related entity types.

For instance, the relationships ofDATEwithCAL_MONTHand withCAL_WEEKare alternative, since the latter two categories have a many-to-many relationship with each other.

Parallel hierarchies correspond to multiple roll-up relationships towards unrelated entity types. For instance, the relationship ofDATEwithCAL_MONTHis parallel to that ofDATEandWEEKDAY.