• Keine Ergebnisse gefunden

Measures, Facts, and Galaxies in the Multidimensional Data Model

5.3 Types of Multi-Fact Schemes

SurgeryID

SURGERY start time

location end time patient

recorder discipline

diagnosis surgeon

(a)SURGERYas an event tracking fact

RecordID

DIAGNOSIS date

patient

doctor

diagnosis severity

(b)DIAGNOSISas a coverage fact related to SURGERY

Figure 5.9: Examples of non-measurable fact schemes

FACT IDENTIFIER

Whenever the fact’s grain corresponds to actual events, there may exist a dimensional attribute with identifier properties, i.e., whose values are unique for each fact entry. For example, eachSURGERYinstance has a uniqueSurgeryID. Kimball uses the concept of adegenerated dimension[81] to handle suchid-like attributes, while DFM handles them asnon-dimension attributes of a fact. In our model, afact identifierattribute is a special case of adegenerated dimension, defined as a dimension represented by a single data field. An example of a degenerated dimension in Figure 4.4 isInvoiceNumberin fact schemePURCHASE. Note that InvoiceNumberdoes not fulfill the role of a fact identifier inPURCHASEsince multiple purchasing records may appear in the same invoice.

DEFINITIONLL-DEGENERATED DIMENSION. A dimensionDisdegenerated, if it has a single cate-goryCconsisting of a single attributeAC:DegeneratedpDq ð pCD tC,JDu ^C tAC,Huq. DEFINITION LL-FACT IDENTIFIER. A degenerated dimensionDi is afact identifierof fact scheme F, if the values ofDiinFuniquely identify the fact entries, i.e.,Difunctionally determines the entire set ofF’s dimensions:FactIdentifierpDi,Fq ð pDegeneratedpDiq ^Di ÑDFq.

Since a degenerated dimension is represented by a single category consisting solely of the dimension level attribute, the functionsDegeneratedpqandFactIdentifierpqmay be invoked on a dimension, a category, or an attribute. Existence of a fact identifier is common for, but not limited to, event tracking fact schemes. In case of a measurable fact scheme, the the fact identifier also functionally determines its set of measures (implied by the transitive functional dependencyDiÑDF^DF ÑMF). As fact identifier examples, consider the attributesSurgeryIDandRecordIDin non-measurable fact schemesSURGERYandDIAGNOSIS, respectively, in Figure 5.9. Since a degenerated dimension is only valid in the context of its fact,X-DFM places it into the designated area inside the respective fact’s node. Degenerated attributes of type fact identifier are shown by double-underlining the attribute’s name. Recognition of fact identifier properties lays the foundation for modeling multi-fact schemes discussed in the following section.

5.3 Types of Multi-Fact Schemes

Multi-fact schemes emerge as sets of related fact schemes in the unified multidimensional space. In this section we investigate various patterns of semantic factual interrelationships (see Figure 5.6).

100 Chapter 5 : Measures, Facts, and Galaxies in the Multidimensional Data Model

5.3.1 Fact Degeneration

There may exist a many-to-many mapping of a fact with some of its dimensional characteristics or even with another fact. Giovinazzo proposes a concept of adegenerate fact, defined as a measure recorded in the intersection table of a many-to-many relationship between a pair of facts or a fact and a dimension [45].

We have been able to identify the following three types of fact degeneration:

A satellitefact schemeFn extracts a many-to-many relationship between a fact schemeFmand a dimension schemeDi along with the corresponding measure characteristics of this relationship as a separate fact. Thereby,Fmacts as a dimension inFn. The termsatellitereflects the accompanying nature of this degenerate fact with respect to its base fact.

Anassociationfact schemeFn extracts a many-to-many relationship between a set (typically, just a pair) of fact schemestFj, j 1, . . . , kualong with the corresponding measure characteristics of this relationship as a separate fact.

Aself-associationfact schemeFnextracts a recursive relationship within a fact schemeFm, converting the latter into two distinct dimensions ofFn.

To put the above descriptions into formal definitions, we declare a functionDegenerationpFn,tFj, j 1, . . . , kuq, which returnstrue, ifFnis a degeneration (i.e., a satellite, an association, or a self-association) of Fj. If called with just one argument, e.g.,DegenerationpFnq, the function returns true ifFnis a degeneration of any other fact scheme(s). There may be no syntactic definition of fact degeneration as it is a purely semantic property and as such, has to be explicitly specified by the data warehouse designer.

DEFINITION LL-SATELLITE FACT. A degenerate fact schemeFn is asatelliteof fact schemeFm, ifFn containsFmas a dimension scheme, with the fact identifier ofFmas that dimension’s bottom level inFn: SatellitepFn,Fmq ð pDegenerationpFn,tFmuq ^ DDi PFn,DCj PFm : KDi Cj^ FactIdentifierpCj,Fmqq.

Consider a many-to-many relationship betweenSURGERYandPARTICIPANTin the E/R diagram (Figure 5.4) of our case study. An attempt to map this relationship to a multidimensional scheme would yield a satellite factSURGERY-PARTICIPANT, shown in Figure 5.10a. Extraction of this relationship into a separate fact scheme enables handling of that relationship’s further attributes, such as fee stored as a measure in SURGERY-PARTICIPANT.

DEFINITION LL-ASSOCIATION FACT. A degenerate fact schemeFn is anassociationof a set of fact schemes tFk, k 1, . . . , pu, if Fn contains each Fk as a dimension scheme, with the fact identifier of Fk as that dimension’s bottom level in Fn: AssociationpFn,tFk, k 1, . . . , puq ð pDegenerationpFn,tFkuq ^ @Fk :pDDiPFn,DCjPFk :KDi Cj^FactIdentifierpCj,Fkqqq.

DEFINITION LL-SELF-ASSOCIATION FACT. A degenerate fact scheme Fn is a self-association of fact scheme Fm, if Fm plays the role of multiple dimension schemes in Fn, with the fact identifier of Fm as the respective dimension’s bottom level in Fn: Self-AssociationpFn,Fmq ð DegenerationpFn,tFmuq ^ DDi,Dj PFn,DCkPFm:KDi KDj Ck^Fact-IdentifierpCj,Fmq.

As an example of an association fact, consider modeling of a trigger relationship between the factsEVENT andACTIVITY (e.g., event X triggered activity Y). This many-to-many relationship is extracted into an association fact schemeEVENT-ACTIVITY, shown in Figure 5.10b. As expected, the schemes ofEVENT

5.3 : Types of Multi-Fact Schemes 101

andACTIVITYact as the corresponding dimensions of the resulting association fact, and attributeconfidence could be added as a measure of the captured trigger relationship. Similarly, a self-association ofEVENTcan be used to capture trigger relationships between events. The resulting fact scheme EVENT-EVENTis also shown in Figure 5.10b.

SurgeryID

SURGERY start time

location end time patient

recorder discipline

diagnosis

fee

SURGERY-PARTICIPANT

role participant

(a)SURGERY-PARTICIPANTas a satellite fact of SURGERY

EventID EVENT start

executor event type

ActivityID

ACTIVITY start

executor action

stop

instrument

body part treated structure

confidence

EVENT-ACTIVITY confidence

EVENT-EVENT trigger

triggered

(b)EVENT-ACTIVITYas an association andEVENT-EVENTas a self-association fact

Figure 5.10: Examples of degenerate fact schemes

5.3.2 Fact Roll-up

So far we considered roll-up relationships only between facts and dimensions and between dimension cate-gories. However, in multi-fact schemes facts may also be involved into a similar kind of roll-up relationships, i.e., be in a many-to-one relationship with each other. This behavior was already encountered in the previous section, where we observed that a degenerate fact scheme rolls-up to each of its base fact schemes by con-verting the latter into dimension hierarchies. In this section we investigate interfactual roll-up relationships that occur between non-degenerate fact schemes. Intuitively, a pair of fact schemes forms a roll-up, or a hierarchy, if those schemes represent different granularity of the same process, event, or object.

DEFINITIONLL-FACT ROLL-UP. A pair of non-degenerate fact schemesFmandFnform afact hi-erarchy, or afact roll-up, denotedFm „ Fn, ifFmhas a dimension containing the fact identifier of Fnas its category at any level of the hierarchy:

Fm „ Fn ð pDegenerationpFmq _ DegenerationpFnqq ^ pDDi P Fm,DCk P Di : FactIdentifierpCj,Fnqq.

A fact roll-up isdirect, denotedFm„Fn, if the fact identifier ofFnserves as the bottom category inF, and istransitiveotherwise, denotedFm„ Fn.

In our scenario, hierarchical relationships exist between the event-tracking fact schemes that model the surgical process itself and its vertical decomposition into phases, activities, work steps, etc. For example, there is a transitive fact roll-up ofACTIVITYtoSURGERY, as depicted in Figure 5.11a: categoryphaseof ACTIVITYrolls-up toSurgeryID, which is a fact identifier ofSURGERY.

5.3.3 Fact Generalization

An object-oriented concept ofinheritance is helpful for dealing with heterogeneity in fact records. Con-ventional multidimensional models disallow heterogeneous fact instances by enforcing decomposition of a

102 Chapter 5 : Measures, Facts, and Galaxies in the Multidimensional Data Model

(a) Fact roll-up ofACTIVITYtoSURGERY

(b) Heterogeneous fact typeCOMPONENT

Figure 5.11: Examples of hierarchical relationships between fact schemes

heterogeneous cube into a set of homogeneous subcubes. One obvious disadvantage of this normalization is that the entire set of fact entries can no longer be analyzed as the same class with respect to their common dimensions as there exists no such OLAP operator as cube union.

In many applications, it might be of benefit to provide support to heterogeneous schemes. Let us consider the example of modeling surgical workflows: each workflow is recorded as a sequence of different types of components, such as activities and events. All component subclasses have a subset of common properties, e.g., start time and executor, as well as type-specific properties, such as event category, instrument used, and treated structure, characterizing an activity.

A so-called fact generalizationis obtained, when a set of heterogeneous fact types, projected to their common characteristics, is extracted into a superclass fact type. Subclass fact schemes, denotedfact special-izations, inherit all properties of their superclass. Theoretically, there is no limitation on the depth of fact inheritance hierarchies.

In our example,EVENTandACTIVITYare made subclasses of classCOMPONENT, as shown in Figure 5.12. The superclass contains a set of dimensions shared by all subclasses. Moreover, fact generalization enables non-redundant modeling of fact degeneration, applicable to all subclasses, by elevating that relation-ship to the superclass level. In our example,COMPONENT-DATAis modeled as a satellite of the generalized fact schemeCOMPONENT.

Fact generalization offers an elegant solution to handling heterogeneity. If inheritance betwen fact schemes is not supported, the respective multi-fact structures are resolved into a set of isolated fact schemes, which can be of typehomogeneousorheterogeneous. A fact scheme is homogeneous, if it disallows partial roll-up relationships between the fact and any of its dimensions, and is heterogeneous otherwise.

Figure 5.12: Fact generalization of classesEVENTandACTIVITYas a superclassCOMPONENT

5.3 : Types of Multi-Fact Schemes 103

DEFINITIONLL-HOMOGENEOUS FACT. A fact schemeFishomogeneous, if all its fact-dimensional roll-up relationships are full:HomogeneouspFq ð p@DPDF:F„(full)KDq.

DEFINITION LL-HETEROGENEOUS FACT. A fact schemeF isheterogeneous, if it contains partial fact-dimensional roll-up relationships:HeterogeneouspFq ð pDDPDF:F„(part)KDq.

Heterogeneous fact types result from storing non-uniformly structured facts as the same type, i.e., avoid-ing specialization. Figure 5.11b shows a variant ofCOMPONENTmodeled as a heterogeneous fact scheme storing all characteristics of both subclassesEVENTandACTIVITY. Subclass specific dimensions have to be modeled as optional roll-up relationships (dashed-line edge).

All fact types considered so far are calledprimary, orbase, as they store fine-grained data that cannot be derived from other already available facts. It is a common practice in data warehousing to build additional multidimensional data views on top of the existing facts – a process known as “cube/subcube computation”.

Results of such computations are typically materialized to boost the performance and to reveal additional information hidden in the data (e.g., to compute a new measure or derive an additional dimension category).

All types of facts computed from the primary data are calledsecondary, orderived. The latter can be further categorized according to the way they were obtained. We limit ourselves to enumerating a few frequent fact derivation methods:

Aggregationfact type is obtained by aggregating its base fact type to a coarser granularity.

Drill-acrossfact type obtains new measures by combining measures from multiple related fact types.

Partitionfact type contains a subset of fact entries from its base fact type.

Transformationfact type is drawn by defining a new measure from a dimension category (push) or converting a measure into a dimension (pull).