• Keine Ergebnisse gefunden

Academic Management as the Motivating Case Study

Dimensions and Hierarchies in the Multidimensional Data Model

4.2 Academic Management as the Motivating Case Study

Conceptual extensions of the multidimensional model presented in this chapter emerged as a result of a col-laborative effort of our research team with SuperX Project1. SuperX is the name of the data warehouse system for university administration, developed specifically for the needs of German public higher education sector and patronized by the Ministry of Education as a standardized reporting environment for public univer-sities throughout Germany. Originated at the University of Karlsruhe and further developed at the University of Duisburg-Essen, SuperX has been officially taken over by HIS Ltd. 2. HIS Ltd. is the principal provider of software solutions for university management (e.g., facilities, personnel, finance, teaching and research, time-scheduling, etc.) in Germany. SuperX automatically extracts operational data from various HIS sys-tems, cleanses and transforms it into multidimensional cubes and makes the latter available for querying and reporting to decision-makers and other staff.

SuperX appeared an attractive and promising cooperation partner for our research for a number of reasons, such as the following:

SuperX is an open-source product available free of charge.

SuperX is being increasingly adopted by universities throughout Germany.

1Project homepage:http://www.superx-projekt.de/

2Higher Education Information System Ltd.,http://www.his.de/english/

4.2 : Academic Management as the Motivating Case Study 63

Figure 4.1: Original SuperX fact tableCOB_BUSA_CUBE(university expenditures) and its dimension tables

SuperX is platform-independent and does not require any commercial software to run.

SuperX is flexibly configurable with respect to the overall architecture (e.g., DBMS, web server).

SuperX has numerous open issues, predominantly due to complex and/or inconsistent data delivered by some of the operational data sources

In addition to the above issues, we intended to enrich the functionality of SuperX by integrating our own software moduleUniCap[117, 183, 184], which is a decision support system for managing the admission capacity and teaching resource utilization in universities.

The SuperX team granted us access to test data extracted from HIS systems COB (cost and activity accounting) and SOC (student admission and performance records). The data was available in the relational form as 2 fact tables with 14 (partially shared) dimension tables. The cubes are as follows:

1. COB_BUSA_CUBEcontains the household data, i.e., purchases of various administrative units.

2. STUD_ALLG_CUBE contains overall student statistics, broken down by age, sex, origin, semester, major, etc., i.e., grouped into cohorts rather than as individual records.

The original logical schemata of the above cubes are depicted in Figures 4.1 and 4.2, respectively: primary key attributes are shown with red background and arrows point to the targets of the foreign key relationships.

At the first glance, the logical schemata shown in Figures 4.1 and 4.2 seem to follow the star schema design – each dimension is stored in a single relation. Notice, however, that most of the dimensions have exactly the same set of attributes, namelytID,NAME,PARENT,SQLu, withIDbeing the primary key and PARENTbeing a self-reference foreign key. In terms of the star schema design, those dimensions appear to contain no hierarchies. However, that is not the case as SuperX stores hierarchical relationships as a reference to the parent record (foreign keyPARENT) – a well-known approach to storing recursive structures in relational databases, but not in data warehouses.

Table 4.1 shows an example of storing the hierarchy of the university’s administrative units using the parent reference. The resulting data hierarchy is shown in Figure 4.3. Since the relational representation pro-vides no clue about hierarchy levels, the node type information mapped to the nodes’ background color had to

64 Chapter 4 : Dimensions and Hierarchies in the Multidimensional Data Model

Figure 4.2: Original SuperX fact tableSTUD_ALLG_CUBE(student enrollments) and its dimension tables

Figure 4.3: Organization hierarchy (fragment) of a university stored in Table 4.1

be extracted manually from the values of the attributeNAME. Without this tediously obtained node semantics it would be impossible to lay the hierarchy out into levels and, consequently, impossible to characterize or interpret it. Back to the data tree in Figure 4.3, based on the node type assignments, we can characterize it as heterogeneous (multiple category types at the same level) and asymmetric (childless non-bottom nodes).

We denote the above approach to storing dimension hierarchies a“pseudo-star”schema: the entire hier-archy is put into a single table, as in the star schema, however not by means of de-normalization, but rather by referencing the parent element. An attempt to map the above tiny organization hierarchy to a summariz-able OLAP dimension would fail at the very bottom level as even that level consists of three node types and, therefore, may not be mapped to a single category. In the SuperX database, this kind of complex hierarchy structures is rather typical and is encountered in product and cost categorizations, personnel and organisation

4.2 : Academic Management as the Motivating Case Study 65

Table 4.1: Example of a non-structured hierarchy storage

ID NAME PARENT

1 ALL NULL

1117 Institut für Entwicklung und Frieden 1100 1211 Institut für Kulturwissenschaften 1200 1213 Institut für Fremdsprachliche Philologie 1200 1510 Abteilung für Elektrotechnik u. Informationstechnik 1500 1612 Institut für niederrheinische Kulturgeschichte 1600 1613 Institut für Verkehr und Logistik (IVL) 1600 1614 Institut für Automation und Robotik 1600

1615 Institut für Informatik 1600

1616 Zentrum für Lehrerbildung 1600

1100 Gesellschaftswissenschaften 1

1200 Geisteswissenschaften 1

1212 Institut für Germanistik 1200

1600 Zentrale wiss. Einrichtungen 1

1300 Wirtschaftswissenschaft 1

1400 Naturwissenschaften 1

1500 Ingenieurwissenschaften 1

1520 Abteilung für Maschinenbau 1500

1530 Abteilung für Informatik, Information/Medien 1500

1540 Abteilung für Materialtechnik 1500

1611 Institut für Ostasienwissenschaften 1600

hierarchies, course structures, etc. Besides, the underlying data sources (HIS applications) also employ the same relational storage approach to hierarchical data. Therefore, SuperX does not only avoid the challenges of re-modeling complex data into OLAP dimensions, but also facilitates the ETL process by simply taking over the original data hierarchies as dimensions of the respective facts.

The penalty of storing dimension hierarchies as unstructured data graphs is devastating–the OLAP tech-nology, its operators, query languages, metadata, and frontend tools become fully inapplicable. SuperX provides proprietary end-user tools for analyzing “pseudo-star” schemata. These tools are rather rudimentary in their functionality, e.g., they allow to retrieve data via pre-defined masks or interact with a cube via a pivot table. The provided pivot table interface Joolap3 allows to explore available measures along at most two dimensions (no dimension nesting) whereas other dimensions may be used as filters. Joolap avoids recursive queries by allowing to drill down only one value at a time, e.g., to drill-down from quarters into months, each quarter’s element has to be expanded. Poor query and presentation functionality is the price SuperX pays for supporting non-structured and non-summarizable hierarchies.

In our opinion, the “pseudo-star” schema solution of SuperX is too disadvantageous due to its incompat-ibility with the established OLAP technology and, hence, inability to benefit from the techniques and tools of the latter. Therefore, we reconsidered the original intention to extend the existing SuperX framework and decided to build a new data warehouse in accordance with the general guidelines of data warehouse design.

Complex hierarchies, which cannot be handled by the conventional multidimensional model, inspired the cor-responding conceptual extensions of the model and transformation techniques for inferring summarizability in those hierarchies.

3Project homepage:http://joolap.memtext.de/

66 Chapter 4 : Dimensions and Hierarchies in the Multidimensional Data Model