1 Introduction
1.3 Data structure and data repository
The second major objective was to define a coherent similarity assessment model, which is able to take into consideration the different data characteristics and also provides sufficient flexibility to be adjusted as required, for instance by configurable weighting factors and search pattern constraints.
The third and most comprehensive objective includes the development of algorithms to efficiently execute the similarity model. Hereby, the focus is set to enhanced techniques for considering different semantics of attributes (such as continuous value series spanning multiple events) and on modeling a search sequence in order to restrain the search process and optimize the matching.
Finally, the work aims at providing the resulting similarity search mechanisms in a user‐friendly way to business analysts. Hereby, a compromise should be found between maximum control over the search process and minimum complexity of the user interface.
A decent performance evaluation with respect to different use cases rounds up the thesis.
1.3 Data structure and data repository
This section describes the data representation the presented similarity search model is able to cope with, and provides insights into how these data are stored in the SENACTIVE InTimeTM system.
Continuous capturing and processing of events produces vast amounts of data. An efficient mass storage is required to store all events and prepare the data for later retrieval and access. This mass storage is called EventBase, a specific database repository for events in the SENACTIVE InTimeTM system. During the processing, events which should be kept persistent are pushed into this repository. Also, information about event correlations is captured and stored. In addition, the events can be indexed for later retrieval with full‐text search as described by Rozsnyai et al. [42].
1.3.1 Single events
Events represent business activities. In order to maintain information about the reflected activity, events capture attributes about the context when the event occurred. Event attributes are items such as the agents, resources, and data associated with an event, the tangible result of an action (e.g., the placement of an order by a customer), or any other information that gives character to the specific occurrence of that type of event.
For example, Figure 2 shows some context attributes of a typical order event.
Figure 2: Event type definition of simple order event
8
the underlying type of state change in a business process that is reflected by the event. The concept of event types is strongly related to the concept of a class in object‐oriented programming (OOP). Event attributes might by of various data types. The SENACTIVE InTimeTM system supports all basic .NET runtime types such as Int32 or String, but also multi‐value types (lists, dictionaries) and arbitrary custom implemented objects. In addition, events can be nested as attributes in other events, whereby an arbitrary hierarchy is theoretically possible. The used event model is called SARI event model. It was originally proposed by Schiefer and Seufert [43] and described in greater detail by Rozsnyai et al. [41].
Figure 3 illustrates the event model in UML notation. Event types can inherit from other event types and may contain various attributes of different types.
Figure 3: The SARI event model
1.3.2 Event correlations
In many cases single events do have a certain context and are semantically related to other events. For instance, a “task started” event is probably semantically related to a “task completed” event with the same task identifier. Correlations are sequences of semantically related events and form the basis for most of the following algorithms.
An event correlation is defined as a set of related events. A correlation set is a template definition for how correlations are identified. The correlation set defines tuples of attributes whose values must match in order for events to correlate.
Figure 4: Correlation set definition
Figure 4 provides an example of a correlation set. Several events of different event types are correlated to a coherent sequence if the value of the attribute “username” matches. Such a correlation is not limited to a single event attribute, but can be defined based on multiple attributes. The red items are a group of matching tuples, each matching each other event type. Also, the order of the events occurring is not decisive. In case of a cash‐in event occurring first and a cash‐out event occurring second, these events will also be correlated. A sequence of correlated events may contain an arbitrary number of events of each event type. Thus, an event sequence based on the above correlation set may contain for instance 10 “bet placed” and and 2 “cash‐out”
events.
1.3.3 Database structure
In the EventBase, a specific table for each event type is automatically created when modelling the event type definition. This specific events table contains a separate column for each event attribute, whereby basic .NET runtime types such as String can be mapped directly to database types (i.e. varchar). Complex types such as lists or nested types are serialized to XML to ease handling. A generic event table contains an xml representation, id and timestamp of each event.
Correlations are also stored in the database. Per unique value group of correlation attributes a database entry exists, and a relational table links them to the actual events in the generic events table.
The EventBase also contains all required metadata used during the similarity search process such as event type definitions and correlation sets.
10