Stream Processing Graph - Flexible processing of streamed context data in a distributed environ

theNexusDS Resource Group. Resources belong at least to one resource group and may belong to many resource groups at the same time. All components of NexusDS—all available services and operators as well as applications—per default belong to this group. I. e., if an operator is not defined to belong to a specific resource group it is automatically assigned belonging to the NexusDS Resource Group. Besides this root resource group additional resource groups can be defined. NexusDS here defines the Node Capabilities Resource Group, grouping all available processing nodes according to their characteristics. Beside this, theCore Resource Groupexists, grouping all services and operators available within the system. Arbitrary additional resource groups can be defined together with the already existing ones. Resource groups are defined by domain or application extension developers or system administrators. Such a custom resource group is, e. g. the Visualization Resource Group depicted in Figure 4.7. This Resource group might group all functionality belonging to the domain of a visualization scenario.

By default, resource groups are public, meaning all services and operators are accessible or executable within their correspondent resource groups. However, this is not always intended as there is also functionality that should not be accessed arbitrarily. Therefore, trusted resource groups can be created, as the Custom Resource Groupexample displayed in Figure 4.7 shows as a gray box. Entities requesting access to the resources contained in such a resource group mustfirst provide valid credentials before they are allowed to access them.

The system-relevant constraints are modeled by theNexusDS Node Resource Group and its underlying resource groups. E. g., all NexusDS nodes having a x86 CPU architecture are ar-ranged within the corresponding resource group. This is beneficial, as search space forfinding suitable NexusDS nodes for the operators might be drastically reduced in size by picking only the resource groups of interest.

4.5 Stream Processing Graph

The SP graph defines the data processing as it consists of different operators that are inter-connected, forming a processing pipeline. Furthermore, the SP graph represents the central exchange format for constraint propagation between the single layers. Two different SP graph formats are distinguished: The Nexus Plan Graph Model (NPGM) and the Nexus Execution Graph Model (NEGM). They differentiate from each other as NEGM formatted SP graphs pro-vide full and unique deployment information whereas NPGM formatted SP graphs do not.

The main idea is that each layer augments and modifies the original SP graph by adding their respective requirement constraints. The constraint annotations originate from user and appli-cation preferences, operator-related requirements for deployment and execution, or domain-specific services. Constraint annotations represent a universal mechanism to integrate highly domain-specific knowledge and thus influence the deployment as well as the execution of SP graphs. First, the NPGM and afterwards the NEGM SP graphs are presented in more detail.

4.5.1 Nexus Plan Graph Model and Nexus Execution Graph Model

The NPGM is a flexible composition model to orchestrate data-flow graphs. As depicted in Figure 4.8, an NPGM formatted SP graph consists of a set of interconnected boxes that constitute either a source operator, a sink operator, or an operator. NPGM boxes have an arbitrary (but well-defined) number of connection slots. Each connection slot is uniquely identified and can be either an input or output. The box-specific implementations describe the expected and delivered types respectively. Differences in data types are denoted by the different shades of gray used for the inputs and outputs in Figure 4.8. Only connection slots having the same data type (thus having the same shade of gray) can be interconnected and vice versa. Processing pipelines are simply built by connecting NPGM boxes. Loops are allowed and can be exploited for feedback loops, e. g. for operators that change their parametrization according to results of subsequent operators.

The purpose of the box-related deployment constraints is twofold. First, they identify pos-sible box implementations, being either a source operator, a sink operator or an operator. Sec-ond, they define on which NexusDS nodes the operators are executed. Besides deployment constraints also runtime constraints can be defined for each box. Thereby the possible runtime constraints are either specified by the concrete box implementation if specified on a physical level. Alternatively, if specified on a logical level, the possible runtime constraints are speci-fied by the common subset of all runtime constraints a certain operator type defines to which the operator under concern belongs to. Defining a box on a physical level means, the actual implementation is specified. In contrast to this, defining a box on a logical level does not uniquely identify a certain box but rather defines a certain type the corresponding implemen-tation of a box belongs to. This fact is depicted in Figure 4.8. Hereby, deployment constraints on a physical level are shown in bold letters (e. g. StreamNode=’SNx007’), whereas deploy-ment constraints on a logical level are shown in italicletters (e. g. Operator_Type=’Render’).

The logically defined deployment constraints must be mapped to deployment constraints that uniquely identify the box and the corresponding NexusDS nodes. In the following for boxes whose deployment constraints are specified on a logical level are referred to as logical NPGM boxes. In contrast to this, boxes whose deployment constraints are also specified on a physical level are referred to as physical NPGM boxes. However, it is important to note that physical NPGM boxes do not necessarily provide full deployment information as a (physical) NEGM box does. Physical NPGM boxes partially define deployment constraints on a physical level.

Based on Figure 4.8, the four different box types are described: logical NPGM box, physical NPGMbox,logical NEGMbox, andphysical NEGMbox.

Logical NPGM box Box3 represents a logical NPGM box as the deployment constraints for this box are logical. This logical operator is of type’Link&Map’and the respective author is ’Visual Pipe’. The NexusDS nodes going to execute the operator must provide a ’Se-cure’execution environment which might be a domain-specific constraint as described in Section 4.1.

4.5 Stream Processing Graph 135

Figure 4.8:Constraint-aware NPGM SP-graph

Physical NPGM box Box4depicts a physical NPGM box. In contrast to a logical NPGM box either the physical operator or the physical NexusDS node are explicitly defined. As for the previously described Box 3, this box logically defines the operator being of the type

’Render’ from the author ’Visual Pipe’. Here the NexusDS node going to execute the physical box is already specified as being’SNx007’. Nevertheless, since the operator is logically defined, it is not deployable and executable.

Logical NEGM box A logical NEGM box is represented by a box where the physical operator as well as the NexusDS nodes going to execute the physical operator are provided on a physical level. However, a logical NEGM box might provide many physical operators and physical NexusDS nodes. A logical NEGM box is not explicitly shown in Figure 4.8.

Physical NEGM box The difference between a logical and a physical NEGM box is that a physical NEGM box has exactly one physical operator and exactly one physical NexusDS node. In Figure 4.8,Box5represents a physical NEGM box.

Beside the deployment constraints shown in Figure 4.8, which specify the deployment con-straints either logically or physically, the operator relevant requirements (described by the requirement meta data as presented in Section 4.2.1.2) must be added. They also constitute deployment constraints as they influence the NexusDS node selection and thus influence the deployment process. These operator-related constraints originating from the operator meta data are not shown in Figure 4.8. Nevertheless, they are necessary tofind matching NexusDS nodes capable of executing one particular operator.

Beside deployment constraints also runtime constraints exist which influence the runtime behavior of boxes. However, the definition of these constraints is optional since presets are defined for each box, guaranteeing the correct execution. Preset parametrization can be

over-ridden as many users may have different preferences such as the rendering resolution for the rendering operator. The available runtime constraints depend on the actual operator, too. E. g., for join operators it is likely to set a join predicate whereas for a rendering operator it is usual to set a resolution.

There also exist SP graph Deployment Constraints as depicted in Figure 4.8. Applica-tion developers must provide these deployment constraints. Beside the other deployment con-straints already mentioned, these concon-straints are exploited by the deployment framework pre-sented in Chapter 6 tofind suitable placement decisions for the boxes of a SP graph, i. e. source operators, sink operators, and operators. Thereby the box-related deployment constraints are exploited to reduce the possible search space to find deployment mappings in. Then, the SP graph-related deployment constraints are used to fine tune the deployment of the boxes ac-cording to quality aspects.

The SP graph-related deployment constraints are represented by a list of QoS requirements.

These QoS requirements are fourfold. First, a QoS criteria is defined, e. g. ’Latency’or ’Band-width’from Figure 4.8. These criteria must be supported by the DSPS, i. e. QoS-related statistics must be collected as described in Chapter 6. Second, a case distinction must be provided which states if the QoS criteria is to be maximized (’>’) or minimized (’<’). Third, a bottleneck con-dition must be defined. This bottleneck concon-dition defines an absolute lower or upper bound value for the respective QoS criteria. If the QoS criteria is to be minimized, the bottleneck condition represents an upper bound. Otherwise it constitutes a lower bound. Finally, a rela-tive importance factor between 0 and 1 must be provided for each QoS criteria, e. g. ’0.5’ for latency. They put each QoS criteria in relation to each other by means of importance. The relative importance factors must sum up to 1. The SP graph deployment constraints are valid for the entire SP graph and must be valid for each box. However, it is important to note that there are two different QoS classes we have to distinguish: absolute and additive. Absolute QoS criterion must be valid for each box. In contrast to this, additive QoS criterion must be valid in sum along the SP graph’s critical path. This means that it is not enough for the additive QoS criterion to be valid for each box since the entire path is of importance.

4.6 Matching Deployment Constraints and Runtime

Im Dokument Flexible processing of streamed context data in a distributed environment (Seite 133-136)