Workload Rules - Universal Workload-based Graph Partitioning and Storage Adaption for Distribut

In Section 3.2.2, we classified the workload analysis used to generate resources access rate into general, specific, and generalized. The basic purpose of that analysis is to find resources access rates under certain employment options. However, we could have multiple access functions each is derived from a certain analysis. Moreover, some of the access functions are targeting the same vertices, which creates multiple access values. In order to systematically deal with these issues, each independent access function is encapsulated within anaccess rule. An access rule has its source, access function, and set of affected vertices.

Definition 3.5 (Access Rule) An access rule $ is defined as the following ele-ment:

$= (s,V , a)ˆ

where a is a function that assigns an access rate value ³ to each v ∈ Vˆ, and s is a set of pattern functions that defines a set of vertices Vˆ ⊆ V as the following:

s = {s₁, s₂, .., s_n} such that ∀s_i ∈ s there exist function f(s_i) = V_s, V_s ⊆ V, and Vˆ =f(s1)∪f(s2)...∪f(sn).

Mapping the workload analysis into access rules enables the power of comparing and aggregating different rules. It also makes it quiet easy to plug new rules into the adaptable system. For example we define in Chapter 5, two rules about the border replication. The first include a general access function, while the second has a specific access function. The two access rules are aggregated into one rule that sketches the net access values of the data in the border region.

The access rule draws the resources access rates under given employment options.

However, the cost model needs further the benefit and cost functions. For that, we define the operational rule. An access rule can be converted into an operational rule for the storage adaption purpose by providing the performance benefit function as well as a destination index. The performance benefit is measured relative to the cost.

Definition 3.6 (Operational Rule) An operation rule is composed by associating an access rule $, by a destination indexχ and a relative performance gain function

∆:

$op= ($, χ,∆)

By applying Formula 3.1, a benefit function for each operational rule can be defined:

b(v) = ∆(v)·a(v)

3The access rate is explained further in Section 3.2.1.

A rule targets a set of vertices that is part of the RDF graph. However, there are cases when more than one rules target the same vertex. That requires stating the method of aggregating the rules such that each vertex has a net rule targeting it and represents its net access rate. For that purpose, we state in the following the aggregation properties for access and operational rules. Moreover, we state two other properties which are the projection and source ordering, which will be used later when stating the rules about indexes, replication, and join cache.

• Property 1 (Rules Aggregation ). For two access rules$₁ and$₂, ifVˆ₁∩Vˆ₂6=∅,

where e₁ and e₂ are weighting functions representing the effectiveness of $₁ and $2 respectively.

We also refer to the function aggregate($₁, $₂, e₁, e₂) as aggregate($₁, $₂) to indicatee1 =e2= 1.

• Property 2 (Operation Rules Aggregations). For two operations rules $_op1 and

$_op2, if Vˆ₁∩Vˆ₂ 6= ∅, and if they share the same destination index χ, then a new rule $g can be defined, that is the aggregate of $op1 and $op2 as

$_g=aggregate_op($_op1, $_op2) = ($, χ,∆), such that$=aggregate($₁, $₂), and ∆is defined as the following:

∆(v) =

• Property 3 (Rule Projection). For an access rule $1, if there exist a pattern function s_p such that it definesVˆ_p ⊆Vˆ₁, then a new rule $_p can be defined, that is the projection of$1 onsp as$p =proj$1(s1) ={s_p,Vˆp, a1}

• Property 4 (Source Ordering). For an access rule $ = {s,V , a}, whereˆ s = {s₁, s₂, ..s_n}, then the following elements of$ can be ordered:

1. vertices by their access.

2. source pattern functionss by their average access values aavg(s). There is no loss in accuracy if for each s_i ∈s, the access function aassigns the same access value to each v∈f(si).

3. for an operational rule $op = ($, χ,∆), its sources can be ordered by their average benefit values b(s) = a_avg(s)·∆_avg(s). where ∆(s) is the average performance gain for each source in the source set. There is no loss in accuracy in the case of a(v) assigns the same access value to each v ∈ f(s_i), and ∆(v) assigns the same performance gain value to each v ∈f(si) for eachsi∈s.

After ordering the sources set s, the head source that stands at the top of the sources set is referred to ass.¯

We explain next the basis used to derive the general rules based on the collected workload, then describe the concept of heat query map in order to use it for finding the access values of the specific rules.

3.4.1 Basic Measurements for The General Rules

We mention in the following points, the average measurements that represent the basis to build general rules about indexes and replications in the next two chapters.

• The average query length. Given a query q, its length q_l is defined by Definition 3.3. For a collected workload Q, we can find the average length qlm

by calculating the arithmetic means for all the queries it contains. This value represents the expected length of the next query the system receives.

• The average query size. Definition A.4 determines a query size in terms of its graph, evaluation, and result. Similar to the previous point, we extend the measurement from the query level to the level of a collected workload, by calculating the arithmetic mean for each of the given measurements. The mean values serve as the general expectation of the system’s next query size.

• The average indexes usage. The execution of a query is carried out by using indexes (Section 2.6). The system observes the execution of the collected workload on the level of each index, and record for each index χ the count of usage or frequency of access. The relative value of this frequency with respect to the total system’s indexes usage represents the general rank of importance of that index.

Im Dokument Universal Workload-based Graph Partitioning and Storage Adaption for Distributed RDF Stores (Seite 71-74)