Evaluation - Universal Workload-based Graph Partitioning and Storage Adaption for Distributed R

c, which is the ratio of the queue length to the system processing power. If this ratio is high, then we would have smaller ycr and this means that a very small threads synchronization cost y would be still greater than y_cr and would cause f(m^max_a ,1)> f(1, m^max_e ) so that having f(1, m^max_e )(maximum enter-queries paral-lelism and minimum intra-query paralparal-lelism) is more beneficial to the system.

8.3 Adaption of The Processing Resources

The processing resources in each working node may all contribute to the processing of a single query. However, any extra thread to process a query requires extra costs expended in the form of threads and data synchronization. If the threads working on a single query are from different working nodes, the system has to pay extra network communication cost. To avoid this type of latency the system tries to keep the execution of each query within a single working node as long as the required data are available locally. In the range of a single query, we still can use more than one thread to execute a single query. The optimizer has the task of deciding the optimal number of such threads. We have already gone through this problem in Section 8.1, and showed that the number of the threads to process a single query is related to the number of queries waiting into the queries’ queue. By using Formula 8.6, the optimizer in any working node can easily estimate the number of threads to assign to each query by looking to the number of waiting queries in the queue, as well as the average query execution time, the average thread synchronization cost, and the available number of hardware threads in the system. Generally, the optimizer favors consuming one thread per query as long as the query arrival rate is greater than the system throughput.

8.4 Evaluation

In this section, we provide a practical evaluation to the adaption of the threading and processing resources.

8.4.1 Working Threads

We practically follow the performance behavior of a query with respect to its work-ing threads. In a distributed environment, there are two types of workwork-ing-threads that might be involved in a single-query execution:local threads which are parallel threads of the working node where the query is being executed, and remote threads which are owned by remote nodes but still handling part of this query. We consider in this section the effect of local threads on the query execution time.

Instead of using the query execution time as a measure of the number of working threads, we use the ratio of execution speedup when using n threads with respect to run the same query with one thread. Having more threads should speedup the query execution by a factor that is ideally the number of the threads; however, this speedup is smaller in the practical world due to the existing of threads scheduling and synchronization costs.

As any typical parallel-processing problem, the important factor in achieving high parallelization speedup is the ratio of the threads maintaining cost to the query pro-cessing time. Since there is a correlation between the query type and its execution time, we consider in this evaluation the threading behavior with respect to the query types. Figure 8.1 shows the general behavior of bounded queries on three types:

star, tree, and chain. The star query has only one central vertex, and all the the other vertices must have exactly one edge to it. The tree query has the shape of a connected directed acyclic graph (DAG). Finally the chain query is also a DAG, but has one source and one destination. The formal definitions of those types are given in Appendix A, Section A.1.

The three types are compared with respect to the ideal speedup behavior which is equal to the number of working threads. A clear deviation for the three types are observable from the second thread and stop delivering any clear benefit to the tree-query speedup starting from the third thread. Moreover, increasing the threads was harmful starting from the third thread for the star-query, and the fourth thread for the chain-type. The behavior of the queries types correlates with the number of processed triples during their executions as given by Table 8.1.

The unbounded queries draw different behavior shown in Figure 8.2 where we have also the three types of queries with respect to the ideal behavior. The deviation from the ideal is very small at two threads for the three types and slightly starts in-creasing from the third thread. The differentiation between the three types becomes significant from the sixth thread, but the three types scaled till the seventh threads with a speedup of 4 in the star query and up to 6 for the tree query. The behavior

Figure 8.1: Speedup of bounded-queries execution with respect to working threads

difference between the bounded and unbounded query can be explained by recalling their difference in execution and processing in Section 3.3.2. In a general memory-based execution of an unbounded query, the first index call returns a set of triples of size n. The execution goes next by effectively executing nbounded sub-queries in a totally independent way, and requires no synchronization between them except a simple union operation on their results in order to form the final query result. This clearly boosts the speedup of an unbounded query parallel execution and allows bet-ter scaling with the used number of threads. On the other hand, a bounded query is typically smaller in size and bounded to at least one vertex in the RDF graph.

Threads Star Chain Tree

1 1 1 1

2 1.41 1.5 1.55

3 1.6 1.92 1.92

4 1.7 1.94 2

5 1.6 1.8 1.97

Processed triples 120 259 412

Table 8.1: Bounded-queries speedup with respect to working threads

Figure 8.2: Speedup of unbounded-queries execution with respect to working threads Threads Star Chain Tree

1 1 1 1

2 1.8 1.9 1.9

3 2.6 2.7 2.8

4 3.3 3.5 3.6

5 4.1 4.1 4.3

6 4.7 5 5.3

7 4.9 5.6 6

Triples processed 2124 6235 5412

Table 8.2: Unbounded-queries speedup with respect to working threads

Im Dokument Universal Workload-based Graph Partitioning and Storage Adaption for Distributed RDF Stores (Seite 154-157)