Query Optimization - Ad-hoc stream query processing

We utilize rule- and cost-based query optimization techniques to empower ad-hoc stream query processing, in Chapters 4 and 5, respectively. To enrich our work with cost-based query optimization, we adopt the Iterative Dynamic Programming (IDP) technique [25] and enhance it for streaming workloads. Below, we provide background information about the original IDP technique.

Algorithms based on dynamic programming lay in the core of query optimization. While these algorithms produce good optimization results (i.e., good query execution plans), its high complexity can be restrictive for optimizing complex queries or multiple queries. Optimization algorithms that are based on the IDP principle propose several advantages to deal with highly complex queries. IDP-based algorithms include both dynamic and iterative programming techniques. Thus, these algorithms are adaptive and produce as good plans as dynamic programming based algorithms if dynamic programming is viable. If dynamic programming is not viable (e.g., the problem is too complex), then IDP variants still are able to produce as-good-as possible plans. Also, existing dynamic programming based query optimizers can be easily extended to their IDP counterparts. There are two main variants of the IDP approach: IDP1 and IDP2. In this thesis, we adopt and enhance IDP1. We explain this algorithm below and refer to it as IDP throughput the thesis.

The main idea behind IDP is toi)break the query into subqueries containing join trees with up tok relations,ii)calculate the cost of each tree,iii)greedily choose the cheapest plan,iv)replace the cheapest one by a compound relation, andv)start the process all over again. Figure 2.2 shows an example query optimization scenario with IDP. The example join query includes 5 relations with block sizek=3. The

2.3 Query Optimization

A B C D E

⋈

A B

⋈

C A

⋈

E D

. . .

⋈

D A

Step 1: Access Plans

Step 2: 2-way joins

⋈

A B

⋈

A B

⋈

A B

⋈

E . . .

⋈

E D

⋈

C Step 3: 3-way joins

cost=300 cost=13 cost=200 cost=2387

Step 4: Select subplan, start 2nd iteration

⋈

A B

⋈

= D

T C E

⋈

T C

⋈

E T

⋈

C E

Step 5: 2nd iteration 2-way joins

⋈

T E

⋈

cost=10000

⋈

C E

⋈

cost=1000000

Step 6: 2nd iteration 3-way joins

Step 7: Final plan

⋈

T E

⋈

C =

⋈

A B

⋈

Figure 2.2: Optimizing a 5-way join query with IDP (k=3)

first three steps are similar to classic dynamic programming, meaning that the algorithm generates access plans, 2-way and 3-way join plans, calculates the optimal QEP, and prunes suboptimal plans. Because we choose the block size to be 3 (k=3), the algorithm breaks in Step 4, and greedily chooses the subplan with the lowest cost (T). All other plans containing one or more tables considered in the selected plan are discarded. In Step 5 IDP starts the second iteration with C, E, and T. This process continues until the final plan is computed (Step 7 in the example).

In the special case wherek is equal to the number of relations in the input query (e.g., for smaller problems), IDP calculates the optimal solution. Thus, tuningk provides a good compromise between runtime and optimality. Because the algorithm combines greedy heuristics with dynamic programming, it is able to scale to large problems.

Benchmarking Distributed Stream Data 3

Processing Engines

This Chapter contains:

3.1 Introduction . . . 19 3.2 Related Work . . . 19 3.2.1 Batch Processing . . . 20 3.2.2 Stream Processing . . . 20 3.3 Benchmark Design Decisions . . . 21 3.3.1 Simplicity is Key . . . 21 3.3.2 On-the-fly Data Generation vs. Message Brokers . . . 21 3.3.3 Queues Between Data Generators and SUT Sources . . . 22 3.3.4 Separation of Driver and the SUT . . . 22 3.4 Metrics . . . 23 3.4.1 Latency . . . 23 3.4.1.1 Event-time vs. Processing-time Latency . . . 24 3.4.1.2 Event-time Latency in Windowed Operators . . . 24 3.4.1.3 Processing-time Latency in Windowed Operators . . . 25 3.4.2 Throughput . . . 26 3.4.2.1 Sustainable Throughput. . . 26 3.5 Workload Design . . . 28 3.5.1 Dataset . . . 28 3.5.2 Queries . . . 28 3.6 Evaluation . . . 29 3.6.1 System Setup . . . 29 3.6.1.1 Tuning the Systems . . . 29 3.6.2 Performance Evaluation . . . 30 3.6.2.1 Windowed Aggregations . . . 30 3.6.2.2 Windowed Joins . . . 32 3.6.2.3 Unsustainable Throughput . . . 35

1

2 3 0

2

Figure 3.1: Scope of Chapter 3: Performance Analysis of modern SPEs

3.6.2.4 Queries with Large Windows . . . 35 3.6.2.5 Data Skew . . . 35 3.6.2.6 Fluctuating Workloads . . . 36 3.6.2.7 Event-time vs. Processing-time Latency . . . 36 3.6.2.8 Observing Backpressure . . . 37 3.6.2.9 Throughput Graphs . . . 37 3.6.2.10 Resource Usage Statistics . . . 38 3.6.2.11 Multiple Stream Query Execution . . . 39 3.6.3 Discussion . . . 39 3.7 Conclusion . . . 40 The need for scalable and efficient stream analysis has led to the development of many open-source SPEs with highly diverging capabilities and performance characteristics. While first initiatives try to compare the systems for simple workloads, there is a clear gap of detailed analyses of the systems’ performance characteristics. In this chapter, we present a framework for benchmarking distributed stream processing engines. We use our suite to evaluate the performance of three widely used SPEs in detail, namely Apache Storm, Apache Spark, and Apache Flink. Our evaluation focuses in particular on measuring the throughput and the latency of windowed operations, which are the basic type of operations in stream analytics. For this benchmark, we design workloads based on real-life, industrial use-cases inspired by the online gaming industry. The contribution of this chapter is threefold. First, we decouple the SUT from the test driver, in order to correctly represent the open-world model of typical stream processing deployments.

This separation enables our benchmark suite to measure system performance under realistic conditions.

Second, we give a definition of latency and throughput for stateful operators. Third, we propose the first benchmarking framework to define and test the sustainable performance of SPEs. Our detailed evaluation highlights the individual characteristics and use-cases of each system.

3.1 Introduction

3.1 Introduction

Processing large volumes of data in batch is often not sufficient when the new data have to be processed fast. For that reason, stream data processing has gained significant attention. The most popular SPEs, with large-scale adoption in industry and the research community, are Apache Storm [2], Apache Spark [3], and Apache Flink [5]. As a measure of popularity, we consider the systems’ community size, pull requests, number of contributors, commit frequency at the source repositories, and the size of the industrial community adopting the respective systems in their production environment.

An important application area of stream data processing is online video games. This application area requires the fast processing of large scale online data feeds from different sources. Windowed aggregations and windowed joins are two main operations that are used to monitor user feeds. A typical use-case is tracking the in-application-purchases per application, distribution channel, or product item (in-app products). Another typical use-case is the monitoring of advertising: making sure that all campaigns and advertisement networks work flawlessly, and comparing different user feeds by joining them. For example, monitoring the in-application-purchases of the same game downloaded from different distribution channels and comparing users’ actions are essential in online video game monitoring.

In this work, we propose a benchmarking framework to accurately measure the performance of SPEs.

For our experimental evaluation, we test three publicly available open-source engines: Apache Storm, Apache Spark, and Apache Flink. We use latency and throughput as the two major performance indicators.

Latency, in SPEs, is the time difference between the moment of data production at the source (e.g., the mobile device) and the moment that the SPE has produced an output. Throughput, in this scenario, determines the number of ingested and processed tuples per time unit.

Even though there have been several comparisons of the performance of SPEs recently [26, 27, 28], they did not measure the latency and throughput that can be achieved in a production setting. One of the repeating issues in previous work is the missing definition and inaccurate measurement of latency in stateful operators (e.g., joins). Moreover, previous work does not clearly separate the SUT and the test driver. Frequently, the performance metrics are measured and calculated within the SUT, resulting in incorrect measurements.

In this chapter, we address the above mentioned challenges. Our proposed benchmarking framework is generic with a clear design and well-defined metrics, which can be applied to any SPE. The main contributions of this chapter are as follows:

• We accomplish the complete separation of the test driver from the SUT.

• We introduce a technique to accurately measure the latency of stateful operators in SPEs. We apply the proposed method to various use-cases.

• We measure the maximum sustainable throughput of SPEs. Our benchmarking framework handles system-specific features like backpressure to measure the maximum sustainable throughput of a system.

• We use the proposed benchmarking system for an extensive evaluation of Storm, Spark, and Flink with practical use-cases.

Im Dokument Ad-hoc stream query processing (Seite 34-39)