Relation ABCD
6.2. Heuristics on Hybrid DBS and DBMS
In HDBMSs/HDBSs, we have to consider cost-based and rule-based optimization for local and global optimization levels. First, we perform optimization – regardless if rule- or cost-based – on the global level; that is, we need an architecture-independent optimization approach. This is the main task of AQUA2. And second, we argue that local optimization is dependent on the corresponding architecture. Hence, we pro-pose the application of existing optimization approaches (i.e., query engines with optimizer) of Row Stores and Column Stores for the architecture-dependent opti-mization – the local level. We present a rule-based optiopti-mization approach for our interface due to the lack of architecture-independent (cost-based) optimizer. We op-timize queries on the global level based on architecture-independent heuristics and rules (cf. Section 6.2); what in our case corresponds to query dispatching without query rewriting. Queries are locally rewritten by architecture-dependent optimizers on either architecture. Consequently, we reduce the solution space from global to local optimization with our interface. At local optimization level, we reuse existing functionality. Rule- and/or cost-based optimization is common in DBMSs. Hence, we state that native optimizers (local optimization) achieve best performance (i.e., optimization result) because they are tailor-made implemented for the correspond-ing DBMS and thus, for the correspondcorrespond-ing architecture. Furthermore, we cause minimum possible overhead for the overall optimization with our approach, because query optimization remains unchanged and rule-based query dispatching only adds slight computational overhead.
Finally, an approach for global cost-based optimization is also conceivable. Such an approach can be promising for global optimization goals (e.g., minimal total time). However, we argue that the solution space significantly increases for cost-based optimization, whenever we apply it to two architectures. That is, the computational cost also increases for cost-based query plans. As cost-based optimization causes high computational cost for single architecture already, most (commercial) DBMSs first prune the solution space by rule-based optimization [GD87, Fre87, Sel88]. We suggest that an architecture-independent optimizer for several architectures does not achieve competitive results compared to a tailor-made optimizer for a certain architecture and system. Consequently, we implement rule-based optimization in our query interface; whereas we keep cost-based query optimization at the corresponding architecture (i.e., its query optimizer).
1 SELECT SUM(l_extendedprice∗l_discount)ASrevenue 2 FROMlineitem
3 WHEREl_shipdate >=date’1994−01−01’ANDl_shipdate <date’1994−01−01’ +interval’1’year ANDl_discountBETWEEN.06−0.01and.06 + 0.01ANDl_quantity < 24;
Figure 6.3.: TPC-H query Q6 [Tra10].
with results in this section. We discuss only query-processing heuristics for OLAP and mixed workloads due to our observations that column stores are not competitive to row stores for OLTP in general.
OLAP. In the OLAP domain, we face a huge amount of data that in general is not frequently updated. Column stores reduce the amount of data significantly due to aggressive compression. That is, more data can be loaded in main memory as well as overall I/O between storage and main memory is reduced. We state, I/O reduction is a major benefit of column stores. We observe that row stores perform on many OLAP queries worse due to fact that CPUs idle most time while waiting for I/O from storage (cf. Section 6.4). Moreover, row stores read unnecessary data for most OLAP queries (e.g., query Q6 (cf. Listing 6.3) from TPC-H benchmark [Tra10]).
That is, we observe that only a few columns of schema’s relations are accessed (e.g., 4 out of 16 for Q6). Simultaneously, aggregate functions belong to this execution schema, too (i.e., usually aggregates refer to one column). We state, column stores outperform row stores for OLAP as long as only a minority of columns has to be accessed for aggregation and predicate selection. We only dispatch OLAP queries to row stores for load-balancing reasons.
Complex OLAP Queries. OLAP analysis becomes more and more complex; thus, queries become more complex, too. Complex OLAP queries describe complex issues and generate large (complex) reports (e.g., query Q13 may be part of a report or of a more complex query). For complex OLAP queries, a major part of relations have to be accessed and aggregated (i.e., nearly the whole relation has to be read). We argue, such queries imply a number of tuple reconstructions that reduce performance of column stores significantly. Hence, row stores achieve competitive performance for these queries. ForQ13(cf. Listing 6.4), we observe another issue that causes a number of tuple reconstructions – group operations; that is, tuples have to be reconstructed precociously. We state that selection on dependent predicates, complex joins, and sub-queries are further issues that reduce performance through tuple reconstructions significantly. We argue, these queries are candidates to be dispatched to row stores (e.g., to balance load).
OLTP/OLAP. For mixed workload environments, our first recommendation is to split workload into two parts – OLTP and OLAP. We argue that both parts are allocated to the corresponding part of the HDBS or HDBMS (i.e., OLTP to row
1 SELECTc_count,COUNT(∗)AScustdist 2 FROM(SELECTc_custkey,COUNT(o_orderkey)
3 FROMcustomerLEFT OUTER JOINordersONc_custkey = o_custkeyANDo_commentNOT LIKE’%special%request%’
4 GROUP BYc_custkey)ASc_orders (c_custkey, c_count) 5 GROUP BYc_countORDER BYcustdistdesc, c_countdesc;
Figure 6.4.: TPC-H query Q13 [Tra10].
Feature Column Store Behavior Background
Space Reduced by factor Aggressive compression (e.g., optimal consumption ≈10(HDD & main memory) compression per data type)
Compressed data Query processing without Does not apply to all compression decompression types nor to all operations Data transfer Reduced and less disk swapping More data fits in memory due to
aggressive compression
CPU Increased Compression and tuple reconstruction
consumption cause CPU load
(Point) Lookup Fast response True for OLAP and OLTP workloads Joins Fast for foreign key join Processing on indexes/single column
Slow for complex join (many columns) Tuple reconstruction needed Slow for (full) outer join Tuple reconstruction needed Predicate Fast on independent predicates May processed in parallel
selection Slow on dependent predicates Highly dependent on structure and dependencies (e.g., reconstruct tuples) Vector Fast processing (e.g., bitmap join) Inherently supported and easily
operations adaptable
Aggregates and no I/O overhead Occur frequently in OLAP column operation
Parallelization For inter- and intra-query Not for ACID transactions with (e.g. parallel aggregation) write operations
Table 6.1.:Insight summary of store qualities for design.
store and OLAP to column store). With this approach, we achieve competitive performance for both OLAP and OLTP. We do not recommend a more fine-grained splitting for OLAP-workload parts due to the fact that intra-query parallelization demands for a separate query engine. Nevertheless, our split methodology can be extended for load balancing whenever one store is overloaded (i.e., we dispatch to other store when resources are available).
We summarize the major insights from above and from Section 5.4.2 in Table 6.1.
We present our insights for column stores due to the fact that row store’s behavior is contrary. Both architectures have advantages as well as disadvantages. We emphasize advantages of column stores (cf. Table 6.1); nevertheless, column stores still perform worse on update operations and concurrent non-read-only data access, respectively.
That is, frequent updates and consistency checks cause tuple reconstructions on column-stores due to the inherently partitioned data; and thus, significant cost. In
Origin Statement Result
Transactions (DML) Perform best onRS Execute always onRS (Point) Lookups Perform well onRSandCS Without overload onRS
To load balance onCSif no real-time requirement
Aggregates Perform best onCS Execute always onCS
Grouping Perform best onCSexcept for Execute always onCS
∗-operator (e.g., TPC-HQ4) ToRSfor load balancing and≥10columns (e.g., TPC-HQ1)
Predicates Distribute for parallelization only depending Dependent predicates toRS on type of predicate (e.g., TPC-HQ8,Q13) Independent predicates toCS Sub-query Distribute for parallelization only ToCS: Comprised of aggregates,
dependent on sub-query type independent predicates, foreign-key joins (e.g., TPC-HQ2) or to RS: for sub-queries with depen-dent predicates (e.g., TPC-HQ11) Joins Distribute for parallelization only Dispatch foreign-key joins to
depending on foreign-key, outer, CSto get a key list
multi-column join Dispatch outer and multi-col.
joins toRSfor intermediates
Table 6.2.:Rules for the online query dispatcher in AQUA2.
analytical scenarios, row stores read a lot of unnecessary data because operations are often based on single columns or small subsets of them. Additionally, row stores do not reach compression ratios like column stores because different data types are combined in tuples that cannot be compressed on high ratio as columns that have just one data type and in best case a more uniform structure and length. That is, data size is already larger than for a column store which implies more I/O and more main memory consumption. In line with others [AMH08, ZNB08], we conclude that both architectures have the right to exist in different applications domains. Neither row stores cannot outperform column stores in their domain nor vice versa.
Therefore, we focus our approach on gray areas between domains (i.e., mixed (OLTP/OLAP) workloads, complex OLAP queries). That is, we introduce a heuristic-based decision framework (i.e., rules) that dispatches queries to the op-timal store independently when we apply to HDBMS or HDBS [LSKS12, LKS12].
We assume for our set of rules that a consistency-control approach is available (i.e., replication mechanism). We present our rules for the online query dispatcher in Table 6.2. Hence, we present rules for inter- as well as intra-query parallelization as outcome of our research; even though intra-query parallelization demands for a (global) query engine (e.g., as for distributed DBMS [ED95, ÖV11]). However, our research is not focused on query-engine development; thus, our query interface sup-ports currently inter-query parallelization. Consequently, we implement a subset of rules in a first step (cf. Section 6.4.3). We discuss improvements to our approach in Section 6.4.5 and refer to Section 6.3 for discussion on real-time and time-bound requirements. Nevertheless, we do not claim generality for design and query
process-ing on row-, column-, and hybrid stores. We are aware, further aspect for design and query processing for a domain exist, even though these may be no technical aspects.
Some aspect are in favor of row stores:
Tuning: Self-tuning techniques and advisors are more advanced that simplify ad-ministration and tuning,
User behavior: Intuitive processing of SQL (i.e., easier to predict, understand, de-velop for),
Training: Most (IT-affine) people are familiar with row-store technology; thus, train-ing costs are less,
Abstraction: Support for a wide field of data management tasks (i.e., mixed require-ments with point- and range-queries, full scans for exploration).