K-ary Search - Exploiting SIMD for Query Execution

Exploiting SIMD for Query Execution

3.4 K-ary Search

The k-ary search was introduced by Schlegel et al. [SGL09] and bases on binary search. The binary search algorithm uses the divide-and-conquer paradigm. This paradigm works iteratively over a sorted list of keys by

3.4. K-ary Search dividing the search space equally in each iteration. The algorithm first iden-tifies the median key of a sorted list of keys. The median key serves as a separator that divides the search space in two equally sized sets of keys (so-calledpartitions). The left partition only contains keys that are smaller than the median key. In contrast, keys in the right partition are larger than the median key. After partitioning, the search keyv is compared to the median key. The search terminates if the search key is equal to the median key.

Otherwise, the binary search uses the left or right partition, depending on the greater-less relationship, as the input for the next iteration. In case of an empty partition, search keyvis not in the list of elements and the search terminates. For a key count n, the complexity is logarithmic and performs h = log₂n iterations in the worst case and h−(2^h −h−1)/n > h−2 on average [SGL09]. Figure 3.5 illustrates the binary search for v = 9 on a sorted list of 26 keys. The boxed keys form one partition and the underlined keys show the separators.

Figure 3.5: Binary search for key 9 and n = 26.

While binary search divides the search space into two partitions in each iteration, the k-ary search algorithm divides the search space intokpartitions by usingk−1separators. We utilize our aforementioned SIMD sequence (see Section 3.3.1) to create this increased number of partitions and separators.

As shown in Section 3.3.1, SIMD instructions are able to compare a list of keys with a search key in parallel. The number of parallel key comparisons depends on the data type and the available SIMD bandwidth. With param-eter k, k−1 separator keys are compared in one iteration which increases the number of partitions to k. Figure 3.6 illustrates the same search as in Figure 3.5 now using k-ary search. The binary search compares only one key at a time with a search key; thus, producing two partitions. In contrast, the k-ary search with k = 3 compares two keys in parallel with a search key and divides the search space into three partitions. As a result, the k-ary search terminates after three iterations while bink-ary search requires five iterations to find the search key. In general, k-ary search reduces the com-plexity to O(log_k(n))compared to O(log₂(n))for binary search. Assuming a commonly available SIMD bandwidth of 128-bit and a data type of 8-bit,

Figure 3.6: K-ary search for key 9, n = 26, and k = 3.

16 values can be compared in parallel. Using our definition ofk, 16 parallel comparisons result ink= 17. Therefore, the number of iterations is reduced by a factor of _log^log²⁽ⁿ⁾

k(n) = log₂(k) ≈ 4 for k = 17. The main restriction of SIMD instructions is their requirement for a sequential load of data. This requirement presupposes, that all keys that are loaded into one SIMD regis-ter with one SIMD instruction must be stored consecutively in main memory.

Load or store instructions using scatter and gather operations could allow a load/store of keys from distributed memory locations [Int12b]. However, only gather instructions are supported by CPUs and only by the newest micro-architectures [PRR15].

The keys in a sorted list are placed one key next to the other in lin-ear order as shown in Figure 3.6. Therefore, keys are placed in ascending or descending order, depending on their relationship to each other. This placement strategy is sufficient for binary search, but not amenable to k-ary search. In a linear sorted list of keys, possible separator keys are not placed in consecutive memory locations because several keys fall in between. For example, keys 8 and 17 in Figure 3.6 may be chosen as separators to par-tition the sorted list in three equally sized parpar-titions. After parpar-titioning, the separators and the search key must be compared to determine the input for the next iteration. When storing the list of keys in linear order, the separator keys are not placed next to each other in main memory and thus cannot be loaded with one SIMD instruction. To overcome this restriction, Schlegel et al. [SGL09] suggest to build a k-ary search tree from the sorted list of keys. They define a perfect k-ary search tree as: “[. . . ] every node – including the root node – has precisely k−1 entries, every internal node has ksuccessors, and every leaf node has the same depth.”.

The k-ary search tree is a logical representation that must be transformed for storage in main memory or on secondary storage. For this transforma-tion, Schlegel et al. [SGL09] propose tolinearize the k-ary search tree. The linearization procedure transforms a sorted list of keys into a linearized k-ary search tree. Figure 3.7 summarizes the transformation process. As a result, both separator keys are placed side by side and thus can be loaded with one SIMD instruction. In Section 3.5.1, we present two algorithms that use depth-first search or breath-first search for this transformation. Figure 3.8 illustrates a k-ary search for search key v = 9 on a breadth-first

Im Dokument Query Execution on Modern CPUs (Seite 52-55)