• Keine Ergebnisse gefunden

2.3 XPath Operators

2.3.1 Descendant and Ancestor Axis

Recent work [GKT03] has been done on developing a new join algorithm, the so called staircase join, that encapsulates “tree awareness” inside the database oper-ators for ancestor and descendant step evaluation. For explanation, we first stick to the descendant case only and split its execution in two logical parts: an initial preprocessing of the context set and the final staircase join evaluation thereafter.

Pruning If we choose the node sequence (b, d, i, j) in the sample document (Fig. 2.2) as the context set for a descendant step, a look at the document tree makes us re-alize that all descendants of the nodes dandj are already contained in the result of (b, i)/descendant. Generally, for any two nodesv, w∈D

w∈v/descendant ⇒ w/descendant⊂v/descendant, w∈v/ancestor ⇒ w/ancestor⊂v/ancestor.

For a descendant step, it is thus equivalent with respect to the result, to reduce the context set by excluding those nodes which are themselves descendants of other context nodes. The method to perform this preprocessing, calledpruning, scans the context set once in ascending preorder. With a markermaxpost storing the highest postorder value visited so far, it simply skips the nodes with postorder values smaller

document node

Figure 2.4: Pruning for the context sequence (b, d, i, j) builds a staircase shaped search region.

thanmaxpost while copying other nodes to the resulting pruned context setCSpr

(Algorithm 1).

It is important to notice that pruning not only reduces the context set, but also guarantees that all remaining context nodes relate to each other on the preced-ing/following axis. Together with the vertical and horizontal limits of the pre/post plane, CSpr defines exact bounds of the search region. Figure 2.4 shows such a pruned set and the region it marks. The staircase-like shape of that region gives the name for the following algorithm.

Staircase Join After the preprocessing is done it remains to scan the pre/post plane for all the nodes within the staircase region. The basic approach here is to vertically partition the pre/post plane along the preorder values of all context nodes, which means to evaluate the staircase “step by step”. For every partition, all nodes within it are tested as to whether their postorder value is beyond or under the step boundary. Partitioning the pre/post plane, however, does not cause additional work. Since the pre/post table is sorted on preorder values, it suffices to scan it in ascending order while the postorder predicate for the comparison changes dynamically with each step. Therefore we could characterize the algorithm as a merge join with a dynamic range predicate. The basic framework of the staircase Algorithm 1: Context set pruning for descendant staircase join

prunecontext desc(context: table(pre,post) sorted in ascending preorder)≡ begin

result←new table(pre,post);

maxpost←0;

foreachciincontextdo ifpost(ci)> maxpostthen

insertciinresult;

maxpost←post(ci);

returnresult;

end

join as described here is presented in Algorithm 2.

Algorithm 2: Staircase join algorithm for descendant axis

staircasejoin desc(doc prepost: table(pre,post),context: table(pre,post))≡ begin

result←new table(pre,post);

foreachpair (ci, ci+1)incontextdo

scanpartition desc(pre(ci),pre(ci+1),post(ci));

c← last node incontext;

n← last node indoc prepost;

scanpartition desc(pre(c),pre(n),post(c));

returnresult;

end

scanpartition desc(pref rom, preto, postmax)≡ begin

fori frompref romtopretodo

ifpost(doc prepost[i])< postmax then appenddoc prepost[i]to result;

else

break; /* skipping */

end

Although we described the pruning of the context set as a separate preceding process, it can as well be integrated in the main evaluation procedure. Looking ahead from context node ci at the next one ci+1, ci+1 is disregarded and therefore simply skipped if it lies in descendant position to ci. “On the fly” pruning thus avoids the intermediate writing and re-reading ofCSpr.

Examination of the staircase join result reveals further advantages of the algo-rithm. Due to the sequential scan over the pre/post relation, the result set remains preorder sorted andduplicate free, in contrast to a “node by node” axis evaluation of the context set. Hence, no additional postprocessing is needed to meet XPath semantics.

The basic staircase join is already very efficient in evaluating descendant steps because it allows to access all the data in single sequential scans. Nevertheless, it is possible to further optimize its execution by introducing more “tree aware”

adaptations. Figure 2.3 in the last section shows that for preorder-sorted nodes all descendants of a single node follow that node in a dense block. This knowledge allows to apply the already mentioned techniques:

Skipping Regarding a single partition of the staircase, the first appearance of a node with a postorder value exceeding the staircase boundary indicates the end of the descendant block corresponding to the current context node ci. All further nodes within that partition lie beyond the postorder limitpost(ci) and therefore can be skipped, which means that the scanning cursor on the pre/post table is moved to the preorder value of the next context nodepre(ci+1).

Copy without Test The inequality (2.11) defines lower bounds for the number of nodes within descendant blocks, which are very close to their actual sizes. In order to save CPU costs of postorder comparisons, the nodes within these lower bounds can be copied to the result set without any further test.

Applying both techniques limits the numbernof postorder comparisons during the whole staircase join by

n≤height(TD)∗ |CSpr|.

Ancestor Axis Pruning could similarly be applied to a context set for ancestor steps, in which case all those nodes being themselves ancestors of other context nodes are eliminated. The pruning algorithm, however, would require to process the context nodes in reverse order, and thus could not be done “on the fly”. In contrast to the descendant axis, the tree properties of the node set ensure correct staircase join evaluation also for non-pruned context sequences. Scanning of any partition [ci, ci+1[ only has to includeciitself, to check whetherciis an ancestor of ci+1.

As a further difference to the descendant axis, ancestors of a single node v are not clustered together, but are located separately between blocks of preceding nodes (Fig. 2.3). Therefore, skipping and copying without test cannot be applied analogously. Nevertheless, it is possible to perform a slightly less effective skipping of non-relevant preceding blocks. If, while scanning the pre/post plane, a preceding nodexwith respect to the current context nodeciis encountered, all descendants of this node, i.e.,x/descendant, are on the preceding axis of ci as well. Since we are able to define a lower bound for the size of this block|x/descendant|, the scanning cursor could be advanced by that number without further tests.