• Keine Ergebnisse gefunden

Iterators for Traversal-Based Query Execution

Im Dokument Querying a Web of Linked Data (Seite 146-149)

II. Execution of Queries over a Web of Linked Data 97

6. A Traversal-Based Strategy 111

6.4. Soundness and Completeness

7.1.2. Iterators for Traversal-Based Query Execution

We now adapt the iterators to define an implementation approach for our traversal-based query execution strategy. Hence, this approaches focuses on executing CLD(M) queries over a Web of Linked Data. As in the static case discussed before, logical (query execu-tion) plans specify an order over the BGP of the CLD(M) query that has to be executed and a corresponding physical plan is a pipeline of iterators. However, to implement a traversal-based query execution we require a different kind of iterators. In this section we introduce these iterators which we calllink traversing iterators.

All link traversing iterators in a pipelined execution plan share a data structure that represents the (currently) discovered subweb of the queried Web of Linked Data. As in the abstract proceduretbExecof our query execution model (cf. Section6.3.7, page125f), we denote this discovered subweb byD. During query executionDgrows monotonically.

To perform the initialization ofDat the beginning of iterator-based query executions we extend theOpenfunction of root iteratorI0. Listing7.3specifies the adjusted function Listing 7.3 Open function for the root iterator I0 in our iterator implementation of traversal-based query execution (Functions GetNext and Close for I0 are the same as in Listing 7.1on page 131).

Require:

S – a finite set of seed URIs (S⊂ U) W – the queried Web of Linked Data

D– the currently discovered part of W (note, all iterators have access to D) FUNCTION Open

1: D:=Dinit(S,W) //Dinit(S,W)is theS-seed part ofW (cf. Definition6.5, page119) 2: ready:= true

7.1. Definition Listing 7.4 GetNext function for an iterator in our iterator implementation of tra-versal-based query execution (FunctionsOpen and Close are the same as in Listing7.2 on page 132).

Require:

tp– a triple pattern

Ipred – a predecessor iterator

W – the queried Web of Linked Data

D– the currently discovered part of W (all iterators in the pipeline access thisD) Ωtmp– a set that allows the iterator to keep (precomputed) partial solutions between

calls of thisGetNext function; Ωtmp is empty initially (cf. Listing7.2)

1: whiletmp =∅ do

2: µinput:=Ipred.GetNext// consume valuation from the input iterator 3: if µinput =EndOfFilethen

4: return EndOfFile 5: end if

6: tp0 :=µinput[tp]

7: Gsnap:=AllData(D)

8: T :=tGsnap

tis a matching triple for tp0

9:tmp:=µinputµ0µ0 is a valuation with dom(µ0) = vars(tp0) andµ0[tp0]∈T

10: for all tT do

11: D:=EXP D, t, W//EXP D, t, W

denotes thet-expansion ofDinW 12: end for // (cf. Definition6.6, page120)

13: end while

14: µ:= an element in Ωtmp

15:tmp:= Ωtmp\ {µ}

16: return µ

for the link traversing version of I0. Functions GetNext and Close of this iterator do not require adjustments; hence, they are the same as in Listing7.1(cf. page 131).

For the link traversing version of the iterators that consume and report valuations we also extend the functionality of their static counterparts. However, in this case, we have to adjust the GetNext function (Open and Close remain the same as in Listing 7.2 on page132). Listing 7.4specifies the adjustedGetNext function. Differences between this GetNext function and the GetNext function for the static case are highlighted in the listing. These differences are twofold:

1. While iterators for the static case compute valuations over a fixed set of RDF triplesG, link traversing iterators use the set of all RDF triples in (a snapshot of) Dfor computing valuations (compare line9in Listing 7.2to line8in Listing7.4).

2. In addition to computing (and reporting) valuations, link traversing iterators also perform the incremental expansion of Dthat is characteristic for traversal-based

query execution (and not necessary in the static case). In particular, a link travers-ing iterator performs expand operations (per Definition 6.6, page 120) each time it precomputes the next version of its set Ωtmp. For these operations the iterator uses the same set of matching triples that it uses for generating the valuations in Ωtmp (cf. lines10 to12 in Listing 7.4).

Example 7.2. LetQBex,Sex be a CLD(M)query with a setSex ={producer1}of seed URIs and a BGPBex ={tp1, tp2} that consists of the following two triple patterns:

tp1 = (?product,producedBy,producer1), tp2= (?previous,oldVersionOf,?product). For a traversal-based execution of this query over our example WebWex (cf. Example2.1, page 18) we assume a physical plan of link traversing iterators I0 to I2 such that I0 is the root iterator and iteratorsI1 and I2 are responsible for triple patternstp1 and tp2, respectively. The sequence diagram in Figure7.2(a)illustrates an execution of this plan;

Figure 7.2(b) enumerates all valuations that the iterators report and consume during this execution, as well as all versions of the set Ωtmp precomputed by each iterator.

In contrast to the sequence diagram that illustrates our example execution for the static case (cf. Figure 7.1(a), page 133), the diagram in Figure 7.2(a) contains an ad-ditional lifeline. This lifeline represents a (hypothetical) component that manages the currently discovered subweb Dof the queried Web of Linked Data. We emphasize that we do not assume (or require) an explicit existence of such a management component in any traversal-based query execution system. Instead, the purpose of this lifeline is to illustrate the points at which iterators attempt to expand D.

The example execution begins with the initialization of Dby root iteratorI0. Hence, when iterator I1 executes itsGetNext function for the first time, Dconsists of a single LD document, namelydPr1 =adocex(producer1). The data in this document contains two RDF triples that match triple patterntp1= tp(I1). We denote these triples byt(1,1) and t(1,2) (cf. Figure 7.2(b)). Based on these matching triples, iteratorI1 precomputes two valuations: µ(1,1) ={?product →product2}andµ(1,2)={?product→product3}. Instead of immediately reporting one of these valuations to its predecessorI2 (as it would happen in the static case), iteratorI1 first uses the two matching triples to expandD. As a result, Dconsists of the LD documents dPr1,dp2 =adocex(product2), and dp3 =adocex(product3) when I1 reports µ(1,1) toI2.

Based on a matching triple in dataex(dp2), iterator I2 now precomputes valuation µ(2,1) as an augmentation of µ(1,1) (cf. Figure 7.2). Thus, I2 benefits from the previous expansion ofD that led to the discovery of LD document dp2. The execution proceeds

as illustrated in Figure7.2(a). 2

As can be seen in the example, calling theGetNextfunction of a link traversing iterator may have the side effect of expanding D (as desired for an implementation of travers-al-based query execution). Since all iterators share the same data structure for D, the next iterator that computes valuations may benefit immediately from this expansion (and from all previous expansions). For instance, in the previous example, the expand operations performed by iterator I1 enable iteratorI2 to compute valuations.

Im Dokument Querying a Web of Linked Data (Seite 146-149)