Iterators for Traversal-Based Query Execution

II. Execution of Queries over a Web of Linked Data 97

6. A Traversal-Based Strategy 111

6.4. Soundness and Completeness

7.1.2. Iterators for Traversal-Based Query Execution

We now adapt the iterators to define an implementation approach for our traversal-based query execution strategy. Hence, this approaches focuses on executing C_LD(M) queries over a Web of Linked Data. As in the static case discussed before, logical (query execu-tion) plans specify an order over the BGP of the C_LD(M) query that has to be executed and a corresponding physical plan is a pipeline of iterators. However, to implement a traversal-based query execution we require a different kind of iterators. In this section we introduce these iterators which we calllink traversing iterators.

All link traversing iterators in a pipelined execution plan share a data structure that represents the (currently) discovered subweb of the queried Web of Linked Data. As in the abstract proceduretbExecof our query execution model (cf. Section6.3.7, page125f), we denote this discovered subweb byD. During query executionDgrows monotonically.

To perform the initialization ofDat the beginning of iterator-based query executions we extend theOpenfunction of root iteratorI0. Listing7.3specifies the adjusted function Listing 7.3 Open function for the root iterator I₀ in our iterator implementation of traversal-based query execution (Functions GetNext and Close for I₀ are the same as in Listing 7.1on page 131).

Require:

S – a finite set of seed URIs (S⊂ U) W – the queried Web of Linked Data

D– the currently discovered part of W (note, all iterators have access to D) FUNCTION Open

1: D:=D_init(S,W₎ //D_init(S,W₎is theS-seed part ofW (cf. Definition6.5, page119) 2: ready:= true

7.1. Definition Listing 7.4 GetNext function for an iterator in our iterator implementation of tra-versal-based query execution (FunctionsOpen and Close are the same as in Listing7.2 on page 132).

Require:

tp– a triple pattern

I_pred – a predecessor iterator

W – the queried Web of Linked Data

D– the currently discovered part of W (all iterators in the pipeline access thisD) Ω_tmp– a set that allows the iterator to keep (precomputed) partial solutions between

calls of thisGetNext function; Ω_tmp is empty initially (cf. Listing7.2)

1: whileΩ_tmp =∅ do

2: µinput:=I_pred.GetNext// consume valuation from the input iterator 3: if µ_input =EndOfFilethen

4: return EndOfFile 5: end if

6: tp⁰ :=µ_input[tp]

7: G_snap:=AllData(D)

8: T :=t∈Gsnap

tis a matching triple for tp⁰

9: Ω_tmp:=µ_input∪µ⁰µ⁰ is a valuation with dom(µ⁰) = vars(tp⁰) andµ⁰[tp⁰]∈T

10: for all t∈T do

11: D:=EXP D, t, W//EXP D, t, W

denotes thet-expansion ofDinW 12: end for // (cf. Definition6.6, page120)

13: end while

14: µ:= an element in Ω_tmp

15: Ω_tmp:= Ω_tmp\ {µ}

16: return µ

for the link traversing version of I0. Functions GetNext and Close of this iterator do not require adjustments; hence, they are the same as in Listing7.1(cf. page 131).

For the link traversing version of the iterators that consume and report valuations we also extend the functionality of their static counterparts. However, in this case, we have to adjust the GetNext function (Open and Close remain the same as in Listing 7.2 on page132). Listing 7.4specifies the adjustedGetNext function. Differences between this GetNext function and the GetNext function for the static case are highlighted in the listing. These differences are twofold:

1. While iterators for the static case compute valuations over a fixed set of RDF triplesG, link traversing iterators use the set of all RDF triples in (a snapshot of) Dfor computing valuations (compare line9in Listing 7.2to line8in Listing7.4).

2. In addition to computing (and reporting) valuations, link traversing iterators also perform the incremental expansion of Dthat is characteristic for traversal-based

query execution (and not necessary in the static case). In particular, a link travers-ing iterator performs expand operations (per Definition 6.6, page 120) each time it precomputes the next version of its set Ω_tmp. For these operations the iterator uses the same set of matching triples that it uses for generating the valuations in Ω_tmp (cf. lines10 to12 in Listing 7.4).

Example 7.2. LetQ^B^ex^,S^ex be a C_LD(M)query with a setS_ex ={producer1}of seed URIs and a BGPB_ex ={tp₁, tp₂} that consists of the following two triple patterns:

tp₁ = (?product,producedBy,producer1), tp₂= (?previous,oldVersionOf,?product). For a traversal-based execution of this query over our example WebWex (cf. Example2.1, page 18) we assume a physical plan of link traversing iterators I₀ to I₂ such that I₀ is the root iterator and iteratorsI1 and I2 are responsible for triple patternstp1 and tp2, respectively. The sequence diagram in Figure7.2(a)illustrates an execution of this plan;

Figure 7.2(b) enumerates all valuations that the iterators report and consume during this execution, as well as all versions of the set Ω_tmp precomputed by each iterator.

In contrast to the sequence diagram that illustrates our example execution for the static case (cf. Figure 7.1(a), page 133), the diagram in Figure 7.2(a) contains an ad-ditional lifeline. This lifeline represents a (hypothetical) component that manages the currently discovered subweb Dof the queried Web of Linked Data. We emphasize that we do not assume (or require) an explicit existence of such a management component in any traversal-based query execution system. Instead, the purpose of this lifeline is to illustrate the points at which iterators attempt to expand D.

The example execution begins with the initialization of Dby root iteratorI₀. Hence, when iterator I1 executes itsGetNext function for the first time, Dconsists of a single LD document, namelyd_Pr1 =adocex(producer1). The data in this document contains two RDF triples that match triple patterntp₁= tp(I₁). We denote these triples byt_(1,1) and t_(1,2) (cf. Figure 7.2(b)). Based on these matching triples, iteratorI₁ precomputes two valuations: µ_(1,1) ={?product →product2}andµ_(1,2)={?product→product3}. Instead of immediately reporting one of these valuations to its predecessorI₂ (as it would happen in the static case), iteratorI₁ first uses the two matching triples to expandD. As a result, Dconsists of the LD documents dPr1,dp2 =adocex(product2), and dp3 =adocex(product3) when I₁ reports µ_(1,1) toI₂.

Based on a matching triple in data_ex(d_p2), iterator I₂ now precomputes valuation µ_(2,1) as an augmentation of µ_(1,1) (cf. Figure 7.2). Thus, I2 benefits from the previous expansion ofD that led to the discovery of LD document d_p2. The execution proceeds

as illustrated in Figure7.2(a). 2

As can be seen in the example, calling theGetNextfunction of a link traversing iterator may have the side effect of expanding D (as desired for an implementation of travers-al-based query execution). Since all iterators share the same data structure for D, the next iterator that computes valuations may benefit immediately from this expansion (and from all previous expansions). For instance, in the previous example, the expand operations performed by iterator I₁ enable iteratorI₂ to compute valuations.

Im Dokument Querying a Web of Linked Data (Seite 146-149)