• Keine Ergebnisse gefunden

Result Containment and Infiniteness

Im Dokument Querying a Web of Linked Data (Seite 76-81)

I. Foundations of Queries over a Web of Linked Data 13

3. Full-Web Query Semantics 33

4.2. Result Containment and Infiniteness

Definition4.4defines precisely what the sound and complete result of any SPARQLLD(R) query over any Web of Linked Data W is. However, in contrast to SPARQLLD (as discussed in Chapter 3), there is no guarantee that such a (complete) SPARQLLD(R) query result is complete w.r.t. all data inW since the corresponding (S, c, P)-reachable subweb of W may not cover W as a whole. We emphasize that such an incomplete coverage is even possible for the reachability criterioncAll because the link graph ofW

4.2. Result Containment and Infiniteness may not be connected; therefore, thecAll-semantics differs from the full-Web semantics.

The following proposition relates the result of any SPARQLLD(R) query to the result of its SPARQLLD counterpart.

Proposition 4.1. Let QP,Sc be a SPARQLLD(R) query; letQP be the SPARQLLD query that uses the same SPARQL expression P as used by QP,Sc ; let W be a Web of Linked Data. Then, the following two properties hold:

1. QP,Sc (W) =QP(R) with R being the (S, c, P)-reachable subweb of W.

2. If QP is monotonic, thenQP,Sc (W)⊆ QP(W).

Proof. We first prove Property 1: By Definition 4.4, QP,Sc (W) = [[P]]AllData(R) (cf.

page 63) and, by Definition 3.1, QP(R) = [[P]]AllData(R) (cf. page 42). Hence, we have QP,Sc (W) =QP(R), as stated.

We now focus on Property 2: Suppose SPARQLLD query QP is monotonic. By Def-inition 4.3 (cf. page 63), R is an induced subweb of W. Therefore, by Proposition 2.1 (cf. page 21), AllData(R) ⊆AllData(W) holds. Then, QP(R) ⊆ QP(W) because QP is monotonic. Using the previously shown Property1, we concludeQP,Sc (W)⊆ QP(W).

Since the result of any SPARQLLD query over a finite Web of Linked Data is finite, we may use Proposition 4.1(Property1) to show the same for SPARQLLD(R) queries:

Corollary 4.1. The result of any SPARQLLD(R) queryQP,Sc over a finite Web of Linked DataW is finite, and so is the (S, c, P)-reachable subweb of W.

Proof. LetW = (D, data, adoc), and letR = (DR, dataR, adocR) be the (S, c, P )-reach-able subweb of W. We first show finiteness for R: By Definition 4.3, R is an induced subweb ofW (cf. page63) and, thus, by Definition2.4, we have DRD (cf. page21).

Then, using the finiteness ofD,DR is finite and henceR is finite.

Given the finiteness of R, the finiteness of QP,Sc (W) follows directly from (i) Propo-sition 4.1, Property 1, and (ii) Proposition 3.5 (which shows that the result of any SPARQLLD query over a finite Web of Linked Data is finite; cf. page 57).

Corollary 4.1 focuses on a finite Web of Linked Data. Now, we study the implications of querying an infinite Web of Linked Data (using reachability-based query semantics).

We first take a look at some example queries:

Example 4.3. For the example we assume the same infinite Web of Linked DataWinf as used in Example3.4(cf. page 58). We recall that Winf = (Dinf, datainf, adocinf) contains an LD documentdiDinf for every integeri∈Z, that is, adocinf(noi) = di where URI

noi ∈ U identifies integer i. The data of each of these documents consists of two RDF triples that refer to the predecessor and to the successor of the corresponding integer:

datainf(di) =(noi,pred,noi−1),(noi,succ,noi+1) for alli∈Z. Furthermore, as a basis for two SPARQLLD queries, Example 3.4 uses two triple patterns: tp1 = (no0,succ,?v) and tp2 = (?x,succ,?y).

We now revisit this example in the context of reachability-based query semantics. We consider the aforementioned reachability criteria cAll, cMatch, and cNone (cf. page 62 in Section4.1) and use URI no0 as seed URI; i.e., S ={no0}.

First, we focus on triple pattern tp1: If we assume adocinf(pred) = adocinf(succ) =⊥, then the (S, cAll, tp1)-reachable subweb of Winf consists of the LD documents for all integers and, thus, is infinite. In contrast, the corresponding reachable subwebs for cMatch and cNone are finite: The (S, cMatch, tp1)-reachable subweb of Winf consists of LD documents d0 and d1, whereas the (S, cNone, tp1)-reachable subweb ofWinf consists only of d0. Irrespective of these differences, the query result is the same in all three cases:

Qtpc1,S

All (Winf) =Qtpc1,S

Match(Winf) =Qtpc1,S

None(Winf) ={?v→no1} .

We now consider triple patterntp2: UndercNone-semantics the query result is the same as in the case oftp1 because the (S, cNone, tp2)-reachable subweb of Winf consists only of LD document d0 (as before). For cAll and cMatch the reachable subwebs are infinite but different: The (S, cAll, tp1)-reachable subweb ofWinf consists, again, of the LD documents for all integers, whereas the (S, cMatch, tp1)-reachable subweb of Winf consists of the LD documentsd0, d1, d2, .... The query results for both criteria are also infinite and different from each other: Qtpc2,S

All (Winf) = ... ,{?x →no-1,?y → no0},{?x → no0,?y → no1}, ...

and Qtpc2,S

Match(Winf) ={?x→no0,?y→no1},{?x→no1,?y→no2}, ... ⊂ Qtpc2,S

All (Winf). 2 The example illustrates that, for the case of aninfinite Web of Linked Data, the results of SPARQLLD(R) queries may be either finite or infinite. In Example 3.4 we found the same heterogeneity for SPARQLLD queries (cf. page 58). However, for SPARQLLD(R) we may identify dependencies between query results and the corresponding reachable subwebs of the queried Web:

Proposition 4.2. Let S ⊆ U be a finite set of URIs; let cbe a reachability criterion; let P be a SPARQL expression; let W be a (potentially infinite) Web of Linked Data, and let R denote the (S, c, P)-reachable subweb of W. Then, the following properties hold:

1. If R is finite, then QP,Sc (W) is finite.

2. If QP,Sc (W) is infinite, thenR is infinite.

3. If c is cNone, then R is finite, and so is QP,Sc

None(W).

Proof. Property 1: By Proposition 4.1 (Property 1), it holds that QP,Sc (W) =QP(R) where QP is the SPARQLLD query that uses the same SPARQL expression as used by QP,Sc . Since the result of any SPARQLLDquery over a finite Web of Linked Data is finite (as shown in Proposition3.5, page57), Property1 follows immediately.

Property 2: SupposeQP,Sc (W) is infinite. We use proof by contradiction, that is, we assumeR is finite. Then, by Property1,QP,Sc (W) is also finite, a contradiction. Hence, R must be infinite.

Property3: LetW = (D, data, adoc) andR= (DR, dataR, adocR). SupposeciscNone. Since cNone always returns false, it is easily verified that there does not exist an LD documentdDthat satisfies Case2in Definition4.2(cf. page62). Hence,DRcontains the seed documents only, that is, DR = dDuS and adoc(u) = d (cf. Case 1 in Definition 4.2). Since S is finite, DR is finite, and so is R. Then, the finiteness of QP,Sc

None(W) follows by Property 1.

4.2. Result Containment and Infiniteness Proposition4.2provides valuable insight into the dependencies between the (in)finiteness of reachable subwebs of an infinite Web and the (in)finiteness of query results. In prac-tice, however, we are primarily interested in answering the following questions: Does the execution of a given SPARQLLD(R)query reach an infinite number of LD documents? Do we have to expect an infinite query result? We formalize these questions as the following LD decision problems and discuss them in the remainder of this section.

LD Problem: FinitenessReachablePart

Web Input: a (potentially infinite) Web of Linked DataW Ordin. Input: a SPARQLLD(R) queryQP,Sc

Question: Is the (S, c, P)-reachable subweb ofW finite?

LD Problem: Finiteness(SPARQLLD(R))

Web Input: a (potentially infinite) Web of Linked DataW Ordin. Input: a SPARQLLD(R) queryQP,Sc

Question: Is query result QP,Sc (W) finite?

As in the case of Finiteness(SPARQLLD), discussed on page 58, an LD machine can trivially decide Finiteness(SPARQLLD(R)) for unsatisfiable1 SPARQLLD(R) queries.

In contrast, the satisfiability property of queries is irrelevant for FinitenessReach-ablePart: The reachable subweb of a queried Web of Linked Data may be infinite regardless of whether the corresponding SPARQLLD(R) query is unsatisfiable or satisfi-able. Nonetheless, for a particular class of SPARQLLD(R) queries we can rule out the existence of infinitely large reachable subwebs. This class comprises all queries that use a reachability criterion that ensures the finiteness of reachable subwebs in any possible case by definition. We define such property of reachability criteria as follows:

Definition 4.5 (Ensuring Finiteness). A reachability criterion c ensures finiteness if, for any Web of Linked Data W, any (finite) set S ⊆ U of URIs, and any SPARQL expression P, the (S, c, P)-reachable subweb ofW is finite. 2 From the reachability criteria discussed so far, only cNoneensures finiteness (see Propo-sition 4.2); this property does not hold for cAll and cMatch (as shown by Example 4.3).

We refer to the next section, in particular, Subsections 4.3.3 and 4.3.4 (cf. page 73ff), for a more comprehensive discussion of reachability criteria that ensure finiteness. How-ever, due to its relevance for Finiteness(SPARQLLD(R)), we emphasize that using reachability criteria that ensure finiteness also guarantee finite query results:

Corollary 4.2. Let c be a reachability criterion that ensures finiteness. For any SPARQLLD(R) query QP,Sc under c-semantics and any Web of Linked Data W, query resultQP,Sc (W) is finite.

Proof. The corollary follows readily from Definition4.5 and Proposition 4.2.

Given Definition 4.5 and Corollary 4.2 we see that, for a SPARQLLD(R) query whose reachability criterion ensures finiteness, an LD machine can immediately answer the

1We shall discuss satisfiability of SPARQLLD(R)queries in Section4.4.1(cf. page77ff).

questions posed by FinitenessReachablePartand byFiniteness(SPARQLLD(R)). Thus, for both of these problems we have decision criteria that cover certain classes of queries. In general, however, both problems are undecidable for LD machines.

Theorem 4.1. FinitenessReachablePartand Finiteness(SPARQLLD(R))are not LD machine decidable.

Proof. We prove Theorem 4.1by reducing the halting problem to FinitenessReach-ablePart and toFiniteness(SPARQLLD(R)). While this proof resembles the proofs of Proposition 3.4 (cf. page 50) and Theorem 3.3 (cf. page 58), we need to use a Web of Linked DataWTMs3 that differs from Webs WTMs and WTMs2 (used in the aforemen-tioned proofs). Although WTMs3 also describes all possible computations of all Turing machines, this description differs from the descriptions in WTMs and WTMs2.

We use the same symbols as in the aforementioned proofs: W denotes the countably infinite set of all words that describe Turing machines. For all w∈ W, M(w) denotes the machine described by word w; cw,x denotes the computation of machine M(w) on inputx; URIuw,xi ∈ U identifies thei-th step in computationcw,x. The countably infinite set of all these identifiers is denoted byUTMsteps.

We now defineWTMs3 as a Web of Linked Data (DTMs3, dataTMs3, adocTMs3) similar to the WebWTMsused for proving Proposition3.4: DTMs3and adocTMs3are the same is in WTMs. That is,DTMs3consists of|UTMsteps|different LD documents, each of which corre-sponds to one of the URIs inUTMsteps. MappingadocTMs3maps each URIuw,xi ∈ UTMsteps to the corresponding LD document dw,xiDTMs3. MappingdataTMs3 forWTMs3 is dif-ferent from the corresponding mapping forWTMs: The setdataTMs3 dw,xi of RDF triples for an LD document dw,xi is empty if computationcw,x halts with the i-th computation step. Otherwise, dataTMs3 dw,xi contains a single RDF triple (uw,xi ,next, uw,xi+1) which as-sociates the computation step identified by URIuw,xi with the next step incw,x(next∈ U denotes a URI for this relationship). Formally:

dataTMs3 dw,xi :=

(∅ if computation cw,x halts with the i-th step, (uw,xi ,next, uw,xi+1) else.

MappingsadocTMs3 and dataTMs3 are Turing computable (by simulation).

For the reduction we use mapping f which is defined as follows: Let (w, x) be an input to the halting problem and let ?a,?b ∈ V be two distinct query variables, then f(w, x) = WTMs3,QPcMatchw,x,Sw,xwhereSw,x=uw,x1 andPw,x= (?a,next,?b). Given that cMatch and WTMs3 are independent of (w, x), it can be easily seen that f is computable by Turing machines (including LD machines).

Before we present the reduction we highlight a property of WTMs3 that is important for our proof. Any RDF triple of the form (uw,xi ,next, uw,xi+1) establishes a data link from LD document dw,xi to LD document dw,xi+1. Based on such links we may reach all LD documents about all steps in a particular computation of any Turing machine (given the corresponding w and x). Hence, for each possible computation cw,x of any Turing machineM(w) we have a (potentially infinite) simple path (dw,x1 , ... , dw,xi , ...) in the link graph ofWTMs3. Each of these paths is finite if and only if the corresponding computation

4.3. Reachability Criteria

Im Dokument Querying a Web of Linked Data (Seite 76-81)