Result Containment and Infiniteness - Full-Web Query Semantics 33

I. Foundations of Queries over a Web of Linked Data 13

3. Full-Web Query Semantics 33

4.2. Result Containment and Infiniteness

Definition4.4defines precisely what the sound and complete result of any SPARQL_LD(R) query over any Web of Linked Data W is. However, in contrast to SPARQL_LD (as discussed in Chapter 3), there is no guarantee that such a (complete) SPARQL_LD(R) query result is complete w.r.t. all data inW since the corresponding (S, c, P)-reachable subweb of W may not cover W as a whole. We emphasize that such an incomplete coverage is even possible for the reachability criterionc_All because the link graph ofW

4.2. Result Containment and Infiniteness may not be connected; therefore, thec_All-semantics differs from the full-Web semantics.

The following proposition relates the result of any SPARQL_LD(R) query to the result of its SPARQL_LD counterpart.

Proposition 4.1. Let Q^P,S_c be a SPARQL_LD(R) query; letQ^P be the SPARQL_LD query that uses the same SPARQL expression P as used by Q^P,S_c ; let W be a Web of Linked Data. Then, the following two properties hold:

1. Q^P,S_c (W) =Q^P(R) with R being the (S, c, P)-reachable subweb of W.

2. If Q^P is monotonic, thenQ^P,S_c (W)⊆ Q^P(W).

Proof. We first prove Property 1: By Definition 4.4, Q^P,S_c (W) = [[P]]_AllData(R) (cf.

page 63) and, by Definition 3.1, Q^P(R) = [[P]]_AllData(R) (cf. page 42). Hence, we have Q^P,S_c (W) =Q^P(R), as stated.

We now focus on Property 2: Suppose SPARQL_LD query Q^P is monotonic. By Def-inition 4.3 (cf. page 63), R is an induced subweb of W. Therefore, by Proposition 2.1 (cf. page 21), AllData(R) ⊆AllData(W) holds. Then, Q^P(R) ⊆ Q^P(W) because Q^P is monotonic. Using the previously shown Property1, we concludeQ^P,S_c (W)⊆ Q^P(W).

Since the result of any SPARQL_LD query over a finite Web of Linked Data is finite, we may use Proposition 4.1(Property1) to show the same for SPARQL_LD(R) queries:

Corollary 4.1. The result of any SPARQL_LD(R) queryQ^P,S_c over a finite Web of Linked DataW is finite, and so is the (S, c, P)-reachable subweb of W.

Proof. LetW = (D, data, adoc), and letR = (D_R, data_R, adoc_R) be the (S, c, P )-reach-able subweb of W. We first show finiteness for R: By Definition 4.3, R is an induced subweb ofW (cf. page63) and, thus, by Definition2.4, we have D_R ⊆D (cf. page21).

Then, using the finiteness ofD,D_R is finite and henceR is finite.

Given the finiteness of R, the finiteness of Q^P,S_c (W) follows directly from (i) Propo-sition 4.1, Property 1, and (ii) Proposition 3.5 (which shows that the result of any SPARQL_LD query over a finite Web of Linked Data is finite; cf. page 57).

Corollary 4.1 focuses on a finite Web of Linked Data. Now, we study the implications of querying an infinite Web of Linked Data (using reachability-based query semantics).

We first take a look at some example queries:

Example 4.3. For the example we assume the same infinite Web of Linked DataW_inf as used in Example3.4(cf. page 58). We recall that Winf = (Dinf, datainf, adocinf) contains an LD documentd_i ∈D_inf for every integeri∈Z, that is, adoc_inf(no_i) = d_i where URI

no_i ∈ U identifies integer i. The data of each of these documents consists of two RDF triples that refer to the predecessor and to the successor of the corresponding integer:

data_inf(d_i) =(no_i,pred,no_i−1),(no_i,succ,no_i+1) for alli∈Z. Furthermore, as a basis for two SPARQL_LD queries, Example 3.4 uses two triple patterns: tp₁ = (no₀,succ,?v) and tp2 = (?x,succ,?y).

We now revisit this example in the context of reachability-based query semantics. We consider the aforementioned reachability criteria c_All, c_Match, and c_None (cf. page 62 in Section4.1) and use URI no₀ as seed URI; i.e., S ={no₀}.

First, we focus on triple pattern tp₁: If we assume adoc_inf(pred) = adoc_inf(succ) =⊥, then the (S, cAll, tp1)-reachable subweb of Winf consists of the LD documents for all integers and, thus, is infinite. In contrast, the corresponding reachable subwebs for c_Match and c_None are finite: The (S, c_Match, tp₁)-reachable subweb of W_inf consists of LD documents d0 and d1, whereas the (S, cNone, tp1)-reachable subweb ofWinf consists only of d₀. Irrespective of these differences, the query result is the same in all three cases:

Q^tp_c¹^,S

All (W_inf) =Q^tp_c¹^,S

Match(W_inf) =Q^tp_c¹^,S

None(W_inf) ={?v→no₁} .

We now consider triple patterntp₂: Underc_None-semantics the query result is the same as in the case oftp1 because the (S, cNone, tp2)-reachable subweb of Winf consists only of LD document d₀ (as before). For c_All and c_Match the reachable subwebs are infinite but different: The (S, c_All, tp₁)-reachable subweb ofW_inf consists, again, of the LD documents for all integers, whereas the (S, cMatch, tp1)-reachable subweb of Winf consists of the LD documentsd₀, d₁, d₂, .... The query results for both criteria are also infinite and different from each other: Q^tp_c²^,S

All (W_inf) = ... ,{?x →no_-1,?y → no₀},{?x → no₀,?y → no₁}, ...

and Q^tp_c²^,S

Match(Winf) ={?x→no₀,?y→no₁},{?x→no₁,?y→no₂}, ... ⊂ Q^tp_c²^,S

All (Winf). 2 The example illustrates that, for the case of aninfinite Web of Linked Data, the results of SPARQL_LD(R) queries may be either finite or infinite. In Example 3.4 we found the same heterogeneity for SPARQL_LD queries (cf. page 58). However, for SPARQL_LD(R) we may identify dependencies between query results and the corresponding reachable subwebs of the queried Web:

Proposition 4.2. Let S ⊆ U be a finite set of URIs; let cbe a reachability criterion; let P be a SPARQL expression; let W be a (potentially infinite) Web of Linked Data, and let R denote the (S, c, P)-reachable subweb of W. Then, the following properties hold:

1. If R is finite, then Q^P,S_c (W) is finite.

2. If Q^P,S_c (W) is infinite, thenR is infinite.

3. If c is c_None, then R is finite, and so is Q^P,S_c

None(W).

Proof. Property 1: By Proposition 4.1 (Property 1), it holds that Q^P,S_c (W) =Q^P(R) where Q^P is the SPARQL_LD query that uses the same SPARQL expression as used by Q^P,S_c . Since the result of any SPARQL_LDquery over a finite Web of Linked Data is finite (as shown in Proposition3.5, page57), Property1 follows immediately.

Property 2: SupposeQ^P,S_c (W) is infinite. We use proof by contradiction, that is, we assumeR is finite. Then, by Property1,Q^P,S_c (W) is also finite, a contradiction. Hence, R must be infinite.

Property3: LetW = (D, data, adoc) andR= (DR, dataR, adocR). SupposeciscNone. Since c_None always returns false, it is easily verified that there does not exist an LD documentd∈Dthat satisfies Case2in Definition4.2(cf. page62). Hence,D_Rcontains the seed documents only, that is, DR = d ∈ Du ∈ S and adoc(u) = d (cf. Case 1 in Definition 4.2). Since S is finite, D_R is finite, and so is R. Then, the finiteness of Q^P,S_c

None(W) follows by Property 1.

4.2. Result Containment and Infiniteness Proposition4.2provides valuable insight into the dependencies between the (in)finiteness of reachable subwebs of an infinite Web and the (in)finiteness of query results. In prac-tice, however, we are primarily interested in answering the following questions: Does the execution of a given SPARQL_LD(R)query reach an infinite number of LD documents? Do we have to expect an infinite query result? We formalize these questions as the following LD decision problems and discuss them in the remainder of this section.

LD Problem: FinitenessReachablePart

Web Input: a (potentially infinite) Web of Linked DataW Ordin. Input: a SPARQL_LD(R) queryQ^P,S_c

Question: Is the (S, c, P)-reachable subweb ofW finite?

LD Problem: Finiteness(SPARQL_LD(R))

Web Input: a (potentially infinite) Web of Linked DataW Ordin. Input: a SPARQL_LD(R) queryQ^P,S_c

Question: Is query result Q^P,S_c (W) finite?

As in the case of Finiteness(SPARQLLD), discussed on page 58, an LD machine can trivially decide Finiteness(SPARQLLD(R)) for unsatisfiable¹ SPARQL_LD(R) queries.

In contrast, the satisfiability property of queries is irrelevant for FinitenessReach-ablePart: The reachable subweb of a queried Web of Linked Data may be infinite regardless of whether the corresponding SPARQL_LD(R) query is unsatisfiable or satisfi-able. Nonetheless, for a particular class of SPARQL_LD(R) queries we can rule out the existence of infinitely large reachable subwebs. This class comprises all queries that use a reachability criterion that ensures the finiteness of reachable subwebs in any possible case by definition. We define such property of reachability criteria as follows:

Definition 4.5 (Ensuring Finiteness). A reachability criterion c ensures finiteness if, for any Web of Linked Data W, any (finite) set S ⊆ U of URIs, and any SPARQL expression P, the (S, c, P)-reachable subweb ofW is finite. 2 From the reachability criteria discussed so far, only c_Noneensures finiteness (see Propo-sition 4.2); this property does not hold for cAll and cMatch (as shown by Example 4.3).

We refer to the next section, in particular, Subsections 4.3.3 and 4.3.4 (cf. page 73ff), for a more comprehensive discussion of reachability criteria that ensure finiteness. How-ever, due to its relevance for Finiteness(SPARQL_LD(R)), we emphasize that using reachability criteria that ensure finiteness also guarantee finite query results:

Corollary 4.2. Let c be a reachability criterion that ensures finiteness. For any SPARQL_LD(R) query Q^P,S_c under c-semantics and any Web of Linked Data W, query resultQ^P,S_c (W) is finite.

Proof. The corollary follows readily from Definition4.5 and Proposition 4.2.

Given Definition 4.5 and Corollary 4.2 we see that, for a SPARQL_LD(R) query whose reachability criterion ensures finiteness, an LD machine can immediately answer the

1We shall discuss satisfiability of SPARQL_LD(R)queries in Section4.4.1(cf. page77ff).

questions posed by FinitenessReachablePartand byFiniteness(SPARQL_LD(R)). Thus, for both of these problems we have decision criteria that cover certain classes of queries. In general, however, both problems are undecidable for LD machines.

Theorem 4.1. FinitenessReachablePartand Finiteness(SPARQLLD(R))are not LD machine decidable.

Proof. We prove Theorem 4.1by reducing the halting problem to FinitenessReach-ablePart and toFiniteness(SPARQL_LD(R)). While this proof resembles the proofs of Proposition 3.4 (cf. page 50) and Theorem 3.3 (cf. page 58), we need to use a Web of Linked DataW_TMs3 that differs from Webs W_TMs and W_TMs2 (used in the aforemen-tioned proofs). Although W_TMs3 also describes all possible computations of all Turing machines, this description differs from the descriptions in WTMs and WTMs2.

We use the same symbols as in the aforementioned proofs: W denotes the countably infinite set of all words that describe Turing machines. For all w∈ W, M(w) denotes the machine described by word w; c^w,x denotes the computation of machine M(w) on inputx; URIu^w,x_i ∈ U identifies thei-th step in computationc^w,x. The countably infinite set of all these identifiers is denoted byU_TMsteps.

We now defineW_TMs3 as a Web of Linked Data (D_TMs3, data_TMs3, adoc_TMs3) similar to the WebW_TMsused for proving Proposition3.4: D_TMs3and adoc_TMs3are the same is in W_TMs. That is,D_TMs3consists of|U_TMsteps|different LD documents, each of which corre-sponds to one of the URIs inU_TMsteps. Mappingadoc_TMs3maps each URIu^w,x_i ∈ U_TMsteps to the corresponding LD document d^w,x_i ∈D_TMs3. Mappingdata_TMs3 forW_TMs3 is dif-ferent from the corresponding mapping forW_TMs: The setdata_TMs3 d^w,x_i of RDF triples for an LD document d^w,x_i is empty if computationc^w,x halts with the i-th computation step. Otherwise, dataTMs3 d^w,x_i contains a single RDF triple (u^w,x_i ,next, u^w,x_i+1) which as-sociates the computation step identified by URIu^w,x_i with the next step inc^w,x(next∈ U denotes a URI for this relationship). Formally:

data_TMs3 d^w,x_i :=

(∅ if computation c^w,x halts with the i-th step, (u^w,x_i ,next, u^w,x_i+1) else.

Mappingsadoc_TMs3 and data_TMs3 are Turing computable (by simulation).

For the reduction we use mapping f which is defined as follows: Let (w, x) be an input to the halting problem and let ?a,?b ∈ V be two distinct query variables, then f(w, x) = W_TMs3,Q^P_c_Match^w,x^,S^w,xwhereS_w,x=u^w,x₁ andP_w,x= (?a,next,?b). Given that cMatch and WTMs3 are independent of (w, x), it can be easily seen that f is computable by Turing machines (including LD machines).

Before we present the reduction we highlight a property of W_TMs3 that is important for our proof. Any RDF triple of the form (u^w,x_i ,next, u^w,x_i+1) establishes a data link from LD document d^w,x_i to LD document d^w,x_i+1. Based on such links we may reach all LD documents about all steps in a particular computation of any Turing machine (given the corresponding w and x). Hence, for each possible computation c^w,x of any Turing machineM(w) we have a (potentially infinite) simple path (d^w,x₁ , ... , d^w,x_i , ...) in the link graph ofW_TMs3. Each of these paths is finite if and only if the corresponding computation

4.3. Reachability Criteria

Im Dokument Querying a Web of Linked Data (Seite 76-81)