E Proofs for Limits of Polynomial Time Learnability

To show Lemma 7, we first show Lemma 22, which uses Lemmas 20 and 21 from [12].

We also require the following lemma from [11], which characterizes concept inclusions entailed by acyclicELTBoxes.

Lemma 19 ([11]).LetT be an acyclicELTBox,ra role name andDanELconcept expression. Suppose that T |= d

1≤i≤nAi ud

1≤j≤m∃rj.Cj v D, whereAi are concept names for1 ≤ i ≤n,Cj areELconcept expressions for1 ≤j ≤m, and m, n≥0, then

– ifDis a concept name such thatT does not contain an inclusionD≡C, for some concept expressionC, then there existsAi,1≤i≤n, such thatT |=AivD;

– ifD is of the form ∃r.D⁰ then either (i) there existsA_i,1 ≤ i ≤ n, such that T |= Ai v ∃r.D⁰ or (ii) there exists rj,1 ≤ j ≤ m, such that rj = r and T |=CjvD⁰.

Lemma 20. LetB=F₁u...uF_n, whereF_i ∈ {Ei,E¯_i}. For any0 ≤m≤n, any sequence of role namesσ = σ¹. . . σ^m, anyL = (σ₁, . . . ,σ_n) ∈ L_n and anyEL concept expressionCoverΣ_n, ifT_L^B|=Cv ∃σ.Bthen either:

1. m= n,σ = σ_i, for some1 ≤ i ≤nandCis of the formAuC⁰,A_iuC⁰or B_iuC⁰, for someELconcept expressionC⁰; or

2. |=Cv ∃σ.B.

Proof. We prove the proposition by induction onm. Since for allFi occurring inB, T_L^Bdoes not contain an inclusionFi≡C, whereCis anELconcept expression, by Lemma 19, there is a concept nameZsuch thatT_L^B|=ZvFi. Then, form= 0,Cis of the formZuC⁰, whereZis a concept name,C⁰is anELconcept expression and T_L^B|=ZvFi. This is only possible ifZisFiitself. As this holds for allFi, we have that|=CvB.

Form >0. By Lemma 19 we have one of the following two cases:

– Cis of the formZuC⁰, for some concept nameZand someELconcept expression C⁰such thatT_L^B|=Zv ∃σ.B. It is easy to see that this is only possible ifm=n, σ=σ_iandZis one ofA,A_iorB_i.

– Cis of the form∃σ¹.C⁰uC⁰⁰for some concept expressionsC⁰andC⁰⁰such that T_L^B|=C⁰ v ∃σ².· · · ∃σ^m.B. By induction hypothesis,|=C⁰ v ∃σ².· · · ∃σ^m.B.

But then|=Cv ∃σ.B.

Lemma 21 ([12]).For any acyclicELTBoxT, any inclusionA vC ∈ T and any concept expression of the form∃t.Dwe haveT |=Av ∃t.Dif, and only if,T |=Cv

∃t.D.

We are now ready for Lemma 22.

Lemma 22. For allELconcept inclusionsCvDoverΣnwhereBis not a subconcept ofC:

– eitherT_L^B|=CvDfor everyL∈L_nor

– the number ofL∈Lnsuch thatT_L^B|=CvDdoes not exceed the size ofD.

Proof. To prove this lemma we argue by induction on the structure ofDand show the following.

Claim 1For allELconcept inclusions C v D overΣn where B ∈ Bn is not a subconcept ofC, if there isL∈LnandB∈Bnsuch thatT_L^B|=CvDthen:

– eitherT_L^B|=CvDfor everyL∈L_nand everyB∈B_nor

– for eachL ∈ Ln such that T_L^B |= C v D there isσ inL and a sequence of rolest₁, . . . , t_m,m≥0, such that|=D v ∃t1.· · · ∃tm.∃σ.>, wheret_j ∈ {r, s}, 1≤j≤m.

We assume throughout the proof that in all casesBis not a subconcept ofCand that there exists someL₀∈L_nsuch thatT_L^B

0 |=CvD.

Base case:Dis a concept name. We make the following case distinction.

– Dis one ofXi,Ai,Bi,EiorE¯ifor1≤i≤n. By Lemma 19,Cis of the form ZuC⁰, for some concept nameZ, andT_L^B₀ |=ZvD. IfDis one ofXi,Ai,Bi, E_iorE¯_i, then this can only be the case ifZ =D. But then for everyL∈Lnwe haveT_L^B|=CvD.

– DisX0. By Lemma 19,Cis of the formZuC⁰, for some concept nameZ, and T_L^B₀ |= Z v X0. This is the case if either Z = X0, orZ is one of A,Ai, Bi, 1≤i≤n. In either case, for everyL∈L_nwe haveT_L^B|=CvX₀.

– DisA. IfCis of the formAuC⁰or, for alli,1≤i≤n,AiorBiis a conjunct of C, then for everyL∈Lnwe haveT_L^B|=CvA. Assume now thatCis not of this form. Then for somejsuch that1≤j ≤n,Cis neither of the formAuC⁰nor of the formA_juC⁰nor of the formB_juC⁰. LetL= (σ₁, . . . ,σ_n)∈L_nbe such thatT_L^B|=CvA. Notice thatT_L^B|=CvA, forL= (σ₁, . . . ,σ_n)∈L_n, if, and only if,T_L^B|=CvX₀u ∃σ₁.Bu · · · u ∃σ_n.B. By Lemma 20, for such aT_L^Bwe must have|=Cv ∃σ_j.B, but then this is not possible asBis not a subconcept of C.

Thus ifDis a concept name then either for everyL∈Lnwe haveT_L^B|=CvDor there exists noL∈Lnsuch thatT_L^B|=CvD, whereBis not a subconcept ofC.

Induction step. IfD=D1uD2, thenT_L^B|=CvDif, and only if,T_L^B|=CvDi, i∈ {1,2}. So the lemma follows from the induction hypothesis.

ForD = ∃t.D⁰, suppose that there isL ∈ L_n such thatT_L^B |= C v D. Then, by Lemma 19, either (i) there exists a conjunct Z of C, Z a concept name, such that T_L^B |= Z v ∃t.D⁰ or (ii) there exists a conjunct∃t.C⁰ ofC withT_L^B |= C⁰ v D⁰. Consider cases (i) and (ii).

(i) LetZ be a conjunct ofCsuch thatZis a concept name andT_L^B|=Z v ∃t.D⁰. Notice thatZcannot beEiorE¯ias for noL∈Lnwe haveT_L^B|=Eiv ∃t.D⁰or T_L^B|= ¯Ei v ∃t.D⁰. Consider the remaining possibilities.

• Z is one of X_i,0 ≤ i ≤ n. It is easy to see that for L, L⁰ ∈ L_n we have T_L^B|=X_iv ∃t.D⁰if, and only ifT_L^B0 |=X_iv ∃t.D⁰. Thus, for everyL∈L_n we haveT_L^B|=Zv ∃t.D⁰.

• ZisA. Suppose that for someL= (σ₁, . . . ,σ_n)∈L_n we haveT_L^B|=Av

(ii) Let∃t.C⁰ be a conjunct ofC withT_L^B |= C⁰ v D⁰. The induction hypothesis implies that either (a) for everyL∈Lnwe have thatT_L^B|=C⁰ vD⁰or (b) for each L∈L_nsuch thatT_L^B|=C⁰vD⁰there isσinLand a sequence of rolest₁, . . . , t_m, m≥0, such that|=D⁰ v ∃t1.· · · ∃tm.∃σ.>, wheret_j ∈ {r, s},1 ≤j ≤m. In case (a), we have that for everyL∈L_n,T_L^B|=Cv ∃t.D⁰. In case (b), if for each L∈L_nsuch thatT_L^B|=C⁰vD⁰there isσsuch that|=D⁰v ∃t₁. . . .∃t_m.∃σ.>

then same happens with∃t.D⁰(notice that for everyL∈Lnand everyB∈Bnwe have thatT_L^B|=C⁰vD⁰iffT_L^B|=∃t.C⁰ v ∃t.D⁰).

Before we proceed to the proof of Lemma 7, we need Lemma 23.

Definition 3. The unravellingA^uofAinto a (possibly infinite) tree is defined as:

– Ind(A^u)is the set of sequencesb0r0· · ·r_n−1bnwithb0, . . . , bn∈Ind(A), r0, . . . , rn−1∈NRandri(bi, bi+1)∈ A;

– for eachA(b)∈ Aandα=b₀r₀· · ·r_n−1·b_n ∈ Ind(A^u)withb_n =b, we have A(α)∈ A^u;

– for eachα=b0r0· · ·r_n−1bn∈Ind(A^u)withn >0, we have r_n−1(b0r0· · ·r_n−1b_n−1, α)∈ A^u.

Lemma 23. For any ABoxAandELconcept expressionDoverΣ_nthere is a concept expressionC_Asuch thata∈C_A^I^A and, for everyL∈L_nandB∈B_n:

(T_L^B,A)|=D(a) iff T_L^B|=C_AvD.

Proof. LetA^ube the unravelling ofA. LetT_L^Bbe a TBox for some arbitraryB∈Bn

andL∈Ln. By definition ofA^uwe have that(T_L^B,A)|=D(a)iff(T_L^B,A^u)|=D(a).

Denote asA^u,k_a the subtree ofA^uwhich is rooted ina∈Ind(A^u)and has depthk∈N. LetT_L^B⁰ be the result of removingX₀u ∃σ1.Bu · · · u ∃σn.BvAfromT_L^B. Then, (T_L^B⁰,A^u)|=D(a)iff(T_L^B⁰,A^u,|D|a )|=D(a). LetI_TB0

L ,A^ube the canonical model of T_L^B⁰andA^u. By definition ofT_L^B, one can make it a canonical model ofT_L^BandA^u by includingd∈A^I^T^L^B⁰^,Au wheneverd∈(X₀u ∃σ₁.Bu · · · u ∃σ_n.B)^I^T^L^B⁰^,Au. Then, (T_L^B,A^u) |= D(a)iff(T_L^B,A^u,|D|+na ) |= D(a). LetC_Abe the concept expression corresponding to the tree interpretation ofA^u,|D|+na rooted ina. We have that, for every L∈L_nandB∈B_n,(T_L^B,A)|=D(a)iffT_L^B|=C_AvD. o We can now proceed to the proof of Lemma 7. We say that anELconcept expres-sion C occursin an ABoxA if there existsa ∈ Ind(A)such thatA |= C(a). For a, b∈Ind(A), arole chainfromatobis a sequencea₀·t₀·...·t_n−1·a_nwitha₀=a, a_n=bandt_i(a_i, a_i+1)∈ A, where0≤i≤n−1andt_i∈ {r, s}.

Proof of Lemma 7.For any ABoxA, anyELconcept assertionD(a)overΣ_n, and any a∈Ind(A), if there isL∈L_nandB∈B_nsuch that(T_L^B∪ T^∗,A)|=D(a)then:

1. for alli,1≤i≤n,EiuE¯idoes not occur inA: first notice that in this case, for everyELconcept expressionCoverΣn,a∈Ind(A)andT_L^B∈S:

(T_L^B∪ T^∗,A)|=C(a) iff (T_L^B,A)|=C(a).

For anyAandELconcept expressionDoverΣ_n, by Lemma 23, there is a concept expressionC_Asuch thata∈C_A^I^Aand, for everyL∈LnandB∈Bn:

(T_L^B,A)|=D(a) iff T_L^B|=C_AvD.

If there is noB ∈ B_n such thatB occurs inAthen the Lemma follows from Corollary 22. Notice that although our construction ofC_Ais not polynomial, Corol-lary 22 does not impose any restriction in the size ofC_A. Otherwise, since for alli, 1≤i≤n,EiuE¯idoes not occur inA, we have that the number ofB∈Bnsuch thatBoccurs inAis linear in the size ofA. So the number ofB∈Bnsuch that (T_L^B∪ T^∗,A)|=D(a)does not exceed the size ofA.

2. there isi,1 ≤ i ≤ n, such thatE_iuE¯_i occurs inA: letE_iuE¯_iAbe the set of individualsb ∈ Ind(A)such that E_i uE¯_i(b) ∈ A. By construction ofT^∗, for every ABoxAand every ELconcept expression D overΣ_n we have that (T^∗,A)|=D(b), whereb∈EiuE¯iA. Then, in particular, for everyL∈Lnwe have that(T_L^B∪ T^∗,A) |= D(b). Fora ∈ Ind(A)\EiuE¯iA we make a case distinction:

– there is a role chain fromato someb ∈ E_iuE¯_iA: by definition ofT^∗, as (E_iuE¯_i) v A for every1≤i≤nand everyA∈Σ_n∩N_C, we have that (T^∗,A)|= (E1uE¯1)(b). Then, since{∃r.(E1uE¯1)v(E1uE¯1),∃s.(E1u E¯1)v(E1uE¯1)} ⊆ T^∗, we have that(T^∗,A)|= (E1uE¯1)(a). In this case, by the argument above, for everyL∈Lnand everyELconcept expressionD overΣn, we have that(T_L^B∪ T^∗,A)|=D(a).

– for all b ∈ EiuE¯_iA, there is no role chain from a to b: let A⁰ = A \ {Ei(b),E¯i(b) | b ∈ EiuE¯_iA}. Since in this case, for all b ∈ EiuE¯_iA, there is no role chain fromatob, we have that, for everyELconcept expression D,A |=D(a)iffA⁰ |=D(a). By definition ofA⁰,E_iuE¯_idoes not occur in A⁰, then the lemma follows as in Case 1.

o The next lemma from [12] prepares the proof of Lemma 8.

Lemma 24 ([12]).For any 0 ≤ i ≤ nandΣn-concept D, if T0 6|= Xi v D then there exists a sequence of role names t1, . . . tlsuch that |= D v ∃t1.· · · ∃tl.Y and T06|=Xiv ∃t1.· · · ∃tl.Y, whereY is either>or a concept name,0≤l≤n−i+ 1.

Proof of Lemma 8.For anyn >1and anyELTBoxHinΣ_nwith|H|<2ⁿ, there ex-ists an ABoxA, an individuala∈Ind(A)and anELconcept expressionDoverΣ_nsuch that (i) the size ofAplus the size ofDdoes not exceed6nand (ii) if(H,A)|=D(a) then(T_L^B,A)|=D(a)for at most oneL∈L_n and if(H,A)6|=D(a)then for every L∈L_nwe have(T_L^B∪ T^∗,A)|=D(a).

Proof. AsT_L^B∪ T^∗|=Cv Diff(T_L^B∪ T^∗,AC,a)|=D(a), whereAC,ais an ABox with canonical model isomorphic to the tree interpretation ofCwith rootρCmapped to a∈Ind(A), to prove this lemma we show the following claim.

We define an exponentially large TBoxT_∩and use it to prove that one can select anEL concept inclusionC vDin such a way that eitherH |=C vDandT_∩ 6|=CvD, or vice versa. Then, the oracle can return(AC,a, D(a))as a counterexample, where AC,ais the tree shaped ABox corresponding to theELconcept expressionCrooted in a∈Ind(A).

To defineT_∩, for any sequenceb =b₁. . . b_n, where everyb_iis either0or1, we denote byC_bthe conjunctiond

i≤nC_i, whereC_i =A_iifb_i= 1andC_i=B_iifb_i = 0.

Then we define

T∩=T0∪ {CbvAuX0|b∈ {0,1}ⁿ}.

LetAC_bbe the ABox corresponding to a concept expressionCb, as defined above. Since, for alli,1≤i≤n,EiuE¯idoes not occur inAC_b, we have that(T_L^B∪ T^∗,AC_b)|= C(a) iff (T_L^B,AC_b)|=C(a). Then, in the following we only considerT_L^B. Consider the possibilities forHandT_∩.

(1) IfH 6|=T∩then there exists an inclusionCvD∈ T∩such thatH 6|=CvD.

Clearly,CvDis entailed byT_L^B, for everyL∈L_n, and the size ofCvDdoes not exceed6n, soCvDis as required.

(2) Suppose that for someb∈ {0,1}ⁿand a concept expression of the form∃t.D⁰ we haveH |= Cb v ∃t.D⁰ andT∩ 6|= Cb v ∃t.D⁰. To ‘minimise’Cb v ∃t.D⁰, notice thatT0 6|= X0 v ∃t.D⁰. Then, by Lemma 24, there exists a sequence of role names t₁, . . . , t_l, for0 ≤ l ≤ n+ 1andY being>or a concept name such that

|=∃t.D⁰ v ∃t1.· · · ∃tl.Y, soH |=C_bv ∃t1.· · · ∃tl.Y, andT06|=X₀v ∃t1.· · · ∃tl.Y. Clearly, the size ofC_b v ∃t₁.· · · ∃t_l.Y does not exceed6n. It remains to prove that T_L^B|=C_bv ∃t₁· · · ∃t_l.Y for at most oneL∈L_n.

Suppose for someL ∈ Lnwe haveT_L^B |= Cb v ∃t1.· · · ∃tl.Y. By Lemma 19, there isAjorBjsuch thatT_L^B|=Ajv ∃t1.· · · ∃tl.Y (orT_L^B|=Bj v ∃t1.· · · ∃tl.Y, respectively). AsT06|=X0 v ∃t1.· · · ∃tl.Y it is easy to see that this is only possible whenl=n,(t₁, t₂, . . . , t_n) =σ_j, andY is implied byB. Since everyσ_jis unique, for everyL⁰∈L_nsuch thatL⁰ 6=Lwe haveT_L^B0 6|=C_bv ∃σj.Y.

Thus,Cbv ∃t1.· · · ∃tl.Y is as required.

(3) Finally, suppose that Case 1 and 2 above do not apply. ThenH |= T_∩ and for everyb∈ {0,1}ⁿ and everyELconcept expression overΣ_n of the form∃t.D⁰: if H |= C_b v ∃t.D⁰ thenT₀ |= X₀ v ∃t.D⁰. We show that unless there exists an inclusionCvDsatisfying the conditions of the lemma,Hcontains at least2ⁿdifferent inclusions. Thus, we have derived a contradiction.

Fixb∈ {0,1}ⁿ. AsH |=T∩we haveH |=C_b vA. Then there must exist an (at least one) inclusionC vAuD ∈ Hsuch thatH |= C_b vC and6|= C v A. Let C =Z₁u · · · uZ_mu ∃t1.C₁⁰ u · · · u ∃tl.C_l⁰, whereZ₁,. . . ,Z_mare different concept names. AsH |=C_bv ∃t_j.C_j⁰we haveT₀|=X₀v ∃t_j.C_j⁰, forj= 1, . . . l. AsH |=T_∩ we haveH |=X₀v ∃t_j.C_j⁰, forj= 1, . . . l. SoH |=Z₁u · · · uZ_muX₀vA.

Suppose that for somei: 1≤i≤nthere exists noj : 1≤j ≤msuch thatZj

is eitherAiorBi. Then we haveT_L^B6|=Z1u · · · uZmuX0 vA, for anyL∈ Ln. Notice that in the worst caseZ1u · · · uZmcontains the conjunction of allΣn-concept names, exceptA_i,B_i, so the size ofZ₁u · · · uZ_muX₀vAdoes not exceed6n, and Z₁u · · · uZ_muX₀vAis as required.

Assume thatZ0u · · · uZmuX0contains a conjunctBisuch thatbi 6= 0. Then H |=CbvBiand for noL∈Lnwe haveT_L^B|=CbvBi. The size ofCbvBidoes not exceed6n, so it is as required.

Assume thatZ0u · · · uZmuX0contains a conjunctAi such thatbi 6= 1. Then H |=CbvAiand for noL∈Lnwe haveT_L^B|=CbvAi. The size ofCbvAidoes not exceed6n, so it is as required.

The only remaining option is thatZ₁u · · · uZ_muX₀contains exactly theA_iwith b_i= 1and exactly theB_iwithb_i = 0.

This argument applies to arbitraryb∈ {0,1}ⁿ. Thus if there exists no inclusion CvDsatisfying the conditions of the lemma thenHcontains at least2ⁿinclusions.

o Proof of Theorem 5.TheELdata retrieval framework is not polynomially exact learn-able.

Proof. Assume that TBoxes are polynomial time learnable in the open data model. Then there exists a learning algorithm whose running time is bounded at any stage by a polynomialp(n, m). Choosensuch thatb2ⁿ/nc>(p(n,6n))²and letS1=Lnand S₂=Bn. We follow Angluin’s strategy of removing elements fromS₁andS₂in such a way that the learner cannot distinguish between any of the remainingT_L^BTBoxes encoded byL∈S₁andB∈S₂. The strategy is as follows.

Given an membership query(T_L^B∪T^∗,A)|=D(a), withA |=C(a), ifT_L^B∪T^∗|= C vDfor everyL∈L_nand everyB∈B_n, then the answer is ‘yes’; otherwise the answer is ‘no’ and allL∈L_nandB∈B_nwithT_L^B∪ T^∗|=CvDare removed from S1andS2, respectively. By Lemma 7, at most the size ofDelements can be removed fromS1or at most the size ofAelements can be removed fromS2. Given an equivalence query withH, the answer is ‘no’ and a concept inclusionCvDnot entailed byHsuch thatT_L^B0 ∪ T^∗ |=C vDfor at most oneL⁰ ∈Ln is guaranteed by Lemma 8. Then a counterexample(T,A)|= D(a)withA |=C(a)and bounded by6nis produced (consider the size of a query or a counterexample(T,A)|=D(a)as being the size ofA plus the size of concept expressionD).

As all counterexamples produced are bounded by6n, the overall running time of the algorithm is bounded byp(n,6n). Hence, the learner asks no more thanp(n,6n)queries and the size of every query does not exceedp(n,6n). By Lemmas 7 and 8, at most (p(n,6n))²elements are removed fromS₁andS₂during the run of the algorithm. But then the algorithm cannot distinguish between any TBoxesT_L^BandT_L^B0⁰forL6=L⁰∈S₁ andB6=B⁰∈S₂based on the given answers and we have derived a contradiction.

Im Dokument Exact Learning Description Logic Ontologies from Data Retrieval Examples (Seite 27-33)