• Keine Ergebnisse gefunden

Exact Learning Description Logic Ontologies from Data Retrieval Examples

N/A
N/A
Protected

Academic year: 2022

Aktie "Exact Learning Description Logic Ontologies from Data Retrieval Examples"

Copied!
33
0
0

Wird geladen.... (Jetzt Volltext ansehen)

Volltext

(1)

Exact Learning Description Logic Ontologies from Data Retrieval Examples

Boris Konev1, Ana Ozaki1, and Frank Wolter1 Department of Computer Science, University of Liverpool, UK

Abstract. We investigate the complexity of learning description logic ontologies in Angluin et al.’s framework of exact learning via queries posed to an oracle. We consider membership queries of the form “is individualaa certain answer to a data retrieval queryqof ABoxAand the target TBox?” and equivalence queries of the form “is a given TBox equivalent to the target TBox?”. We show that (i) DL-Lite TBoxes with role inclusions andELIconcept expressions on the right-hand side of inclusions and (ii)ELTBoxes without complex concept expressions on the right-hand side of inclusions can be learned in polynomial time. Both results are proved by a non-trivial reduction to learning from subsumption examples. We also show that arbitraryELTBoxes cannot be learned in polynomial time.

1 Introduction

Building an ontology is prone to errors, time consuming, and costly. A large number of different research communities have addressed this problem by, for example, supplying tool support for editing ontologies [15, 4, 9], developing reasoning support for debugging ontologies [18], supporting modular ontology design [17], and investigating automated ontology generation from data or text [8, 6, 5, 14]. One major problem when building an ontology is the fact that domain experts are rarely ontology engineering experts and that, conversely, ontology engineers are typically not familiar with the domain of the ontology.

An ontology building project therefore often relies on the successful communication between an ontology engineer (familiar with the semantics of ontology languages) and a domain expert (familiar with the domain of interest). In this paper, we consider a simple model of this communication process and analyse, within this model, the computational complexity of reaching a correct domain ontology. We assume that

– the domain expert knows the domain ontology and its vocabulary without being able to formalize or communicate this ontology;

– the domain expert is able to communicate the vocabulary of the ontology and shares it with the the ontology engineer. Thus, the domain expert and ontology engineer have a common understanding of the vocabulary of the ontology. The ontology engineer knows nothing else about the domain.

– the ontology engineer can pose queries to the domain expert which the domain expert answers truthfully. Assuming that the domain expert can interpret data in her area of expertise, the main queries posed by the ontology engineer are based on instance retrieval examples:

(2)

• assume a data instanceAand a queryq(x)are given. Is the individualaa certain answer to queryq(x)inAand the ontologyO?

In addition, we require a way for the ontology engineer to find out whether she has reconstructed the target ontology already and, if this is not the case, to request an example illustrating the incompleteness of the reconstruction. We abstract from defining a communication protocol for this, but assume for simplicity that the following query can be posed by the ontology engineer:

• Is this ontologyHcomplete? If not, return a data instanceA, a queryq(x), and an individualasuch thatais a certain answer toq(x)inAand the ontologyO and it is not a certain answer toq(x)inAand the ontologyH.

Given this model, our question is whether the ontology engineer can learn the target ontology O and which computational resources are required for this depending on the ontology language in which the ontologyOand the hypothesis ontologiesHare formulated. Our model obviously abstracts from a number of fundamental problems in building ontologies and communicating about them. In particular, it is not realistic to assume that the domain expert knows the domain ontology and its vocabulary (without being able to formalize it) as it is well known that finding an appropriate vocabulary for a domain of interest is a major problem in ontology design [8]. We make this assumption here in order to isolate the problem of communication about the logical relationships between known vocabulary items and its dependence on the ontology language within which the relationships can be formulated.

The model described above is an instance of Angluin et al.’s framework of exact learning via queries to an oracle [1]. The queries using instance retrieval examples can be regarded as membership queries posed by a learner to an oracle and the completeness query based on a hypothesisHcan be regarded as an equivalence query by the learner to the oracle. Formulated in Angluin’s terms we are thus interested in whether there exists a deterministic learning algorithm that poses membership and equivalence queries of the above form to an oracle and that learns an arbitrary ontology over a given ontology language in polynomial time. We consider polynomial learnability in three distinct DLs: we show that DL-Lite ontologies with role inclusions and arbitraryELI concepts on the right-hand side of concept inclusions can be learned in polynomial time if database queries in instance retrieval examples areELIinstance queries (or, equivalently, acyclic conjunctive queries). We call this DL DL-LiteRand note that it is the core of the web ontology language profile OWL2 QL. We also note thatwithoutcomplex ELI concepts on the right-hand side of concept inclusions, polynomial learnability would be trivial as only finitely many non-equivalent such TBoxes exist over a given vocabulary of concept and role names. The second DL we consider isELwhich is the logic underpinning the web ontology language profile OWL2 EL. We show thatEL TBoxes cannot be learned in polynomial time using the protocol above if the database queries in instance retrieval examples areELinstance queries. We then consider the fragment ELlhs ofELwithout complex concepts on the right-hand side of concept inclusions and prove that it can be learned in polynomial time using the above protocol with instance retrieval examples. The proofs of the positive learning results are by reduction to polynomial time learnability results for DL-LiteRandELlhsfor the case in whichconcept subsumptionsrather than instance retrieval examples as used in the

(3)

communication between the learner and the oracle [12]. Detailed proofs are provided in the full version at http://cgi.csc.liv.ac.uk/anaozaki/publ.html.

2 Preliminaries

LetNCbe a countably infinite set ofconcept namesandNRa countably infinite set of role names. The dialect DL-LiteRof DL-Lite is defined as follows [7]. Aroleis a role name or an inverse rolerwithr ∈NR. Arole inclusion (RI)is of the formr vs, whererandsare roles. Abasic conceptis either a concept name or of the form∃r.>, withra role. A DL-LiteR concept inclusion (CI) is of the formB vC, whereB is a basic concept expression andCis anELIconcept expression, that is,Cis formed according to the ruleC, D :=A | > |CuD | ∃r.C | ∃r.CwhereAranges over NCandrranges overNR. A DL-LiteRTBox is a finite set of DL-LiteR CIs and RIs.

As usual, anELconcept expressionis anELIconcept expression that does not use inverse roles, anELconcept inclusionhas the formCvDwithCandDELconcept expressions, and a(general)ELTBoxis a finite set ofELconcept inclusions [2]. We also consider the restrictionELlhsof generalELTBoxes where only concept names are allowed on the right-hand side of concept inclusions. Thesizeof a concept expressionC, denoted with|C|, is the length of the string that represents it, where concept names and role names are considered to be of length one. A TBoxsignatureis the set of concept and role names occurring in the TBox. Thesizeof a TBoxT, denoted with |T |, is P

CvD∈T|C|+|D|.

LetNIbe a countably infinite set ofindividual names. An ABoxAis a finite non- empty set containingconcept name assertionsA(a)androle assertionsr(a, b), where a, bare individuals inNI,Ais a concept name andris a role. The set of individuals that occur inAis denoted byInd(A). We say thatAis asingletonABox if it contains only one ABox assertion. Assertions of the formC(a)andr(a, b), wherea, b∈NI,Can ELIconcept expression, andr∈NR, are calledinstance assertions. Note that instance assertions of the formC(a)withC not a concept name norC =>do not occur in ABoxes. The semantics of description logic is defined as usual [3]. We writeI |=αto say that an inclusion or assertionαis true inI. An interpretationIis amodelof a KB (T,A)ifI |=αfor allα∈ T ∪ A.(T,A)|=αmeans thatI |=αfor all modelsIof (T,A).

Alearning frameworkFis a triple(X,L, µ), whereXis a set ofexamples(also called domain or instance space),Lis a set oflearning concepts1andµis a mapping fromLto2X. Thesubsumptionlearning frameworkFS, studied in [12], is defined as (XS,L, µS), whereLis the set of all TBoxes that are formulated in a given DL;XS is the set ofsubsumption examplesof the formCvD, whereC, Dare concept expressions of the DL under consideration; andµS(T)is defined as{CvD∈XS | T |=CvD}, for everyT ∈ L. It should be clear thatµS(T) =µS(T0)if, and only if, the TBoxesT andT0entail the same set of inclusions, that is, they are logically equivalent.

1In the learning literature (e.g., [1]), the term ‘learning concept’ is often defined as a set of examples. We do not distinguish between learning concepts and their representations and only consider representable learning concepts to emphasize on the task of identifying a TBox that is logically equivalent to the target TBox.

(4)

We study thedata retrievallearning frameworkFD defined as(XD,L, µD), where L is same as inFS; X is the set ofdata retrieval examplesof the form(A, D(a)), whereAis an ABox,D(a)is a concept assertion of the DL under consideration, and a ∈ Ind(A); andµ(T) = {(A, D(a)) ∈ XD | (T,A)|= D(a)}. As in the case of learning from subsumptions,µS(T) =µS(T0)if, and only if, the TBoxesT andT0are logically equivalent.

Given a learning frameworkF = (X,L, µ), we are interested in the exact identi- fication of atargetlearning conceptl ∈ Lby posing queries to oracles. LetMEMl,X

be the oracle that takes as input somex∈ X and returns ‘yes’ ifx∈ µ(l)and ‘no’

otherwise. We say thatxis apositive exampleforlifx∈µ(l)and anegative example forlifx6∈µ(l). Then amembership queryis a call to the oracleMEMl,X. Similarly, for everyl∈ L, we denote byEQl,X the oracle that takes as input ahypothesislearning concepth∈ Land returns ‘yes’, ifµ(h) =µ(l), or acounterexamplex∈µ(h)⊕µ(l) otherwise, where⊕denotes the symmetric set difference. Anequivalence queryis a call to the oracleEQl,X.

We say that a learning framework(X,L, µ)isexact learnableif there is an algorithm Asuch that for any targetl∈ Lthe algorithmAalways halts and outputsl0 ∈ Lsuch thatµ(l) =µ(l0)using membership and equivalence queries answered by the oracles MEMl,X andEQl,X, respectively. A learning framework (X,L, µ) ispolynomially exact learnable if it is exact learnable by an algorithmAsuch that at every step2 of computation the time used byAup to that step is bounded by a polynomialp(|l|,|x|), wherelis the target andx∈Xis the largest counterexample seen so far3. As argued in the introduction, for learning subsumption and data retrieval learning frameworks we additionally assume that the signature of the target TBox is always known to the learner.

An important class of learning algorithms—in particular, all algorithms presented in [12, 10, 16] fit in this class—always make equivalence queries with hypothesesh which are polynomial in the size ofland such thatµ(h)⊆µ(l), so that counterexamples returned by theEQl,X oracles are always positive. We say that such algorithms use positive bounded equivalence queries.

3 Polynomial Time Learnability

In this section we prove polynomial time exact learnability of the DL-LiteRandELlhs

data retrieval learning frameworks. These frameworks are instances of the general definition given above, where the concept expressionD in a data retrieval example (A, D(a))is anELIconcept expression in the DL-LiteRframework and anELconcept

expression in theELlhsframework, respectively.

The proof is by reduction to learning from subsumptions. We illustrate its idea for ELlhs. To learn a TBox from data retrieval examples we run a learning from subsumptions algorithm as a ‘black box’. Every time the learning from subsumptions algorithm makes a membership or an equivalence query we rewrite the query into the data setting and pass it on to the data retrieval oracle. The oracle’s answer, rewritten back to the subsumption

2We count each call to an oracle as one step of computation.

3We assume some natural notion of a length of an examplexand a learning conceptl, denoted

|x|and|l|, respectively.

(5)

A r,s

A

A ...

A s A

r s ...

A s A

r r s A

... A s A

r s ...

A s A

r r

r

Fig. 1:An ABoxA={r(a, a), s(a, a), A(a)}and its unravelling up to leveln.

setting, is given to the learning from subsumptions algorithm. When the learning from subsumptions algorithm terminates we return the learnt TBox. This reduction is made possible by the close relationship between data retrieval and subsumption examples. For every TBoxT and inclusionsC vD, one can interpret a concept expressionCas a labelled tree and encode this tree as an ABoxACwith rootρCsuch thatT |=CvD iff(T,AC)|=D(ρC).

Then, membership queries in the subsumption setting can be answered with the help of a data retrieval oracle due to the relation between subsumptions and instance queries described above. An inclusionC v D is a (positive) subsumption example for some target TBox T if, and only if, (AC, D(ρC))is a (positive) data retrieval example for the same targetT. To handle equivalence queries, we need to be able to rewrite data retrieval counterexamples returned by the data retrieval oracle into the subsumption setting. For every TBoxT and data retrieval query(A, D(a))one can construct a concept expressionCAsuch that(T,A)|=D(a)iffT |=CAvD. Such a concept expressionCAcan be obtained by unravellingAinto a tree-shaped ABox and representing it as a concept expression. This unravelling, however, can increase the ABox size exponentially. Thus, to obtain a polynomial bound on the running time of the learning process,CAvDcannot be simply returned as an answer to a subsumption equivalence query. For example, for a target TBoxT ={∃rn.AvB}and a hypothesis H = ∅ the data retrieval query (A, B(a)), whereA = {r(a, a), s(a, a), A(a)}, is a positive counterexample. The tree-shaped unravelling ofA up to levelnis a full binary tree of depthn, as shown in Fig. 1. On the other hand, the non-equivalence of T andHcan already be witnessed by(A0, B(a)), whereA0 ={r(a, a), A(a)}. The unravelling ofA0 up to levelnproduces a linear size ABox{r(a, a2), r(a2, a3), . . . , r(an−1, an), A(a), A(a2), . . . , A(an)}, corresponding to the left-most path in Fig. 1, which, in turn, is linear-size w.r.t. the target inclusion∃rn.A v B. Notice that A0 is obtained fromAby removing thes(a, a)edge and checking, using membership queries, whether(T,A0)|=qstill holds. In other words, one might need to ask further membership queries in order to rewrite answers to data retrieval equivalence queries given by the data retrieval oracle into the subsumption setting.

We address the need of rewriting counterexamples by introducing an abstract notion of reduction between different exact learning frameworks. To simplify notation, we assume that both learning frameworks use the same set of learning conceptsLand only consider positive bounded equivalence queries. This definition of reduction can be easily extended to arbitrary learning frameworks and arbitrary queries.

(6)

We say that a learning frameworkF = (X,L, µ)polynomially reduces toF0 = (X0,L, µ0)if for some polynomialsp1(·),p2(·)andp3(·,·)there exist a functionf : X0→X and a partial functiong:L × L ×X →X0, defined for every(l, h, x)such that|h|=p1(|l|),µ(h)⊆µ(l)andx∈X, for which the following conditions hold.

– For allx0∈X0we havex0∈µ0(l)if, and only if,f(x0)∈µ(l);

– For allx∈X we havex∈µ(l)\µ(h)if, and only if,g(l, h, x)∈µ0(l)\µ0(h);

– f(x0)is computable in timep2(|x0|);

– g(l, h, x)is computable in timep3(|l|,|x|)andlcan only be accessed by calls to the membership oracleMEMl,X.

As in the case of learning algorithms, we consider every call to the oracle as one step of computation. Notice also that even thoughgtakeshas input, the polynomial time bound on computingg(l, h, x)does not depend on the size ofhasgis only defined for hpolynomial in the size ofl.

Theorem 1. Let (X,L, µ)and(X0,L, µ0)be learning frameworks. If there exists a polynomial reduction from(X,L, µ)to(X0,L, µ0)and a polynomial learning algorithm for(X0,L, µ0)that uses membership queries and positive bounded equivalence queries then(X,L, µ)is polynomially exact learnable.

We use Theorem 1 to prove that DL-LiteR andELlhsTBoxes can be learned in polynomial time from data retrieval examples. We employ the following result:

Theorem 2 ([12]). The DL-LiteR and ELlhs subsumption learning frameworks are polynomial time exact learnable with membership and positive bounded equivalence queries.

As the existence off is guaranteed by the following lemma, in what follows we prove the existence ofgand establish the corresponding time bounds.

Lemma 1. LetL∈ {DL-LiteR,ELlhs}and letCvDbe anLconcept inclusion. Then (T,AC)|=D(ρC)if, and only if,T |=CvD.

Polynomial Reduction for DL-LiteR TBoxes We show for any target T and hy- pothesisHpolynomial in the size of T that Algorithm 1 transforms every positive counterexample in polynomial time to a positive counterexample with a singleton ABox (i.e., of the form{A(a)}or{r(a, b)}). Using the equivalences(T,{A(a)})|=C(a)iff T |=AvCand(T,{r(a, b)})|=C(a)iffT |=∃r.> vC, we then obtain a positive subsumption counterexample, sog(l, h, x)is computable in polynomial time.

Given a positive data retrieval counterexample(A, C(a)), Algorithm 1 exhaustively applies therole saturationandparent-child mergingrules introduced in [12]. We say that an instance assertionC(a)isrole saturatedfor(T,A)if(T,A)6|=C0(a)whenever C0is the result of replacing a rolerby some roles∈NR∩ΣT withT 6|=rvsand T |= s v r, whereΣT is the signature of the target TBoxT known to the learner.

To define parent/child merging, we identify eachELIconceptCwith a finite treeTC

whose nodes are labeled with concept names and edges are labeled with roles in the standard way. For example, ifC=∃t.(Au ∃r.∃r.∃r.B)u ∃s.>then Fig. 2a illustrates

(7)

Algorithm 1Reducing the positive counterexample

1: LetC(a)be an instance assertion such that(H,A)6|=C(a)and(T,A)|=C(a) 2: functionREDUCECOUNTEREXAMPLE(A,C(a))

3: Find a role saturated and parent/child mergedC(a)(membership queries) 4: ifC=C0u...uCnthen

5: FindCi,0≤i≤n, such that(H,A)6|=Ci(a)

6: C:=Ci

7: ifC=∃r.C0and there isr(a, b)∈ Asuch that(T,A)|=C0(b)then 8: REDUCECOUNTEREXAMPLE(A,C0(b))

9: else

10: Find a singletonA0⊆ Asuch that(T,A0)|=C(a)but 11: (H,A0)6|=C(a)(membership queries)

12: return(A0,C(a))

TC. Now, we say that an instance assertionC(a)isparent/child mergedforT andA if for nodesn1, n2, n3inTCsuch thatn2is anr-successor ofn1,n3is ans-successor ofn2andT |=r ≡swe have(T,A)6|=C0(a)ifC0is the concept that results from identifyingn1andn3. For instance, the concept in Fig. 2c is the result of identifying the leaf labeled withBin Fig. 2b with the parent of its parent.

We present a run of Algorithm 1 forT ={Av ∃s.B, svr}andH={svr}. As- sume the oracle gives as counterexample(A, C(a)), whereA={t(a, b), A(b), s(a, c)}

and C(a) = ∃t.(Au ∃r.∃r.∃r.B)u ∃s.>(a)(Fig. 2a). Role saturation produces C(a) =∃t.(Au ∃s.∃s.∃s.B)u ∃s.>(a)(Fig. 2b). Then, applying parent/child merg- ing twice we obtainC(a) =∃t.(Au ∃s.B)u ∃s.>(a)(Fig. 2c and 2d).

tA B

s r

r r

s t

B s s

s

A

(a) (b)

s s tA

s B

s tsAB

(c) (d)

Fig. 2:ConceptCbeing role saturated and parent/child merged.

Since(H,A) 6|= ∃t.(Au ∃s.B)(a), after Lines 4-6, Algorithm 1 updatesC by choosing the conjunct∃t.(Au ∃s.B). AsCis of the form∃t.C0and there ist(a, b)∈ A such that(T,A)|=C0(b), the algorithm recursively calls the function “ReduceCoun- terExample” with Au ∃s.B(b). Now, since(H,A) 6|= ∃s.B(b), after Lines 4-6,C is updated to∃s.B. Finally,Cis of the form∃t.C0 and there is not(b, c) ∈ Asuch that (T,A) |= C0(c). So the algorithm proceeds to Lines 11-12, where it chooses A(b)∈ A. Since(T,{A(b)}) |=∃s.B(b)and(H,{A(b)})6|=∃s.B(b)we have that T |=Av ∃s.BandH 6|=Av ∃s.B.

Lemma 2. Let(A, C(a))be a positive counterexample. Then the following holds:

1. ifCis a basic concept then there is a singletonA0⊆ Asuch that(T,A0)|=C(a);

(8)

Algorithm 2Minimizing an ABoxA

1: LetAbe an ABox such that(T,A)|=A(a)but(H,A)6|=A(a), forA∈NC,a∈Ind(A).

2: functionMINIMIZEABOX(A) 3: Concept saturateAwithH

4: foreveryA∈NC∩ΣT anda∈Ind(A)such that 5: (T,A)|=A(a)and(H,A)6|=A(a)do 6: Domain MinimizeAwithA(a) 7: Role MinimizeAwithA(a) 8: return(A)

2. ifC is of the form∃r.C0 (or∃r.C0) andC is role saturated and parent/child merged then either there isr(a, b)∈ A(orr(b, a)∈ A) such that(T,A)|=C0(b) or there is a singletonA0⊆ Asuch that(T,A0)|=C(a).

Lemma 3. For any target DL-LiteRTBoxT and hypothesis DL-LiteRTBoxHgiven a positive data retrieval counterexample(A, C(a)), Algorithm 1 computes in time polynomial in|T |,|H|,|A|and|C|a counterexampleC0(b)such that(T,A0)|=C0(b), whereA0⊆ Ais a singleton ABox.

Proof. (Sketch) Let(A, C(a))be the input of “ReduceCounterExample”. The number of membership queries in Line 3 is polynomial in|C|and|T |. IfC has more than one conjunct then it is updated in Lines 4-6, soCbecomes either (1) a basic concept or (2) of the form ∃r.C0 (or∃r.C0). By Lemma 2 in case (1) there is a singleton A0 ⊆ Asuch that(T,A0)|=C(a), computed by Line 11 of Algorithm 1. In case (2) either there is a singletonA0 ⊆ Asuch that(T,A0)|=C(a), computed by Line 11 of Algorithm 1, or we obtain a counterexample with a refinedC. Since the size of the refined counterexample is strictly smaller after every recursive call of “ReduceCounterExample”,

the total number of calls is bounded by|C|. o

Using Theorem 2 and Theorem 1 we obtain:

Theorem 3. The DL-LiteRdata retrieval framework is polynomially exact learnable.

Polynomial Reduction forELlhsTBoxes In this section we give a polynomial algo- rithm computinggforELlhs. First we note that the concept assertion in data retrieval counterexamples(A, D(a))can always be made atomic. LetΣT be the signature of the target TBoxT.

Lemma 4. If(A, D(a))is a positive counterexample then by posing polynomially many membership queries one can find a concept nameA∈ΣT and an individualb∈Ind(A) such that(A, A(b))is also a counterexample.

Thus it suffices to show that given a positive counterexample(A, D(a))withD ∈ NC, one can compute anELconcept expressionCbounded in size by|T |such that (T,{C(b)}) |= A(b)and(H,{C(b)}) 6|= A(b), whereA ∈ NC. As(T,{C(b)}) |= A(b)if and only ifT |=C vA, we obtain a positive subsumption counterexample.

Our algorithm for computinggis based on two operations: minimization, computed by

(9)

Algorithm 3Computing a tree shaped ABox 1: functionFINDTREE(A)

2: MINIMIZEABOX(A)

3: whilethere is a cyclecinAdo 4: Unfolda∈Ind(A)in cyclec 5: MINIMIZEABOX(A)

6: LetCbe the concept expression corresponding toAwith counterexampleA(a).

7: return(C(a),A(a))

Algorithm 2, and unfolding. Algorithm 2minimizesa given ABox with the following rules.

(Concept saturateAwithH) IfA(a)∈ A/ and(H,A)|=A(a)then replaceAby A ∪ {A(a)}, whereA∈NC∩ΣT anda∈Ind(A).

(Domain MinimizeAwithA(a)) IfA(a)is a counterexample and(T,A−b)|=A(a) then replaceAbyA−b, whereA−bis the result of removing fromAall ABox assertions in whichboccurs.

(Role MinimizeAwithA(a)) IfA(a)is a counterexample and(T,A−r(b,c)) |= A(a) then replace Aby A−r(b,c), where A−r(b,c) be obtained by removing a role assertionr(b, c)fromA.

Lemma 5. Given a positive counterexample(A, D(a))with D ∈ NC, Algorithm 2 computes in polynomially many steps with respect to|A|,|H|, and|T |an ABoxA0such that|Ind(A0)| ≤ |T |and(A0, A(b))is a positive counterexample, for someA∈NCand b∈Ind(A0).

It remains to show thatAcan be made tree-shaped. We say thatAhas an (undirected) cycle if there is a finite sequencea0·r1·a1·...·rk·aksuch that (i)a0=akand (ii) there are mutually distinct assertions of the formri+1(ai, ai+1)orri+1(ai+1, ai)inA, for 0≤i < k. Theunfoldingof a cyclec=a0·r1·a1·...·rk·akin a given ABoxAis obtained by replacingcby the cyclec0 =a0·r1·a1·...·rk·ak−1·rk·ba0·r1· · ·bak−1·rk·a0, where baiare fresh individual names,0≤i≤k−1, in such a way that (i) ifr(ai, d)∈ A, for an individualdnot in the cycle, thenr(bai, d)∈ A; and (ii) ifA(ai)∈ AthenA(bai)∈ A.

We prove in the appendix that after every unfolding-minimisation step in Algorithm 3 the ABoxAon the one hand becomes strictly larger and on the other does not exceed the size of the target TBoxT. Thus Algorithm 3 terminates after a polynomial number of steps yielding a tree-shaped counterexample.

Lemma 6. Algorithm 3 computes a minimal tree shaped ABoxAwith size polynomial in|T |and runs in polynomially many steps in|T |and|A|.

Using Theorem 2 and Theorem 1 we obtain:

Theorem 4. TheELlhsdata retrieval framework is polynomially exact learnable.

4 Limits of Polynomial Time Learnability

Our proof of non-polynomial learnability of generalELTBoxes from data retrieval examples extends previous results on non-polynomial learnability ofELTBoxes from

(10)

subsumptions [12]. We start by giving a brief overview of the construction in [12], show that it fails in the data retrieval setting and then demonstrate how it can be modified.

The non-learnability proof in [12] proceeds as follows. A learner tries to exactly identify one of the possible target TBoxes{TL |L ∈Ln}, for a superpolynomial in nsetLn defined below. At every stage of computation the oracle maintains a set of TBoxesS, which the learner is unable to distinguish based on the answers given so far.

InitiallyS={TL |L∈Ln}. It has been proved that for anyELinclusionCvDeither TL |= C vD for everyL∈ Ln or the number ofL ∈ Ln such thatTL |= C vD does not exceed|C|. When a polynomial learner asks a membership queryCvDthe oracle answers ‘yes’ ifTL |=C v D for everyL ∈ Ln and ‘no’ otherwise. In the latter case the oracle removes polynomially manyTLsuch thatTL |=C vDfromS.

Similarly, for any equivalence query with hypothesisHasked by a polynomial learning algorithm there exists a polynomial size inclusionCvD, which can be returned as a counterexample and that excludes only polynomially many TBoxes fromS. Thus, every query to the oracle reduces the size ofSat most polynomially inn, but then the learner cannot distinguish between the remaining TBoxes of our initial superpolynomial setS.

The set of indicesLn and the target TBoxesTL are defined as follows. Fix two role namesrands. Ann-tupleLis a sequence of role sequences(σ1, . . . ,σn), where everyσiis a sequence of role namesrands, that isσii1σ2i . . . σinwithσji ∈ {r, s}.

ThenLn is a set ofn-tuples such that for everyL, L0 ∈ LnwithL = (σ1, . . . ,σn), L0 = (σ10, . . . ,σ0n), ifσij0 thenL=L0andi=j. There areN=b2n/ncdifferent tuples inLn. For everyn >0and everyn-tupleL= (σ1, . . . ,σn)we define an acyclic ELTBoxTLas the union ofT0 ={Xi v ∃r.Xi+1u ∃s.Xi+1 |0 ≤i < n}and the following inclusions:

A1v ∃σ1.MuX0

B1v ∃σ1.MuX0

. . . Anv ∃σn.MuX0

Bnv ∃σn.MuX0

A≡X0u ∃σ1.Mu · · · u ∃σn.M.

where the expression∃σ.C stands for∃σ1.∃σ2. . .∃σn.C,M is a concept name that

‘marks’ a designated path given byσandT0generates a full binary tree whose edges are labelled with the role namesrandsand withX0at the root,X1at level1and so on.

In contrast to the subsumption framework, everyTLcanbe exactly identified using data retrieval queries. For example, as X0u ∃σ1.M u · · · u ∃σn.M v A ∈ TL, a learning from data retrieval queries algorithm can learn all the sequences in then- tupleL= (σ1, . . . ,σn), by defining an ABoxA={X0(a1), r(a1, a2), s(a1, a2), . . . , r(an−1, an), s(an−1, an), M(an)}and then proceeding with unfolding and minimizing Avia membership queries of the form(TL,A)|=A(a1).

To show the non-tractability for data retrieval queries, we first modifySin such a way that the concept expression which ‘marks’ the sequences inL= (σ1, . . . ,σn)is now given by the setBnof all conjunctionsF1u · · · uFn, whereFi ∈ {Ei,E¯i}, for 1≤i≤n. Intuitively, every member ofBnencodes a binary string of lengthnwithEi

encoding1andE¯iencoding0. For everyL∈Lnand everyB∈Bnwe defineTLBas the union ofT0and the concept inclusions defined above withBreplacingM.

Then for any sequenceσ of lengthnthere exists at most oneL ∈ Ln, at most one1 ≤ i ≤nand at most oneB ∈Bnsuch that TLB |= Ai v ∃σ.BandTLB |=

(11)

Bi v ∃σ.B. Notice that the size of eachTLBis polynomial innand soLn contains superpolynomially manyn-tuples in the size of eachTLB, withL∈Ln andB∈Bn. EveryTLBentails, among other inclusions,dn

i=1CivA, where everyCiis eitherAior Bi. LetΣnbe the signature of the TBoxesTLBand consider a TBoxTdefined as the following set of concept inclusions:

∃r.(E1uE¯1)v(E1uE¯1),

∃s.(E1uE¯1)v(E1uE¯1),

(E1uE¯1)v ∃r.(E1uE¯1), (E1uE¯1)v ∃s.(E1uE¯1), (EiuE¯i)vA for every1≤i≤nand everyA∈Σn∩NC

The basic idea of extending our TBoxes withT is that ifa∈ (EiuE¯i)IA, for an ABoxAand individuala ∈ Ind(A), then for allL ∈Ln andB ∈Bn, we have (TLB,A)|=D(b), whereDis anyELconcept expression overΣnandb∈Ind(A)is any successor or predecessor ofa(oraitself). This means that for each individual in Aat most oneBof the2nbinary strings inBncan be distinguished by data retrieval queries. The following lemma enables us to respond to membership queries without eliminating too manyL∈LnandB∈Bnused to encodeTLBin the set of TBoxes that the learner cannot distinguish.

Lemma 7. For any ABoxA, anyELconcept assertionD(a)overΣn, and anya ∈ Ind(A), if there isL∈LnandB∈Bnsuch that(TLB∪ T,A)|=D(a)then:

– either(TLB∪ T,A)|=D(a), for everyL∈LnandB∈Bn, or – (TLB∪ T,A)|=D(a)for at most|D|elementsL∈Ln, or – (TLB∪ T,A)|=D(a)for at most|A|elementsB∈Bn.

The next lemma (proved in Appendix E) is immediate from Lemma 15 presented in [12]. It shows how the oracle can answer equivalence queries eliminating at most one L∈Lnused to encodeTLBin the setSof TBoxes that the learner cannot distinguish.

Lemma 8. For anyn >1and anyELTBoxHinΣnwith|H|<2n, there exists an ABoxA, an individuala∈Ind(A)and anELconcept expressionDoverΣnsuch that (i) the size ofAplus the size ofDdoes not exceed6nand (ii) if(H,A)|=D(a)then (TLB,A)|=D(a)for at most oneL∈Lnand if(H,A)6|=D(a)then for everyL∈Ln

we have(TLB∪ T,A)|=D(a).

Then, by Lemmas 7 and 8, we have that: (i) any polynomial size membership query can distinguish at most polynomially many TBoxes fromS; and (ii) if the learner’s hypothesis is polynomial size then there exists a polynomial size counterexample that the oracle can give which distinguishes at most polynomially many TBoxes fromS.

Theorem 5. TheELdata retrieval framework is not polynomially exact learnable.

5 Future Work

We plan to consider an extension of the learning protocol in which arbitrary conjunctive queries are admitted in queries to the domain expert/oracle. We then still have polynomial time learnability forELlhsbut conjecture non-polynomial time learnability for DL-LiteR. Another extension is exact learnability for the Horn-extension of DL-LiteRfor which we conjecture that polynomial time learnability still holds.

(12)

Bibliography

[1] D. Angluin. Queries and concept learning.Machine Learning, 2(4):319–342, 1987.

[2] F. Baader, S. Brandt, and C. Lutz. Pushing theELenvelope. InIJCAI, pages 364–369. Professional Book Center, 2005.

[3] F. Baader, D. Calvanese, D. McGuiness, D. Nardi, and P. Patel-Schneider.The De- scription Logic Handbook: Theory, implementation and applications. Cambridge University Press, 2003.

[4] S. Bechhofer, I. Horrocks, C. Goble, and R. Stevens. Oiled: a reason-able ontology editor for the semantic web. InKI 2001: Advances in Artificial Intelligence, pages 396–408. Springer, 2001.

[5] D. Borchmann and F. Distel. Mining ofEL-GCIs. InThe 11th IEEE International Conference on Data Mining Workshops, Vancouver, Canada, 11 December 2011.

IEEE Computer Society.

[6] P. Buitelaar, P. Cimiano, and B. Magnini, editors.Ontology Learning from Text:

Methods, Evaluation and Applications. IOS Press, 2005.

[7] D. Calvanese, G. De Giacomo, D. Lembo, M. Lenzerini, and R. Rosati. Tractable reasoning and efficient query answering in description logics: TheDL-Litefamily.

Journal of Automated reasoning, 39(3):385–429, 2007.

[8] P. Cimiano, A. Hotho, and S. Staab. Learning concept hierarchies from text corpora using formal concept analysis.J. Artif. Intell. Res. (JAIR), 24:305–339, 2005.

[9] J. Day-Richter, M. A. Harris, M. Haendel, S. Lewis, et al. Obo-edit an ontology editor for biologists.Bioinformatics, 23(16):2198–2200, 2007.

[10] M. Frazier and L. Pitt. Learning From Entailment: An Application to Propositional Horn Sentences. InICML, pages 120–127, 1993.

[11] B. Konev, M. Ludwig, D. Walther, and F. Wolter. The logical difference for the lightweight description logic EL.J. Artif. Intell. Res. (JAIR), 44:633–708, 2012.

[12] B. Konev, C. Lutz, A. Ozaki, and F. Wolter. Exact learning of lightweight descrip- tion logic ontologies. InPrinciples of Knowledge Representation and Reasoning:

Proceedings of the Fourteenth International Conference, KR 2014, Vienna, Austria, July 20-24, 2014, 2014.

[13] C. Lutz, R. Piro, and F. Wolter. Description logic TBoxes: Model-theoretic charac- terizations and rewritability. InIJCAI, pages 983–988, 2011.

[14] Y. Ma and F. Distel. Learning formal definitions for snomed CT from text. In Artificial Intelligence in Medicine - 14th Conference on Artificial Intelligence in Medicine, AIME 2013, Murcia, Spain, May 29 - June 1, 2013. Proceedings, pages 73–77, 2013.

[15] M. A. Musen. Prot´eg´e ontology editor.Encyclopedia of Systems Biology, pages 1763–1765, 2013.

[16] C. Reddy and P. Tadepalli. Learning First-Order Acyclic Horn Programs from Entailment. Inin Proceedings of the 15th International Conference on Machine Learning; (and Proceedings of the 8th International Conference on Inductive Logic Programming, pages 23–37. Morgan Kaufmann, 1998.

(13)

[17] H. Stuckenschmidt, C. Parent, and S. Spaccapietra, editors. Modular Ontologies:

Concepts, Theories and Techniques for Knowledge Modularization, volume 5445 ofLecture Notes in Computer Science. Springer, 2009.

[18] H. Wang, M. Horridge, A. Rector, N. Drummond, and J. Seidenberg. Debugging OWL-DL ontologies: A heuristic approach. InThe Semantic Web–ISWC 2005, pages 745–757. Springer, 2005.

A Simulations and Canonical Models

The semantics of DL-LiteR andELlhs is given by interpretations. An interpretation I = (∆II)consists of a non-empty set∆Iand a function·Ithat assigns each concept nameAto a setAI ⊆∆Iand each role namerto a binary relationrI ⊆∆I×∆I. We make the unique name assumption for the individuals of an ABox. To interpret an ABoxA, we consider interpretationsIwhich assign to eacha∈ Ind(A)an element aI ∈ ∆I. An interpretationI satisfies an assertionA(a) ∈ AifaI ∈ AI and an assertionr(a, b)∈ Aif(aI, bI)∈rI. TheextensionCIof anELIconcept expression Cis inductively defined as follows:

– >I =∆I

– (CuD)I =CI∩DI

– (∃r.C)I={d∈∆I| ∃e∈CI : (d, e)∈rI} – (∃r.C)I={d∈∆I| ∃e∈CI : (e, d)∈rI}

An interpretationIsatisfies:

– a concept inclusionCvD, in symbolsI |=CvD, ifCI⊆DI; – a role inclusionrvs, in symbolsI |=rvs, ifrI ⊆sI;

– an instance assertionC(a), in symbolsI |=C(a), ifaI∈CI; – a role assertionr(a, b), in symbolsI |=r(a, b), if(aI, bI)∈rI.

We say that an interpretationI is amodelof a TBoxT (an ABoxA) ifI |=α for allα∈ T (α ∈ A). A CI (a RI)αfollows froma TBoxT if every model ofT is a model ofα, in symbolsT |=α.|= αis used to denote thatαfollows from the empty TBox. A knowledge base (KB) is a pairK= (T,A)consisting of a TBox and an ABox. An instance assertion follows fromK= (T,A)if every individual name that occurs inαalso occurs inInd(A)and every model of(T,A)is a model ofα, in symbols (T,A)|=α. For DL-LiteRKBs, we assume that the ABoxAis closed under inverses, i.e.,r(a, b)∈ Aiffr(b, a)∈ A. If the add or remove assertions from an ABoxA, we silently do this assuming that the resultingAis again closed under inverses. ForELlhs

KBs, role assertions inAdo not have inverse roles and we do not makeAclosed under inverses.

(14)

Homomorphisms and Simulations Apathin anELIconcept expressionCis a finite sequenceC0·r1·C1·...·rk·Ck, whereC0=C,k≥0, and∃ri+1.Ci+1is a conjunct of Ci, for0≤i < k. The setpaths(C)contains all paths inC. We also definetail(p) ={A

|Ais a conjunct ofCk}, whereCkis the last concept expression in pathp.

Definition 1. The tree interpretationICofCis defined as follows:

– ∆IC ={p=C0·r1·C1·...·rk·Ck |p∈paths(C)} – AIC ={p|A∈tail(p)}

– rIC ={(p, p0)|p0=p·r·D}

The next two lemmas relate our tree shaped interpretations with homomorphisms.

A homomorphismh : IC → I between the tree interpretation of an ELIconcept expressionCand an interpretationIis defined as follows, forp, p0∈∆IC:

– ifp∈AIC thenh(p)∈AI;

– if(p, p0)∈rIC then(h(p), h(p0))∈rI.

Lemma 9. LetCbe anELIconcept expression. Denote asICthe tree shaped inter- pretation ofC. If there is a homomorphismh:IC → I, such thath(ρC) =d, where ρCis the root path ofIC, thend∈CI.

Proof. In the base case letC be a concept name A. SinceρA ∈ AIA, if there is a homomorphismh:IA→ I, such thath(ρA) =d, thend∈AI. Now we make a case distinction:

– ForC=C1uC2: Suppose the lemma is true for a homomorphismh1:IC1 → I withh1C1) = dand a homomorphismh2 :IC2 → Iwithh2C2) = d. Then d∈C1Iandd∈C2I. By semantics of∩,d∈(C1uC2)I.

– ForC=∃r.C0: We know thatρ∈(∃r.C0)IC. By semantics of∃, there isdsuch that(ρ, d) ∈rIC andd ∈C0IC. If there is a homomorphismh: IC → I then (h(ρ), h(d)) ∈rI. Suppose the lemma is true forC0. Then,h(d)∈ C0I and we

have thath(ρ)∈(∃r.C0)I.

– ForC = ∃r.C0: We know thatρ∈ (∃r.C0)IC. By semantics of∃, there isd such that(d, ρ)∈rIC andd∈C0IC. If there is a homomorphismh:IC → Ithen (h(d), h(ρ)) ∈rI. Suppose the lemma is true forC0. Then,h(d)∈ C0I and we

have thath(ρ)∈(∃r.C0)I.

o

Lemma 10. LetCbe anELI concept expression. Denote asIC the tree shaped in- terpretation ofC. Ifd ∈ CI then there is a homomorphismh : IC → I, such that h(ρC) =d, whereρCis the root path ofIC.

Proof. In the base case letCbe a concept nameA. Ifd∈AIthen, sinceρA∈ IA, we have thath(ρA) =dis a homomorphismh:IA→ I. Now we make a case distinction:

(15)

– ForC = C1uC2: Ifd ∈ (C1uC2)I then, by semantics of ∩,d ∈ C1I and d ∈ C2I. Suppose the lemma is true for d1 ∈ C1I and d2 ∈ C2I. Then there is a homomorphismh1 : IC1 → I with h1C1) = d1 and a homomorphism h2 :IC2 → Iwithh2C2) =d2. Now, we need to show that the lemma is also true forC=C1uC2. Forp∈∆IC, we definehas follows:

h(p) =

d ifp=ρ, whereρis the root ofIC

h1(p0) ifp=p0, wherep0 ∈∆IC1 h2(p0) ifp=p0, wherep0 ∈∆IC2

We know thatρ ∈ CIC. Ifd ∈ CI thenh(ρ) = dis a partial homomorphism.

SinceC = C1uC2, we have that for all p ∈ ∆IC \ {ρ}, p ∈ paths(C1)or p∈paths(C2). By hypothesish1andh2are homomorphisms, soh:IC → Iis also a homomorphism.

– ForC =∃r.C0: Ifd ∈(∃r.C0)I then, by semantics of∃, there is ad0such that (d, d0)∈rIandd0 ∈C0I. Suppose the lemma is true ford0∈C0I. Then there is a homomorphismh0 :IC0 → Iwithh0C0) =d0. Now, we need to show that the lemma is also true forC=∃r.C0. Forp∈∆IC, we definehas follows:

h(p) =

d ifp=ρ, whereρis the root ofIC

h0(p0) ifp=ρ·r·p0andp0∈∆IC0

We know thatρ∈CIC. Ifd∈CIthenh(ρ) =dis a partial homomorphism. Since C=∃r.C0, we have that for allp∈∆IC \ {ρ},p=ρ·r·p0withp0∈paths(C0).

By hypothesish0is a homomorphism, soh:IC→ I is also a homomorphism.

– ForC=∃r.C0: Ifd∈(∃r.C0)Ithen, by semantics of∃, there is ad0such that (d0, d)∈rIandd0 ∈C0I. Suppose the lemma is true ford0∈C0I. Then there is a homomorphismh0 :IC0 → Iwithh0C0) =d0. Now, we need to show that the lemma is also true forC=∃r.C0. Forp∈∆IC, we definehas follows:

h(p) =

d ifp=ρ, whereρis the root ofIC

h0(p0) ifp=ρ·r·p0andp0∈∆IC0

We know thatρ∈CIC. Ifd∈CIthenh(ρ) =dis a partial homomorphism. Since C=∃r.C0, we have that for allp∈∆IC\{ρ},p=ρ·r·p0withp0∈paths(C0).

By hypothesish0is a homomorphism, soh:IC→ I is also a homomorphism.

o Let I,J be interpretations and Σ a signature, d ∈ ∆I and e ∈ ∆J. A relation S⊆∆I×∆J is aΣ-simulationfrom(I, d)to(J, e)if the following conditions are satisfied:

– for all concept namesA∈Σand all(d, e)∈S, ifd∈AIthene∈AJ;

– for all role namesr∈Σ, all(d, e)∈Sand alld0∈∆I, if(d, d0)∈rIthen there existse0∈∆J such that(e, e0)∈rJ and(d0, e0)∈S.

We write(I, d) ⇒Σ (J, e)if there is aΣ-simulation from (I, d)to(J, e). If Σ = NC∪NR we omitΣ and speak of a simulation(I, d) ⇒ (J, e). Simulations preserve the membership ofELconcept expressions [13]:

(16)

Lemma 11. For allELconcept expressionsC: ifd∈CI and(I, d)⇒(J, e), then e∈CJ.

The Canonical Model of an ABox The canonical models of DL-LiteR and ELlhs

knowledge bases is given by the pair(T,A), whereAis an ABox andT is a TBox with DL-LiteRandELlhsCIs, respectively. Before we proceed with knowledge bases, we present the canonical model of an ABox.

Definition 2. The canonical model IA = (∆IAIA) of an ABox Ais defined as follows:

– ∆IA ={a|a∈Ind(A)}

– AIA={a|A(a)∈ A,A∈NC} – rIA ={(a, b)|r(a, b)∈ A,r∈NR}

A.1 The Canonical Model for DL-LiteR

The canonical modelIT,A =SIn≥0of a DL-LiteRknowledge baseK = (T,A)is defined by a sequence of interpretationsIn. LetIAbe the canonical interpretation ofA.

In the base case,I0:=IA. The domain∆Incontains, in addition to∆I0, sequences a0·r0·C0·r1·C1·...·rm·Cm, wherea0∈Ind(A). Forp=a0·r0·C0·r1·C1·...·rm·Cm

andq=C00 ·r01·C10 ·...·r0m0·Cm0 0 we definep·s·q=a0·r0·C0·r1·C1·...·rm· Cm·s·C00·r01·C10·...·rm0 0·Cm0 0, that is, the concatenation ofpandqthrough the role s. AssumeInhas been defined. Letk≤nbe minimal such that there is ap∈∆Ikwith p∈BIk,B vD ∈ T butp /∈DIk. LetD0, D1, ..., Dlbe the conjuncts ofDof the formDi=∃si.Ei,0≤i≤l. Now, we defineIn+1as follows:

– ∆In+1 =∆In∪ {p·si·q|q∈paths(Ei),0≤i≤l};

– AIn+1 =AIn∪ {p·si·q|A∈tail(q),0≤i≤l} ∪{p|ifAis a conjunct ofD};

– rIn+1 =rIn∪ {(p·si·q, p·si·q0)|(q, q0) ∈r0IEi,T |=r0 v r,0 ≤i ≤l}

∪{(p, p·si·Ei)| T |=sivr,0≤i≤l}.

(The canonical modelIT,C = SIn≥0 of an ELI concept expressionC and a DL-LiteRTBoxT is defined analogously withI0=IC, whereICis the tree interpre- tation ofC.)

As an example letA={r(a, b), A(b), s(a, c)}andT ={Av ∃s.B}. Figures 3a and 3b shows the interpretationsIAandIT,A, respectively.

It follows from Lemma 13 that the sequence of interpretations which defineIT,A= SIn≥0 is the canonical model of a DL-LiteR knowledge base(T,A). This lemma requires a technical lemma, presented below, which states that there is a homomorphism fromIT,Ato an arbitrary model of(T,A).

Lemma 12. Let J be a model of(T,A). Then there exists a homomorphism h : IT,A→ J mappingh(a) =aJ for alla∈Ind(A)with the following properties:

– ifp∈AIT,A thenh(p)∈AJ;

– if(p, p0)∈rIT,Athen(h(p), h(p0))∈rJ, wherep, p0∈∆IT,A.

(17)

a c r b

s A

(a) The canonical modelIA

a c

r b s As B

(b) The canonical modelIT,A

Fig. 3: Canonical Models withA={r(a, b), A(b), s(a, c)}andT ={Av ∃s.B}

Proof. The proof is by induction on the sequence of interpretations for the canonical modelIT,A. We defineh=S

n≥0hn and seth0 :∆I0 →∆J withh0(a) =aJ for a∈∆I0 andaJ ∈∆J. By definition ofI0,A(a)∈ Aiffa∈AI0 andr(a, b)∈ A iff(a, b)∈ rI0. SinceJ is a model of(T,A), ifA(a) ∈ Athena∈AJ. Similarly, ifr(a, b)∈ Athen(a, b)∈rJ. Soa∈AI0impliesh0(a)∈AJ. Also,(a, b)∈rI0 implies(h0(a), h0(b))∈rJ. Thus,h0is a homomorphism.

Suppose it was proven thathn:In→Jis a homomorphism. Letk≤nbe minimal such that there is ap∈∆Ik withp∈BIk,B vD∈ T butp /∈DIk. As described above, for every conjunctDiofD=D0u...uDl,0≤i≤l,In+1is defined as:

1. ifDi is a concept nameA thenIn+1 is defined in the same way as In except thatAIn+1 = AIn∪ {p}. Since J is a model of(T,A), ifhn(p) ∈ BJ and BvD∈ T thenhn(p)∈DJ. SinceDi=Ais a conjunct ofD,hn(p)∈AJ. So hn+1=hnis a homomorphism.

2. otherwiseDiis of the form∃s.E, wherescan be an inverse role. Then a copy of the tree shaped interpretationIEof the concept expressionEconnected by the role sis added toIn+1following the way described above. By hypothesis,hn(p)∈BJ. SinceJ is a model of(T,A), ifBvD∈ T thenhn(p)∈DJ. SinceDi=∃s.E is a conjunct ofD,hn(p)∈(∃s.E)J. By semantics of∃,

– (∗) there isd∈∆J such that(hn(p), d)∈sJ andd∈EJ.

By Lemma 10, ifd∈EJ then there is a homomorphismh0 :IE → J mapping ρE tod, where IE is the tree interpretation of E rooted inρE. Now, for every p·s·q ∈∆In+1 such thatq ∈paths(E)we definehn+1(p·s·q) =h0(q). We want to show thathn+1is a homomorphism.

By definition ofIn+1,p·s·q ∈ AIn+1 iffq ∈ AIE. If q ∈ AIE then (since h0 : IE → J is a homomorphism)h0(q) ∈ AJ. Sop·s·q ∈ AIn+1 implies hn+1(p·s·q) =h0(q)∈AJ. Also, by definition ofIn+1,(p·s·q, p·s·q0)∈rIn+1 iff(q, q0) ∈ r0IE andT |= r0 v r. If(q, q0) ∈ r0IE then(h0(q), h0(q0)) ∈ r0J. SinceJ is a model of(T,A), ifT |= r0 v r then (h0(q), h0(q0)) ∈ rJ. As hn+1(p·s·q) =h0(q)andhn+1(p·s·q0) =h0(q0), if(p·s·q, p·s·q0)∈rIn+1 then(hn+1(p·s·q), hn+1(p·s·q0))∈rJ. Finally, notice that by definition ofIn+1, tIn+1 =tIn∪ {(p, p·s·E)| T |=svt}. By (∗), we have that(hn(p), d)∈sJ. SinceJ is a model of(T,A), ifT |=svtthen(hn(p), d)∈tJ. By definition ofh0 :IE → J,h0E) =d. Sohn+1(p·s·E) =d. Then(p, p·s·E)∈tIn+1 implies(hn(p), hn+1(p·s·E))∈tJ.

Referenzen

ÄHNLICHE DOKUMENTE

CIs with a complex left-hand side or concept equivalences are essential for non-polynomial query learnability as any acyclic TBox containing expressions of the form A v C only is

Just as classical description logics, fuzzy DLs are based on concept descriptions built from the mutually disjoint sets N C , N R and N I of concept names, role names, and

Peñaloza, Are fuzzy description logics with general concept inclusion axioms decidable?, in: Proceedings of the 2011 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE’11),

Experimental Evaluation of GCIs Learned from Textual Data 2015-06-08 3 / 14...

In Pascal Hitzler and Thomas Lukasiewicz, editors, Proceedings of the 4th International Conference on Web Reasoning and Rule Systems (RR’10), volume 6333 of Lecture Notes in

In this section we present a basic algorithm for computing the deductive closure of input expressions under inference rules, which we call the abstract saturation procedure.. This is

In particular, it was shown that in EL the matching problem (i.e., the problem of deciding whether a given matching problem has a matcher or not) is polyno- mial for matching

general TBoxes (i.e., finite sets of general concept inclusions, GCIs) is in NP by introducing a goal-oriented matching algorithm that uses non-deterministic rules to transform a