Data Complexity in the EL family of Description Logics

(1)

Data Complexity in the E L family of Description Logics

Adila Krisnadhi¹and Carsten Lutz²

1Faculty of Computer Science, University of Indonesia adila@cs.ui.ac.id

2Institute for Theoretical Computer Science, TU Dresden, Germany lutz@tcs.inf.tu-dresden.de

Abstract. We study the data complexity of instance checking and conjunctive query answering in theELfamily of description logics, with a particular em- phasis on the boundary of tractability. We identify a large number of intractable extensions ofEL, but also show that inELI^f, the extension ofELwith inverse roles and global functionality, conjunctive query answering is tractable regarding data complexity. In contrast, already instance checking inELextended with only inverse roles or global functionality is EXPTIME-complete regarding combined complexity.

1 Introduction

In recent years, lightweight description logics (DLs) have experienced increased in- terest because they admit highly efficient reasoning on large-scale ontologies. Most prominently, this is witnessed by the ongoing research on the DL-Lite andELfami- lies of DLs (see also [12, 16] for other examples). The main application ofELand its relatives is as an ontology language [6, 2, 4]. In particular, the DLEL⁺⁺ proposed in [2] admits tractable reasoning while still providing sufficient expressive power to represent, for example, life-science ontologies. In contrast, the DL-Lite family of DLs is specifically tailored towards applications with a massive amount of instance data [9, 7, 8, 1]. In such applications, instance checking and conjunctive query answering are the most relevant reasoning services and should thus be computationally cheap, preferably tractable. When determining the computational complexity of these tasks for a given DL, it is often realistic to considerdata complexity, where the size of the input is mea- sured only in terms of the ABox (which represents instance data), but not in terms of the TBox (which corresponds to the schema) and the query, as the latter both tend to be small compared to the former. This is in contrast tocombined complexity, where also the size of the TBox and query are taken into account.

The aim of this paper is to study theELfamily of DLs in the light of data inten- sive applications. To this end, we analyze the data complexity of instance checking and conjunctive query answering in extensions ofEL. For the DL-Lite family, such an investigation has been carried out e.g. in [8, 1], with complexities ranging from LOGSPACE- complete to coNP-complete. It follows from the results in [8] that we cannot expect the data complexity to be below PTIMEfor members of theELfamily (at least in the presence of so-called general TBoxes, i.e., sets of GCIs). The reason is that, in a crucial aspect, DL-Lite is even more lightweight thanEL: in contrast toEL, DL-Lite does not

(2)

Name Syntax Semantics

top > ∆^I

conjunction CuD C^I∩D^I

existential restriction ∃r.C {x∈∆Î | ∃y∈∆Î: (x, y)∈rÎ∧y∈CÎ} atomic negation ¬A ∆Î\AÎ

disjunction CtD C^I∪D^I

sink restriction ∀r.⊥ {x| ¬∃y: (x, y)∈r^I}

value restriction ∀r.C {x| ∀y: (x, y)∈rÎ →y∈CÎ} at-least restriction (>k r) {x|#{y∈∆Î|(x, y)∈rÎ} ≥k}

at-most restriction (6k r) {x|#{y∈∆^I|(x, y)∈r^I} ≤k}

inverse roles ∃r⁻.C {x| ∃y: (y, x)∈rÎ∧y∈CÎ} role negation ∃¬r.C {x| ∃y∈∆Î: (x, y)∈/rÎ∧y∈CÎ} role union ∃r∪s.C {x| ∃y∈∆Î: (x, y)∈rÎ∪sÎ∧y∈CÎ} transitive closure ∃r⁺.C {x| ∃y∈∆Î: (x, y)∈(rÎ)⁺∧y∈CÎ}

Table 1.Syntax and semantics of relevant DL constructors.

allow for qualified existential (neither universal) restrictions, and thus the interaction between different domain elements is very limited. When analyzing the data complexity of instance checking and conjunctive query answering inELand its extensions, we therefore concentrate on mapping out the boundary of tractability.

We consider a wide range of extensions ofEL, and analyze the data complexity of the mentioned tasks with acyclic TBoxes and with general TBoxes. When select- ing extensions ofEL, we focus on DLs for which instance checking has been proved intractableregarding combined complexity in [2]. We show that, in most of these extensions, instance checking is also intractable regarding data complexity. The notable exceptions areELextended with globally functional roles andELextended with inverse roles. It is shown in [3] that instance checking in these DLs is E^XPT^IME-complete regarding combined complexity. On the other hand, it follows from results in [12] that instance checking is tractable regarding data complexity inELI^f, the extension ofEL with both globally functional and inverse roles. In this paper, we extend this result to conjunctive query answering inELI^f, and show that this problem is still tractable regarding data complexity.

2 Preliminaries

In DLs,conceptsare inductively defined with the help of a set ofconstructors, starting with a setN_Cofconcept namesand a setN_Rofrole names. InEL, concepts are formed using the three topmost constructors in Table 1. There and in general, we userands to denote role names,AandBto denote concept names, andC, Dto denote concepts.

The additional constructors shown in Table 1 give rise to extensions of EL. We use

(3)

canonical names to refer to such extensions, writing e.g.EL^∀r.⊥forELextended with sink restrictions andEL^CtD forELextended with disjunction. Since we perform a very fine grained analysis,EL^(≤kr)means the extension ofELwith(≤k r)for some fixedk≥0(but not for some fixedr).

In DLs, TBoxes are used to represent general knowledge about an application domain, and thus play the role of an ontology. We introduce two different forms of TBoxes.

Anacyclic TBoxT is a finite set of concept equationsA =. Csuch that the left-hand sides are unique and there are no cycles, i.e., if{A0 .

=C0, . . . , An−1 .

=Cn−1} ⊆ T then for somei≤ n,Ai does not occur inCi+1 whereAn := A0andCn :=C0. A general TBoxis a finite set of concept inclusionsC vD (often calledGCIs). Every concept equationA .

= Ccan be written as two inclusionsA v C andC v A, and thus general TBoxes subsume acyclic ones. ABoxes are used to represent instance data.

LetN_Ibe a set of individual names. AnABoxis a finite set of expressionsA(a)and r(a, b), whereaandbare fromN_I(here and in what follows). Observe that we disallow complex concepts in the ABox, as usual when studying data complexity.

The semantics ofELand its extensions is defined in terms ofinterpretationsI = (∆Î,·Î). Thedomain∆Î is a non-empty set and theinterpretation function·Î maps each concept nameA∈N_Cto a subsetAÎof∆Î, each role namer ∈N_Rto a binary relationrÎon∆Î, and each individual namea ∈ N_I to a domain elementaÎ ∈∆Î. The extension of ·Î to complex concepts is inductively defined as shown in the third column of Table 1, where#S denotes the cardinality of the setS. An interpretation I satisfiesan equationA .

= CiffAÎ = CÎ, an inclusionC v DiffCÎ ⊆DÎ, an assertionC(a)iffaÎ∈CÎ, and an assertionr(a, b)if(aÎ, bÎ)∈rÎ. It is amodelof a TBoxT (ABoxA) if it satsfies all equations/inclusions inT (assertions inA).

We will also considerEL^kf, the extension ofELwithk-functional roles, i.e., roles for which every domain element can have at mostksuccessors. InEL^kf, there are no additional concept constructors that may be used to build up complex concepts. Instead, a new kind of expression> v(≤k r)is allowed in the TBox. These expressions can be understood asglobalat-most restrictions, in contrast to the local at-most restrictions shown in Table 1. An interpretationIsatisfies> v(≤k r)if|{e|(d, e)∈r^I}| ≤k for alld∈∆^I. Instead of 1-functional roles, we will speak of functional roles as usual.

The two main inference problems considered in this paper are instance checking and conjunctive query entailment. An individual nameais aninstanceof a conceptCw.r.t.

an ABoxAand a TBoxT (writtenA,T |=C(a)) iffa^I ∈C^Iin all modelsI ofA andT. The instance problem is to decide, givena,C,AandT, whetherA,T |=C(a).

Conjunctive query entailment is the decision problem corresponding to conjunctive query answering, which is a search problem. Aconjunctive queryis a setqof atoms C(v)andr(u, v), whereu, vare variables. We useVar(q)to denote the variables used inq. IfI is an interpretation andπis a mapping fromVar(q)to∆Î, we writeI |=^π C(v)ifπ(v) ∈ CÎ,I |=^π r(u, v)if(π(u), π(v)) ∈ rÎ,I |=^π qifI |=^π αfor all α∈q, andI |=qifI |=^π qfor someπ. Finally,A,T |=qmeans that for all models Iof the ABoxAand the TBoxT, we haveI |=q. Now,conjunctive query entailment is to decide givenA,T, andq, whetherA,T |=q.

It is not hard to see that, inEL, instance checking is a special case of conjunctive query entailment, as everyEL-conceptC can be converted into a tree-shaped query.

(4)

Note that we do not partition the variables in a conjunctive query into answer variables and existentially quantified variables as usual. Since we are dealing with query entailment instead of query answering, this distinction is meaningless. Also observe that we do not allow individual names in conjunctive queries in place of variables. It is well- known that individual names in the query can be simulated by concept names with only a linear blowup of the input, see for example [10] for details.

The last preliminary worth mentioning is theunique name assumption (UNA), which requires that for alla, b∈N_Iwitha6=b, we havea^I6=b^I. Most of our results do not depend on the UNA. Whenever they do, we will state explicitly whether the UNA is adopted or not.

3 Lower Bounds

We show that, in almost all extensions ofELintroduced in Section 2, instance checking is co-NP-hard regarding data complexity. All our lower bounds assume only acyclic TBoxes.

For the sake of completeness, we note that the case where there is no TBox is not very interesting: because only conceptnamesare admitted in the ABox, the additional concept constructors can then only occur in the query (which is a concept in the case of instance checking and a conjunctive query otherwise). In most cases (such asEL^(¬)and EL^∀r.C), this means that no query which contains the additional constructor is entailed by any ABox. Thus, there is a trivial reduction to query answering in basicEL. In other cases such asEL^CtD, it is easily shown that conjunctive query containment is tractable regarding data complexity. A notable exception isEL^kf,k ≥ 2, for which instance checking is coNP-complete already without TBoxes (as is proved below).

3.1 Basic Cases

In [19], Schaerf proves that instance checking inEL^¬Ais co-NP-hard regarding data complexity. He uses a reduction from a variant of SAT that he calls 2+2-SAT. Our lower bounds for extensions ofELare obtained by variations of Schaerf’s reduction.

For this reason, we start with repeating the original reduction of Schaerf. Before we go into detail, a remark onEL^¬A is in order. In this extension ofEL, the application of negation is restricted to concept names. However, full negation can easily be recovered using acyclic TBoxes: instead of writing ¬C, we may write ¬A and add a concept equationA .

= C, withAa fresh concept name. Thus, we restrict the use of negation even further, namely to concept names that do not occur on the left-hand side of any concept equation in the (acyclic) TBox. As we shall see shortly, the TBoxes required for our lower bound are actually of very simple form.

A2+2 clauseis of the form(p1∨p2∨ ¬n1∨ ¬n2), where each ofp1, p2, n1, n2is a propositional letter or a truth constant1,0. A2+2 formulais a finite conjunction of 2+2 clauses. Now, 2+2-SAT is the problem of deciding whether a given 2+2 formula is satisfiable. It is shown in [19] that 2+2-SAT is NP-complete.

Letϕ=c0∧ · · · ∧cn−1be a 2+2-formula inmpropositional lettersq0, . . . , qm−1. Letci =pi,1∨pi,2∨ ¬ni,1∨ ¬ni,2for alli < n. We usef, the propositional letters

(5)

q0, . . . , qm−1, the truth constants1,0, and the clausesc0, . . . , cn−1as individual names.

Define the TBoxT as{A .

=¬A}and the ABoxA_ϕas follows, wherec,p1,p2,n1, andn2are role names:

Aϕ:={A(1), A(0)} ∪

{c(f, c0), . . . , c(f, cn−1)} ∪ S

i<n{p1(ci, pi,1), p2(ci, pi,2), n1(ci, ni,1), n2(ci, ni,2)}

It should be obvious thatA_ϕ is a straightforward representation ofϕ. Models ofA_ϕ andT represent truth assignments forϕby way of settingqi to true ifqi ∈ A^I and to false ifqi ∈A^I. SinceI is a model ofT, this truth assignment is well-defined. Set C := ∃c.(∃p1.Au ∃p2.Au ∃n1.Au ∃n2.A). Intuitively,C expresses thatϕis not satisfied, i.e., there is a clause in which the two positive literals and the two negative literals are all false. It is not hard to show the following.

Lemma 1 (Schaerf).Aϕ,T 6|=C(f)iffϕis satisfiable.

Thus, instance checking inEL^¬A w.r.t. acyclic TBoxes is co-NP-hard regarding data complexity.

This reduction can easily be adapted toEL^∀r.⊥. In all interpretationsI,∃r.>and

∀r.⊥partition the domain∆^Iand can thus be used to simulate the concept nameAand its negation¬Ain the original reduction. We can thus simply replace the TBoxT with T⁰:={A=. ∃r.>, A=. ∀r.⊥}.

In some extensions ofEL, we only find concepts that cover the domain, but not necessarily partition it. An example isEL^(≤kr),k ≥ 1, in which∃r.>and(≤ k r) provide a covering (fork= 0, observe that(≤k r)is equivalent to∀r.⊥). Interestingly, this does not pose a problem for the reduction. In the case ofEL^(≤kr), we use the TBox T :={A =. ∃r.>, A= (≤. k r)}, and the ABoxAϕas well as the query conceptC remain unchanged. Let us show that

Lemma 2. A_ϕ,T 6|=C(f)iffϕis satisfiable.

Proof.“if”. This direction is as in the proof of Lemma 1. Lettbe a truth assignment satisfyingϕ. Define an interpretationIas follows:

∆^I:={f, c0, . . . , cn−1, q0, . . . , qm−1,0,1, d}

c^I:={(f, c0), . . . ,(f, cn−1)}

p^I_j :={(c0, p0,j), . . . ,(cn−1, pn−1,j)}

n^I_j :={(c0, n0,j), . . . ,(cn−1, nn−1,j)}

A^I:={1} ∪ {qi |i < mandt(qi) = 1}

AÎ:=∆Î\AÎ

r^I:={(e, d)|e∈A^I}

All individual names are interpreted as themselves. It is not hard to verify thatI is a model ofAϕandT, and thatf /∈C^I.

“only if”. Here we need to deal with the non-disjointness of∃r.>and(≤k r). LetI be a model ofAϕandT such thatf /∈C^I. Define a truth assignmenttby choosing

(6)

for each propositional letterqi, a truth valuet(qi)such thatt(qi) = 1impliesq^I_i ∈A andt(qi) = 0impliesq_i^I ∈ A. Such a truth assignment exists since AandAcover the domain. However, it is not necessarily unique sinceAandAneed not be disjoint.

To show thattsatisfiesϕ, assume that it does not. Then there is a clauseci = (pi,1∨ pi,2∨ ¬ni,1∨ ¬ni,2)that is not satisfied byt. By definition oft,pi,1, pi,2 ∈AÎand ni,1, ni,2∈AÎ. ThuscÎ_i ∈(∃p1.Au ∃p2.Au ∃n1.Au ∃n2.A)Î and we getf ∈CÎ,

which is a contradiction. ❏

The casesEL^∀r.C andEL^∃¬r.C can be treated similarly because a covering of the domain can be achieved by choosing the concepts∃r.>and∀r.Xin the case ofEL^∀r.C, and∃r.>and ∃¬r.>in the case of EL^∃¬r.C. In the case, EL^CtD, we use a TBox T⁰ :={V .

=X tY}. In all models ofT⁰, the extension ofV is covered by the con- ceptsX andY. Thus, we can use the above ABoxA_ϕ, addV(qi)for alli < m, and use the TBoxT :=T⁰∪ {A .

=X, A .

=Y}and the same query conceptCas above.

The caseEL^∃r⁺^.Cis quite similar. In all models of the TBoxT⁰:={V .

=∃r⁺.C}, the extension ofV is covered by the concepts∃r.C and∃r.∃r⁺.C. Thus, we can use the same ABox and query concept as forEL^CtD, together with the TBoxT :=T⁰∪ {A .

=

∃r.C, A .

=∃r.∃r⁺.C}.

Theorem 1. For the following, instance checking w.r.t. acyclic TBoxes is co-NP-hard regarding data complexity:EL^¬A,EL^∀r.⊥,EL^∀r.C,EL^∃¬r.C,EL^CtD,EL^∃r⁺^.C, and EL^(≤kr)for allk≥0.

ForEL^∀r.⊥,EL^∀r.C, andEL^CtD, co-NP-hardness of conjunctive query containment w.r.t. general TBoxes has been established in [8]. It seems likely that the proofs (which are not given in detail) actually apply to instance checking and acyclic TBoxes.

3.2 Cases that depend on the UNA

The results in the previous subsection are independent of whether or not the UNA is adopted. In the following, we consider some cases that depend on the (non-)UNA, starting withEL^{(≥k r)}.

InEL^{(≥k r)},k≥2, it does not seem possible to find two concepts that a priori cover the domain and can be used to represent truth values in truth assignments. However, if we add slightly more structure to the ABox, such concepts can be found. We treat only the casek= 3explicitly, but it easily generalizes to other values ofkas long ask≥2.

Consider the following auxiliary ABox, also shown in Figure 1.

A={r(a, b1), r(a, b2), r(a, b3), r(b1, b2), r(b2, b3), r(b1, b3)}.

Without the UNA, there are two cases for models ofA: either two ofb1,b2,b3identify the same domain element or they do not. In the first case,asatisfies∃r⁴.>, where∃r⁴ denotes the four-fold nesting of∃r. In the second case,asatisfies(≥ 3r). It follows that we can reduce satisfiability of 2+2 formulas using a reduction very similar to the one forEL^{(≤k r)}. The main differences are that (i) a copy ofAis plugged in for each qi, withareplaced byqiand (ii) we use the TBoxT :={A=. ∃r⁴.>, A= (≥. 3r)}.

(7)

a

r r r

b1 r r b3

b2

r

Fig. 1.Auxiliary ABoxAforEL^(≥³^r)without UNA.

Unlike the previous results, this lower bound clearly depends on the fact that the UNA is not adopted. If the UNA is adopted, we can prove the same result using a different auxiliary ABox. Again, we only treat the casek= 3, which easily generalizes.

Let

A⁰={r(a, b1), r(a, b2), V(a), A(b1), A(b2)}

and consider the TBox T⁰ = {V .

= ∃r.B}. In every modelI of A⁰ andT⁰, there is a d ∈ BÎ such that (aÎ, d) ∈ rÎ. We can distinguish two cases: if d = bi for somei ∈ {1,2}, thenasatisfies∃r.(AuB). Otherwise,asatisfies(≥3 r). We can now continue the reduction as in the previous cases. Start with the ABoxAϕfrom the reduction for EL^¬A, addV(qi)for alli < m and a copy ofA⁰ for eachqi, witha replaced byqi. Then use the TBoxT =T⁰∪ {A .

=∃r.(AuB), A .

= (≥3r)}and the original query conceptC. Observe that this reduction does not work without the UNA.

Theorem 2. ForEL^{(≥k r)}withk ≥2, instance checking w.r.t. acyclic TBoxes is co- NP-hard regarding data complexity, both with and without the UNA.

Another case that depends on the (non-)UNA isEL^kf withk≥2. We start with proving coNP-hardness provided that the UNA is not adopted. For the caseEL^1f, we will prove in Section 4 that instance checking (and even conjunctive query entailment) is tractable regarding data complexity, with or without the UNA. For simplicity, we only treat the caseEL^2fexplicitly. It is easy to generalize our argument to larger values ofk. Like in EL^(≥3r), inEL^2f it does not seem possible to find two concepts that cover the domain without providing additional structure via an ABox. Set

A⁰⁰={r(a, b1), r(a, b2), r(a, b3), r(b1, b2), A(b1), A(b2), B(b3)}}.

whereris 2-functional and thus at least two ofb1, b2, b3have to identify the same domain element. A graphical representation is given in Figure 2. Regarding models ofA⁰⁰, we can distinguish two cases: eitherb3is identified withb1orb2, thenasatisfies∃r.(Au B). Orb1andb2are identified, thenasatisfies∃r³.>, where∃r³denotes the three-fold nesting of∃r. It follows that we can reduce satisfiability of 2+2 formulas using a reduction very similar to that forEL^(≥3r)above. Observe that we do not need a TBox at all to make this work. We take the original ABoxAϕdefined forEL^¬A, add a copy of A⁰⁰for eachqiwithareplaced byqi, and replaceA(1)with{r(1, e), A(e), B(e)}and A(0) with{r(0, e0), r(e0, e1), r(e1, e2)}. Thus,1 satisfies ∃r.(AuB)(representing

(8)

a

r r r

b1 r b2 b3

A A B

Fig. 2.Auxiliary ABoxA⁰⁰forEL^2f without UNA.

true) and0satisfies∃r³.>(representing false). It remains to modify the query concept toC⁰ :=∃c.(∃p1.∃r³.> u ∃p2.∃r³.> u ∃n1.∃r.(AuB)u ∃n2.∃r.(AuB)).

With the UNA and without TBoxes, instance checking inEL^kf,k≥2is tractable regarding data complexity. The same holds for conjunctive query answering. In a nut- shell, a polytime algorithm is obtained by considering the input ABox as a (complete) description of an interpretation and then checking all possible matches of the conjunctive query. A special case that has to be taken into account are inconsistent ABoxes such as those containing{r(a, b1), r(a, b2), r(a, b3)}for a 2-functional rolerand with the bimutually distinct. Such inconsistencies are easily detected. If found, the algorithm returns “yes” because an inconsistent ABox entails every consequence.

If we add acyclic TBoxes, instance checking inEL^kf,k≥2, becomes co-NP-hard also with the UNA. We only treat the case k = 3, but our arguments generalize. As in the case ofEL^2f without UNA, we have to give additional structure to the ABox.

Consider the TBoxT⁰⁰={V =. ∃r.B}and the ABox

A⁰⁰⁰ ={V(a), r(a, b1), r(a, b2), r(a, b3), s(a, b1), s⁰(a, b2), s⁰(a, b3)}.

withra 3-functional role. Thenasatisfies∃r.Bin all modelsIofA⁰⁰⁰andT⁰⁰. Because of the UNA, we can distinguish two cases: eitherb1satisfiesB or one ofb2, b3 does.

In the first case,asatisfies∃s.Band in the second case, it satisfies∃s⁰.B. We can then continue the reduction as in the previous cases.

Theorem 3. ForEL^kf withk≥2, instance checking is – tractable w.r.t. the empty TBox and with UNA;

– co-NP-hard in the following cases: (i) w.r.t. the empty TBox and without UNA, and (ii) w.r.t. acyclic TBoxes and with UNA.

4 Upper Bound

The only remaining extensions ofELintroduced in Section 2 areEL^∃r⁻^.C andEL^1f. For both of them, instance checking w.r.t. general TBoxes is EXPTIME-complete regarding combined complexity [2]. In this section, we consider the union ELI^f of EL^∃r⁻^.CandEL^1f, i.e., the extension ofELwith both inverse roles and globally functional roles. It follows from the results on Horn-SHIQin [12] that instance checking inELI^f w.r.t. general TBoxes is tractable regarding data complexity. A direct proof

(9)

can be found in [14]. Here, we show that even conjunctive query answering inELI^fis tractable regarding data complexity.

Aninverse roleis an expressionr⁻ withr a role name. The interpretation of an inverse role is(r⁻)^I ={(e, d) |(d, e) ∈r^I}. InELI^f, roles and also their inverses can be declared functional using statements> v(≤1r)in the TBox. For conveniently dealing with inverse roles, we use the following convention: ifr =s⁻ (withsa role name), thenr⁻ denotess. Observe that w.l.o.g., we do not admit inverse roles in the ABox and the query.

As a preliminary, we assume that TBoxes are in a normal form, i.e., all concept inclusions are of one of the following forms, where A,A1,A2, and B are concept names or>andris a role name or an inverse role:

AvB, Av ∃r.B, > v(≤1r) A1uA2vB, ∃r.AvB

LetT be a TBox.T can be converted into normal formT⁰in polytime, by introducing additional concept names. See [2] for more details. Moreover, it is not too difficult to see that for every ABoxAand conjunctive queryqnot using any of the concept names that occur inT⁰but not inT, we haveA,T |=qiffA⁰,T⁰ |=q.

Two other (standard) assumptions that we make w.l.o.g. is that (i) in all atomsC(v) in a conjunctive queryq,C is a concept name; and (ii) conjunctive queries are connected, i.e., for all u, v ∈ Var(q), there are atomsr(u0, u1), . . . , r(un−1, un) ∈ q, n≥0, such thatu=u0andv =un. It is easy to achieve (i) by replacingC(v)with A(v)and addingA .

= Cto the TBox, with Aa fresh concept name. Regarding (ii), it is well-known that entailment of non-connected queries can easily (and polynomially) be reduced to entailment of connected queries: ifqis a non-connected query, then A,T |=qiffA,T |=q⁰for all connected componentsq⁰ofq; see e.g. [10].

Our algorithm for conjunctive query answering inELI^fis based on canonical models. To introduce canonical models, we need some preliminaries. LetT be a TBox and Γ a finite set of concept names. We useN^T

C to denote the set of all concept names occurring inT and “vT” to denote subsumption w.r.t.T, i.e.,CvT DiffC^I ⊆D^Ifor all modelsIofT. We write

sub_T(Γ) :={A∈N^T_C |

u

A⁰∈ΓA⁰vT A}

to denote theclosureofΓ under subsuming concept names w.r.t.T. For the next definition, the reader should intuitively assume that we want to make all elements ofΓ (jointly) true at a domain element in a model ofT. IfA∈Γ andAv ∃r.B∈ T, then we say thatΓ has∃r.B-obligationO, where

O={B} ∪ {B⁰ ∈N^T_C | ∃A⁰∈Γ :∃r⁻.A⁰vB⁰∈ T } ∪O⁰,

withO⁰ =∅if> v(≤1r)∈ T/ andO⁰ ={B⁰ ∈N^T_C | ∃A⁰ ∈Γ :A⁰v ∃r.B⁰ ∈ T } otherwise.

LetT be a TBox in normal form and Aan ABox, for which we want to decide conjunctive query entailment (for a yet unspecified queryq). We useInd(A)to denote

(10)

the set of individual names occurring inA. To define a canonical model forAandT, we have to require thatAisadmissiblew.r.t.T. What admissibility means depends on whether or not we make the UNA:Ais admissible w.r.t.T if (i) the UNA is made and Ais consistent w.r.t.T or (ii) the UNA is not made and(> v(≤1r))∈ T implies that there are noa, b, c∈Ind(A)withr(a, b), r(a, c)∈ Aandb 6=c. As will be discussed later, admissibility can be ensured by an easy (polytime) preprocessing step.

We define a sequence of interpretationsI0,I1, . . ., and the canonical model forA andT will then be the limit of this sequence. To facilitate the construction, it is helpful to use domain elements that have an internal structure. AnexistentialforT is a concept

∃r.Athat occurs on the right-hand side of some inclusion in T. Apathpfor T is a finite (possibly empty) sequence of existentials forT. We useex(T)to denote the set of all existentials forT,ex(T)^∗to denote the set of all paths forT, andεto denote the empty path. All interpretationsI_iin the above sequence will satisfy

∆^Iⁱ⊆ {ha, pi |a∈Ind(A)andp∈ex^∗(T)}

For convenience, we use a slightly non-standard representation of interpretations when defining the sequence I0,I1, . . . and canonical interpretations: the function·Î maps every elementd∈∆Îto a set of concept namesdÎinstead of every concept nameAto a set of elementsAÎ. It is obvious how to translate back and forth between the standard representation and this one, and we will switch freely in what follows.

To start the construction of the sequenceI0,I1, . . ., defineI0as follows:

∆^I⁰ :={ha, εi |a∈Ind(A)}

r^I⁰ :={(ha, εi,hb, εi)|r(a, b)∈ A}

ha, εi^I⁰ :={A∈N_C| A,T |=A(a)}

a^I⁰ :=ha, εi

Now assume thatIihas already been defined. We want to constructIi+1. If it exists, select aha, pi ∈∆Îⁱ and anα=∃r.A∈ex(T)such thatha, piÎⁱhasα-obligationO, and (i)(> v(≤1 r)) ∈ T/ andha, pαi∈/ ∆Îⁱ or (ii) there is nohb, p⁰i ∈ ∆Îⁱ with (ha, pi,hb, p⁰i)∈rÎⁱ. Then do the following:

– addha, pαito∆^Iⁱ;

– ifris a role name, add(ha, pi,ha, pαi)tor^Iⁱ; – ifr=s⁻, add(ha, pαi,ha, pi)tos^Iⁱ;

– setha, pαi^Iⁱ :=sub_T(O).

The resulting interpretation isIi+1 (andIi+1 = I_i if there are noha, piandαto be selected). We assume that the selectedha, piis such that the length ofpis minimal, and thus all obligations are eventually satisfied. To ensure that the constructed canonical model is unique, we also assume that the setex(T)is well-ordered and the selectedα is minimal for the nodeha, pi.

A proof of the following result can be found in the appendix.

Lemma 3. The canonical modelIforT andAis a model ofT and ofA.

(11)

Our aim is to prove that we can verify whetherAandT entail a conjunctive queryqby checking whether the canonical modelIforAandT matchesq. Key to this result is the observation that the canonical model ofAandT can be homomorphically embedded into any model ofAandT. We first define homomorphisms and then state the relevant lemma.

LetIandJ be interpretations. A functionh:∆^I→∆^J is ahomomorphismfrom ItoJ if the following holds:

1. for all individual namesa,h(a^I) =a^J;

2. for all concept namesAand alld∈∆^I,d∈A^Iimpliesh(d)∈A^J;

3. for all (maybe inverse) rolesrandd, e∈∆^I,(d, e)∈r^Iimplies(h(d), h(e))∈r^J. Lemma 4. LetI be the canonical model forAandT, andJ a model ofAandT. Then there is a homomorphismhfromItoJ.

Proof. LetI andJ be as in the lemma. For each interpretation Ii in the sequence I0,I1, . . . used to constructI, we define a homomorphismhifromIitoJ. The limit of the sequenceh0, h1, . . . is then the desired homomorphismhfromItoJ. To start, defineh0by settingh0(ha, εi) :=a^J for all individual namesa. Clearly,h0is a homomorphism:

– Condition 1 is satisfied by construction.

– For Condition 2, letha, εi ∈A^I⁰. ThenA,T |=A(a). SinceJ is a model ofAand T,h0(ha, εi) =a^J ∈A^J.

– For Condition 3, let(ha, εi,hb, εi)∈r^I⁰. Thenr(a, b)∈ Aand sinceJ is a model ofAand by definition ofh0, we have(h0(ha, εi), h0(hb, εi))∈r^J.

Now assume thathihas already been defined. IfIi+1=I_i, thenhi+1=hi. Otherwise, there is a unique ha, pαi ∈ ∆Îⁱ⁺¹\∆Îⁱ. Thenha, pi ∈ ∆Îⁱ, and ha, piÎⁱ hasα =

∃r.B-obligationOsuch thatha, pαi^Iⁱ⁺¹ =sub_T(O). LetA∈ ha, pi^Iⁱ such thatAv

∃r.B ∈ T. By Condition 2 of homomorphisms, we haved=hi(ha, pi)∈A^J. Since Av ∃r.B∈ T, there is ane∈B^J with(d, e)∈r^J. Definehi+1as the extension of hiwithhi+1(ha, pαi) :=e. We prove that the three conditions of homomorphisms are preserved:

– Condition 1 is untouched by the extension.

– Now for Condition 2. Sinceha, pαi^Iⁱ⁺¹=sub_T(O)andJ is a model ofT, it suffices to show that for allB⁰ ∈O, we havee∈B^0J. LetB⁰ ∈O. By definition ofO, we can distinguish three cases.

First, letB⁰=B. Then we are done by choice ofe.

Second, let there be anA⁰ ∈ ha, pi^Iⁱsuch that∃r⁻.A⁰vB⁰ ∈ T. Sincehisatisfies Condition 2 of homomorphisms, we haved ∈ A^0J. SinceJ is a model ofT and (d, e)∈r^J, it follows thate∈B^0J.

The third case is that> v (≤ 1r) ∈ T and there is an A⁰ ∈ ha, pi^Iⁱ such that A⁰ v ∃r.B⁰ ∈ T. It is similar to the previous case.

– Condition 3 was satisfied byI_iand is clearly preserved by the extension toIi+1. ❏

(12)

Lemma 5. LetIbe the canonical model forAandT, andqa conjunctive query. Then A,T |=qiffI |=q.

Proof. LetI andqbe as in the lemma, and n,m, andk as above. If I 6|= q, then A,T 6|= qsince, by Lemma 3,I is a model ofAandT. Now assumeI |=^π q, and letJ be a model ofAandT. By Lemma 4, there is a homomorphismhfromItoJ. Defineπ⁰:Var(q)→∆^J by settingπ⁰(v) :=h(π(v)). It is easily seen thatJ |=^π⁰ q.

❏ Thus, we can decide query entailment by looking only at the canonical model. At this point, we are faced with the problem that we cannot simply construct the canonical modelI and check whetherI |= qsinceI is infinite. However, we can show that if I |=q, thenI |=^π qfor some matchπthat maps all variables to elements that can be reached by travelling only a bounded number of role edges from some ABox individual.

Thus, it suffices to construct a sufficiently large “initial part” ofIand check whether it matchesq.

To make this formal, letnbe the size ofA,mthe size ofT, andkthe size ofq. In the following, we use|p|to denote the length of a pathp. Theinitial canonical model I⁰forAandT is obtained from the canonical modelIforAandT by setting

∆^I⁰ :={ha, pi | |p| ≤2^m+k}

AÎ⁰ :=AÎ∩∆Î⁰

rÎ⁰ :=rÎ∩(∆Î⁰ ×∆Î⁰) aÎ⁰ :=aÎ

Lemma 6. LetIbe the canonical model forAandT,I⁰the initial canonical model, andqa conjunctive query. ThenI |=qiffI⁰ |=q.

Proof.LetI,I⁰, andqbe as in the lemma. It is obvious thatI⁰|=qimpliesI |=q. For the converse direction, letI |=^π q. First assume that there is ana∈Ind(A)and av ∈ Var(q)such thatπ(v) =a^I. Sinceqis connected, this means that for allv ∈Var(q), we haveπ(v) =ha, pisuch that|p| ≤k. It follows thatI⁰|=^π q.

Now assume that there are no suchaandv. Again sinceqis connected, this means that there is ana ∈ Ind(A)such that for allv ∈ Var(q), we haveπ(v) = ha, pi, for somep∈ex^∗(T). Ifπ(v) =ha, piwith|p| ≤2^m+kfor allv∈Var(q), thenI⁰|=^πq.

Otherwise, there is av ∈ Var(q)such thatπ(v) = ha, piwithp∈ ex^∗(T)such that

|p| > 2^m+k. Since qis connected, this implies that for allv ∈ Var(q), we have π(v) =ha, pi, for somep∈ex^∗(T)with|p| >2^m. Once more sinceqis connected, there is a v0 ∈ Var(q) such that π(v0) = ha, p0i and for allv ∈ Var(q), we have π(v) =ha, piwithp0a prefix ofp.

Since|p0|>2^mand the number of distinct labelsdÎ,d∈∆Î, is bounded by2^m, we can splitp0intop1p2p3such thatha, p1iÎ =ha, p1p2iÎ, andp26=ε. Now, letπ⁰ : Var(q)→ ∆Î be obtained by settingπ⁰(v) := ha, p1p3piifπ(v) = ha, p1p2p3pi. In the full version of the proof given in the appendix, we show thatI |=^π⁰ q. Moreover, for eachv∈Var(q)withπ(v) =ha, piandπ⁰(v) =ha, p⁰i, we have that the length ofp⁰is strictly smaller than that ofp. It follows that we can repeat the described construction to

(13)

construct a new match from an existing one only a finite number of times. We ultimately end up with a π^∗ such that I |=^π^∗ q and for allv ∈ Var(q),π^∗(v) = ha, piwith

|p| ≤2^m+k. ❏

The initial canonical modelI⁰ for Aand T can be constructed in time polynomial in the size ofA. In particular, (i)I0can be constructed in polytime since, due to the results of [12, 14], instance checking inELI^f is tractable regarding data complexity;

(ii) obligations can be computed in polytime since subsumption inELI^f w.r.t. general TBoxes is decidable and the required checks are independent of the size ofA; (iii) the number of elements in the initial canonical model is bounded by`:=n·m²^m^+kand is thus independent of the size ofA.

Our algorithm for deciding entailment of a conjunctive queryqby a TBoxT in normal form and an ABoxAis as follows. If the UNA is made, we first check consistency ofAw.r.t.T using one of the polytime algorithms from [12, 14]. IfAis inconsistent w.r.t.T, we answer “yes”. If the UNA is not made, then we convertAinto an ABox A⁰that is admissible w.r.t.T, and continue working withA⁰. Obviously, the conversion can be done in time polynomial in the size ofAsimply by identifying ABox individ- uals. Both with and without UNA, at this point we have an ABox that is admissible w.r.t. T. The next step is to construct the initial canonical structure I⁰ forT andA, and then check matches ofqagainst this structure. The latter can be done in time polynomial in the size ofA: there are at most`^k (and thus polynomially many) mappings τ :Var(q)→∆^I⁰, and each of them can be checked for being a match in polynomial time. We thus obtain a time bound for our algorithm ofp(n^k·m^k·2^m^+k²), withp()a polynomial. This bound is clearly polynomial inn

Theorem 4. In ELI^f, conjunctive query answering w.r.t. general TBoxes is in P re- garding data complexity.

We conjecture that the time bound can be improved to O((n+ 2^m)^k)(only single- exponential inm) by a more refined approach to canonical models. Basically, the idea is to work with the filtration of the canonical model instead of with the initial part.

A matching lower bound can be taken from [8] (which relies on the presence of general TBoxes and already applies to the instance problem), and thus we obtain P- completeness.

5 Summary and Outlook

The results of our investigation are summarized in Table 2. In all cases the lower bounds apply to instance checking and the upper bounds to conjunctive query entailment. The co-NP upper bounds are a consequence of the results in [10]. When the UNA is not explicitly mentioned, the results hold both with and without UNA. We point out two interesting issues. First, for all of the considered extensions we were able to show tractability regarding data complexity if and only if the logic isconvex regarding in- stances, i.e.,A,T |= C(a)withC = D0t · · · tDn−1 impliesA,T |= Di(a)for somei < n. It would be interesting to capture this phenomenon in a general result. And second, it is interesting to point out that subtle differences such as the UNA or local

(14)

Extensions ofEL w.r.t. acyclic TBoxes w.r.t. general TBoxes

EL^¬A coNP-complete coNP-complete

EL^CtD coNP-complete coNP-complete

EL^∀r.⊥,EL^∀r.C coNP-complete coNP-complete

EL^(≤kr), k≥0 coNP-complete coNP-complete

EL^kf w/o UNA,k≥2 coNP-complete coNP-complete (even w/o TBox)

EL^kf,k≥2with UNA coNP-complete coNP-complete (in P w/o TBox)

EL^(≥kr),k≥2 coNP-complete coNP-complete

EL^∃¬r.C coNP-hard coNP-hard

EL^∃r∪s.C coNP-hard coNP-hard

EL^∃r⁺^.C coNP-hard coNP-hard

ELI^f in P P-complete

Table 2.Complexity of instance checking and conjunctive query entailment

versus global functionality (for the latter, seeEL^(≤1r)vs.ELI^f) can have an impact on tractability.

As future work, it would be interesting to extend our upper bound by including more operators from the tractable description logicEL⁺⁺as proposed in [2]. For a start, it is not hard to show that conjunctive query entailment in fullEL⁺⁺is undecidable due to the presence of role inclusionsr1◦ · · · ◦rn vs. In the following, we briefly sketch the proof, which is by reduction of the problem of deciding whether the intersection of two languages defined by given context-free grammarsGi = (Ni, T, Pi, Si),i∈ {1,2}, is empty. We assume w.l.o.g. that the set of non-terminalsN1andN2 are disjoint. Then define a TBox

T :={> v ∃r_a.> |a∈T} ∪ {r_A₁◦ · · · ◦rAn vrA|A→A1· · ·An ∈P1∪P2}.

It is not too difficult to see thatL(G1)∩L(G2)6=∅iff the conjunctive queryS1(u, v)∧ S2(u, v)is entailed by the ABox{>(a)}and TBoxT.

We have learned recently that the same undecidability result has been shown inde- pendently and in parallel in the workshop papers [17, 18]. For people interested in the complexity of conjunctive querying entailment in theELfamily of DLs, both papers are recommended reading. In particular, the algorithms for query answering presented there seem more suitable for implementation than the brute-force canonical model approach pursued in Section 4. We have also learned that our undecidability result is very similar to a number of undecidability results for subsumption in extensions ofELproved in [13].

AcknowledgementWe are grateful to Markus Kr¨otzsch and Meng Suntisrivaraporn for valuable comments on earlier versions of this paper.

(15)

References

1. A. Artale, D. Calvanese, R. Kontchakov, and M. Zakharyaschev. DL-Lite in the light of first-order logic. InProc. of the 22nd Conf. on AI (AAAI-07). AAAI Press, 2007

2. F. Baader, S. Brandt, and C. Lutz. Pushing theELenvelope. InProc. of the 19th Int. Joint Conf. on AI (IJCAI-05), pages 364–369. Morgan Kaufmann, 2005.

3. F. Baader, S. Brandt, and C. Lutz. Pushing theELenvelope. Submitted to a Journal. 2007 4. F. Baader, C. Lutz, and B. Suntisrivaraporn. Is tractable reasoning in extensions of the de-

scription logicELuseful in practice? InProc. of the 4th Int. WS on Methods for Modalities (M4M’05), 2005.

5. F. Baader, D. L. McGuiness, D. Nardi, and P. Patel-Schneider.The Description Logic Hand- book: Theory, implementation and applications. Cambridge University Press, 2003.

6. S. Brandt. Polynomial time reasoning in a description logic with existential restrictions, GCI axioms, and—what else? InProc. of the 16th European Conf. on AI (ECAI-2004), pages 298–302. IOS Press, 2004.

7. D. Calvanese, G. D. Giacomo, D. Lembo, M. Lenzerini, and R. Rosati. DL-lite: Tractable description logics for ontologies. InProc. of the 20th National Conf. on AI (AAAI’05), pages 602–607. AAAI Press, 2005.

8. D. Calvanese, G. D. Giacomo, D. Lembo, M. Lenzerini, and R. Rosati. Data complexity of query answering in description logics. InProc. of the 10th Int. Conf. on KR (KR’06). AAAI Press, 2006.

9. D. Calvanese, G. D. Giacomo, M. Lenzerini, R. Rosati, and G. Vetere. DL-lite: Practical reasoning for rich dls. InProc. of the 2004 Int. WS on DLs (DL2004), volume 104 ofCEUR Workshop Proceedings. CEUR-WS.org, 2004.

10. B. Glimm and I. Horrocks and C. Lutz and U. Sattler. Conjunctive Query Answering for the Description LogicSHIQ. InProc. of the 20th Int. Joint Conf. on AI (IJCAI-07). AAAI Press, 2007.

11. G. D. Giacomo and M. Lenzerini. Boosting the correspondence between description logics and propositional dynamic logics. InProc. of the 12th National Conf. on AI (AAAI’94).

Volume 1, pages 205–212. AAAI Press, 1994.

12. U. Hustadt, B. Motik, and U. Sattler. Data complexity of reasoning in very expressive description logics. In Proc. of the 19th Int. Joint Conf. on AI (IJCAI’05), pages 466–471.

Professional Book Center, 2005.

13. Y. Kazakov. Saturation-based decision procedures for extensions of the guarded fragment, PhD thesis, University of Saarland, 2005.

14. A. Krisnadhi. Data complexity of instance checking in theELfamily of description logics.

Master thesis, TU Dresden, Germany, 2007.

15. A. Krisnadhi and C. Lutz Data complexity of instance checking in theELfamily of description logics. Available from http://lat.inf.tu-dresden.de/∼clu/papers/

16. M. Kr¨otzsch, S. Rudolph, and P. Hitzler. On the complexity of horn description logics.

InProc. of the 2nd WS on OWL: Experiences and Directions, number 216 in CEUR-WS (http://ceur-ws.org/), 2006.

17. M. Kr¨otzsch and S. Rudolph. Conjunctive Queries forELwith Composition of Roles. In Proc. of the 2007 Int. WS on DLs (DL2007). CEUR-WS.org, 2007.

18. R. Rosati. On conjunctive query answering inEL. InProc. of the 2007 Int. WS on DLs (DL2007). CEUR-WS.org, 2007.

19. A. Schaerf. On the complexity of the instance checking problem in concept languages with existential quantification.Journal of Intelligent Information Systems, 2:265–278, 1993.

(16)

A Omitted Proofs

Lemma 3.The canonical modelIforT andAis a model ofT and ofA.

Proof.By definition ofI0andI, the canonical model is a model ofA. To show that it is also a model ofT, we make a case distinction according to the possible forms of concept inclusions inT:

– A v B andA1uA2 v B. Satisfied since for allha, pi ∈ ∆Î, we clearly have ha, piÎ=sub_T(ha, piÎ).

– Av ∃r.B. Letha, pi ∈A^I. This together withAv ∃r.B∈ T means thatha, pi^I⁰ hasα-obligationO, whereα=∃r.B. Clearly,B∈O. There are two cases.

? Ifp=ε, thenA∈ ha, εi^I⁰and thusA,T |=A(a). We distinguish three subcases.

For the first subcase, assume that(> v(≤1 r))∈ T/ . By construction, there is ani > 0 such that(ha, εi,ha, αi) ∈ rÎⁱ ⊆ rÎ andha, αiÎⁱ = sub_T(O). Since B∈O,ha, αi ∈BÎⁱ. It follows thatha, εi ∈(∃r.B)Î.

For the second subcase, assume that (> v (≤ 1 r)) ∈ T and there is ab ∈ Ind(A)such thatr(a, b)∈ A. ThenA,T |=B(b). By construction ofI0, we have (ha, εi,hb, εi)∈rÎ⁰ ⊆rÎandB∈ hb, εiÎ⁰. We thus obtainha, pi ∈(∃r.B)Îby definition ofIand the semantics.

For the third subcase, assume that(> v(≤1r))∈ T and there is nob∈Ind(A) such thatr(a, b)∈ A. By construction, there is aβ =∃r.B⁰ ∈ ex(T)such that ha, piÎ⁰hasβ-obligationO⁰and there is ani >0such that(ha, εi,ha, βi)∈rÎⁱ⊆ rÎandha, βiÎⁱ =sub_T(O⁰). Since(> v(≤1r))∈ T,O=O⁰. SinceB ∈O, ha, βi ∈BÎⁱ. It follows thatha, εi ∈(∃r.B)Î.

? Letp6=ε. Then there is ani >0such thatha, pi ∈ ∆^Iⁱ. Letibe minimal with this property. There are again three subcases.

First, assume that(> v(≤1r))∈ T/ . Then, there is aj > iwith(ha, pi,ha, pαi)∈ rÎ^j andha, pαiÎ=sub_T(O). SinceB∈O,ha, pi ∈(∃r.B)Î.

Second, assume that(> v(≤1 r))∈ T and there is nohb, p⁰i ∈∆Îⁱ such that (ha, pi,hb, p⁰i)∈ rÎⁱ. By construction, there is aβ = ∃r.B⁰ ∈ ex(T)such that ha, piÎⁱ hasβ-obligationO⁰and there is aj > isuch that(ha, pi,ha, pβi)∈rÎ^j andha, pβiÎ^j =sub_T(O⁰). Since(> v(≤1 r))∈ T,O =O⁰. SinceB ∈O, ha, pβi ∈BÎⁱ. It follows thatha, pi ∈(∃r.B)Î.

Last, assume that (> v (≤ 1 r)) ∈ T and there is ahb, p⁰i ∈ ∆^Iⁱ such that (ha, pi,hb, p⁰i)∈r^Iⁱ. By construction of the sequenceI0,I1, . . . and sincep6=ε, this can only be the case ifa=band

1. p=p⁰αor for someβ=∃r⁻.B⁰∈ex(T), or 2. p⁰ =pαfor someβ=∃r.B⁰∈ex(T).

First for Case 1. Thenha, p⁰i^Iⁱhasβ-obligationO⁰, andha, pi^Iⁱ =sub_T(O⁰). By definition of obligations,A∈sub_T(O⁰)implies that

u

X∈ha,p⁰i^IiX vT ∃r⁻.A.

Together with(> v(≤1r))∈ T andAv ∃r.B∈ T, we get

u

X∈ha,p⁰i^IiX vT

B. Sinceha, p⁰iÎⁱ =sub_T(ha, p⁰iÎⁱ), we thus haveB ∈ ha, p⁰iÎⁱ. By the semantics,ha, pi ∈(∃r.B)Î.

Now for Case 2. Thenha, pi^Iⁱ hasβ-obligation O⁰, and ha, p⁰i^Iⁱ = sub_T(O⁰).

Since(> v (≤ 1 r)) ∈ T andA v ∃r.B ∈ T, we haveB ∈ O. ThusB ∈ ha, p⁰i^Iⁱand,ha, pi ∈(∃r.B)^I.

(17)

– ∃r.A v B. Let ha, pi ∈ (∃r.A)Î. Then there is a hb, p⁰i ∈ AÎ and such that (ha, pi,hb, p⁰i)∈rÎ. We distinguish four cases.

? p=p⁰=ε. Thenr(a, b)∈ AandA,T |=A(b). Thus,A,T |=B(a)anda∈B^I by definition ofI0.

? p=ε,p⁰6=ε. By construction of the sequenceI0,I1, . . ., this impliesa=band p⁰ =α=∃r.B⁰ ∈ex(T). Also by construction,ha, εi^Ihas∃r.B⁰-obligationO, andha, αi^I = sub_T(O). SinceA ∈ sub_T(O), it follows that

u

X∈ha,εi^IX vT

∃r.A. Together with∃r.A v B ∈ T, we get

u

X∈ha,εi^IX vT B. Thus,B ∈ ha, εi^I.

? p6=ε,p⁰ 6=ε. There are two subcases. Ifp⁰ =pαfor someα=∃r.B⁰∈ex(T), then we can argue analogous to the previous case. Thus, we only consider the casep=p⁰αfor someα=∃r⁻.B⁰ ∈ ex(T). In this case,ha, p⁰iÎ has∃r⁻.B⁰- obligationO, andha, piÎ =sub_T(O). SinceA∈ ha, p⁰iÎand∃r.AvB,B∈O.

It follows thatha, pi ∈B^I.

? p6=ε,p⁰=ε. By construction of the sequenceI0,I1, . . ., this impliesa=band p=α=∃r⁻.B⁰ ∈ex(T). Also by construction,ha, εiÎhas∃r⁻.B⁰-obligation O, andha, αiÎ = sub_T(O). SinceA ∈ ha, εiÎ and∃r.A v B ∈ T, we have B∈O. Thus,ha, αi=ha, pi ∈BÎ.

– > v(≤1r). SinceAis admissible w.r.t.T, there are noa, b, c∈Ind(A)withb6=c such that for some role namer,r(a, b)andr(a, c)are inAand> v(≤1r)∈ T. It follows thatI0satisfies all> v(≤1 r)∈ T. This property is clearly preserved when constructingIiwithi >0, and thus it holds forI.

❏ Lemma 6. LetI be the canonical model forAandT,I⁰ the initial canonical model, andqa conjunctive query. ThenI |=qiffI⁰|=q.

Proof.(Full Version) Let I,I⁰, andq be as in the lemma. It is obvious thatI⁰ |= q impliesI |= q. For the converse direction, letI |=^π q. First assume that there is an a ∈Ind(A)and av ∈ Var(q)such thatπ(q) =a^I. Sinceqis connected, this means that for allv∈Var(q), we haveπ(v) =ha, pisuch that|p| ≤k. It follows thatI⁰|=^πq.

Now assume that there are no suchaandv. Again sinceqis connected, this means that there is ana ∈ Ind(A)such that for allv ∈ Var(q), we haveπ(v) = ha, pi, for somep∈ex^∗(T). Ifπ(v) =ha, piwith|p| ≤2^m+kfor allv∈Var(q), thenI⁰|=^πq.

Otherwise, there is av ∈ Var(q)such thatπ(v) = ha, piwithp∈ ex^∗(T)such that

|p| > 2^m+k. Since qis connected, this implies that for allv ∈ Var(q), we have π(v) =ha, pi, for somep∈ex^∗(T)with|p| >2^m. Once more sinceqis connected, there is a v0 ∈ Var(q) such that π(v0) = ha, p0i and for allv ∈ Var(q), we have π(v) =ha, piwithp0a prefix ofp.

Since|p0|>2^mand the number of distinct labelsdÎ,d∈∆Î, is bounded by2^m, we can splitp0 intop1p2p3 such thatha, p1iÎ = ha, p1p2iÎ, andp2 6= ε. Now, let π⁰ :Var(q)→∆Îbe obtained by settingπ⁰(v) :=ha, p1p3piifπ(v) =ha, p1p2p3pi.

We show the following: for allv∈Var(q), 1. π⁰(v)∈∆Îandπ(v)Î=π⁰(v)Î; 2. I |=^π⁰ q.

(18)

For Point 1, letπ(v) =ha, p1p2p3pi. Thenπ(v⁰) =ha, p1p3pi. We prove by induction on the length ofp⁰that for all prefixesp⁰ofp3p,

a) ha, p1p⁰i ∈∆Îand b) ha, p1p⁰iÎ =ha, p1p2p⁰iÎ.

Forp⁰ =ε, Point a) is true sinceha, p0i ∈∆^Iand by construction ofI,ha, p⁰⁰i ∈∆^I for all prefixesp⁰⁰ofp0, includingp⁰⁰=p1. Moreover, Point b) is true by choice ofp1

andp2.

Now assume that the claim has already been shown for p⁰, and let α ∈ ex(T) such thatp⁰αis a prefix ofp3p. Thenp1p2p⁰hasα-obligationOandha, p1p2p⁰αiÎ = sub_T(O). By induction hypothesis,ha, p1p⁰iÎ =ha, p1p2p⁰iÎ. It follows thatp1p⁰also hasα-obligationO(here we exploit the well-order onex(T). By construction ofI, we thus haveha, p1p⁰αi ∈ ∆Î andha, p1p⁰αiÎ =sub_T(O). The former proves Point a) and the latter Point b). This finishes the proof of Point 1.

For Point 2, letA(v)∈ q. By Point 1,I |=^π A(v)impliesI |=^π⁰ A(v). Now let r(u, v)∈q. Then(π(u), π(v))∈r^I. By construction ofI, this implies that one of the following holds:

1. π(u) =ha, p1p2p3piandπ(v) =ha, p1p2p3pαifor someα=∃r.B∈ex(T);

2. π(u) =ha, p1p2p3pαiandπ(v) =ha, p1p2p3pifor someα=∃r⁻.B∈ex(T).

In Case 1, we haveπ⁰(u) =ha, p1p3piandπ(v) =ha, p1p3pαi. Again by construction ofI, this means(π⁰(u), π⁰(v))∈r^I. Case 2 is analogous.

We have thus proved Point 2, i.e.I |=^π⁰ q. Moreover, for eachv ∈ Var(q)with π(v) = ha, piandπ⁰(v) = ha, p⁰i, we obviously have that the length ofp⁰ is strictly smaller than that ofp. It follows that we can repeat the described construction to construct a new match from an existing one only a finite number of times. We ultimately end up with a π^∗ such that I |=^π^∗ q and for allv ∈ Var(q),π^∗(v) = ha, piwith

|p| ≤2^m+k. ❏