The Price of Selfishness: Conjunctive Query Entailment for ALCSelf is 2EXPTIME-hard

(1)

The Price of Selfishness:

Conjunctive Query Entailment for ALC

_Self

is 2 E X P T I M E -hard

Bartosz Bednarczyk,

^1,2

Sebastian Rudolph

¹

1Computational Logic Group, Technische Universit¨at Dresden, Germany

2Institute of Computer Science, University of Wrocław, Poland {bartosz bednarczyk, sebastian.rudolph}@tu-dresden.de

Abstract

In logic-based knowledge representation, query answering has essentially replaced mere satisfiability checking as the infer- encing problem of primary interest. For knowledge bases in the basic description logicALC, the computational complexity of conjunctive query (CQ) answering is well known to be EX PTI M E-complete and hence not harder than satisfiability.

This does not change when the logic is extended by certain features (such as counting or role hierarchies), whereas adding others (inverses, nominals or transitivity together with role- hierarchies) turns CQ answering exponentially harder.

We contribute to this line of results by showing the surpris- ing fact that even extendingALCby just theSelfoperator – which proved innocuous in many other contexts – increases the complexity of CQ entailment to 2 EX PTI M E. As common for this type of problem, our proof establishes a reduction from alternating Turing machines running in exponential space, but several novel ideas and encoding tricks are required to make the approach work in that specific, restricted setting.

1 Introduction

Formal ontologies are of significant importance in artificial intelligence, playing a central role in the Semantic Web, ontology-based information integration, or peer-to-peer data management. In such scenarios, an especially prominent role is played bydescription logics(DLs) (Baader et al. 2017) – a robust family of logical formalisms used to describe ontologies and serving as the logical underpinning of contemporary standardised ontology languages. To put knowledge bases to full use as core part of intelligent information systems, much attention is being devoted to the area of ontology-based data- access, withconjunctive queries(CQs) being employed as a fundamental querying formalism (Ortiz and Simkus 2012).

In recent years, it has become apparent that various modelling features of DLs affect the complexity of CQ answering in a rather strong sense. Let us focus on the most popular DL,ALC. It was first shown in (Lutz 2008) that CQ entailment is exponentially harder than the consistency problem forALC extended with inverse roles (I). Shortly after, a combination of transitivity and role-hierarchies(SH)was shown as a culprit of higher worst-case complexity of reasoning (Eiter et al. 2009). Finally, also nominals(O)turned Copyright © 2022, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved.

out to be problematic (Ngo, Ortiz, and Simkus 2016). Nev- ertheless, there are also more benign DL constructs regard- ing the complexity of CQ entailment. Examples are counting (Q) (Lutz 2008) (the complexity stays the same even for expressive arithmetical constraints (Baader, Bednarczyk, and Rudolph 2020)), role-hierarchies alone(H)(Eiter, Or- tiz, and Simkus 2012) or even a tamed use of higher-arity relations (Bednarczyk 2021a).

Our results. We study CQ entailment inALCSelf, an exten- sion ofALCwith theSelfoperator,i.e. a modelling feature that allows us to specify the situation when an element is related to itselfby a binary relationship. Among other things, this allows us to formalise the concept of a “narcissist”:

Narcissistv ∃loves.Self or to express that no person is their own parent:

Personv ¬∃hasParent.Self.

The Self operator is supported by the OWL 2 Web Ontol- ogy Language and the DL SROIQ(Horrocks, Kutz, and Sattler 2006). Due to the simplicity of theSelf operator (it only refers to one element), it is easy to accommodate for automata techniques (Calvanese, Eiter, and Ortiz 2009) or consequence-based methods (Ortiz, Rudolph, and Simkus 2010) and thus, so far, there has been no real indication that the added expressivity provided by Self may change any- thing, complexity-wise. Arguably, this impression is further corroborated by the observation thatSelffeatures in two pro- files of OWL 2 (the EL and the RL profile), again without harming tractability (Kr¨otzsch, Rudolph, and Hitzler 2008).

In this work, however, we show a rather counter-intuitive result, namely that CQ entailment forALCSelf is exponentially harder than forALC. Hence, it places the seemingly innocuous Self operator among the “malign” modelling features, like(I),(SH)or(O). Moreover, this establishes 2 EX PTI M E-hardness of query entailment for the Z family (a.k.a. ALCHb_reg^Self) of DLs (Calvanese, Eiter, and Or- tiz 2009), which until now remained open as well as the 2 EX PTI M E-hardness of querying the forward guarded fragment (Bednarczyk 2021a) with equality.

Our proof goes via encoding of computation trees of alternating Turing machines working in exponential space and follows the general hardness-proof-scheme by Lutz 2008.

(2)

However, to adjust the schema toALCSelf, novel ideas are required: the ability to speak about self-loops is exploited to produce a single query that traverses trees in a root-to- leaf manner and to simulate disjunction inside CQs, useful to express that certain paths are repeated inside the tree.

For space reasons, we will argue model-theoretically while refraining from presenting the axiomatisations in the main paper (they follow mostly standard ideas), but they can be found in our arXiV report (Bednarczyk and Rudolph 2021).

2 Preliminaries

We recall the basics on description logics (DLs) (Baader et al.

2017) and query answering (Ortiz and Simkus 2012).

DLs. We fix countably-infinite pairwise disjoint sets ofin- dividual namesNI,concept namesNC, androle namesNR

and introduce the description logicALCSelf. Starting from NCandNR, the setCofALCSelfconceptsis built using the following concept constructors:negation(¬C),conjunction (CuD),existential restriction(∃r.C), thetop concept(>),

andSelf concepts(∃r.Self), with the grammar:

C,D ::= > | A | ¬C | CuD | ∃r.C | ∃r.Self, whereC,D∈C,A∈NC, andr ∈NR. We often employ disjunctionCtD := ¬(¬Cu ¬D), universal restrictions

∀r.C :=¬∃r.¬C, bottom⊥:=¬>, and the less commonly used “inline-implication”C→D :=¬CtD.

Assertionsare of the formC(a)orr(a,b)fora,b∈N_I, C∈C, andr ∈NR. Ageneral concept inclusion(GCI) has the formCvDfor conceptsC,D∈C. We useC≡Das a shorthand for the two GCIsCvDandDvC. Aknowledge base(KB)K= (A,T)is composed of a finite non-empty set A(ABox) of assertions and a finite non-empty setT (TBox) of GCIs. We call the elements ofA ∪ T axioms.

Name Syntax Semantics

top concept > ∆^I

concept name A A^I⊆∆^I

role name r rÎ⊆∆Î×∆Î

conc. negation ¬C ∆Î\CÎ conc. intersection CuD CÎ∩DÎ

exist. restriction ∃r.C {d| ∃e.(d,e)∈rÎ∧e∈CÎ} Self concept ∃r.Self {d|(d,d)∈rÎ}

Table 1: Concepts and roles inALCSelf.

Axiomα I |=α, if

CvD CÎ ⊆DÎ TBoxT C(a) aÎ∈CÎ ABoxA r(a,b) (aÎ,bÎ)∈rÎ

Table 2: Axioms inALCSelf.

The semantics ofALC_Self is defined via interpretations I = (∆Î,·Î)composed of a non-empty set∆Î called the domain ofI, and aninterpretation function·Îmapping individual names to elements of∆Î, concept names to subsets

of∆Î, and role names to subsets of∆Î×∆Î. This mapping is extended to concepts (see Table 1) and finally used to definesatisfactionof assertions and GCIs (see Table 2). We say that an interpretationIsatisfiesa KBK = (A,T)(or Iis amodelofK, written:I |=K) if it satisfies all axioms ofA ∪ T. A KB isconsistent(orsatisfiable) if it has a model, andinconsistent(orunsatisfiable) otherwise.

Ahomomorphismh: I → J is a concept-name and role- name-preserving function that maps every element of∆Îto some element from∆^J,i.e. we have thatd ∈ AÎ implies thath(d)∈A^J and(d,e)∈rÎimplies(h(d),h(e))∈r^J for all role/concept namesr ∈NR,A∈NCandd,e∈∆Î. Queries. Booleanconjunctive queries(CQs) are conjunc- tions ofatoms of the form r(x,y) orA(z), where r is a role name,A is a concept name, and x,y,z are variables from a countably infinite setNV. Given a CQ q, we denote with|q|the number of its atoms, and withVar(q)the set of all variables. LetI be an interpretation,q a CQ and π : Var(q) → ∆Î be a variable assignment. We write I |=π r(x,y) if (π(x), π(y)) ∈ rÎ, and I |=π A(z) if π(z)∈AÎ. We say thatπis amatchforIandqifI |=πα holds for every atomα∈q, and thatIsatisfiesq(denoted with:I |= q) wheneverI |=π q for some matchπ. The definitions are lifted to KBs:qisentailedby a KBK(written:K |= q) if every modelI ofK satisfiesq. We stress here that satisfaction of conjunctive queries is preserved by homomorphisms,i.e. ifI |=qand there is a homomorphism fromh : I → J thenJ |=q. WhenI |= KbutI 6|= q, we callIacountermodelforKandq. Thequery entailment problemasks ifK |=qholds for an input KBKand a CQq.

Whenever convenient, we employ thepath syntaxof CQs to write queries in a concise way. By apath expressionwe mean an expression of the form

(A0?;r1; A1?;r2; A2?;. . .; A_n−1?;rn; An?)(x0,xn) with allr_i∈N_R,A_i∈N_C∪{>}, serving as a shorthand for

n

^

i=0

Ai(xi)∧

n

^

i=1

ri(x_i−1,xi).

WheneverAihappens to be>, it will be removed from the expression; this does not create ambiguities. Note that path CQs are just syntactic sugar and should not be mistakene.g.

with regular path queries.

2.1 Alternating Turing Machines

We next fix the notation of alternating Turing machines over a binary alphabet{0,1}working in exponential space (simply: ATMs). An ATM is defined as a tuple M = (N,Q,Q_∃,sI,sA,sR,T), whereQis a finite set ofstates(usu-

ally denoted withs); Q∃ ⊆ Qis a set of existentialstates;

sI,sA,sR ∈ Qare, respectively, pairwise differentinitial, accepting, andrejectingstates; we assume thatsI ∈(Q\Q∃).

T⊆(Q×{0,1})×({0,1}×Q×{−1,+1})is thetransition relation; and the natural numberN(encoded in unary) is a parameter governing the size of the working tape. We call the states fromQ_∀:=Q\Q_∃universal. The size ofM, denoted with|M|, isN+|Q|+|Q∃|+ 3 +|T|.

(3)

Aconfiguration ofMis a wordwsw⁰ ∈ {0,1}^∗Q{0,1}^∗ with|ww⁰|= 2^N. We callwsw⁰(i) existential (resp. universal) ifsis existential (resp. universal), (ii) final ifsis eithersA

orsR(iii) non-final if it is not final (iv) accepting ifs=sA. Successor configurations are defined in terms of the transition relationT. Fora,b,c,d ∈ {0,1}andv,v⁰,w,w⁰ ∈ {0,1}^∗ with|v| =|w|, we letwbs⁰w⁰be aquasi-successor configuration ofvsav⁰ whenever(s,a,b,s⁰,+1) ∈ T, and we letws⁰dbw⁰be a quasi-successor configuration ofvcsav⁰ whenever(s,a,b,s⁰,−1)∈T. If additionally we meet the re- quirementw=v,w⁰=v⁰, andc=dwe speak ofsuccessor configurations.¹

Without loss of generality, we make the following additional assumptions aboutM: First, for each non-final (i.e.

non-accepting and non-rejecting) state s and every letter a∈ {0,1}the setT(s,a) :={(s,a,b,s⁰, d)∈T}contains exactly two elements, denotedT1(s,a)andT2(s,a). Hence, every configuration has exactly two successor configurations.

Second, for any(s,a,b,s⁰, d) ∈ T, ifsis existential then s⁰ is universal and vice versa. Third, the machine reaches a final state no later than after2²^N steps (for configuration sequences). Fourth and last,Mnever attempts to move left (resp. right) on the left-most (resp. right-most) tape cell.

ArunofMis a finite tree, with nodes labelled by configurations ofM, that satisfies all the conditions below:

• the root is labelled with the initial configurationsI0²^N,

• each node labelled with a non-final existential configura- tionwsw⁰has a single child node which is labelled with one of the successor configurations ofwsw⁰,

• each node labelled with a non-final universal configura- tionwsw⁰has two child nodes which are labelled with the two successor configurations (wrt.T1andT2) ofwsw⁰,

• no node labelled with a final configuration has successors.

Quasi-runsofMare defined analogously by replacing the notions of successors with quasi-successors. Note that every run is also a quasi-run but not vice versa.

An ATM M is (quasi-)accepting if it has an accepting (quasi)-run, i.e. one whose all leaves are labelled by accepting configurations. By (Chandra and Stockmeyer 1976) the problem of checking if a given ATM is accepting is 2 EX PTI M E-hard.

3 A High-Level Overview of the Encoding

LetMbe an ATM. The core contribution of our paper is to present a polynomial-time reduction that, givenM, constructs a pair(K_M,q_M)— composed of anALC_Self knowledge base and a conjunctive query — such thatKM6|=qM

iffMis accepting. Intuitively, the models ofKwill encode accepting quasi-runs ofM,i.e. trees in which every node is a meaningful configuration ofM, but the tape contents of consecutive configurations might not be in sync as they should. The queryq_Mwill be responsible for detecting such errors. Hence, the existence of a countermodel forK_Mand

1In words, this corresponds to the common definition of successor configurations, while for quasi-successor configurations, untouched tape cells may change arbitrarily during the transition.

qMwill coincide with the existence of an accepting run of M. The intended models ofK_Mlook as follows:

The depicted triangles are called theconfiguration trees and encode configurations ofM. The information contained in these configuration trees is “superimposed” on identical configuration units: full binary trees of heightN+1decorated with many self-loops²that will provide the “navigational in- frastructure” for the queryq_Mto detect “tape mismatches”.

Every such tree has2^Nnodes at its N-th level and each of these nodes represents a single tape cell of a machine. The (N+1)-th level of nodes will serve a technical purpose that will be explained later. Lastly, the roots of configuration units store all remaining necessary information required for encoding: the current state ofM, the previous and the current head position as well as the transition used to arrive at this node from the previous configuration. Finally, the roots of configuration trees are interconnected by the rolenext indicating that(r,r⁰)∈ next^I holds iff the configuration represented byr⁰is a quasi-successor of the configuration ofr.

4 Configuration Units

In our encoding, a vital role is played by n-configuration units, which will later form the backbone of configuration trees. Roughly speaking, eachn-configuration unit is a full binary tree of depthn, decorated with certain concepts, roles, and self-loops. We introduce configuration units by providing the formal definition, followed by a graphical depiction and an intuitive description. In order to represent configuration units inside interpretations, we employ role names fromRunitas well as concept names fromCunit:

R_unit:={`i,ri,next|1≤i≤n}

C_unit:={Lvl0,Lvl_i,L,R,Ad⁰_i,Ad¹_i |1≤i≤n}.

Definition 4.1(configuration unit). Given a numbern, an n-configuration unitUis an interpretation(∆^U,·^U)with

∆^U ={0,1}^≤n :={w∈ {0,1}^∗| |w| ≤n},

`_iÛ ={(w, w0)| |w|=i−1} ∪ {(w, w)|w∈∆Û}, r_iÛ ={(w, w1)| |w|=i−1} ∪ {(w, w)|w∈∆Û}, nextÛ ={(w, w)| |w|=n},

Lvl^U_i ={w∈∆^U| |w|=i},

LÛ\{ε}={w0∈∆Û} and RÛ= ∆Û\LÛ, (Ad^b_i)Û ={w∈∆Û| |w| ≥iand its i-th letter isb}.

2The concrete purpose of the abundant presence of self-loops will only become clear later, starting from Corollary 4.4.

(4)

The following drawing depicts a2-configuration unit.

As one can see, the nodes in the tree are layered into levels according to their distance from the root. Nodes at thei-th level are members of theLvl_iconcept and their distance from the root is equal toi. Next, each non-leaf node at thei-th level has two children, the left one and the right one (satisfying, respectively, the conceptsLandR) and is connected to them via the role`_i andr_i, respectively. All nodes are equipped with`i- andri-self-loops and all leaves are additionally en- dowed withnext-loops. With all nodes inside the tree, we naturally associate their addresses,i.e. their “numbers” when nodes from thei-th level are enumerated from left to right.

In order to encode the address of a given node at thei-th level, we employ conceptsAd^b₁,Ad^b₂, . . . ,Ad^b_iwith “values”

beither0or1, meaning that a node is inAd^b_jiff thej-th bit of its address is equal tob. The most significant bit isAd^b₁.

It is routine to axiomatisen-configuration units (cf. technical report). The provided axiomatisation is made formally precise by the following lemma:

Lemma 4.2. There is anALCSelf-KBKⁿ_unit such that each n-configuration unit is a model ofKⁿ_unit. For any modelIof K_unitⁿ and anyd ∈Lvl^I₀ there is ann-configuration unit U and a homomorphismhfromUintoIwithh(ε) = d.

At this point, we would like to give the reader some intu- itions why units are decorated with different self-loops. First, we show that their presence can be exploited to navigate top- down through a given unit.

Lemma 4.3. LetUbe ann-configuration unit. Then for all w∈∆^Uwe have(ε, w)∈`1U◦r1U◦. . .◦`nU◦rnUwith “◦”

denoting the composition of relations,i.e.sÛ◦tÛ:={(c,e)| (c,d)∈sÛand(d,e)∈tÛfor somed}.

Sketch. For simplicity we uses_iÛas an abbreviation of`1U◦ r₁Û◦. . .◦`_iÛ ◦r_iÛ. The proof is by induction, where the assumption is that for all1≤i≤nwe have that all words wof length at mostisatisfy(ε, w)∈s_iÛ.

As a corollary of Lemma 4.3, we conclude that there is a singleCQ detecting root-leaf pairs in units.

Corollary 4.4. LetU be ann-configuration unit. There is a single conjunctive queryq_rlwithx₀,x_2n∈Var(q_rl)such that the setM ={(π(x0), π(x2n)) | U |=π qrl}is equal to the set of root-leaf pairs fromU,i.e.Lvl^U₀ ×Lvl^U_n.

Proof. Takeq_rl:= (Lvl₀?;`₁;r₁;. . .;`_n;r_n; Lvl_n?)(x₀,x_2n).

The correctness follows from Lemma 4.3.

5 From Units to Configuration Trees

In the next step, we enrich (N+1)-configuration units with additional concepts, allowing the units to represent a meaningful configuration of our ATM M = (N,Q,Q∃,sI,sA,sR,T). To this end, we employ a variety of new concept names from Cconf consisting of HdHere,NoHdHere,St_s,HdPos^b_i,HdLet_a,Let_a,0,1, wheres∈Q, b∈ {0,1}, i∈ {1, . . . ,N},a∈ {0,1}.

Before turning to a formal definition we first describe how configurations are structurally represented in models. Recall that a configuration ofMis a wordwsw⁰ with|ww⁰| = 2^N (calledtape) ands∈Q. In our encoding, this configuration will be represented by an(N+1)-configuration unitCdeco- rated by concepts fromC_conf. The interpretationCstores the states, by associating the state conceptSt_sto its root. The tape contentww⁰ is represented by the internal nodes ofC:

thei-th letter ofww⁰(i.e. the content of the ATM’si-th tape cell) is represented by thei-th node (according to their binary addresses) at theN-th level. In case this letter is0, the corresponding node will be assigned the conceptLet₀, while1is represented byLet₁. Yet, for reasons that will become clear only later, the tape cells’ content is additionally represented in another way: if it is0, then we label thei-th node’s left child with0and its right child with1. The reverse situation is implemented when node represents the letter1. Finally, there is a unique tape cell that is visited by the head ofM and the node corresponding to this cell is explicitly marked by the conceptHdHerewhile all other “tape cell nodes” are marked byNoHdHere. In order to implement this marking correctly, the head’s position’s address needs to be explicitly recorded. Consequently,C’s root node stores this address (bi- narily encoded using theHdPos^b_i concepts) and from there, these concept assignments are broadcast to and stored in all tape cell nodes on theN-th level. Similarly, we decorateC’s root with the conceptHdLet_ameaning that the current letter scanned by the head isa.

After this informal description and depiction, the formal definition of configuration trees should be plausible.

Definition 5.1(configuration tree). Aconfiguration treeC ofMis an interpretationC = (∆^C,·^C)such thatC is an (N+1)-configuration unit additionally satisfying:

• There exists a unique states∈Qsuch that(St_s)^C ={ε}

and(St_s⁰)^C =∅for alls⁰ 6=s.

• (LvlN+1)^C =0^C∪1^C and 0^C∩1^C =∅.

(5)

• (Let₀)^C ={w| w0∈0^C, w1 ∈1^C},(Let₁)^C ={w | w0∈1^C, w1∈0^C}, and(Let₀)^C∪(Let₁)^C = Lvl^C_N.

• There is a unique word whead of length N witnessing HdHere^C={w_head}andNoHdHere^C=Lvl^C_N\ {w_head}.

• For1≤i≤Nandb∈ {0,1}s.t.w_head∈(Ad^b_i)^C we put (HdPos^b_i)^C = Lvl^C₀∪Lvl^C_N and(HdPos^1−b_i )^C =∅.

• HdLet^C_a = {ε} andHdLet^C₁_−a = ∅, where a is the unique letter from{0,1}such thatw_head∈Let^C_a.

The axiomatisation K_conf of configuration trees can be found in the technical report, together with the proof of:

Lemma 5.2. Any configuration treeC is a model ofK_conf. For anyI |= Kconfandd ∈ Lvl^I₀ there is a configuration treeCand a homomorphismhfromCintoIwithh(ε) = d.

6 Enriching Configuration Trees

Recall that the purpose of configuration trees is to place them inside a model that describes the run of the Turing machine M. In particular, this will require to describe the progression of one configuration to another. In order to prepare for that, we next introduce an extended version of configuration trees that are enriched by additional information pertaining to their predecessor configuration in a run. To this end, we use new concept names fromC_enr:={PTrns_t,Init,PHdHere, NoPHdHere,PHdAbv,NoPHdAbv,PHdPos^b_i,PHdLet_a}, witht∈T,1≤i≤N,b∈ {0,1}, anda∈ {0,1}. We use C_ptrto denote the set{Init,PTrns_t|t∈T}.

The conceptPTrns_t, assigned to the root, indicates the transition, through which the configuration has been reached from the previous configuration, while Init is used as its replacement for the initial configuration. In addition, con- ceptsPHdPos^b_i andPHdLet_aare attached to the root in order to — in a way very similar toHdPos^b_i andHdLet_a— indicate the previous configuration’s head position as well as the letter stored in that position on the current configuration’s tape. For the sake of our encoding we also employ the conceptsPHdHere,NoPHdHerethat play the role analogous to the HdHere and NoHdHere concept from configuration-trees. For technical reasons, we also introduce the conceptsPHdAbvandNoPHdAbvthat will label nodes on the(N+1)-th level iff their parent is labelled with the corresponding concept from{PHdHere,NoPHdHere}.

Definition 6.1 (enriched configuration tree). An enriched configuration treeEofMis an interpretationE= (∆^E,·^E) such thatEis a configuration tree additionally satisfying the following conditions on concepts fromC_enr:

• There is exactly one conceptC∈ C_ptr for whichC^E = {ε}and for allC⁰∈Cptr\ {C}we have(C⁰)^E =∅.

• PTrns^E_(s,a,b,s0,d) = {ε} implies(St_s⁰)^E = {ε} for all transitions(s,a,b,s⁰, d)∈T.

• PHdHereÊ={wphd}andNoPHdHereÊ=LvlÊ_N \ {wphd} for theN-digit binary wordwphdencoding

– the number obtained asw_head−d(see: Definition 5.1) wheneverPTrns^E_(s,a,b,s0,d)={ε}, or

– the number0in caseInit^E ={ε}.

• PHdAbvÊ = {w0, w1 | w ∈ PHdHereÊ} and NoPHdAbvÊ = LvlÊ_N+1\PHdAbvÊ.

• (PHdPos^b_i)Ê = LvlÊ₀∪LvlÊ_N and (PHdPos^1−b_i )Ê = ∅ for all1≤i≤Nand0≤b≤1withw_phd∈(Ad^b_i)Ê.

• PHdLetÊ_a ={ε}andPHdLetÊ₁_−a =∅, whereais the unique letter from{0,1}such thatw_phd∈LetÊ_a.

• InitÊ={ε}impliesε∈LÊ,StÊ_s_I={ε},LetÊ₀= LvlÊ_N, and HdPos⁰_i = PHdPos⁰_i = LvlÊ₀∪LvlÊ_N for all1≤i≤N.

The corresponding axiomatisationKenras well as the proof of the following lemma can be found in the technical report.

Lemma 6.2. Any enriched configuration tree ofEis a model ofKenr. For any modelI ofKenrand anyd ∈ Lvl^I₀, there is an enriched configuration treeE and a homomorphismh fromEintoIwithh(ε) = d.

7 Describing Accepting Quasi-Runs

Recall that a quasi-runRofMis simply a tree labelled with configurations ofMwhere the root is labelled with the initial configurationsI0²^N. Each node representing an existential configuration has one child labelled with a quasi-successor configuration, while each node representing a universal configuration has two children labelled by quasi-successor configurations obtained via different transitions.

In order to represent an accepting quasi-run by a model, we employ the notion of aquasi-computation treeQ, a structure intuitively defined from someR as follows: replace every node ofRby its corresponding configuration tree, adequately enriched with information about its generating transition and the predecessor configuration. The roots of these enriched configuration trees are linked via thenextrole to express the quasi-succession relation ofR. The roots of enriched configuration trees representing universal configurations are chosen to be labelled withL, their leftnext-child withLand their rightnext-child withR (both corresponding to existential configurations). As expected, theInitconcept decorates the root of the distinguished enriched configuration tree that rep- resentsR’s initial configuration. As our attention is restricted toacceptingquasi-runsR, we require that no enriched configuration tree occurring in Qcarries a rejecting state. We now give a formal definition of such a structureQ.

Definition 7.1(quasi-computation tree). Aquasi-computation treeQofMis an interpretationQ= (∆^Q,·^Q)satisfying the following properties:

• ∆^Q:=T×{0,1}^≤N+1, whereTis³a prefix-closed subset of{10,00}^∗· {ε,0,1}withw1∈Timplyingw0∈T.

• For everyw∈T, the substructure ofQinduced by{w}×

{0,1}^≤N+1 is isomorphic to an enriched configuration tree ofMvia the isomorphism(w, w)7→w.

• (ε,w)∈R^Qifwends with1, otherwise(ε,w)∈L^Q.

• For anyw6=w⁰and arbitraryw, w⁰ ∈ {0,1}^≤N+1holds ((w, w),(w⁰, w⁰))∈/ s^Qfor anys∈R_unit\ {next}.

3Tis just a binary tree in which nodes at thei-th level have exactly2children ifiis even and exactly one child otherwise.

(6)

• next^Q\ {(d,d) | ∆^Q×∆^Q} = {((w, ε),(wb, ε)) | wb,w∈T,b∈ {0,1}}.

• Init^Q={(ε, ε)}.

• For anyw0∈Twith(w, ε)∈St^Q_s and(w, ε)∈Let^Q_a – ifw1∈Tthen(w0, ε)∈PTrns^Q_T

1(s,a)and(w1, ε)∈ PTrns^Q_T

2(s,a),

– ifw1 ∈/ Tthen(w0, ε)∈ PTrns^Q_T

1(s,a)or(w0, ε) ∈ PTrns^Q_T

2(s,a).

• If (w, w) ∈ HdHere^Q and wb ∈ T then (wb, w) ∈ PHdHere^Q.

• St^Q_s_R=∅as well as(w, ε)∈St^Q_s_Aiffw∈Tandw06∈T.

LetTMbe the set of all GCIs presented so far (plus the additional ones used to axiomatise quasi-computations) and letAMbe an ABox composed of a single axiomInit(a)for a fresh individual namea. PutK_M:= (A_M,T_M).

Lemma 7.2. Any accepting quasi-computation treeQofM is a model ofK_M. For any modelI ofK_Mthere exists an accepting quasi-computation treeQand a homomorphism h:Q → Iwithh(ε, ε) =a^I.

8 Detecting Faulty Runs with a Single CQ

We finally have reached the point where querying comes into play. Our last goal is to design one single conjunctive query that detects “faulty configuration progressions” in quasi-computation trees, meaning that it matches a pair of two positions in consecutive configuration trees representing the same cell and being untouched by the head ofMyet storing different letters. Note that the lack of such cells in a quasi-computation tree means that any two consecutive configuration trees represent not only quasi-successor configuration but actually proper successors and hence the structure as such even represents a “proper” run. We start by formalising our requirements to such a query:

Lemma 8.1. There exists a CQq_Mof size polynomial inN with two distinguished variablesx,ysuch that for all quasi- computation treesQwe have Q |=π qM iff there exists a wordw, a letterband a wordwof lengthN+1such that:

• π(x) = (w, w),π(y) = (wb, w),

• π(y)∈NoPHdAbv^Q,

• π(x)∈0^Qandπ(y)∈1^Q.

Note the asymmetry in the 3rd bullet point above – we ignore the reverse constellation. Yet, due to our encoding if the reverse situation occurs then so does the original one.

Hence, every mismatch in a sense causes two inconsistencies from the point ofN+1-level nodes. This solves the mystery of introducing levelN+1in our configuration trees and the particular encoding of tape symbols: it is crucial for catching faulty progressions by using one single CQ. Before prov- ing Lemma 8.1 we show how it implies our main theorem:

Theorem 8.2. Conjunctive query entailment overALCSelf

knowledge bases is2 EX PTI M E-hard.

Proof. It suffices to show that CQ non-entailment over ALCSelf KBs is 2 EX PTI M E-hard. Take KM as defined

in Section 7 andqM as given by Lemma 8.1. We claim that K_M 6|= q_M iff M is accepting. The “if” direction is easy: we take an accepting run of M and turn it into quasi-computation treeQ. By Lemma 7.2 we conclude that Q |= K_M. We also have thatQ 6|= q due to the fact that any two consecutive configuration trees represent proper successor configurations. For the second direction it suffices to show that ifMis not accepting thenKM |= qM. Indeed, assume thatMis not accepting and letIbe a model ofK_M. By Lemma 7.2 there is a quasi-computation treeQ and a homomorphismh : Q → I withh(ε, ε) = a^I. But this quasi-computation tree must represent a “faulty” run – in the opposite case it would correspond to an accepting run ofM, which does not exist by assumption. Hence there must be a match ofqMtoQ. As query matches are preserved under homomorphisms, we concludeI |=q_M. Thus all modelsI ofKMhave matches ofqM, which impliesKM|=qM.

In the forthcoming query definitions, we employ a convenient naming scheme. By writingq[x,y]we indicate that the variablesx,y ∈Var(q)areglobal(i.e. the same across (sub)queries that we might join together) while its other variables are local (i.e. pairwise different from any variables occurring in other queries — this can always be enforced by renaming). Going back to the query, we proceed as follows.

We first prepare a queryqmain[x,y]with two global distinguished variablesx,y that relates any two domain elements whenever they are leaf nodes of consecutive computation trees. Thenq_main[x,y]is combined with queriesq_adrⁱ [x,y]for all1 ≤ i ≤ N+1with the intended meaning thatx andy have the samei-th bit of their addresses. Additionally, our final query will require thatxbe mapped to a node satisfying 0andy to a node satisfying1andNoPHdHere.

To constructqmain[x,y]we essentially employ Lemma 4.3.

Lemma 8.3. There exists a CQq_main[x,y]such that for any quasi-computation treeQthe setMqmain :={(π(x), π(y))| Q |=_π q_main}is composed precisely of any pair of leaves of two consecutive configuration trees ofQ. Formally:Mqmain= ((w, w),(wb, v))∈∆^Q| |w|=|v|=N+1,b∈ {0,1} .

Proof. It suffices to takeq_main:=q_rl[x_r,x]∧next(x_r,y_r)∧ qrl[yr,y]. LetQ |=πqmain. ThatMqmainis a superset of the set above follows from the fact that quasi-computation trees are computation units and hence, containment follows by Corol- lary 4.4. We now focus on the other direction. Note that by the 5th item of Definition 7.1 we know thatπ(xr)andπ(yr) must be two distinct roots of enriched configuration trees Exr,Eyr. By the 4th item of Definition 7.1 we know that the interpretation of thers and`s is restricted to pairs of domain elements located inside the same enriched configuration tree (and by their definition to configuration trees and by their definition to configuration units). Sinceq_rlonly employs the roles`i,riand the conceptsLvl0,LvlN+1we conclude that

(7)

qrlhas exactly the same set of matches inExr as in its un- derlying unit. Hence, by Corollary 4.4 we know thatx(resp.

y) is indeed mapped to a leaf ofExr (resp. to a leaf ofEyr), which finishes the proof.

The next part of our query construction focuses on sub- queriesq_adrⁱ [x,y]that are meant to relate leaves having equal i-th bits of addresses. In order to construct it we combine together several smaller queries, written in path syntax below.

• We letq↓[x,y] := (`1;r1;. . .;`N+1;rN+1)(x,y)define the top-down query. It intuitively traverses an enriched configuration tree in a top-down manner. Note that q_↓[x,y]is actually the major sub-query ofq_rl[x,y].

• The`_i-top-down queryq_ì↓[x,y]is similar toq_↓[x,y], but with theì;ri part replaced by justì. The intended be- havior is that again a tree is traversed from root to leaves, but this time, an ì edge must be crossed when going from the(i−1)-th to thei-th level. Ther_i-top-down query qri↓[x,y]is defined by replacingì;riinq↓[x,y]withri. An important ingredient in the construction is the query q₌₀^{i-th bit}[x,y]defined as follows:

Lvl_N+1(x)∧q`i↓[x⁰,x]∧next(x⁰,y⁰)∧q_`i↓[y⁰,y]∧Lvl_N+1(y).

In total analogy, we defineq₌₁^{i-th bit}[x,y]by usingq_ri↓instead ofq`i↓. Any matchπof the queryq_=b^{i-th bit}[x,y]instantiates the variablesx andy in a quasi-computation treeQaccord- ing to one of the following two scenarios: eitherπ(x) =π(y) orπ(x)andπ(y)are leaves in two consecutive enriched configuration trees inside the quasi-computation tree and both of these leaves have theiri-th address bit set tob.

Lemma 8.4. Let Q be a quasi-computation tree and let M_qi-th bit

=b ={(π(x),π(y))| Q |=πq_=b^{i-th bit}}forb∈ {0,1}. Then M_qi-th bit

=b is equal to the union ofM₁^b :={((w, w),(w, w))}

andM₂^b:={((w, ubv),(wb, u⁰bv⁰))| |u|=|u⁰|=i−1}.

Proof. We show the statement forb= 0, the case forb= 1 then follows by symmetry. First we showM₁⁰ ⊆ M_qi-th bit

=0 . This is easy: for any leafd = (w, w)we map all variables ofq₌₀^{i-th bit}[x,y]intod; this is a match due to the presence of all the self-loops at the leaves. To showM₂⁰ ⊆M_qi-th bit

=0 we take anyd = (w, w)ande = (wb, v). Letπbe a variable assignment that mapsxtod,ytoe,x⁰to(w, ε),y⁰to(wb, ε).

The variables ofq_`i↓[x⁰,x]are mapped to(w, wj), wherewj

is the prefix ofwof lengthjfollowing the path from(w, ε)to (w, w)level-by-level. We stress that((w, w_i−1),(w, wi))∈

`^Q_i holds, which is crucial for the construction to work and that every(w, wj)node has all`- andr-loops. The variables ofq_`i↓[y⁰,y] are mapped analogously. After noticing that d,e ∈ Lvl^Q_N+1 and that(π(x⁰), π(y⁰)) ∈ next^Qholds, we conclude thatπis clearly a match ofq₌₀^{i-th bit}[x,y]toQ.

Now we focus on showing thatM_qi-th bit

=0 [x,y] ⊆M₁⁰∪M₂⁰. Take any match π and note that x,y must be mapped to leaves. Forπ(x⁰)andπ(y⁰)we consider the two cases:

1. π(x⁰) = π(y⁰). As the roots do not have next-loops, π(x⁰)must be a leaf. This implies that all variables of q`i↓[x⁰,x]map into a single domain element (otherwise

we would not reach a leaf after traversing such path). Ar- guing similarly we infer that all variables ofq_`i↓[y⁰,y]are mapped to the same element. Thusπ(x) =π(y)holds.

2. π(x⁰)6=π(y⁰). Since all incomingnextroles from leaves are self-loops, we conclude thatπ(x⁰)is the root of some enriched quasi-computation tree andπ(y⁰)is the root of some corresponding quasi-successor inQ(by the definition ofnext^Q). By the satisfaction ofq`i↓[x⁰,x]there is a sequence of domain elements contributing to a path from π(x⁰)toπ(x)witnessing its satisfaction. Moreover, note that since the subqueryq_`i↓[x⁰,x]leads from the root to a leaf it implies that we necessarily cross the`_irole at the i−1-th level, meaning that thei-th bit of the address of π(x)is equal to0. Thus we infer thatπ(x) ∈ (Ad⁰_i)^Q. Analogously, we inferπ(y)∈(Ad⁰_i)^Q.

The queryq_adrⁱ [x,y]pairing leaves in consecutive enriched conf. trees with coincidingi-th address bit is defined as:

q_adrⁱ [x,y] :=q_main[x, y]∧q₌₀^{i-th bit}[x,z]∧q₌₁^{i-th bit}[z,y] Lemma 8.5. For any quasi-computation tree Q we have thatM_qi

adr ={(π(x), π(y))| Q |=_πq_adrⁱ [x,y]}is composed precisely of the leaf pairs in two consecutive enriched configuration trees ofQhaving equali-th bit of address, formally:

M_qi

adr =M_q_main∩

Ad⁰_i^Q×Ad⁰_i^Q

∪ Ad¹_i^Q×Ad¹_i^Q . Sketch. By employing the definition of the query, Lemma 8.4 and relational calculus.

We are finally ready to present our query by means of which we can conclude with the proof of Lemma 8.1.

q_M:=VN+1

i=1q_adrⁱ [x,y]∧NoPHdAbv(y)∧0(x)∧1(y) Proof of Lemma 8.1. Letq_Mas defined above and observe that its size is clearly polynomial inN. Note thatq_Msatisfies our requirements: The 1st item follows from two lemmas:

the fact thatxandyare mapped to leaves of two consecutive enriched configuration trees follows from Lemma 8.3 and the fact thatxandyare mapped to nodes having equal addresses follows from Lemma 8.5 applied for every 1 ≤ i ≤ N+1.

The 2nd and the 3rd points hold since we supplemented our query withNoPHdAbv(y)∧0(x)∧1(y).

9 Conclusions

Conjunctive query entailment for ALCSelf is, in fact 2 EX PTI M E-complete, where membership follows from much stronger logics (Calvanese, Eiter, and Ortiz 2009).

Hardness, shown in this paper, came as a quite surprise to us (in fact, we spent quite some time trying to prove EX PTI M E- membership, see: (Bednarczyk 2021b)). The key insight of our proof (and maybe the take-home message from this paper) is that the presence of Self allows us to mimic case distinction over paths (and hence the handling of disjunctive information) through concatenation, by providing the oppor- tunity for one of the two disjuncts to idle by “circling in place”.

On a last note, our result holds for plainALCSelfTBoxes, since the only ABox assertionInit(a)can be replaced by the GCI> v ∃aux.Initfor an auxiliary role nameaux.

(8)

Acknowledgements

This work was supported by the ERC through the Consolida- tor Grant No. 771779.

References

Baader, F.; Bednarczyk, B.; and Rudolph, S. 2020. Satis- fiability and Query Answering in Description Logics with Global and Local Cardinality Constraints. In Giacomo, G. D.;

Catal´a, A.; Dilkina, B.; Milano, M.; Barro, S.; Bugar´ın, A.;

and Lang, J., eds.,ECAI 2020 - 24th European Conference on Artificial Intelligence, 29 August-8 September 2020, San- tiago de Compostela, Spain, August 29 - September 8, 2020, volume 325, 616–623. IOS Press.

Baader, F.; Horrocks, I.; Lutz, C.; and Sattler, U. 2017. An Introduction to Description Logic. Cambridge University Press. ISBN 978-0-521-69542-8.

Bednarczyk, B. 2021a. Exploiting Forwardness: Satisfiabil- ity and Query-Entailment in Forward Guarded Fragment. In Faber, W.; Friedrich, G.; Gebser, M.; and Morak, M., eds., Logics in Artificial Intelligence - 17th European Conference, JELIA 2021, Virtual Event, May 17-20, 2021, Proceedings, volume 12678 ofLecture Notes in Computer Science, 179–

193. Springer.

Bednarczyk, B. 2021b. Lutz’s Spoiler Technique Revisited:

A Unified Approach to Worst-Case Optimal Entailment of Unions of Conjunctive Queries in Locally-Forward Descrip- tion Logics. CoRR, abs/2108.05680.

Bednarczyk, B.; and Rudolph, S. 2021. The Price of Selfishness: Conjunctive Query Entailment for ALCSelf is 2ExpTime-hard.CoRR, abs/2106.15150.

Calvanese, D.; Eiter, T.; and Ortiz, M. 2009. Regular Path Queries in Expressive Description Logics with Nom- inals. In Boutilier, C., ed.,IJCAI 2009, Proceedings of the 21st International Joint Conference on Artificial Intelligence, Pasadena, California, USA, July 11-17, 2009, 714–720.

Chandra, A. K.; and Stockmeyer, L. J. 1976. Alternation.

In 17th Annual Symposium on Foundations of Computer Science, Houston, Texas, USA, 25-27 October 1976, 98–108.

IEEE Computer Society.

Eiter, T.; Lutz, C.; Ortiz, M.; and Simkus, M. 2009. Query Answering in Description Logics with Transitive Roles.

In Boutilier, C., ed.,IJCAI 2009, Proceedings of the 21st International Joint Conference on Artificial Intelligence, Pasadena, California, USA, July 11-17, 2009, 759–764.

Eiter, T.; Ortiz, M.; and Simkus, M. 2012. Conjunctive query answering in the description logic SH using knots.J. Comput.

Syst. Sci., 78(1): 47–85.

Horrocks, I.; Kutz, O.; and Sattler, U. 2006. The Even More Irresistible SROIQ. In Doherty, P.; Mylopoulos, J.; and Welty, C. A., eds.,Proceedings, Tenth International Confer- ence on Principles of Knowledge Representation and Rea- soning, Lake District of the United Kingdom, June 2-5, 2006, 57–67. AAAI Press.

Kr¨otzsch, M.; Rudolph, S.; and Hitzler, P. 2008. ELP:

Tractable Rules for OWL 2. In Sheth, A. P.; Staab, S.;

Dean, M.; Paolucci, M.; Maynard, D.; Finin, T. W.; and

Thirunarayan, K., eds., The Semantic Web - ISWC 2008, 7th International Semantic Web Conference, ISWC 2008, Karlsruhe, Germany, October 26-30, 2008. Proceedings, volume 5318 ofLecture Notes in Computer Science, 649–664.

Springer.

Lutz, C. 2008. The Complexity of Conjunctive Query An- swering in Expressive Description Logics. In Armando, A.;

Baumgartner, P.; and Dowek, G., eds.,Automated Reason- ing, 4th International Joint Conference, IJCAR 2008, Sydney, Australia, August 12-15, 2008, Proceedings, volume 5195 of Lecture Notes in Computer Science, 179–193. Springer.

Ngo, N.; Ortiz, M.; and Simkus, M. 2016. Closed Predicates in Description Logics: Results on Combined Complexity. In Baral, C.; Delgrande, J. P.; and Wolter, F., eds.,Principles of Knowledge Representation and Reasoning: Proceedings of the Fifteenth International Conference, KR 2016, Cape Town, South Africa, April 25-29, 2016, 237–246. AAAI Press.

Ortiz, M.; Rudolph, S.; and Simkus, M. 2010. Worst-Case Optimal Reasoning for the Horn-DL Fragments of OWL 1 and 2. In Lin, F.; Sattler, U.; and Truszczynski, M., eds., Principles of Knowledge Representation and Reasoning: Pro- ceedings of the Twelfth International Conference, KR 2010, Toronto, Ontario, Canada, May 9-13, 2010. AAAI Press.

Ortiz, M.; and Simkus, M. 2012. Reasoning and Query Answering in Description Logics. In Eiter, T.; and Kren- nwallner, T., eds.,Reasoning Web. Semantic Technologies for Advanced Query Answering - 8th International Summer School 2012, Vienna, Austria, September 3-8, 2012. Proceed- ings, volume 7487 ofLecture Notes in Computer Science, 1–53. Springer.