Preserving Constraints with the Stable Chase David Carral

(1)

David Carral

¹

, Markus Krötzsch

²

, Maximilian Marx

³

, Ana Ozaki

⁴

, and Sebastian Rudolph

⁵

1 Center for Advancing Electronics Dresden (cfaed), TU Dresden david.carral@tu-dresden.de

2 Center for Advancing Electronics Dresden (cfaed), TU Dresden markus.kroetzsch@tu-dresden.de

3 Center for Advancing Electronics Dresden (cfaed), TU Dresden maximilian.marx@tu-dresden.de

4 Center for Advancing Electronics Dresden (cfaed), TU Dresden ana.ozaki@tu-dresden.de

5 Computational Logic Group, TU Dresden sebastian.rudolph@tu-dresden.de

Abstract

Conjunctive query answering over databases with constraints – also known as (tuple-generating) dependencies – is considered a central database task. To this end, several versions of a construction called chase have been described. Given a set Σ of dependencies, it is interesting to ask which constraints not contained in Σ that are initially satisfied in a given database instance are preserved when computing a chase over Σ. Such constraints are an example for the more general class of incidental constraints which when added to Σ as new dependencies do not affect the certain answers and might even speed up query answering.

After formally introducing incidental constraints, we show that deciding incidentality is undecidable for tuple-generating dependencies, even when restricting to classes for which query entailment is decidable. We find that for dependency sets admitting a finite universal model, the core chase can be used to decide incidentality. For the infinite case, we propose the stable chase, which is a generalisation of the core chase, and study its relation to incidental constraints.

1998 ACM Subject Classification F.4.1 Mathematical Logic, H.2.3 Languages, I.2.3 Deduction and Theorem Proving, I.2.4 Knowledge Representation Formalisms and Methods

Keywords and phrases Incidental constraints, Tuple-generating dependencies, Infinite core chase, Universal Model, BCQ entailment

Digital Object Identifier 10.4230/LIPIcs.ICDT.2018.12

1 Introduction

Thechase [7, 14, 23] is an essential family of algorithms used to solve entailment questions in databases in the presence of constraints, such as computing certain answers to queries in data integration scenarios. Given a database instanceI and a set of dependencies Σ, chase procedures compute an instance that extendsI and satisfies all constraints in Σ, and that is universal in the sense that it admits a homomorphism into any other model ofI and Σ. In particular, such a universal model can be used for query answering, as it entails exactly the certain answers to conjunctive queries overI and Σ.

NowI might satisfy constraints that are not part of Σ, and it is a relevant question to ask whether or not these constraints arepreserved by the chase, i.e., whether they still hold in the universal model that is computed. This can be viewed as an extension of integrity checks

(2)

to the virtual, possibly infinite views that are defined by a set of dependencies. Moreover, constraints that are preserved in this sense can safely be assumed to hold, and hence be used in algorithms. For instance, query rewriting algorithms can benefit from additional constraints [22, 25].

For the case of Datalog rules (full dependencies) Σ, constraint preservation is a known problem in databases [1, 28], which is typically further generalised by asking if some set of constraints Γ is implied by Σ given arbitrary input instancesIthat merely satisfy certain input constraints Γ⁰. Constraint preservation then is the special case where Γ = Γ⁰. Traditionally, one is asking which constraints Γ are satisfied in the (unique, finite) least model of Σ, but there have also been works that consider all (first-order) models [30].

Unfortunately, however, these simple notions of constraint preservation (or implication) are no longer meaningful if we consider more general theories Σ that may contain tuple- generating dependencies. Which constraints are preserved then becomes highly sensitive to the details of the chase, since a constraint might be preserved in some universal models ofI and Σ but not in others. It is often possible to preserve a constraint even if it is not logically entailed byI and Σ. How can we find out if any universal model preserves a particular constraint, and how can we possibly compute such a model? The answer is not obvious, especially in the general case where universal models are necessarily infinite.

To tackle this problem, we propose the notion ofincidental constraints to capture the intuitive idea of a constraint being “preservable” (possibly with some effort). Concretely, a constraintρis incidental forI and Σ if addingρto Σ does not lead to any additional answer to conjunctive queries overI (and thereby to many other positive queries). Constraints that do not hold inI may therefore be incidental, too.

We only require conjunctive query equivalence rather than semantic equivalence, since the primary use of the chase is positive query answering. As a result, incidentality is not the same as logical entailment. For example, any constraint whose premise is not entailed (as a Boolean conjunctive query, BCQ) is incidental, and is satisfied by all universal models, yet may not be a entailed in general. Even dependencies that are violated in universal models can be incidental:

IExample 1. Consider the dependencyρ=R(x, y)→ ∃z.R(y, z) and an instanceI with a single relationR(n₀, n₁) wheren₀ andn₁ are nulls. ThenI andρhas a universal model that is an infiniteR-chain, starting atR(n0, n1). The dependency ρ⁰=R(y, z)→ ∃x.R(x, y) is not satisfied in this model, but is incidental for I andρ. Indeed, I and {ρ, ρ⁰} has a universal model that is a two-way infiniteR-chain, which entails the same queries as the one-way infinite chain, but is not a universal model ofI andρ.

We study incidentality and the related problem of recognising incidental constraints. This problems turns out to be hard: it is on the second level of the arithmetic hierarchy, and remains undecidable even in cases where conjunctive query answering is decidable. We give a complete (and computable) characterisation for theories that admit a finite model. Even for cases where a finitary computation procedure is impossible, we seek a deeper understanding of models that preserve incidental constraints. This leads us to develop a new notion of chase, which we use to establish the existence of core models that characterise both BCQ answers and the entailed incidental relationships. In summary, our main contributions are as follows:

We formalise a new notion of constraint preservation based on incidental dependencies.

We show that incidentality is not recursively enumerable (RE) in general, and remains undecidable even in restricted cases.

We show that thecore chase[14] can be used to decide incidentality for cases where a finite universal model exists.

(3)

We develop the stable chaseas a generalisation of the core chase to the infinite case.

We show that the stable chase produces a core that can be used both for query answering and for characterising full incidental dependencies.

Finally, we combine our results to establish the existence of a model that entails the same queries as a universal model and that satisfies exactly the tuple-generating depenencies that are incidental. This model can no longer be universal, but it is a core.

2 Preliminaries

We consider countably infinite, disjoint sets ofconstants ∆c and ofnulls∆n. A schemaS is a finite set of relation symbols, wherear(R) is the arity of R ∈ S. An instance I over S assigns to each relation symbol R ∈ S a (possibly infinite) ar(R)-ary relation R^I over

∆c∪∆n. Often, we do not explicitly mention that an instanceI is defined over a schemaS, and simply assume that such a signature has been fixed. Theactive domain ofI, denoted by ∆^I, is the set of alldomain elementsthat occur in relations ofI. We writea for a tuple ha1, . . . , aniof domain elements.

Morphisms Let I andJ be instances over a schema S. Ahomomorphism h: I → J is a function from ∆Î to ∆^J such that (i) h(c) =c for all c∈∆Î∩∆c, and (ii) a ∈RÎ impliesh(a)∈R^J for allR∈ S anda=ha1, . . . , ani ∈(∆Î)âr(R), where h(a) is short for tuplehh(a1), . . . , h(a_n)i. It isstrong if (ii) is strengthened to require a∈RÎ if and only if h(a)∈R^J.¹ Anembedding is an injective strong homomorphism, and an isomorphismis a bijective strong homomorphism (i.e., a surjective embedding). AnendomorphismofI is a homomorphismh:I → I.

Dependencies and Queries We use a countably infinite set ∆v ofvariables, disjoint from

∆_c∪∆_n. Aterm is an elementt∈∆_v∪∆_c. We use letters x, y, z, u, v, wand expressions such as x for tupleshx1, . . . , x`i of the corresponding elements. We treat such tuples as sets when order is not relevant. An atom is a formula R(t) with R ∈ S and |t| =ar(R).

First-order formulae are defined as usual. We writeϕ[x] to emphasise that the free variables inϕare a subset of x. Atuple generating dependency (TGD) is a formula of the form

∀x,z.(ϕ[x,z]→ ∃y.ψ[x,y]) (1)

where thebody ϕ and thehead ψare conjunctions of atoms, and ψcontains at least one conjunct. TGDs never contain free variables, hence we usually omit the universal quantifiers.

A TGD is full if it does not contain existentially quantified variables. ABoolean conjunctive query (BCQ), or simply aquery, is a formula of the form∃y.ϕ[y] withϕa conjunction of atoms. We allow TGDs with empty bodies to assert facts (possibly including existentials), and we often omit→in this case. Throughout this paper, we assume that Σ denotes afinite set of TGDs.

A conjunction of atoms ϕ(resp. a BCQq=∃y.ϕ[y]) gives rise to a finite instance Iϕ

(resp. Iq), obtained by treatingϕas a set of relational tuples using a fresh nulln_x in place of each variablex. Conversely, every finite instanceI induces a conjunctionϕ_I that has an atom for every relational tuple, using fresh variables x_n in place of nullsn. The BCQq_I then is the existential closure ofϕI. Note that TGDs can encode a given finite instanceI

1 Strong homomorphisms were calledfullby Deutsch et al. [14].

(4)

using a dependency→qI. This is why we will generally state our results for sets Σ of TGDs without mentioning an additional instance.

Universal Models and Cores Instances naturally correspond to first-order interpretations.

We let|= denote first-order modelhood and entailment. Note that, for an instanceI and a finite instanceJ, we haveI |=q_J iff there exists a homomorphismh:J → I. The set of all BCQs modelled (entailed) by an interpretationI or a set of TGDs Σ is denoted with BCQ(I) andBCQ(Σ), respectively.

A model J |= Σ is universal if, for every model I |= Σ, there is a homomorphism h:J → I. In this case,BCQ(J) =BCQ(Σ), i.e.,J and Σ are BCQ-equivalent [14]. Two instancesI andJ areBCQ-equivalent ifBCQ(I) =BCQ(J).

IDefinition 2. An instanceI is acoreif every endomorphism ofI is an embedding. A core I is calleda core of J if there is an endomorphismhofJ such that I is the restriction of J to the image of h.

Definition 2 corresponds to Bauslaugh’s propertyIN[5], and has also been used, e.g., in studies of constraint satisfaction [8]. Bauslaugh favours a stronger definition based on isomorphisms instead of endomorphisms (propertyISN), but this forces cores to be unique up to isomorphism, which is too restrictive for our needs. There are several further definitions of cores, all of which differ only on infinite instances [5, 6]. For finite instances, Definition 2 agrees with the one in [14] and a unique core (up to isomorphism) always exists, whereas (for Definition 2) infinite instances may have no core or several cores (see examples in Section 4).

Applying Rules A TGDρas in (1) isapplicableto an instanceI if there is a homomorphism h:Iϕ→ I. We then extendhtoIψby defining, for all variablesy∈ythat are existentially quantified,h(n_y) =n_y,ρ,hto be a null that is specific for y,ρ, andh, where we assume that all nulls of the formny,ρ,h exist and are mutually distinct. Letρ(I) denote the union ofI with all sets of the formh(I_ψ) for some extended homomorphismh:I_ϕ→ I. For a set Σ of TGDs, we set Σ(I) =S

ρ∈Σρ(I).

3 Incidental Dependencies

It is intuitive to ask whether a dependencyρthat holds for a finite instanceI is “preserved”

by a given set Σ of TGDs. We formalise this as follows, where we omitI since it can be captured by a TGD in Σ:

IDefinition 3. A TGD ρis incidental for a set Σ of TGDs if BCQ(Σ) =BCQ(Σ∪ {ρ}).

The set of all incidental TGDs of Σ is denotedICDT(Σ).

Clearly, Σ⊆ICDT(Σ). Indeed, every TGD that is logically entailed is also incidental.

However, the converse is not true, as illustrated in Example 1 (where the instanceI can be expressed by a TGD → ∃x, y.R(x, y)). In particular, incidental TGDs are not automatically

“preserved” in an arbitrary chase procedure, hence we avoid this terminology, though it was used previously, e.g., related to constraint preservation under non-recursive full TGDs [28].

Note that our notion of incidental TGDs is not specific to BCQs. Indeed, BCQ-equivalent sets of TGDs are also equivalent with respect to many other types of negation-free queries, such as Datalog queries and its numerous fragments, including (unions of) conjunctive regular path queries [18, 11], monadic [12] and linear [19] Datalog queries, (nested) monadically

(5)

defined queries [27, 9] and many more. Queries with negation, however, are not preserved:

in Example 1, Σ|=∃x.∀y.¬R(y, x) whereas Σ∪ {ρ} 6|=∃x.∀y.¬R(y, x).

A noteworthy property is that a TGD is incidental exactly if it does not lead to a newly entailed BCQ in a single derivation step. To state this formally, we use ρ(p) to abbreviate the BCQq_ρ(I_p) for any BCQpand TGDρ.

ITheorem 4. For a TGD ρ and a set Σ of TGDs, ρ∈ ICDT(Σ) iff ρ(q)∈ BCQ(Σ) for everyq∈BCQ(Σ).

Proof. (⇒) For the contrapositive, assume thatq ∈ BCQ(Σ) and ρ(q)∈/ BCQ(Σ). Then ρis applicable toIq. LetJ be any model of Σ. SinceJ |=q, there is a homomorphism I_q → J, henceρis applicable to J. Therefore, any instance J⁰⊇ J that satisfiesρentails ρ(q). SinceJ was arbitrary, we find that Σ∪ {ρ} |=ρ(q). Henceρis not incidental for Σ.

(⇐) Assume thatρ(q)∈BCQ(Σ) for allq∈BCQ(Σ). Suppose for a contradiction that Σ∪{ρ} |=qfor someq /∈BCQ(Σ). Then there is a finite derivationI =ρk(ρ_k−1(. . . ρ0(I_∅). . .)) withI_∅ the empty instance and rulesρ_i∈Σ∪ {ρ}, such thatI |=q. Letqw.l.o.g. be such thatkis minimal. Then Σ|=qJ forJ =ρk−1(. . . ρ0(I_∅). . .), and we haveρk =ρ. By the assumption onρ, also Σ|=ρ(q_J). Therefore Σ|=q_I, since Σ|=q_I=q_ρ(J₎ is isomorphic to ρ(q_J). Hence, sinceI |=qimpliesq_I |=q, we get the desired contradiction Σ|=q. J An important insight from the preceding theorem is that incidentality for some set Σ can be established solely onBCQ(Σ).

ILemma 5. For every two BCQ-equivalent sets Σ,Σ⁰ of TGDs, ICDT(Σ) =ICDT(Σ⁰).

Proof. Letρ∈ICDT(Σ) be an incidental TGD for Σ. Then, by Theorem 4, ρ(q)∈BCQ(Σ) for every q ∈ BCQ(Σ). Due to BCQ equivalence, this means ρ(q) ∈ BCQ(Σ⁰) for every q∈BCQ(Σ⁰), which, by the other direction of Theorem 4, implies thatρ∈ICDT(Σ⁰). The

converse follows by symmetry. J

Among others, this insight can be leveraged to show Theorem 6 below, which establishes that individual incidental TGDs are also jointly incidental, i.e., do not entail any additional BCQs together.

ITheorem 6. For every set Σof TGDs,BCQ(Σ) =BCQ(ICDT(Σ)).

Proof. Let q∈BCQ(ICDT(Σ)) be a BCQ. Then, by compactness, there is a finite subset Γ ={γ1, . . . , γ_k} ⊆ICDT(Σ) such thatq∈BCQ(Σ∪Γ). But thenBCQ(Σ) =BCQ(Σ∪{γ1}) = BCQ(Σ∪ {γ1, γ2}) =· · · =BCQ(Σ∪Γ): Sinceγ1 is incidental for Σ, we haveBCQ(Σ) = BCQ(Σ∪ {γ₁}). By Lemma 5,γ₂ is incidental for Σ∪ {γ₁}, i.e.,BCQ(Σ∪ {γ₁}) =BCQ(Σ∪ {γ1, γ2}). Further applications of Lemma 5 show thatγk is incidental for Σ∪ {γ1, . . . , γ_k−1}, yielding the above equality. This showsq∈BCQ(Σ). Hence,BCQ(ICDT(Σ))⊆BCQ(Σ), and

by monotonicity, we also haveBCQ(Σ)⊆BCQ(ICDT(Σ)). J

IDefinition 7. Incidentalis the following decision problem. Given a set Σ of TGDs and a TGDρ, isρincidental for Σ?

Since BCQ entailment checking over a set of TGDs is already undecidable in general, it is not surprising that the same is true forIncidental. However, the problem is actually on the second level of the arithmetic hierarchy [26], i.e., strictly harder than query answering, such that neither incidental dependencies nor non-incidental dependencies can be recursively enumerated:

(6)

ITheorem 8. IncidentalisΠ⁰₂-complete, and in particular neitherRE norcoRE. Proof. For membership note that we can characterise incidentality by quantifying over (finite) derivations (or proofs) in some theory. Indeed, a TGDρis incidental for Σ if: for all derivations that show Σ∪ {ρ} |=q for some BCQq, there is a derivation that shows Σ|=q. Using Gödel numbers for representing derivations, this condition can be expressed as a∀∃-sentence in first-order arithmetic.

We show hardness by many-one reduction from theuniversal halting problem, which is as follows: given a (deterministic) Turing machineM, doesMhalt on all inputs? Universal halting is known to be complete for Π⁰₂ (see [26, Theorem VIII], and apply Post’s Theorem).

For the reduction, we construct for a given TMMa set ΣM of TGDs and a full TGDρ such thatρis incidental for Σ iffMhalts universally. The rules of Σ_M consist of three parts:

Σ1ensures that each model contains representations of all possible inputs; Σ2 simulatesM on a particular input; Σ3 marks elements of an accepting TM simulation with a specific unary relationhalted. The ruleρthen asserts that initial elements in TM simulations are always marked byhalted, which is incidental if all runs have indeed terminated. The detailed

constructions in each case are given in the appendix. J

There are many known classes of TGD sets for which query answering becomes decidable, such as acyclic TGDs or guarded TGDs [15, 16, 2, 3, 21, 13], andIncidental does indeed becomecoREin this case.

ITheorem 9. Let C be a class of sets of TGDs over which BCQ entailment is decidable.

There is an algorithm that, given Σ∈ C, enumerates all TGDsρsuch that ρ /∈ICDT(Σ).

Proof. Letρbe an arbitrary TGD. Then ρis non-incidental iff there is some BCQ qsuch that either Σ |= q but Σ∪ {ρ} 6|= q, or Σ 6|= q but Σ∪ {ρ} |= q. Due to monotonicity of TGDs, only the second case can occur. Now, enumerating allq such that Σ6|=q and checking Σ∪ {ρ} |=qyields a semi-decision procedure for non-incidentality. Using a suitable

diagonalisation, we can enumerate allρ /∈ICDT(Σ). J

By Theorem 9, establishing non-incidentality of a given rule ρ is RE, even in cases where Σ∪ {ρ}∈ C. On the other hand,/ Incidentalin general remains undecidable even if BCQ-entailment is decidable, and even when asking only for the incidentality of one fixed full dependency.

ITheorem 10. There is a classC of sets of TGDs for which BCQ answering is decidable, and a full dependencyρfor which Σ∪ {ρ} ∈ C for allΣ∈ C, such that the following problem is undecidable: given someΣ∈ C, isρ incidental forΣ?

Proof. We show undecidability by reducing the halting problem of deterministic Turing machines when started on the empty tape. Consider a Turing machineM=hQ,Γ, δ, q_s, q_ei as in the proof of Theorem 8, which w.l.o.g. does not return to its initial stateqs in any run.

We consider predicate symbols as used in the proof of Theorem 8, and define the setτ(M) of TGDs to contain the rules Σ2as in this proof, together with the additional rules (facts):

→ ∃v, w.headq_s(v)∧symbol(v)∧right(v, w)∧symbol(w)∧end(w) (2)

→ ∃v.right(v, v)∧right⁺(v, v)∧next(v, v)∧end(v)∧^

q∈Q\{qs}

headq(v)∧ ^

σ∈Γ

symbol_σ(v) (3)

Here, (2) encodes the initial configuration ofMon the empty tape, which is the start of a Turing machine simulation as effected by Σ2; and (3) creates an element that stands in

(7)

all possible relations not involvingheadqs. Letρ=headqe(x)→halted(x), and letC be the class of all TGD sets of the formτ(M) orτ(M)∪ {ρ}.

BCQ answering over TGD sets of Cis decidable. Indeed, any BCQ that does not contain headq_s is trivially entailed by any Σ ∈ C, due to (3). On the other hand, if a connected component in a BCQ containsheadq_s, then it describes a property of a finite initial segment of the simulation of a TM, which can be checked effectively.

For a Turing machine M, the full TGD ρis incidental forτ(M) iff Mdoes not halt on the empty input. Indeed, if Mdoes not halt, then the only occurrence of head_q_e in a universal model ofτ(M) is in the element created due to (3), andρis already satisfied by this element. Conversely, ifMhalts, thenheadq_e occurs for an element that is connected to the starting sequence a created due to (2). Hence, there is a BCQ of the form q =

∃x.headq_s(x0)∧p1(x0, x1)∧. . .∧pn(x_n−1, xn)∧halting(xn) withpi∈ {right,next}, such that

τ(M)6|=q andτ(M)∪ {ρ} |=q. J

The previous result is particularly interesting since it only considers situations where query answering is decidable, both for the TGDs with and without the candidate dependency ρ. In spite of this general result, concrete classes of TGD sets with decidable BCQ entailment may allow us to decideIncidental, as discussed in the next section.

4 Cores and Incidentals

In this section we relate incidental dependencies with the notion of a core of an instance.

Theorem 11 shows that if a set of TGDs has a finite universal modelI then all incidental dependencies follow from the core ofI. It then follows from Theorem 11 that if the core chase [14] (also, see Definition 19) terminates thenIncidentalis decidable. In the following, letcore(I) denote the core of a finite instanceI.

ITheorem 11. LetΣbe a set of TGDs with a finite universal modelI and letρbe a TGD.

Then,ρ∈ICDT(Σ) iffcore(I)|=ρ.

Proof. (⇒) Consider ρ= ϕ[x,z] → ∃y.ψ[x,y] with ρ∈ ICDT(Σ). Let h: Iϕ → core(I) be some homomorphism, and assume that it is extended to Iψ using new nulls to map to as defined before. Then set J :=core(I)∪h(Iψ), i.e. the core with the consequence of ρ under h added (possibly by adding new elements). J is finite since core(I) is, and clearly ρ(core(I)) |= qJ. Therefore Σ∪ {ρ} |= qJ, and hence Σ |= qJ by incidentality.

So core(I) |= q_J since core(I) is a universal model, and we obtain a homomorphism g:J →core(I). But then the restriction ofgto elements of ∆^core(I)is an endomorphism, and therefore an embedding sincecore(I) is a core. Every embedding on a finite core is an isomorphism [20, 5], sog has an inverseg⁻:core(I)→core(I). ForK=g(h(Iψ∪ Iψ)) we haveK ⊆core(I) and hence g⁻(K)⊆core(I). Since g⁻(g(h(Iψ))) =h(Iψ), we can find a homomorphismh⁰ such that h⁰(I_ϕ) =h(I_ϕ) andh⁰(I_ψ)⊆g⁻(K)⊆core(I) (h⁰ may differ fromhin the choice of null values for existentially quantified variables). This shows thatρis satisfied bycore(I) for the particular matchh. Sincehwas arbitrary, we obtaincore(I)|=ρ.

(⇐) This follows by direct application of the definitions. J Given this connection between finite cores and incidental dependencies, one may ask whether it extends to cases where the set of TGDs does not admit a finite universal model.

Unfortunately, this it not the case: Example 1 shows a case where an incidental dependency does not hold in a universal model that is in fact a core (the one-way infinite chain).

(8)

(a)

. . . (b)

. . . . . .

(c)

. . .

Figure 1Universal models that have (a) two non-isomorphic cores, (b) no core, and (c) a core that is not a model, whereR is black andS is orange (grey)

This discrepancy between incidentals and cores goes together with a general loss of good properties of the core on infinite models. Finite instances (i) always have a core, which is (ii) unique up to isomorphism [17, 20], and (iii) the core of a finite universal model of a set of TGDs is also a universal model [14]. Examples 12, 13, and 14 show that we no longer have any of these properties when dealing with infinite universal models.

IExample 12. Let Σ consist of the following TGDs:

∃x, y.R(x, y) ∃x, y.S(z, w) R(x, y)→ ∃z.S(y, z) S(x, y)→ ∃z.R(y, z)

Figure 1a illustrates a universal modelI of Σ. The upper and the lower chain of relations each by itself is a core ofI, but the chains are not isomorphic, so property (ii) does not hold.

IExample 13. Let Σ consist of the following three TGDs:

∃x, y.R(x, y)∧S(x, y) R(x, y)∧S(x, y)→ ∃z.R(y, z)∧S(y, z) R(y, z)→ ∃x.R(x, y) Figure 1b illustrates a universal model of Σ, which is not a core, since there are non-embedding endomorphisms that map parts of the single chain into the double chain. In fact, one can see that this instance does not have a core.

IExample 14. Let Σ consist of the following TGDs:

∃x, y.R(x, y)∧S(x, y) R(x, y)∧S(x, y)→ ∃z.R(y, z)∧S(y, z) S(y, z)→ ∃x.R(x, y) Figure 1c shows a universal modelI of Σ. It is not a core, since there is a non-embedding endomorphism that maps each element to its right neighbour. This results in an instance that is isomorphic toI with the left-most node and itsR-relation removed, which is a core ofI but not a model for the third rule in Σ.

Nevertheless, cores can be relevant in finding instances that satisfy incidental dependencies.

To this end, we consider a particularly well-behaved type of core that can be obtained as a limit of a growing sequence of finite cores.

IDefinition 15(Core Cover). An instanceJ has acore cover if there are finite subinstances J0⊆ J1⊆ J2⊆. . .withJ =S

i≥0Jisuch that, for allJi, every homomorphismh:Ji→ J is an embedding.

ITheorem 16. If an instance has a core cover then it is a core.

Proof. Consider an instanceJ with core cover (Ji)i≥0, and an endomorphismh:J → J. By Definition 15, the restrictionhi:Ji→ J is an embedding for alli≥0. WithJ =S

i≥0Ji

andJ_i⊆ J_i+1it follows thathis an embedding, otherwise, since injectivity and being strong both are finitary conditions, there would be a non-embeddinghi:Ji→ J. J We remark that the condition thatJi ⊆ Ji+1is needed for Theorem 16 to hold. Figure 2 illustrates an instance that satisfies the remaining conditions of Definition 15 for a set of disjoint instances (Ji)i≥0, but which is not a core.

(9)

. . .

J0 J1 J2 J3

Figure 2An instance that satisfies most conditions of Definition 15 but is not a core . . .

. . .

Figure 3A core without a core cover, using two relationsR(black) andS(orange/grey)

IExample 17. Having a core cover is a sufficient but not a necessary condition for an instance to be a core. Figure 3 illustrates an instanceI that is a core. Indeed, any endomorphism must preserve the adjacency in this two-way infinite chain. But since one pair of elements is notS-related, only this very same pair can be mapped to this position in the chain, so the only endomorphism is the identity mapping.

However,I has no core cover, since any finite subset ofI that contains the pair without theS connection can be mapped by a non-strong endomorphism into a sufficiently long fully R-S-connected segment ofI.

The next theorem shows that cores with core covers can characterise the set of full incidental dependencies for a set of TGDs.

ITheorem 18. LetΣ be a set of TGDs and letI be an instance. Assume thatBCQ(I) = BCQ(Σ)andI has a core cover. Then,ρ∈ICDT(Σ)iffI |=ρ, for any full dependencyρ.

Proof. (⇒) Let (Ii)i≥0 be a core cover for I, and consider a full dependency ρ: ϕ→ψ that is incidental for Σ. IfI |=ϕfor some homomorphismh:Iϕ→ I, then there isIi such thathcan be considered as a homomorphismIϕ→ Ii. LetJ :=Ii∪h(Iψ), where we note thathdoes not introduce new nulls since ρis full. Similar to the proof of Theorem 11, we find that Σ∪ {ρ} |=q_J. ThereforeI |=q_J asρis incidental, and there is a corresponding homomorphismg:J → I. Since ∆^J = ∆^Iⁱ,gis a homomorphismg:Ii→ I, and therefore an embedding (Definition 15). This shows thath(I_ϕ∪ I_ψ)⊆ I_i as required. SincehandI_i was arbitrary, we conclude thatI |=ρ.

(⇐) This follows by direct application of the definitions. J Given Theorem 18 and the observation that a core cover is closely related to a bottom-up construction of a core, one naturally wonders if a chase-like procedure could be used to obtain a suitable model. The prime candidate is thecore chaseof Deutsch et al. [14]:

IDefinition 19. The core chase sequence for a set Σ of TGDs is a sequenceI₀,I₁, . . .of instances, where I0is the empty instance, and, for each i >0,Ii is the core of Σ(Ii−1). A finite core chase sequenceI0, . . . ,I`is terminating ifI`|= Σ, and in this case,I` is called thecore chase.

Intuitively, the procedure defined by Deutsch et al. consists on applying the rules and taking the core of the resulting instance in each step. Deutsch et al. do not define the core chase for cases where Σ require infinite models, and indeed the limit of infinite core chase sequences is not defined here. While this issue can be repaired by using a more sophisticated definition, the deeper problem is that the result of applying the rules and then taking the core in each step may not be a core. This can be seen, e.g., from the TGDs in Example 12, on which an infinite core chase would simply produce the universal model shown in Figure 1b, which is not a core.

(10)

5 The Stable Chase

In the following section, we show that all sets of TGDs admit a BCQ-equivalent model that is a core and that characterises full incidental dependencies. To this end, we introduce the stable chase, a novel variant of the chase. Our approach can be viewed as a generalisation of the core chase where core computation is performed by looking for non-embedding homomorphisms of an instance into any future instance along a chase sequence. If such a homomorphism is found, all instances in the current chase sequence are rewritten as follows:

IDefinition 20. Consider a homomorphismh:I → J on finite instances overS, and let≺ be a strict total order on ∆^I. Theh-rewriting of an instanceK is obtained as follows:

1. For allR∈ S anda∈(∆^K∩∆^I)^ar(R), withh(a)∈R^J, inserta∈R^K.

2. Replace alla∈∆^K∩∆^I by the≺-least element b∈∆^K∩∆^I for whichh(a) =h(b).

Theh-rewriting of a sequence of instances is the sequence of h-rewritings of its members.

IExample 21. Let I_i,j be the instance occurring in thei-th row,j-th column of Figure 4. Moreover, let h : I3,2 → I3,3 be the homomorphism that maps ni to ni+1 for every

−1≤i≤2. Then,I_4,2 is theh-rewriting ofI_3,2, and (the sequence)I_4,1,I_4,2,I_4,3 is the h-rewriting of (the sequence)I3,1,I3,2,I3,3.

We proceed with the definition of astabilising chase sequencefor a set of TGDs, which is a chase sequence that evolves in the sense that also previously derived instances may be modified at a later stage. The limit of this construction will yield a chase sequence from which we can obtain the potentially infinite core we are looking for.

IDefinition 22 (Stabilising Chase Sequence). A stabilising chase sequence for a set Σ of TGDs is a seriesQ=Q0,Q1, . . .of chase sequences. EachQk=Qk,0· · ·Qk,`(k)is a finite chase sequence of length`(k) + 1 consisting of instancesQk,i, such that the following hold:

1. Q0 is the singleton sequence containing the empty instance;

2. for allk≥0, either

(2.a) Qk+1=Qk,0,· · · ,Qk,`(k),Σ(Qk,`(k)) isQk extended by Σ(Qk,`(k)), or

(2.b) Qk+1 is the h-rewriting of Qk for some homomorphism h : Qk,i → Qk,j with 0≤i≤j that is not an embedding,

where we require that the order≺from Definition 20 is an extension of the (partial) order in which new nulls are introduced, and that all possible rewritings will eventually be applied: if there is a homomorphismh:Q_k,i→Q_k,j as in (2.b), then there isk⁰> ksuch that his an embedding from the sub-structure ofQk⁰,i on whichhis defined toQk,j.

Our requirement on ≺ ensures that in cases where two elements are merged by a homomorphism in step (2.b), we will always pick one as a representative that has the longest history in the chase sequence. This ensures monotone growth of the domain within a sequence.

While we define the stabilising chase sequenceQ =Q0,Q1, . . . to be infinite, it may happen that neither new derivations nor core constructions are possible at some stage. The process can still continue with step (2.a), appending copies of the last instance of the chase sequence, even if they contain no new derivations. Finite termination of the chase is therefore captured in the sequence becoming constant at some point.

IExample 23. Figure 4 illustrates a stabilising chase sequenceQfor the set of TGDs from Example 13. Q₄ is theh-rewriting ofQ₃for the non-strong homomorphismhdenoted with dotted arrows in the figure.

(11)

Q1

Q₂ Q3

Q4

Q5

Q₆

n₀ n₁ n_-1 n₀ n₁ n₂ n_-2n_-1 n₀ n₁ n₂ n₃ n_-3n_-2n_-1 n₀ n₁ n₂ n₃ n₄

Figure 4Stabilising chase sequenceQof Example 13 without the initial sequenceQ0; relations RandS are denoted in black and orange (grey), respectively; domain elements are named below each column of instances

IExample 24. A stabilising chase sequence might not be unique. For the set Σ of TGDs from Example 12, parallel chase steps as in (2.a) of Definition 22 yield instances that contain finite initial segmentsR(a0, a1), S(a1, a2), . . .andS(b0, b1), R(b1, b2), . . .of parallel chains as in Figure 1a. Non-embedding homomorphisms collapse the lower chain into (a longer future version of) the upper, or vice versa. In each case, the chase will produce initial segments of a single infinite chain, which might begin with either RorS depending on the chosen homomorphism.

For a particular stabilising chase sequence, however, the instances occurring in the i-th positions of the sequence will eventually stabilise to a unique structure.

IDefinition 25. An instanceI isstable for position iin a stabilising chase sequence Qif there isk≥0 such thatI=Qk⁰,ifor allk⁰ ≥k.

ILemma 26. There is a unique stable instance for every stabilising chase sequenceQand positioni≥0. This stable instance is a core.

Proof. There are three ways in which the finite structureQ_`,imay evolve for some`≥0:

(1)Q`,i=Q`+1,i; (2) ∆^Q^`,i ⊃∆^Q^`+1,i; or (3)R^Q^`,i ⊂R^Q^`+1,i for some relational symbolR.

The (not mutually exclusive) cases (2) and (3) can only occur for a finite number of times.

For (2), it is clear that the finite domain cannot decrease in size infinitely often. Moreover, domain elements are only ever renamed if two of them are merged by a homomorphism during rewriting. The finite bound for (3) follows since there can only be at most finitely many relations over a finite domain. Therefore, there is somekfor whichQk,i is stable.

Stable instances are cores since otherwise they would admit a non-embedding endomorphism, which would eventually be used in step (2.b) of Definition 22, contradicting

stability. J

We may therefore denote the stable instance for position iinQbyst(Q, i), and use the sequence of stable instances to define an infinite structure:

IDefinition 27(Stable Chase). IfQis a stabilising chase sequence for some set Σ of TGDs, then (st(Q, i))i≥0 is astable chase sequence for Σ, andS

i≥0st(Q, i) is astable chasefor Σ.

(12)

IExample 28. In Figure 4, all instances below the dashed line are stable in the stabilising chase sequenceQfrom Example 23 (using the TGDs Σ from Example 13). The corresponding stable chase sequence is (Si)_i≥0 where, for everyi≥0, Si is a chain of length 2iof elements sequentially connected byR andS. The stable chase for Σ is a two-way infinite chain of elements sequentially connected byRandS. The stable chaseS is unique up to isomorphism in this case.

6 Properties of the Stable Chase

We start by showing that every set of TGDs admits a stable chase. We then show that the stable chase algorithm yields a model of Σ (Theorem 30) that is BCQ-equivalent to Σ (Theorem 31), and that is a core (Theorem 32). We show that it satisfies all full incidental dependencies (Theorem 33) and that it coincides with the result of the core chase in finite cases (Theorem 34). Nevertheless, we observe that the stable chase is neither unique nor a universal model. Finally, we show the existence of another BCQ-equivalent model that is a core and entails all incidental dependencies.

ITheorem 29. Every set of TGDs has a stable chase sequence.

Proof. We show that every set of TGDs admits a stabilising chase sequenceQ. Indeed, let Q=Q0,Q1, . . .be a stabilising chase sequence constructed as follows:

1. SetQ0as the singleton sequence containing the empty instance.

2. For every k ≥ 0: If every homomorphism h : Qk,i → Qk,j for every 0 ≤ i ≤ j is an embedding, then let Qk+1 = Qk,0· · ·Qk,`(k)Σ(Qk,`(k)). Otherwise, Qk+1 is the h-rewriting ofQk withhsome (arbitrarily chosen) non-embedding homomorphism from some instance ofQk to another.

It is clear that the resulting seriesQsatisfies 1 and 2 from Definition 22.

It remains to verify the fairness condition on the application of step (2.b). Consider some k≥0, some 0≤i≤j, and some non-embedding homomorphismh:Qk,i →Qk,j. Then, letQ_k⁰ be the sequence in Qwith the same length as Q_k such thatk⁰ is maximal (note that,k⁰ ≥k). By item (2) and Definition 22, every homomorphism fromQk⁰⁰,itoQk⁰⁰,j with k⁰⁰≥k⁰is an embedding. Moreover, we can show via induction that there is a homomorphism h⁰ : Qk,j → Qk⁰⁰,j for every k⁰⁰ ≥ k⁰. Note that, given some k⁰⁰ ≥ k, the existence of a non-embedding homomorphismh⁰⁰ : Qk⁰⁰,i →Qk,j would imply the existence of another homomorphism from Qk⁰⁰,i toQk⁰⁰,j which is not an embedding either (namely,h⁰◦h⁰⁰).

Hence, for everyk⁰⁰≥k⁰, every homomorphismh⁰⁰:Qk⁰⁰,i→Qk,j is an embedding. J ITheorem 30. If C is a stable chase forΣ, thenC |= Σ.

Proof. LetQbe the stabilising chase sequence from whichC was extracted. Consider any ruleϕ→ ∃y.ψ∈Σ that is applicable toC based on some homomorphismh:Iϕ→ C. Since Iϕis finite, there isi≥0 such thathrestricts to a homomorphismIϕ→st(Q, i). Letk be the least number such thatQk,i =st(Q, i). By Definition 22, we find thatQk,i⊆Qk,j for all i≤j≤`(k). Moreover, there isk⁰ > kwith`(k⁰) =`(k) + 1 andQ_k⁰_,`(k)+1= Σ(Q_k⁰_−1,`(k)) (step 2.a). Sincest(Q, i) = Qk,i =Qk⁰−1,i ⊆ Qk⁰−1,`(k), rule ϕ→ ∃y.ψ is applicable to

Q_k⁰_−1,`(k) underh. Therefore,Q_k⁰_,`(k)+1 contains the result of this rule application, and by Definition 20 this remains true (possibly for some renaming of new nulls) inst(Q, `(k) + 1)

and hence inC. J

ITheorem 31. If C is a stable chase forΣ, thenC andΣare BCQ-equivalent.

(13)

Proof. By Theorem 30 and the definition of BCQ entailment, Σ|=qimpliesC |=qfor all BCQsq.

For the base case, Q0,0 is the empty instance, and the claim is immediate. Now assume the claim holds for all instances ofQk. Definition 22 has two ways for constructingQk+1: 2.a Then the only new instance is Σ(Q_k,`(k)). Since the claim holds forI =Q_k,`(k), we

find that Σ|=q_I for the corresponding BCQq_I. Therefore, any rule application that is possible onI is possible (up to isomorphism) in any model of Σ, and hence Σ|=q_Σ(I), which entails the claim.

2.b Leth:Qk,i→Qk,j be the homomorphism used for the rewriting. Thenhrestricts to a homomorphismQk+1,i→Qk,j. By Definition 22, we find that Qk+1,i⁰ ⊆Σ(Qk+1,i⁰−1) for alli⁰ > i. Therefore, Q_k+1,`(k)⊆Σ^`(k)−i(Qk+1,i) (‡). It suffices to consider BCQs qthat are entailed byQk+1,`(k) (where`(k) =`(k+ 1)), since they subsume all BCQ entailment in any instance ofQk+1. By (‡),Q_k+1,`(k)|=qimpliesQk+1,i,Σ|=q. Using the homomorphismh:Q_k+1,i→Q_k,j, we concludeQ_k,j,Σ|=qand hence Σ|=q.

J ITheorem 32. IfC is a stable chase forΣ, thenC is a core.

Proof. By Definition 27, there is some rewritten chase sequence S = S0,S1, . . . with C=S

i≥0Si. Moreover, for every i≥j≥0 and every homomorphismh:Si→Sj,his an embedding. Since every element ofS is finite, every homomorphismhmapping such element toC is also an embedding. SinceS_i−1⊆Si for everyi≥1, we conclude thatS is a core cover forC. Therefore, we can apply Theorem 16 to show that the theorem follows. J The previous observation that the stable chase sequence yields a core cover, together with the BCQ-equivalence of stable chase and Σ (Theorem 31), lets us apply Theorem 18 to conclude that the stable chase does indeed characterise the full incidental dependencies:

I Theorem 33. Every stable chase of Σ entails exactly those full dependencies that are incidental forΣ.

As one would expect in the light of Theorem 8, the stable chase does not constitute a semi-decision procedure for incidentality or non-incidentality. On the one hand, the stable chase may not terminate, on the other hand we cannot even decide if a given finite instance in a stabilising chase sequence is already stable.

The core chase can be viewed as a special case of the stable chase procedure, since it can be obtained by prioritising step (2.b) in Definition 22, while applying it only to the last instance in a chase sequence Qk (this forces the homomorphism that is used to be an endomorphism). For finite models, this does not change the outcome, and indeed the stable chase coincides with the core chase whenever the latter is defined:

ITheorem 34. If a setΣof TGDs has a finite universal model, then the stable chase over Σis equal to the result of the core chase, up to isomorphism.

Proof. Deutsch et al. showed that the core chase yields a finite universal model in this case [14, Theorem 7]. Let U be this model, and letC be a stable chase of Σ. Since U is

(14)

v₁

v₂

a₁ a₂ a₃ a₄

b1 b2 b3 b4

c₁ c₂ c₃ c₄

d₁ d₂ d₃ d₄

v₁

v₂

b1 b2 b3 b4

c₁ c₂ c₃ c₄

d₁ d₂ d₃ d₄

v₁

v₂

b1 b2 b3 b4

c₁ c₂ c₃ c₄

Figure 5 InstancesI (left),J (middle), andK(right). RolesU, V, R,andS are represented with dashed and black, dashed and orange (possibly grey), black, and orange (possibly grey) arrows, respectively; dotted edges indicate the continuation of a sequence of elements up to some length

universal, there is a homomorphismh:C → U, sinceC is a model (Theorem 30). Moreover, sinceU is finite,U |=q_U, and sinceU andC are BCQ-equivalent (Theorem 31), there is a homomorphismh⁰:U → C. Therefore, the functionh◦h⁰ is an endomorphism overC with a finite range. SinceC is a core (Theorem 32), every endomorphism (includingh◦h⁰) must be injective and hence,C must be finite. SinceC is finite, BCQ-equivalent toU, and a core, we

conclude thatC is equal toU up to isomorphism. J

We continue with some limitations of the stable chase: it may not yield a universal model, it may admit uncountably many non-isomorphic results, and it cannot be used to characterise non-full incidental TGDs. As already pointed out in Section 4, there are sets of TGDs that only admit universal models which are not cores (e.g., see the set of TGDs from Example 14). Hence, since the stable chase is guaranteed to yield a core (Theorem 32), it may not always produce a universal model. To illustrate the other limitations, consider the following example.

IExample 35. Consider a set Σ of TGDs containing the following dependencies.

∃x, y.V(x, y) U(x, y)∧R(y, z)→U(x, z) V(x, y)→ ∃z.V(y, z) U(x, y)∧S(y, z)→U(x, z) V(x, y)→ ∃z, w.U(x, z)∧R(z, w) R(x, y)→ ∃z.S(y, z) V(x, y)→ ∃z, w.U(x, z)∧S(z, w) S(x, y)→ ∃z.R(y, z) Moreover, letI,J, andK be the instances depicted in Figure 5.

By iteratively applying the chase step (2.a) from Definition 22 during the computation of some stabilising chase sequenceQof Σ, we can produce an instance such asI containing an arbitrarily longV chain, and two alternatingRandS chains linked to every elementvi of suchV-chain. Applying step (2.b) from Definition 22, we can, for each pair of chains inI linked to the samevi, collapse the lower chain into the upper, or vice versa. In each case, the chase will produce initial segments of a single alternating infinite chain, which might begin withR orS. Applying suchh-rewritings, we can produce instances such asJ andK.

The h-rewritings discussed above are somehow similar to the rewriting discussed in Example 24. However, in the current example, we have an infinite number of rewritings to consider–one for each elementvi in the infiniteV chain. Taking into account all of these

(15)

choices, we can generate uncountably many different stable chase sequences which can in turn be used to define uncountably many non-isomorphic stable chases.

Finally, note that there are (non-full) incidental TGDs for Σ, such asV(x, y)→ ∃z.V(z, x), which are not entailed by any stable chase of Σ.

As highlighted by the previous example, instances resulting from the stable chase sequence may not be used to characterise non-full TGDs. Nevertheless, we can show that, for a set of TGDs, there is an instance that satisfies all incidentals. While this result shows the existence of a suitable structure, it does not offer a constructive way of approximating it, since it relies on the (infinite) set of all incidentals to be given.

ITheorem 36. Given a setΣ of TGDs, there is an instanceI such that:

1. I is a core;

2. I |= Σ;

3. BCQ(I) =BCQ(Σ); and

4. ρ∈ICDT(Σ) iffI |=ρ, for any TGDρ.

Proof. To show this theorem we sketch how one can adapt the stable chase so that it can deal with infinite sets of TGDs. In this case infinite instances may occur in a stabilising chase ofICDT(Σ) and hence, the stable chase is not well-defined. To avoid this, we slightly modify Definition 22: In (2.a), instead of settingQk+1as the extension ofQkwithICDT(Σ)(Q_k,`(k)), we define this sequence as the extension of Q_k with ρ(Q_k,`(k)) for some ρ ∈ ICDT(Σ).

Moreover, we must also ensure fairness of the application of the rules inICDT(Σ); i.e., each rule inICDT(Σ) must be applied after the computation of a finite amount of sequences. With this modified version of the stable chase, one can show that it maintains its main properties.

Then, by Theorem 29 there is some stable chase I of ICDT(Σ) which, by Theorem 32, I is a core; by Theorem 30, I |= ICDT(Σ) and hence, I entails all subsets of ICDT(Σ), including Σ; and by Theorem 31,BCQ(I) =BCQ(Σ). Also, ifρ∈ICDT(Σ), thenI |=ρsince I |=ICDT(Σ). Conversely, ifI |=ρthenρ(q)∈BCQ(Σ) for everyq∈BCQ(Σ). Therefore,

by Theorem 4,ρ∈ICDT(Σ). J

7 Conclusion

To the best of our knowledge, this is the first study on constraint implication in the presence of arbitrary theories of tuple-generating dependencies. This idea is embodied in our new notion of incidental dependencies, which correspond to constraints that can be safely assumed to hold when checking BCQ entailment, despite not being a consequence of the given TGD set. Even for a single, fixed instance, finding incidental dependencies remains a challenging problem which is highly undecidable.

Our work reveals close connections between incidental dependencies and cores. If a finite universal model exists, its unique core perfectly characterises the incidentals. The correspondence breaks down if models become infinite, but we can still find cases where cores characterise at least all full incidental dependencies. However, one then has to be content with cores that are BCQ-equivalent to the universal models, but that are not universal themselves. To obtain such cores, we presented the stable chase as a generalisation of the core chase that can be used to build infinite models, and which is interesting in its own right.

On the theoretical level, several questions remain for future work: Is there a construction alike the stable chase which produces a BCQ-equivalent model which is indicative of all incidental TGDs (not just the full ones), without knowing all incidentals beforehand? What

(16)

are the computational characteristics ofIncidental for restricted classes of TGDs (such as guarded [4], sticky [10], etc.)? Obviously, all classes that warrant a finite universal model (such as diverse versions of acyclic TGDs [16, 24, 21, 13] and full TGDs) guarantee decidability, but the exact complexity of checking incidentality of individual TGDs would still be of interest. Further questions arise when considering equality-generating dependencies in addition to TGDs. Finally, it is of great importance to understand how known incidentals can be exploited toward more efficient practical query answering, as already suggested in some previous works [22, 25].

Acknowledgements This work is partly supported by the German Research Foundation (DFG) in CRC 912 (HAEC) and in Emmy Noether grant KR 4381/1-1.

References

1 Serge Abiteboul and Richard Hull. Data functions, datalog and negation. InProc. SIGMOD Int. Conf. on Management of Data (SIGMOD’88), pages 143–153. ACM, 1988.

2 Jean-François Baget, Michel Leclère, Marie-Laure Mugnier, and Eric Salvat. Extending decidable cases for rules with existential variables. In Craig Boutilier, editor,Proc. 21st Int. Joint Conf. on Artificial Intelligence (IJCAI’09), pages 677–682. IJCAI, 2009.

3 Jean-François Baget, Michel Leclère, Marie-Laure Mugnier, and Eric Salvat. On rules with existential variables: Walking the decidability line. Artificial Intelligence, 175(9–10):1620–

1654, 2011.

4 Jean-François Baget, Marie-Laure Mugnier, Sebastian Rudolph, and Michaël Thomazo.

Walking the complexity lines for generalized guarded existential rules. In Walsh [29], pages 712–717.

5 Bruce L. Bauslaugh. Core-like properties of infinite graphs and structures. Discrete Math- ematics, 138(1):101–111, 1995.

6 Bruce L. Bauslaugh. Cores and compactness of infinite directed graphs.J. of Comb. Theory Ser. B, 68(2), 1996.

7 Catriel Beeri and Moshe Y. Vardi. A proof procedure for data dependencies. J. ACM, 31(4):718–741, September 1984.

8 Manuel Bodirsky. The core of a countably categorical structure. In Volker Diekert and Bruno Durand, editors,Proc. 22nd Annual Symposium on Theoretical Aspects of Computer Science (STACS’05), volume 3404 of Lecture Notes in Computer Science, pages 110–120.

Springer, 2005.

9 Pierre Bourhis, Markus Krötzsch, and Sebastian Rudolph. Reasonable highly expressive query languages. In Qiang Yang and Michael Wooldridge, editors, Proc. 24th Int. Joint Conf. on Artificial Intelligence (IJCAI’15), pages 2826–2832. AAAI Press, 2015.

10 Andrea Calì, Georg Gottlob, and Andreas Pieris. Towards more expressive ontology languages: The query answering problem. J. of Artif. Intell., 193:87–128, 2012.

11 Diego Calvanese, Giuseppe De Giacomo, Maurizio Lenzerini, and Moshe Y. Vardi. Reas- oning on regular path queries. SIGMOD Record, 32(4):83–92, 2003.

12 Stavros S. Cosmadakis, Haim Gaifman, Paris C. Kanellakis, and Moshe Y. Vardi. Decidable optimization problems for database logic programs (preliminary report). In Janos Simon, editor, Proc. 20th Annual ACM Symposium on Theory of Computing (STOC’88), pages 477–490. ACM, 1988.

13 Bernardo Cuenca Grau, Ian Horrocks, Markus Krötzsch, Clemens Kupke, Despoina Magka, Boris Motik, and Zhe Wang. Acyclicity notions for existential rules and their application to query answering in ontologies. J. of Artificial Intelligence Research, 47:741–808, 2013.

(17)

14 Alin Deutsch, Alan Nash, and Jeffrey B. Remmel. The chase revisited. In Maurizio Lenzer- ini and Domenico Lembo, editors,Proc. 27th Symposium on Principles of Database Systems (PODS’08), pages 149–158. ACM, 2008.

15 Alin Deutsch and Val Tannen. Reformulation of XML queries and constraints. In Diego Cal- vanese, Maurizio Lenzerini, and Rajeev Motwani, editors,Proc. 9th Int. Conf. on Database Theory (ICDT’03), volume 2572 ofLNCS, pages 225–241. Springer, 2003.

16 Ronald Fagin, Phokion G. Kolaitis, Renée J. Miller, and Lucian Popa. Data exchange:

semantics and query answering. Theoretical Computer Science, 336(1):89–124, 2005.

17 Ronald Fagin, Phokion G. Kolaitis, and Lucian Popa. Data exchange: Getting to the core.

ACM Trans. Database Syst., 30(1):174–210, March 2005.

18 Daniela Florescu, Alon Levy, and Dan Suciu. Query containment for conjunctive queries with regular expressions. In Alberto O. Mendelzon and Jan Paredaens, editors,Proc. 17th Symposium on Principles of Database Systems (PODS’98), pages 139–148. ACM, 1998.

19 Georg Gottlob and Christos H. Papadimitriou. On the complexity of single-rule datalog queries. Inf. Comput., 183(1):104–122, 2003.

20 Pavol Hell and Jaroslav Nešetřil. The core of a graph. Discrete Mathematics, 109:117–126, 1992.

21 Markus Krötzsch and Sebastian Rudolph. Extending decidable existential rules by joining acyclicity and guardedness. In Walsh [29], pages 963–968.

22 Markus Krötzsch and Veronika Thost. Ontologies for knowledge graphs: Breaking the rules.

In Yolanda Gil, Elena Simperl, Paul Groth, Freddy Lecue, Markus Krötzsch, Alasdair Gray, Marta Sabou, Fabian Flöck, and Hideaki Takeda, editors, Proc. 15th Int. Semantic Web Conf. (ISWC’16), volume 9981 ofLNCS, pages 376–392. Springer, 2016.

23 David Maier, Alberto O. Mendelzon, and Yehoshua Sagiv. Testing implications of data dependencies. ACM Transactions on Database Systems, 4:455–469, 1979.

24 Bruno Marnette. Generalized schema-mappings: from termination to tractability. In Jan Paredaens and Jianwen Su, editors,Proc. 28th Symposium on Principles of Database Sys- tems (PODS’09), pages 13–22. ACM, 2009.

25 Mariano Rodriguez-Muro, Roman Kontchakov, and Michael Zakharyaschev. Ontology- based data access: Ontop of databases. In Harith Alani, Lalana Kagal, Achille Fokoue, Paul T. Groth, Chris Biemann, Josiane Xavier Parreira, Lora Aroyo, Natasha F. Noy, Chris Welty, and Krzysztof Janowicz, editors, Proc. 12th Int. Semantic Web Conf. (ISWC’13), volume 8218 ofLecture Notes in Computer Science, pages 558–573. Springer, 2013.

26 Hartley Rogers, Jr.Theory of Recursive Functions and Effective Computability. MIT Press, paperback edition edition, 1987.

27 Sebastian Rudolph and Markus Krötzsch. Flag & check: Data access with monadically defined queries. In Richard Hull and Wenfei Fan, editors,Proc. 32nd Symposium on Prin- ciples of Database Systems (PODS’13), pages 151–162. ACM, 2013.

28 Yehoshua Sagiv. Optimizing datalog programs. In Moshe Y. Vardi, editor,Proc. Sixth ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems (PODS’87), pages 349–362. ACM, 1987.

29 Toby Walsh, editor.Proc. 22nd Int. Joint Conf. on Artificial Intelligence (IJCAI’11). AAAI Press/IJCAI, 2011.

30 Ke Wang and Li-Yan Yuan. Preservation of integrity constraints in definite DATALOG programs. Inf. Process. Lett., 44(4):185–193, 1992.

8 Appendix: Proof of Theorem 8

ITheorem 8. Incidental isΠ⁰₂-complete, and in particular neitherREnor coRE.