Datalog-Expressibility for Monadic and Guarded Second-Order Logic

(1)

Second-Order Logic

Manuel Bodirsky^1?, Simon Kn¨auer^1??, and Sebastian Rudolph^{2? ? ?}

1 TU Dresden, Institut f¨ur Algebra, Germany

2 TU Dresden, Computational Logic Group, Germany

Abstract. We characterise the sentences in Monadic Second-order Logic (MSO) that are over finite structures equivalent to a Datalog program, in terms of an existential pebble game. We also show that for every classC of finite structures that can be expressed in MSO and is closed under homomorphisms, and for all`, k∈N, there exists acanonical Datalog programΠ of width (`, k), that is, a Datalog program of width (`, k) which is sound for C (i.e., Π only derives the goal predicate on a finite structure Aif A ∈ C) and with the property thatΠ derives the goal predicate wheneversome Datalog program of width (`, k) which is sound for C derives the goal predicate. The same characterisations also hold for Guarded Second-order Logic (GSO), which properly extends MSO. To prove our results, we show that every classC in GSO whose complement is closed under homomorphisms is a finite union of constraint satisfaction problems (CSPs) ofω-categorical structures.

1 Introduction

Monadic Second-order Logic (MSO) is an important logic in theoretical computer science. By B¨uchi’s theorem, a formal language can be defined in MSO if and only if it is regular (see, e.g., [23]).

MSO sentences can be evaluated in polynomial time on classes of structures whose treewidth is bounded by a constant; this is known as Courcelle’s theorem [15]. The latter result even holds for the more expressive logic ofGuarded Second-order Logic (GSO)[17,20], which extends First-order Logic by second-order quantifiers over guarded relations. Guarded Second-order Logic contains Guarded First-order Logic (which itself captures many description logics [19]).

Another fundamental formalism in theoretical computer science, which is heavily studied in database theory, is Datalog (see, e.g., [23]). Every Datalog program can be evaluated on finite structures in polynomial time. Like MSO, Datalog strikes a good balance between expressivity and good mathematical and computational properties. Two important parameters of a Datalog programΠ are the maximal arity `of its auxiliary predicates (IDBs), and the maximal number k of variables per rule in Π. We then say that Π haswidth (`, k), following the terminology of Feder and Vardi [18]. These parameters are important both in theory and in practice:` closely corresponds to memory space and k to computation time needed when evaluatingΠ on a given structure.

In some scenarios we are interested in having the good computational properties of expressibility in Datalog and having the good computational properties of expressibility in MSO. And indeed, several important query formalisms, such as monadic Datalog, monadically defined queries, or nested monadically defined queries are contained both in MSO and in Datalog [24]. In this paper we are interested in two questions that (perhaps surprisingly) turn out to be closely related:

1. Which classes of finite structures are simultaneously expressible in MSO and in Datalog?

2. Which constraint satisfaction problems (CSPs)can be expressed in MSO, or, more generally, in GSO?

?The author has received funding from the European Research Council via the ERC Consolidator Grant No. 681988 (CSP-Infinity).

?? The author is supported by the DFG Research Training Group 1763 (QuantLA).

? ? ? The author has received funding from the European Research Council via the ERC Consolidator Grant No. 771779 (DeciGUT).

arXiv:2010.05677v1 [cs.LO] 12 Oct 2020

(2)

For a structureBwith a finite relational signatureτ, theconstraint satisfaction problem forBis the class of all finiteτ-structures that homomorphically map toB. Every finite-domain constraint satisfaction problem can already be expressed in monotone monadic SNP (MMSNP; [18]), which is a small fragment of MSO. On the other hand, the constraint satisfaction problem for (Q;<), which is the class of all finite acyclic digraphs (V;E), cannot be expressed in MMSNP [5], but can be expressed in MSO by the sentence

∀X6=∅ ∃x∈X ∀y∈X:¬E(x, y).

The class of CSPs of arbitrary infinite structuresBis quite large; it is easy to see that a class D of finite structures with a finite relational signatureτ is a CSP of a countably infinite structure if and only if

– it is closed under disjoint unions, and

– A∈ Dfor anyAthat maps homomorphically to someA⁰∈ D.

The second item can equivalently be rephrased as the complement ofD (meant within the class of all finite τ-structures; this comment applies throughout and will be omitted in the following) beingclosed under homomorphisms: a classC is closed under homomorphisms if for any structure A∈ Cthat maps homomorphically to someCwe haveC∈ C. Examples of classes of structures that are closed under homomorphisms naturally arise from Datalog. We say that a classC of finiteτ- structuresis in Datalog³if there exists a Datalog programΠwith a distinguished predicate nullary goal such that Π derives goal on a finite τ-structure if and only if the structure is in C; in this case, we write JΠKfor C. Every class ofτ-structures in Datalog is closed under homomorphisms.

However, not every class of finite structures in Datalog describes the complement of a CSP: consider for example, for unary predicatesRandB, the classCR,Bof finite{R, B}-structuresAsuch that R^A is empty orB^A is empty. Clearly, CR,B is not closed under disjoint unions. However, a finite structure is inCR,B if and only if the Datalog program that consists of just one rule

goal:−R(x), B(y) does not derivegoalon that structure.

An important class of CSPs is the class of CSPs for structures B that are countably infinite andω-categorical. A structureBisω-categorical if all countable models of the first-order theory of Bare isomorphic. A well-known example of anω-categorical structure is (Q;<), which is a result due to Cantor. Constraint satisfaction problems of ω-categorical structures can be evaluated in polynomial time on classes of treewidth bounded by some constantk∈N, by a result of Bodirsky and Dalmau [6]. The polynomial-time algorithm presented by Bodirsky and Dalmau is in fact a Datalog program of width (k−1, k). A Datalog program Π is called sound for a class of τ- structuresC ifJΠK⊆ C. Bodirsky and Dalmau showed that ifC is the complement of the CSP of an ω-categorical τ-structure B then there exists for all `, k ∈N a canonical Datalog program of width(`, k)forC, i.e., a Datalog programΠ of width (`, k) such that

– Π is sound forC, and

– JΠ⁰K⊆JΠKfor every Datalog programΠ⁰ of width (`, k) which is sound forC.

Moreover, whether the canonical Datalog program of width (`, k) forCderivesgoalon a givenτ- structureAcan be characterised in terms of the existential pebble game from finite model theory, played on (A,B) [6]. The existential`, kpebble game is played by two players, calledSpoiler and Duplicator (see, e.g., [16, 18, 22]). Spoiler starts by placingk pebbles on elementsa₁, . . . , a_k ofA, and Duplicator responds by placingkpebblesb₁, . . . , b_k onB. If the map that sendsa₁, . . . , a_k to b1, . . . , bk is not a partial homomorphism fromAto B, then the game is over and Spoiler wins.

Otherwise, Spoiler removes all but at most ` pebbles fromA, and Duplicator has to respond by removing the corresponding pebbles fromB. Then Spoiler can again place all his pebbles on A,

3 Warning: Feder and Vardi [18] say that a CSP is in Datalog if itscomplement in the class of all finite τ-structures is in Datalog.

(3)

and Duplicator must again respond by placing her pebbles onB. If the game continues forever, then Duplicator wins. IfB is a finite, or more generally a countableω-categorical structure then Spoiler has a winning strategy for the existential `, k pebble game on (A,B) if and only if the canonical Datalog program for CSP(B) derives goalonA(Theorem 21). This connection played an essential role in proving Datalog inexpressibility results, for example for the class of finite- domain CSPs [2] (leading to a complete classification of those finite structures B such that the complement of CSP(B) can be expressed in Datalog [3]).

Results and Consequences

We present a characterisation of those GSO sentencesΦthat are over finite structures equivalent to a Datalog program. Our characterisation involves a variant of the existential pebble game from finite model theory, which we call the (`, k)-game. This game is defined for a homomorphism-closed classCof finiteτ-structures, and it is played by the two players Spoiler and Duplicator on a finite τ-structureAas follows.

– Duplicator picks a countable τ-structureBsuch that CSP(B)∩ C=∅.

– The game then continues as the existential (`, k) pebble game played by Spoiler and Duplicator on (A,B).

In Section 6 we show that a GSO sentenceΦis equivalent to a Datalog program of width (`, k) if and only if

– JΦKis closed under homomorphisms, and

– Spoiler wins the existential (`, k)-game forJΦKonAif and only ifA|=Φ.

We also show that for every GSO sentence Φ whose class of finite models C is closed under homomorphisms and for all `, k∈N there exists a canonical Datalog programΠ of width (`, k) for C (Theorem 24). To prove these results, we first show that every class of finite structures in GSO whose complement is closed under homomorphisms is a finite union of CSPs that can also be expressed in GSO (Lemma 17; an analogous statement holds for MSO). Moreover, every CSP in GSO is the CSP of a countableω-categorical structure (Corollary 10). We present an example of such a CSP which is even expressible in MSO and coNP-complete, and hence not the CSP of a reduct of a finitely bounded homogeneous structure, unless NP=coNP (Proposition 20). Note that our results imply that every class of finite structures that can be expressed both in in GSO and in Datalog is a finite intersection of the complements of CSPs for ω-categorical structures.

In general, it is not true that a Datalog program describes a finite intersection of complements of CSPs (we present a counterexample in Example 19).

2 Preliminaries

In the entire text, τ denotes a finite signature containing relation symbols and sometimes also constant symbols. IfR∈τ is a relation symbol, we writear(R) for its arity. If Ais aτ-structure we use the corresponding capital roman A letter to denote the domain of A; the domains of structures are assumed to be non-empty. IfR∈τ, thenR^A⊆A^ar(R) denotes the corresponding relation ofA.

A primitive positive τ-formula (in database theory also conjunctive query) is a first-order τ-formula without disjunction, negation, and universal quantification. Every primitive positive formula is equivalent to a formula of the form

∃x1, . . . , x_n(ψ₁∧ · · · ∧ψ_m)

whereψ1, . . . , ψmare atomicτ-formulas. Thecanonical database of a primitive positiveτ-formula φwhich does not contain the equality symbol is theτ-structureSφwhose vertices are the variables ofφand where (y1, . . . , yn)∈R^Sif and only ifφcontains the conjunctR(y1, . . . , yn). Anexistential

(4)

positive τ-formula is a first-order τ-formula without negation and universal quantification. It is easy to see that every existential positiveτ-formula is a disjunction of primitive positiveτ-formulas (and hence referred to as aunion of conjunctive queriesin database theory). We writeψ(x₁. . . , x_n) if the free variables ofψare fromx₁, . . . , x_n. IfAis aτ-structure andψ(x₁, . . . , x_n) is aτ-formula, then the relation

{(a1, . . . , an)|A|=ψ(a1, . . . , an)}

is called the relationdefined byψ overA; ifψcan be chosen to be primitive positive (or existential positive) then Ris calledprimitively positively definable (orexistentially positively definable, respectively).

For all logics over the signatureτconsidered in this text, we say that two formulasΦ(x1, . . . , xn) andΨ(x1, . . . , xn) areequivalent if for allτ-structuresAand alla1, . . . , an∈Awe have

A|=Φ(a₁, . . . , a_n)⇔A|=Ψ(a₁, . . . , a_n).

Formulas without free variables are calledsentences; in database theory, formulas are often called queries and sentences are often called Boolean queries. If Φ is a sentence, we write JΦK for the class of all finite models ofΦ.

Areduct of a relational structureAis a structureA⁰ obtained fromAby dropping some of the relations, andAis called anexpansion ofA⁰.

2.1 Datalog

In this section we refer to the finite set of relation and constant symbolsτasEDBs(forextensional database predicates). Letρbe a finite set of new relation symbols, called theIDBs(forintensional database predicates). A Datalog program is a set of rules of the form

ψ₀:−ψ₁, . . . , ψ_n

whereψ0is an atomicρ-formula andψ1, . . . , ψn are atomic (ρ∪τ)-formulas. IfAis aτ-structure, andΠ is a Datalog program with EDBsτ and IDBsρ, then a (τ∪ρ)-expansionA⁰ ofAis called afixed point of Π on AifA⁰ satisfies the sentence

∀¯x(ψ0∨ ¬ψ1∨ · · · ∨ ¬ψn)

for each ruleψ0:−ψ1, . . . , ψn. IfA1 andA2 are two (ρ∪τ)-structures with the same domainA, thenA1∩A2denotes the (ρ∪τ)-structure with domainAsuch thatRÂ¹^∩A²:=RÂ¹∩RÂ². Note that if A1 andA2 are two fixed points ofΠ onA, thenA1∩A2 is a fixed point of Π onA, too.

Hence, there exists a unique smallest (with respect to inclusion) fixed point of Π on A, which we denote by Π(A). It is well-known that if Ais a finite structure then Π(A) can be computed in polynomial time in the size of A[23]. If R ∈ρ, we also say thatΠ computes R^Π(A) on A. A Datalog program together with a distinguished predicateR∈ρmay also be viewed as a formula, which we also call a Datalog query, and which over a given τ-structure A denotes the relation R^Π(A). If the distinguished predicate has arity 0, we often call it thegoal predicate; we say thatΠ derivesgoalon Aifgoal^Π(A)={()}. The classCof finiteτ-structuresAsuch thatΠ derivesgoal onAis called the class of finiteτ-structures computed byΠ, and denoted byJΠK. Note that this classC is in universal second-order logic (we have to express that in every expansion of the input by relations for the IDBs that satisfies all the rules of the Datalog program the goal predicate is non-empty).

2.2 Second-Order Logic

Second-order logic is the extension of first-order logic which additionally allows existential and universal quantification over relations; that is, if R is a relation symbol and φis a second-order τ∪ {R}-formula, then ∃R:φand ∀R:φare second-order τ-formulas. IfAis aτ-structure and Φ is a second-order τ-sentence, we write A|=Φ (and say thatA is a model of Φ) if A satisfiesΦ, which is defined in the usual Tarskian style. We write JΦKfor the class of all finite models ofΦ.

A second-order formula is calledmonadicif all second-order variables are unary. We use syntactic sugar and also write∀x∈X: ψinstead of∀x(X(x)⇒ψ) and∃x∈X:ψinstead of∃x(X(x)∧ψ).

(5)

2.3 Guarded Second-Order Logic

Guarded Second-order Logic (GSO), introduced by Gr¨adel, Hirsch, and Otto [20], is the extension ofguarded first-order logicby second-order quantifiers. Guarded (first-order)τ-formulas are defined inductively by the following rules [1]:

1. all atomicτ-formulas are guardedτ-formulas;

2. ifφandψare guarded τ-formulas, then so areφ∧ψ,φ∨ψ, and¬φ.

3. ifψ(¯x,y) is a guarded¯ τ-formula andα(¯x,y) is an atomic¯ τ-formula such that all free variables ofψoccur inαthen∃y α(¯¯ x,y)¯ ∧ψ(¯x,y)¯

and∀y α(¯¯ x,y)¯ ⇒ψ(¯x,y)¯

are guardedτ-formulas.

Guarded second-order formulas are defined similarly, but we additionally allow (unrestricted) second-order quantification; GSO generalises Courcelle’s logic MSO₂from graphs to general relational structures.

Definition 1. A second-orderτ-formula is called guarded if it is defined inductively by the rules (1)-(3) for guarded first-order logic and additionally by second-order quantification.

There are many semantically equivalent ways of introducing GSO [20]. LetBbe aτ-structure.

Then (t₁, . . . , t_n)∈Bⁿ is calledguarded inBif there exists an atomicτ-formulaφandb₁, . . . , b_k such that B|=φ(b₁, . . . , b_k) and{t₁, . . . , t_n} ⊆ {b₁, . . . , b_k}. Note that (for n= 1) every element ofB is guarded (because of the atomic formulax=x). A relationR⊆Bⁿ is calledguarded if all tuples inRare guarded. Note that all unary relations are guarded. IfΨ is an arbitrary second-order sentence, we say that a finite structureAsatisfiesΨ with guarded semantics, in symbolsA|=gA, if all second-order quantifiers in Ψ are evaluated over guarded relations only. Note that for MSO sentences, the usual semantics and the guarded semantics coincide.

Sometimes, we will also use the term GSO (MSO, Datalog) to denote all problems (i.e., all classes of structures) that can be expressed in the formalism. In particular, this justifies to say that a certain CSP isin GSO (MSO, Datalog).

Proposition 2 (see [20]).Guarded Second-order Logic and full Second-order Logic with guarded semantics are equally expressive.

It follows that GSO is at least as expressive as MSO. There are Datalog programs that are equivalent to a GSO sentence, but not to an MSO sentence. The proof is based on a variant of an example of a Datalog query in GSO given in [11] (Example 2) and can be found in the next section.

3 A Class in Datalog and GSO, but not in MSO

Proposition 3. There is a Datalog query that can be expressed in GSO but not in MSO.

Proof. Letτ be the signature consisting of the binary relation symbolsS, T, R, N, and let C be the class of finite τ-structures such that the following Datalog program with one binary IDB U derivesgoal.

U(x, y) :−S(x, y)

U(x⁰, y⁰) :−U(x, y), N(x, x⁰), N(y, y⁰), R(x⁰, y⁰) goal:−U(x, y), T(x, y)

On the left of Figure 1 one can find an example of a{S, T, R, N}-structureBwhere the given Datalog program derives goal. To show that C is not MSO definable, suppose for contradiction that there exists an MSO sentenceΦsuch thatJΦK=C. We useΦto construct an MSO sentence Ψ which holds on a finite wordw∈ {a, b}^∗ (represented as a structure with signaturePa, Pb, <in the usual way [23]) if and only if w∈ {aⁿbⁿ |n ≥1}; this contradicts Courcelle’s theorem [15].

LetΦ⁰ be the MSO sentence obtained fromΦby replacing all subformulas ofΦof the form

(6)

Fig. 1.An example of an{S, T, R, N}-structureBin the classC of Proposition 3.

– S(x, y) by a formulaφ_S(x, y) that states thatxis the smallest element with respect to<, that P_b(y), and that there is noz < yin P_b;

– T(x, y) by a formulaφT(x, y) that states thatPa(x), that there is noz > xinPa, and that y is the largest element with respect to<;

– R(x, y) by the formulaφ_R(x, y) given byx < y;

– N(x, y) by a formulaφN(x, y) stating thaty is the next element afterxwith respect to <.

The resulting MSO sentence Ψ₁ has the signature {Pa, P_b, <}; let Ψ be the conjunction of Ψ₁ with the sentence Ψ₂ which states that for all x, y∈A, if x < y and P_a(y) then P_a(x). We first show that ifAis a{<, Pa, P_b}-structure that represents a wordw_A∈ {a, b}^∗, then A|=Ψ if and only if w_A is of the form aⁿbⁿ for some n ≥ 1. Let B be the {S, T, R, N}-structure such that for X ∈ {S, T, R, N} we have X^B :={(x, y)|A|=φ_X(x, y)}. See Figure 1 for an example of a structureAsuch thatw_A=a⁴b⁴ and the corresponding{S, T, R, N}-structureB.

If wA is of the form aⁿbⁿ for some n ≥ 1, then A clearly satisfies Ψ2. To show that it also satisfiesΨ1, letv1, . . . , vn, w1, . . . , wn∈Abe such that{v1, . . . , vn}=P_aÂand{w1, . . . , wn}=P_bÂ such that for alli, j∈ {1, . . . , n}, ifi < j thenvi<Âvj and wi<Âwj. Then

(v1, w1)∈S^B, (vn, wn)∈T^B,

(vi, wi)∈R^Bfor alli∈ {2, . . . , n−1}, (1) (vi, vi+1),(wi, wi+1)∈N^Bfor alli∈ {1, . . . , n−1}.

It follows thatBsatisfiesΦand thereforeA|=Ψ.

For the converse direction, suppose that A |= Ψ. Clearly, wA ∈ a^∗b^∗ because A |= Ψ2. Moreover, since A |= Ψ₁ we have that B |= Φ, and hence there exist n ∈ N and elements v₁, . . . , v_n, w₁, . . . , w_n ∈ A such that B satisfies (1). We first prove that P_aÂ = {v1, . . . , v_n} and |P_aÂ| = n. Since (v_n, w_n) ∈ T^B we have φ_T(v_n, w_n) and hence v_n ∈ P_aÂ. Since B |= N(v₁, v₂), . . . , N(v_n−1, v_n) we have that v₁ < v₂ < · · · < v_n−1 < v_n holds in A and it also follows that |P_aÂ| =n. Then for every i ∈ n we have that v_i ∈ P_aÂ because v_i ≤ v_n, v_n ∈ P_aÂ, and w_A ∈a^∗b^∗. Now suppose for contradiction that there existsx∈P_aÂ\ {v₁, . . . , v_n}; choose x largest with respect to <Â. Since (vn, wn) ∈T^B and x∈ P_aÂ we must have x≤ vn, and hence x < vnsincex /∈ {v1, . . . , vn}. Then there existsy∈Asuch thatφN(x, y) holds inA. Sincey≤vn, vn ∈P_aÂ, andwA∈a^∗b^∗, we must have P_aÂ. By the maximal choice ofxwe get that y =vi for somei∈ {1, . . . , n}. But thenφN(x, vi) implies thatx∈ {v1, . . . , v_n−1}, a contradiction. Similarly, one can prove thatP_bÂ={w1, . . . , wn} and that|P_bÂ|=n. This implies thatwA=aⁿbⁿ.

We finally have to prove that C is in GSO. Let Φ be the GSO {S, T, R, N} sentence with existentially quantified unary relations V, W, and existentially quntified binary relations R⁰ ⊆R andN⁰⊆N, which states that

– there are elements v1, vn∈V andw1, wn∈W such thatS(v1, w1) andT(vn, wn) hold;

– for every x∈V \ {v₁} there exists a unique elementy∈V \ {v_n} such thatN⁰(y, x) holds;

– for every x∈V \ {vn}there exists a unique elementy∈V \ {v1} such thatN⁰(x, y) holds;

(7)

– for every x∈W\ {w1}there exists a unique elementy∈W\ {wn} such thatN⁰(y, x) holds;

– for every x∈W\ {wn}there exists a unique elementy∈W \ {w1} such thatN⁰(x, y) holds;

– for allv∈V andw∈W we have thatN⁰(v1, v)∧N⁰(w1, w) impliesR⁰(v, w).

– for allv, v⁰∈V\{v1, vn}andw, w⁰ ∈W\{w1, wn}we have thatR⁰(v, w)∧N⁰(v, v⁰)∧N⁰(w, w⁰) impliesR⁰(v, w).

– For allv∈V andw∈W we have thatN⁰(v, v_n)∧N⁰(w, w_n) impliesR⁰(v, w).

ThenΦholds on a finite{S, T, R, N}-structureBif and only ifBhas elementsv1, . . . , vn, w1, . . . , wn

satisfying (1), which is the case if and only ifB∈ C.

4 Homomorphism-Closed GSO

We prove that the class of finite models of a GSO sentence is a finite union of CSPs ofω-categorical structures whenever its complement is closed under homomorphisms. In particular, every CSP in GSO (and therefore every CSP in MSO) is the CSP of an ω-categorical structure. CSPs that can be formulated as the CSP of an ω-categorical structure have been characterised [8]; this characterisation will be recalled in the next section.

4.1 CSPs for Countably Categorical Structures

By the theorem of Ryll-Nardzewski, a countable structureBisω-categorical if and only if for every n∈Nthere are finitely many orbits of the componentwise action of the automorphism group of BonBⁿ (see, e.g., [21]). We now present a condition that characterises classes of structures that are CSPs ofω-categorical structures. LetCbe a class of finiteτ-structures. LetΛn be the class of primitive positiveτ-formulas with free variablesx1, . . . , xn whose canonical database is inC. We define∼^C_n to be the equivalence relation onΛsuch thatφ1∼^C_nφ2holds if for all primitive positive τ-formulas ψ(x1, . . . , xn) we have that the canonical database of φ1(x1, . . . , xn)∧ψ(x1, . . . , xn) belongs toC if and only if the canonical database of φ2(x1, . . . , xn)∧ψ(x1, . . . , xn) belongs to C.

Theindex of an equivalence relation is the number of its equivalence classes.

Theorem 4 (Bodirsky, Hils, Martin [8]). Let C be a constraint satisfaction problem. Then there is an ω-categorical structure B such that C = CSP(B) iff ∼^C_n has finite index for all n.

Moreover, the structure B can be chosen so that for all n ∈ N the orbits of the componentwise action of the automorphism group ofB onBⁿ are primitively positively definable inB.

Example 5. The structure B1 := (Z;<) is not ω-categorical. However,∼^CSP(Bn ¹⁾ has finite index for all n, and indeed CSP(Z;<) = CSP(Q;<) and (Q;<) is ω-categorical. On the other hand, for B2 := (Z; Succ) we have that the index ∼^CSP(B₂ ²⁾ is infinite, and it follows that there is no ω-categorical structureBsuch that CSP(B2) = CSP(B); see [5].

A rich source ofω-categorical structures are structures with finite relational signature that are homogeneous, i.e., every isomorphism between finite substructures can be extended to an automorphism. There are uncountably many countable homogeneous digraphs with pairwise distinct CSP, and it follows that there are homogeneous digraphs with undecidable CSPs. A structureBis calledfinitely bounded if there exists a finite setF of finite structures such that a finite structure Aembeds intoBif and only if no structure inF embeds intoA.

It is well-known that if a structure is ω-categorical, then all of itsreducts areω-categorical as well [21]. Moreover, it is easy to see that the CSP of reducts of finitely bounded structures is in NP. It has been conjectured that the CSP of reducts of finitely bounded homogeneous structures is in P or NP-complete [10]; this conjecture generalises the finite-domain complexity dichotomy that was conjectured by Feder and Vardi [18] and proved by Bulatov [13] and by Zhuk [25].

(8)

4.2 Quantifier Rank

In this section, it will be convenient to work with signatures that also contain constant symbols.

The quantifier rank of a second-order τ-formulaΦ is the maximal number of nested (first-order or second-order) quantifiers inΦ; for this definition, we view Φas a second-order sentence with guarded semantics, just as in [4]. IfAandBareτ-structures andq∈Nwe write A≡^GSO_q BifA andBsatisfy the same GSO τ-sentences of quantifier rank at mostq.

Lemma 6 (Proposition 3.3 in [4]). Let q ∈ N and τ be a finite signature with relation and constant symbols. Then≡^GSO_q is an equivalence relation with finite index on the class of all finiteτ- structures. Moreover, every class of≡^GSO_q can be defined by a single GSO sentence with quantifier rank q. The analogous statements hold for MSO as well.

If A is a τ-structure and ¯a is a k-tuple of elements of A, then we write (A,¯a) for a τ ∪ {c1, . . . , ck}-structure expandingAwherec1, . . . , ck denote fresh constant symbols being mapped to the corresponding entries of ¯a. If A and B are τ-structures and ¯a ∈ A^k, ¯b ∈ B^k, and when writing (A,¯a)≡^GSO_q (B,¯b) we implicitly assume that we have chosen the same constant symbols for ¯aand for ¯b.

Lemma 7 (Proposition 3.4 in [4]).Letq∈Nand letAandBbeτ-structures. ThenA≡^GSO_q+1 B if and only if the following properties hold:

– (first-order forth) For every a∈A, there existsb∈B such that(A, a)≡^GSO_q (B, b).

– (first-order back) For everyb∈B, there existsa∈A such that(A, a)≡^GSO_q (B, b).

– (second-order forth) For every expansionA⁰ of Aby a guarded relation, there exists an expan- sionB⁰ of Bby a guarded relation such that A⁰≡^GSO_q B⁰.

– (second-order back) For every expansionB⁰ ofBby a guarded relation, there exists an expan- sionA⁰ ofAby a guarded relation such that A⁰≡^GSO_q B⁰.

In the following,τ denotes a finite relational signature.

Definition 8. Let ρ:={c1, . . . , cn} be a finite set of constant symbols. Then Dn is defined to be the set of all pairs(A,B)of finite (τ∪ρ)-structures such that

– c^A=c^Bfor all constant symbols c∈ρ;

– {c^A₁, . . . , c^A_n}=A∩B={c^B₁, . . . , c^B_n}.

We writeA]Bfor the structure with domainA∪Bsuch thatRÂ]B:=RÂ∪R^Bfor each relation symbol R∈τ andcÂ]B=cÂ=c^Bfor each constant symbol c∈ρ.

The following theorem in the special case of n= 0 is Proposition 4.1 in [4].

Theorem 9. Let q, n, r, s ∈ N, let (A₁,B₁),(A₂,B₂) ∈ Dn, and let ¯a₁ ∈ (A₁)^r, ¯a₂ ∈ (A₂)^r,

¯b₁∈(B₁)^s,¯b₂∈(B₂)^s be such that(A₁,¯a₁)≡^GSO_q (A₂,¯a₂) and(B₁,¯b₁)≡^GSO_q (B₂,¯b₂). Then (A₁]B₁,¯a₁,¯b₁)≡^GSO_q (A₂]B₂,a¯₂,¯b₂).

Proof. Our proof is by induction onq. Every quantifier-free formula is a Boolean combination of atomic formulas, so for q= 0 it suffices to consider atomic formulas φ. By symmetry, it suffices to show that if (A1]B1,¯a1,¯b1)|=φthen (A2]B2,¯a2,¯b2)|=φ. Thenφis built using a relation symbolR∈τ, and the tuple that witnesses the truth ofφinA₁]B₁must be fromR^A¹ or from R^B¹, by the definition ofA₁]B₁. We first consider the former case; the latter case can be treated similarly. If a constant that appears inφis fromA₁∩B₁, then by the definition ofD_n this element is denoted by a constant symbolc∈ρ, and therefore we may assume without loss of generality that φis a formula over the signature of (A₁,¯a₁). Hence, (A₁,a¯₁)|=φand by assumption (A₂,a¯₂)|=φ.

This in turn implies that (A2]B2,a¯2,¯b2)|=φ.

For the inductive step, suppose that the claim holds forq, and that (A1,¯a1)≡^GSO_q+1 (A2,¯a2) and (B1,¯b1)≡^GSO_q+1 (B2,¯b2). By symmetry and Lemma 7 it suffices to verify the properties (first-order

(9)

forth) and (second-order forth). Let c₁ ∈A₁∪B₁. We may assume thatc₁ ∈A₁; the case that c₁ ∈B₁ can be shown similarly. By Lemma 7, there exists c₂ ∈ A₂ such that (A₁,a¯₁, c₁)≡^GSO_q (A₂,¯a₂, c₂). By the inductive assumption, this implies that

(A1]B1,a¯1, c1,¯b1)≡^GSO_q (A2]B2,a¯2, c2,¯b2) and concludes the proof of (first-order forth).

Now let R be a guarded relation of A₁]B₁ of arity k. Let A⁰₁ be the expansion of A₁ by the guarded relation R ∩A^k₁, and B⁰₁ be the expansion of B₁ by the guarded relation R ∩ B₁^k. By Lemma 7 there are expansions A⁰₂ of A and B⁰₂ of B₂ by guarded relations such that (A⁰₁,¯a₁)≡^GSO_q (A⁰₂,¯a₂) and (B⁰₁,¯b₁)≡^GSO_q (B⁰₂,¯b₂). By the inductive assumption, this implies that (A⁰₁]B⁰₁,a¯1,¯b1)≡^GSO_q (A⁰₂]B⁰₂,¯a2,¯b2), which completes the proof of (second-order forth).

Corollary 10. Let C be a CSP that can be expressed in GSO. Then there exists a countable ω-categorical structureB such thatC= CSP(B).

Proof. Let τ be the signature of C, and let Φ be a GSO τ-formula with quantifier-rank q such that C=JΦK. By Theorem 4 it suffices to show that the equivalence relation ∼^C_n has finite index for every n ∈ N. Let ρ := {c1, . . . , cn} be a set of new constant symbols. By Lemma 6, there exists anm∈Nsuch that≡^GSO_q hasmequivalence classes on (τ∪ρ)-structures. Ifφ(x1, . . . , xn) is a primitive positive τ-formula, then defineSφ to be the (τ ∪ρ)-structure obtained from the canonical database ofφby settingc^S_i ^φ :=xifor alli∈ {1, . . . , n}.

We claim that if Sφ ≡^GSO_q Sψ, then φ ∼^C_n ψ. Let θ(x1, . . . , xn) be a primitive positive τ- formula; we may assume that the existentially quantified variables of θ are disjoint from the existentially quantified variables ofφand of ψ, so that (Sφ,Sθ),(Sψ,Sθ)∈ Dn. Since Sφ≡^GSO_q Sψ and Sθ ≡^GSO_q Sθ, we have Sφ]Sθ ≡^GSO_q Sψ]Sθ by Theorem 9. This implies that the canonical database ofφ∧θ satisfiesΦ if and only if the canonical database ofψ∧θ does, which proves the claim.

The claim implies that∼^C_n has at mostmequivalence classes, concluding the proof.

Example 11. LetΦbe the following MSO sentence.

∀X ∃x:X(x)∧ ∃x, y∈X ∀z∈X(¬E(x, z)∨ ¬E(y, z))

It is easy to see thatJΦK is closed under disjoint unions and that its complement is closed under homomorphisms. Corollary 10 implies that there exists a countableω-categorical structure with CSP(B) =JΦK.

Example 12. Fork≥2, letΨk be the following MSO sentence.

∀X¬ ∀x, y(E(x, y)⇒E(y, x))∧ ∀x:¬E(x, x)

∧ ∃x:x∈X∧ ∀x, y, z∈X:¬(E(x, y)∧E(y, z)∧E(z, x))

∧ ∀¯x,y¯∈X( ^

i,j≤k

¬E(xi, xj)⇒ ∃z∈X ^

i≤k

E(xi, z)∧^

i≤`

¬E(yi, z)

The formula states that a given graph is symmetric and without loops, and does not contain non- empty triangle-free graphs that satisfy the triangle-freek-point extension property. Starting from k≥5, it is not known whetherΨkdescribes the class of all graphs; see [14]. It is easy to check that JΨkKis closed under disjoint unions and that its complement is closed under homomorphisms, and again the assumptions of Corollary 10 hold.

4.3 Finite Unions of CSPs

In this section we prove that every class in GSO whose complement is closed under homomorphisms is a finite union of CSPs (Lemma 17); the statement announced at the beginning of Section 4 then

(10)

follows (Corollary 18). Throughout this section, let C be a non-empty class of finiteτ-structures whose complement is closed under homomorphisms. In particular,C contains the structureIwith only one element where all relations are empty.

Let ∼be the equivalence relation defined on C by letting A∼B if for everyC∈ C we have A]C∈ C if and only ifB]C∈ C; here]denotes the usual disjoint union of structures, which is a special case of Definition 8 for n= 0. Note that the equivalence classes of∼are in one-to-one correspondence to the equivalence classes of∼^C₀. Also note thatC is closed under disjoint unions if and only if∼has only one equivalence class.

Remark 13. Since the complement ofCis closed under homomorphisms,C is closed under disjoint unions if and only if it has the joint homomorphism property: if A ∈ C and B ∈ C, then there exists aC∈ C such thatAandBmap homomorphically toC. The joint homomorphism property is undecidable even ifC is given by a universal first-order sentenceφ(i.e.,C=JφK) [12].

If A∈ C, then we write [A] for the equivalence class of A with respect to ∼. The following observations are immediate consequences from the definitions:

1. [A] is closed under disjoint unions.

2. ∼-equivalence classes are closed under homomorphic equivalence.

3. A∈[I] if and only ifA]B∈ C for allB∈ C.

Lemma 14. Let A ∈ C and let D be the smallest subclass of C that contains [A] and whose complement is closed under homomorphisms. Then

1. D is a union of equivalence classes of∼, and

2. if∼has more than one equivalence class, then C \ D is non-empty.

Proof. LetC∈ D, letBbe a finite structure with a homomorphism toC, and letB⁰ ∈[B]. Since B]CandCare homomorphically equivalent, we have thatB]C∼C. We claim thatB⁰]C∼C.

To see this, letD∈ C. Then

C]D∈ C ⇔(B]C)]D∈ C (sinceB]C∼C)

⇔B](C]D)∈ C

⇔B⁰](C]D)∈ C (sinceB∼B⁰)

⇔(B⁰]C)]D∈ C

which shows the claim. SoB⁰]C∈[C] = [A]. SinceB⁰ has a homomorphism toB⁰]Cwe obtain thatB⁰∈ D; this proves the first statement.

To prove the second statement, first observe that the statement is clear if A∈ [I], since the complement of [I] is closed under homomorphisms. The statement therefore follows from the assumption that∼has more than one equivalence class. Otherwise, ifA∈/[I], then there exists a structureB∈ Csuch thatA]B∈ C. Then/ B∈ C \Dcan be shown indirectly as follows: otherwise Bwould have a homomorphism to a structureA⁰∈[A]. SinceB]A⁰is homomorphically equivalent to A⁰, we have B]A⁰ ∼ A⁰ ∼A and in particular B]A⁰ ∈ C. But B]A⁰ ∈ C if and only if B]A∈ C sinceA∼A⁰. This is in contradiction to our assumption onB.

Example 15. We consider a signatureτ:={R₁, R₂, R₃}of unary relation symbols. Define for every i∈ {1,2,3} theτ-structure S_i to be a one-element structure where R_i is non-empty and R_j, for j6=i, is empty. Let

C:= CSP(S1]S₂)∪CSP(S₂]S₃)∪CSP(S₃]S₁).

Clearly, the complement of C is closed under homomorphisms. The equivalence classes of ∼can be described as follows. For distincti, j∈ {1,2,3},

[Si]Sj] = CSP(Si]Sj)\(CSP(Si)∪CSP(Sj)) [S_i] = CSP(S_i)\[I]

[I] = CSP(I).

(11)

Let q ∈ N and let τ be a finite relational signature. Recall that Lemma 6 asserts that the equivalence relation≡^GSO_q on the class of finiteτ-structures has finitely many equivalence classes C₁, . . . ,C_m, and that each of the equivalence classesC_i can be defined by a single GSOτ-sentence Ψ_i with quantifier rank q; we write T_q^τ := {Ψ₁, . . . , Ψ_m} for this set of GSO sentences. If Φ is a GSOτ-sentence of quantifier rank q, let Φ1, . . . , Φn, for some n≤m, be the set of all sentences inT_q^τ that implyΦ. Thennis called thedegree ofΦ. Let∼be the equivalence relation defined in the beginning of this section for the classC:=JΦK.

Lemma 16. Let Φbe a GSOτ-sentence of quantifier rankq. Then for every classD ⊆JΦKwhich is a union of∼-classes there exists a GSO sentenceΨ of quantifier rank qsuch that D=JΨK. Proof. LetΦ1, . . . , Φn be all the formulas inT_q^τ that implyΦ. As in the proof of Corollary 10 one can use Theorem 9 to show for all finite τ-structures A,B that if A ≡^GSO_q B, then A ∼ B. It follows that there existsI⊆ {1, . . . , n} such that D=S

i∈IJΦiK, andW

i∈IΦi is the desired GSO sentence.

Lemma 17. Let Φ be a GSOτ-sentence such that the complement ofJΦKis closed under homomorphisms. Then there are GSO sentencesΦ1, . . . , Φmeach of which describes a CSP such that Φ is equivalent to Φ1∨ · · · ∨Φm. If Φis an MSO sentence, then Φ1, . . . , Φm can be be chosen to be MSO sentences as well.

Proof. Letqbe the quantifier rank of Φ. We prove the statement by induction on the degreenof Φ. Note that Lemma 16 implies that∼has at mostnequivalence classes on τ-structures. Hence, ifn= 1, thenJΦKis closed under disjoint unions, and we are done.

LetA1, . . . ,As beτ-structures such that{[A1], . . . ,[As]}is the set of all equivalence classes of

∼that are distinct from [I]. Let Ci be the smallest subclass ofJΦK that contains [Ai] and whose complement is closed under homomorphisms. Note that JΦK = S

i≤sCi since [I] is contained in Ci for all i ≤ s. By Lemma 14 (1), each class Ci is a union of ∼-classes, and by Lemma 16 it follows that there exists a GSO sentenceΨi of quantifier rankq such that Ci =JΨiK. Hence,Φis equivalent toW

i≤sΨi. Lemma 14 (2) asserts thatJΦK\ Ciis non-empty, and hence the degree ofΨi

must be strictly smaller thann. The statement now follows from the inductive assumption. The same argument applies to MSO as well.

Lemma 17 together with Corollary 10 implies the following.

Corollary 18. Every GSO sentence which is closed under homomorphisms is equivalent to a finite conjunction of GSO sentences each of which describes the complement of a CSP of anω-categorical structure. The analogous statement holds for MSO.

Not every class of structures that can be expressed in Second-order Logic is a finite intersection of complements of CSPs. We even have an example of a class of finiteτ-structures that can be expressed in Datalog but cannot be written in this form.

Example 19. LetS andT be unary, and let Rbe a binary relation symbol. LetC be the class of all finite{S, T, R}-structuresAsuch that the following Datalog programΠ with the binary IDB E derivesgoalonA.

E(x, y) :−S(x), S(y)

E(x, y) :−E(x⁰, y⁰), R(x⁰, x), R(y⁰, y) goal:−T(x), E(x, x⁰), R(x⁰, y)

Forn∈N, letP_n be the{S, T, R}-structure on the domain{1, . . . , n}with S^Pⁿ:={1} T^Pⁿ:={n} R^Pⁿ:=

(i, i+ 1)|i∈ {1, . . . , n−1} .

It is easy to see that each of the structures in{Pn |n≥1} is not contained in C, and that the disjoint union ofPiandPj, fori6=j, is contained inC. It follows thatCis not a finite intersection of complements of CSPs (and, by Corollary 18, cannot be expressed in GSO).

(12)

5 A coNP-complete CSP in MSO

In this section we show that the class of CSPs in MSO is (under complexity-theoretic assumptions) larger than the class of CSPs for reducts of finitely bounded structures (see Section 4.1). Let T ={T2,T3, . . .} be the set ofHenson tournaments: the tournament Tn, for n≥2, has vertices 0,1, . . . , n+ 1 and the following edges:

– (i, i+ 1) for i∈ {0, . . . , n};

– (0, n+ 1);

– (j, i) fori+ 1< j and (i, j)6= (0, n+ 1).

The class C of all finite loopless digraphs that do not embed any of the digraphs from T is an amalgamation class, and hence there exists a homogenous structure H with age C. It has been shown in [7] that CSP(H) is coNP-complete.

Proposition 20. CSP(H)can be expressed in MSO.

Proof. We have to find an MSO sentence that holds on a given digraph (V;E) if and only if (V;E) does not embed any of the tournaments from T. We specify an MSO {X, E}-sentence Φ, for a unary relation symbol X, that is true on a finite {X, E}-structure S if and only if (X^S;E^S) is isomorphic toT_n, for some n≥2. In φwe existentially quantify over

– two vertices s, t∈X (that stand for the vertex 0 and the vertexn+ 1 inT_n).

– a partition ofX\ {s} into two setsAandB (they stand for the set of even and the set of odd numbers in{1, . . . , n+ 1}).

The formulaΦhas the following conjuncts:

1. a first-order formula that states thatE defines a tournament onX;

2. a first-order formula that expresses thatE is a linear order onAwith maximal elementa;

3. a first-order formula that expresses thatE is a linear order onB with maximal elementb;

4. E(s, t),E(s, a),E(a, b), andE(x, s) for allx∈X\ {a, t};

5. a first-order formula that states that if there is an edge from an elementx∈Ato an element y∈B then there is precisely one elementz∈Asuch that (y, z),(z, x)∈E, unlessy=t;

6. a first-order formula that states that if there is an edge from an element x∈B to an element y∈Athen there is precisely one elementz∈B such that (y, z),(z, x)∈E, unlessy=t.

We claim that the MSO sentence ∀x:¬E(x, x)∧ ∀X: ¬Φ holds on a finite digraph if and only if the digraph is loopless and does not embedTn, for alln≥3. The forwards implication easily follows from the observation that if (X;T) is isomorphic to Tn, for some n ≥ 2, then φ holds;

this is straightforward from the construction ofΦ(and the explanations above given in brackets).

Conversely, suppose that Φ holds. Then (X;T) is a tournament. We construct an isomorphism f from (X;T) to T_|X|−1 as follows. Define f(s) := 0,f(a) := 1, andf(b) = 2. Since E(a, b), by item 5 there exists exactly onea⁰∈Asuch thatE(b, a⁰) and E(a⁰, a). Definef(a⁰) := 3. If a⁰ =t then we have found an isomorphism with T2. Otherwise, the partial map f defined so far is an embedding intoTn for somen≥3. Item 6 andE(b, a⁰) imply that there exists exactly oneb⁰∈B such thatE(a⁰, b⁰) andE(b⁰, b), and we definef(b⁰) := 4. Continuing in this manner, we eventually definef on all ofX and find an isomorphism withT_|X|−1.

This shows that CSP(H) cannot be expressed, unless NP = coNP, as CSP(B) for some reduct of a finitely bounded structure and such CSPs are in NP. We do not know how to show this statement without complexity-theoretic assumptions, even if we just want to rule out that CSP(H) can be expressed as CSP(B) for some reduct of a finitely boundedhomogeneous structure.

(13)

6 Canonical Datalog Programs

A remarkable fact about the expressive power of Datalog for constraint satisfaction problems over finite domains is the existence of canonical Datalog programs [18]; this has been generalised to CSPs forω-categorical structures.

Theorem 21 (Bodirsky and Dalmau [6]). Let B be a countable ω-categorical τ-structure.

Then for all`, k∈Nthere exists a canonical Datalog programΠ of width(`, k)for the complement of CSP(B). Moreover, for every finiteτ-structureA the following are equivalent:

– Π derives goalonA;

– Spoiler has a winning strategy for the existential (`, k)-pebble game on(A,B).

We later need the following well-known fact.

Lemma 22. IfC₁andC₂are in Datalog, then so areC₁∪C₂andC₁∩C₂. IfΠ₁andΠ₂are Datalog programs of width (`, k), then there is a Datalog program Π of width (`, k) for JΠ₁K∪JΠ₂K and forJΠ₁K∩JΠ₂K.

Proof. For union, let Π be obtained by taking the union of the rules of Π1 and ofΠ2, possibly after renaming IDB predicate names to make them disjoint except forgoal. For intersection, we proceed similarly, but we first rename the symbolgoalinΠ1 togoal₁ and the symbol goalinΠ2

to goal₂. Finally we add the new rule goal:−goal₁,goal₂ to the union ofΠ1 andΠ2. It is clear that these constructions preserve the width.

Theorem 23. Let Φbe a GSO sentence such thatJΦKis closed under homomorphisms. Let`, k∈ N. Then there exists a canonical Datalog program Π of width(`, k)forJΦK.

Proof. By Corollary 18 there are GSO sentencesΦ₁, . . . , Φ_mandω-categorical structuresB₁, . . . ,B_m such thatΦis equivalent toΦ₁∧ · · · ∧Φ_mandJ¬Φ_iK= CSP(B_i). LetΠ_ibe the canonical Datalog program for CSP(B_i) which exists by Theorem 21. Then Lemma 22 implies that there exists a Datalog programΠ such thatJΠK=JΠ₁K∩ · · · ∩JΠ_mK. It is clear thatΠ is sound forJΦK. To see thatΠis a canonical Datalog program forJΦK, suppose thatAis such that some Datalog program Π⁰ of width (`, k) which is sound for JΦKderives goalon A. Since, for everyi ∈ {1, . . . , m}, the programΠ⁰ is also sound forJΦiK, and Πi is a canonical Datalog program forJΦiK, the program Πi derivesgoalonA. Hence,A∈JΠK=JΠ1K∩ · · · ∩JΠmK.

Theorem 24. Let Φ be a GSO sentence. ThenΦcan be expressed in Datalog if and only if 1. JΦK is closed under homomorphisms, and

2. there exist `, k∈N such that for all finite structures A, Spoiler wins the (`, k)-game for JΦK on Aif and only ifA|=Φ.

Proof. First suppose thatJΦKis in Datalog. That is, there exists`, k∈Nand a Datalog program Π of width (`, k) such thatJΦK=JΠK. Then clearlyJΦKis closed under homomorphisms, and by Lemma 17, there are GSO sentences Φ1, . . . , Φm such that Φis equivalent to Φ1∧ · · · ∧Φm and JΦiKis the complement of a CSP, for eachi∈ {1, . . . , m}. Corollary 10 implies that there exists an ω-categorical structureBi such that CSP(B_i) =J¬ΦiK. Now suppose thatAis a finiteτ-structure such that A |= Φ. Then Spoiler wins the (`, k)-game as follows. Suppose that Duplicator plays the countable structure Bsuch that CSP(B)∩JΦK =∅. Then CSP(B)∩JΦ_iK=∅ for some i ∈ {1, . . . , m}; otherwise, if there is a structureA_i∈CSP(B)∩JΦ_iKfor everyi∈ {1, . . . , m}, then the disjoint union ofA₁, . . . ,A_msatisfiesΦ_isinceΦ_iis closed under homomorphisms, and is in CSP(B) since CSP(B) is closed under disjoint unions; but this is in contradiction to our assumption that CSP(B)∩JΦK = ∅. Hence, CSP(B) ⊆ CSP(Bi) and hence there is a homomorphism h from B to Bi (see [6]). Note that Π is sound for CSP(Bi), and Π derives goal on A, and hence Theorem 21 implies that Spoiler wins the existential (`, k)-pebble game on (A,Bi). But sinceB homomorphically maps toBi, this implies that Spoiler wins the existential (`, k)-pebble game on

(14)

(A,B_i). Now suppose thatA|=¬Φ. Hence, there existsi∈ {1, . . . , m}such that A|=¬Φi. Then Duplicator wins the (`, k)-game as follows. She starts by playing B_i. Then A homomorphically maps toB_i, and Duplicator can win the existential (`, k) pebble game on (A,B_i) by always playing along the homomorphism.

For the converse implication, suppose that 1. and 2. hold. SinceJΦKis closed under homomorphisms, Corollary 18 implies that there are GSO sentencesΦ₁, . . . , Φ_mandω-categorical structures B₁, . . . ,B_msuch thatΦis equivalent toΦ₁∧ · · · ∧Φ_mandJ¬ΦiK= CSP(B_i). By Theorem 21, for every i ∈ {1, . . . , m} there exists a canonical Datalog program Π_i of width (`, k) for JΦ_iK. Then Lemma 22 implies that there exists a Datalog program Π such that JΠK =JΠ₁K∩ · · · ∩JΠ_mK. Since each Π_i is sound for JΦ_iK, it follows that Π is sound for JΦK. Hence, it suffices to show that if A is a finite τ-structure such that A|= Φ, then Π derives goal on A. Since A |= Φi for alli∈ {1, . . . , m}, the assumption implies that Spoiler wins the existential (`, k) pebble game on (A,Bi). By Theorem 21, it follows thatΠi derivesgoalonA. Hence,Π derivesgoalonA.

7 Conclusion and Open Problems

We provided a game-theoretic characterisation of those problems in Guarded Second-order Logic that are equivalent to a Datalog program. We also proved the existence of canonical Datalog programs for GSO sentences whose models are closed under homomorphisms. To prove these results, we showed that every class of finite τ-structures in GSO whose complement is closed under homomorphisms is a finite union of CSPs. We also showed that every CSP in GSO can be formulated as a CSP of anω-categorical structure. These results also imply that the so-called universal-algebraic approach, which has eventually led to the classification of finite-domain CSPs in Datalog [3], can be applied to study problems that are simultaneously in Datalog and in GSO (also see [9]). Our results might also pave the way towards a syntactic characterisation of Datalog

∩GSO. We close with two open problems.

1. Nested monadically defined queries (Nemodeq) have been introduced by Rudolph and

Kr¨otzsch [24]; they prove that Nemodeq is contained both in MSO and in Datalog. We ask wether conversely, every problem in MSO∩Datalog is expressible as a Nemodeq.

2. Is every CSP of a reduct of a finitely bounded homogeneous structure in GSO?

References

1. Hajnal Andréka, István Németi, and Johan van Benthem. Modal languages and bounded fragments of predicate logic. J. Philos. Log., 27(3):217–274, 1998. URL: https://doi.org/10.1023/A:

1004275029985,doi:10.1023/A:1004275029985.

2. Albert Atserias, Andrei A. Bulatov, and Anuj Dawar. Affine systems of equations and counting infinitary logic. Theoretical Computer Science, 410(18):1666–1683, 2009.

3. Libor Barto and Marcin Kozik. Constraint satisfaction problems solvable by local consistency methods.

Journal of the ACM, 61(1):3:1–3:19, 2014.

4. Achim Blumensath. Monadic second-order logic. Lecture Notes, 2020.

5. Manuel Bodirsky. Complexity of infinite-domain constraint satisfaction. Submitted for publication in the LNL Series, Cambridge University Press, 2020.

6. Manuel Bodirsky and V´ıctor Dalmau. Datalog and constraint satisfaction with infinite templates.

Journal on Computer and System Sciences, 79:79–100, 2013. A preliminary version appeared in the proceedings of the Symposium on Theoretical Aspects of Computer Science (STACS’05).

7. Manuel Bodirsky and Martin Grohe. Non-dichotomies in constraint satisfaction complexity. In Luca Aceto, Ivan Damgard, Leslie Ann Goldberg, Magnús M. Halldórsson, Anna Ingólfsdóttir, and Igor Walukiewicz, editors,Proceedings of the International Colloquium on Automata, Languages and Pro- gramming (ICALP), Lecture Notes in Computer Science, pages 184 –196. Springer Verlag, July 2008.

8. Manuel Bodirsky, Martin Hils, and Barnaby Martin. On the scope of the universal-algebraic approach to constraint satisfaction. In Proceedings of the Symposium on Logic in Computer Science (LICS), pages 90–99. IEEE Computer Society, July 2010.

(15)

9. Manuel Bodirsky, Wied Pakusa, and Jakub Rydval. Temporal constraint satisfaction problems in fixed-point logic. In Holger Hermanns, Lijun Zhang, Naoki Kobayashi, and Dale Miller, editors,LICS

’20: 35th Annual ACM/IEEE Symposium on Logic in Computer Science, Saarbr¨ucken, Germany, July 8-11, 2020, pages 237–251. ACM, 2020. URL: https://doi.org/10.1145/3373718.3394750, doi:10.1145/3373718.3394750.

10. Manuel Bodirsky, Michael Pinsker, and Andr´as Pongr´acz. Projective clone homomorphisms. Journal of Symbolic Logic, 2019. To appear. doi:10.1017/jsl.2019.23.

11. Pierre Bourhis, Markus Kr¨otzsch, and Sebastian Rudolph. Reasonable highly expressive query languages - IJCAI-15 distinguished paper (honorary mention). In Qiang Yang and Michael J. Wooldridge, editors,Proceedings of the Twenty-Fourth International Joint Conference on Artificial Intelligence, IJ- CAI 2015, Buenos Aires, Argentina, July 25-31, 2015, pages 2826–2832. AAAI Press, 2015.

12. Samuel Braunfeld. Towards the undecidability of atomicity for permutation classes via the undecidability of joint embedding for hereditary graph classes, 2019. Preprint available at arXiv:1903.11932.

13. Andrei A. Bulatov. A dichotomy theorem for nonuniform CSPs. In58th IEEE Annual Symposium on Foundations of Computer Science, FOCS 2017, Berkeley, CA, USA, October 15-17, pages 319–330, 2017.

14. Gregory Cherlin. Two problems on homogeneous structures, revisited. Model Theoretic Methods in Finite Combinatorics, Contemporary Mathematics, 558, 2011.

15. Bruno Courcelle and Joost Engelfriet.Graph Structure and Monadic Second-Order Logic: A Language- Theoretic Approach. Cambridge University Press, 40 W. 20 St. New York, NY, United States, 2012.

16. Victor Dalmau, Phokion G. Kolaitis, and Moshe Y. Vardi. Constraint satisfaction, bounded treewidth, and finite-variable logics. InProceedings of the International Conference on Principles and Practice of Constraint Programming (CP), pages 310–326, 2002.

17. Michael Elberfeld, Martin Grohe, and Till Tantau. Where first-order and monadic second-order logic coincide. CoRR, abs/1204.6291, 2012. URL:http://arxiv.org/abs/1204.6291,arXiv:1204.6291.

18. Tom´as Feder and Moshe Y. Vardi. The computational structure of monotone monadic SNP and constraint satisfaction: a study through Datalog and group theory. SIAM Journal on Computing, 28:57–104, 1999.

19. Erich Gr¨adel. Description logics and guarded fragments of first order logic. In Enrico Franconi, Giuseppe De Giacomo, Robert M. MacGregor, Werner Nutt, and Christopher A. Welty, editors, Proceedings of the 1998 International Workshop on Description Logics (DL’98), IRST, Povo - Trento, Italy, June 6-8, 1998, volume 11 ofCEUR Workshop Proceedings. CEUR-WS.org, 1998. URL:http:

//ceur-ws.org/Vol-11/graedel.ps.

20. Erich Gr¨adel, Colin Hirsch, and Martin Otto. Back and forth between guarded and modal logics.

ACM Trans. Comput. Log., 3(3):418–463, 2002.

21. Wilfrid Hodges. A shorter model theory. Cambridge University Press, Cambridge, 1997.

22. Phokion G. Kolaitis and Moshe Y. Vardi. On the expressive power of Datalog: Tools and a case study.

Journal of Computer and System Sciences, 51(1):110–134, 1995.

23. Leonid Libkin. Elements of Finite Model Theory. Springer, 2004.

24. Sebastian Rudolph and Markus Kr¨otzsch. Flag & check: Data access with monadically defined queries.

InProc. 32nd Symposium on Principles of Database Systems (PODS’13), pages 151–162. ACM, June 2013. doi:10.1145/2463664.2465227.

25. Dmitriy N. Zhuk. A proof of CSP dichotomy conjecture. In 58th IEEE Annual Symposium on Foundations of Computer Science, FOCS 2017, Berkeley, CA, USA, October 15-17, pages 331–342, 2017. https://arxiv.org/abs/1704.01914.