• Keine Ergebnisse gefunden

Example 8.12.Consider the Thue system (Σ, R(s)) with alphabet Σ ={a, b} and rule set given by R={(aaa, aa),(bbb, bb),(ab, ba)}. The semigroupS=S(Σ, R(s)) has the presentation

S=ha, b|aa=aaa, bb=bbb, ab=bai.

Any word of S is of the form [ai1bj1. . . aikbjk], where i1, . . . , ik, j1, . . . , jk ≥ 0 such that at least one of these numbers is non-zero. First, the relation [ab] = [ba] implies that each word ofS has the shape [aibj], wherei, j ≥0 andi+j ≥1. Second, the relations [aa] = [aaa] and [bb] = [bbb] entail that the semigroupS consists of eight elements: [a],[b],[aa],[ab],[bb],[aab],[abb],[aabb]. ♦ Theword problemfor the semigroupS=S(Σ, R(s)) asks whether or not arbitrary stringss, t∈Σ+ describe the same semigroup element [s] = [t]. By (8.19), there is an effective reduction of the word problem for Thue systems to the word problem for semigroups. This leads to the following result which was independently established by Emil Post (1987-1954) and Andrey Markov Jr. (1903-1979).

Theorem 8.13.The word problem for semigroups is undecidable.

8.4 Post’s Correspondence Problem

The Post correspondence problem is an undecidable problem that was introduced by Emil Post in 1946.

Due to its simplicity it is often used in proofs of undecidability.

APost correspondence system (PCS) over an alphabetΣhaving at least two symbols is a finite set of pairs of elements inΣ+,

8.4 Post’s Correspondence Problem 95

Π ={(αi, βi)∈Σ+×Σ+|1≤i≤l}. (8.20) For each finite sequencei= (i1, . . . , ir)∈ {1, . . . , l}+ of indices, define the (left) string

α(i) =αi1◦αi2◦ · · · ◦αir (8.21) and the (right) string

β(i) =βi1◦βi2◦ · · · ◦βir, (8.22) where◦ denotes the concatenation of strings.

A solution of the PCSΠ is a sequencei∈ {1, . . . , l}+ of indices such that α(i) =β(i). Note that the alphabetΣis required to have at least two symbols, since the problem is decidable if the alphabet has only one element. Moreover, if there is a solution of a PCS, the concatenation of the solution is also a solution. Thus if a PCS has one solution, it has infinitely many.

Example 8.14.The PCS Π = {(α1, β1) = (aa, a),(α2, β2) = (ba, ab),(α3, β3) = (c, ac)} over the alphabetΣ={a, b, c}has the solutioni= (1,2,3), since

α(i) =α1◦α2◦α3=aa◦ba◦c

=aabac

=a◦ab◦ac=β1◦β2◦β3=β(i).

Note that each sequence of the shapei= (1,2, . . . ,2,3) provides a solution. ♦ Theword problem for Post correspondence systems asks whether a Post correspondence system has a solution or not. This problem is undecidable.

To see this, the word problem for semi-Thue systems will be reduced to this problem. To this end, let (Σ, R) be a semi-Thue system, whereΣ={a1, . . . , an}andR={(ui, vi)|1≤i≤m}. Take a copy of the alphabetΣgiven byΣ={a1, . . . , an}. In this way, each strings=ai1ai2. . . air overΣ can be assigned a copys =ai1ai2. . . air overΣ.

Put q=m+nand lets, t∈Σ+. Define the PCSΠ =Π(Σ, R, s, t) over the extended alphabet

Σ=Σ∪Σ∪ {x, y, z} (8.23)

by the followingl= 2q+ 4 pairs:

i, βi) = (ui, vi), 1≤i≤m, (αm+i, βm+i) = (ai, ai), 1≤i≤n, (αq+i, βq+i) = (ui, vi), 1≤i≤m, (αq+m+i, βq+m+i) = (ai, ai), 1≤i≤n, (α2q+1, β2q+1) = (y, z), (α2q+2, β2q+2) = (z, y),

2q+3, β2q+3) = (x, xsy), (α2q+4, β2q+4) = (ztx, x).

(8.24)

Example 8.15.(Cont’d) Take the semi-Thue system (Σ, R) with alphabet Σ = {a, b, c} and rule set R ={(aa, a),(ba, ab),(c, ac)}. The corresponding PCS Π =Π(Σ, R, s, t) over the alphabetΣ = {a, b, c, a, b, c, x, y, z}consists of the following pairs:

1, β1) = (aa, a), (α2, β2) = (ba, ab), (α3, β3) = (c, ac), (α4, β4) = (a, a), (α5, β5) = (b, b), (α6, β6) = (c, c), (α7, β7) = (aa, a), (α8, β8) = (ba, ab), (α9, β9) = (c, ac), (α10, β10) = (a, a), (α11, β11) = (b, b), (α12, β12) = (c, c), (α13, β13) = (y, z), (α14, β14) = (z, y),

15, β15) = (x, xsy), (α16, β16) = (ztx, x).

♦ Lemma 8.16.If s, t∈Σ+ with s=t or s→R t, there is a sequencei ∈ {1, . . . , q}+ of indices such thatα(i) =sandβ(i) =t, and there is a sequence i∈ {q+ 1, . . . ,2q}+ of indices such thatα(i) =s andβ(i) =t.

Proof. Ifs=t, the pairs (ai, ai) provide a sequence iwithα(i) =sandβ(i) =s. Similarly, the pairs (ai, ai) yield a sequence i withα(i) =s andβ(i) =s.

If s→R t, then s=xuiy andt=xviy for somex, y∈Σ and 1≤i≤m. Writex=x1. . . xk and y=y1. . . ylas words overΣ. Then the sequenceigiven by the sequence of pairs

(x1, x1), . . . ,(xk, xk),(ui, vi),(y1, y1), . . . ,(yl, yl) (8.25) satisfiesα(i) =sandβ(i) =t. Similarly, the sequencei defined by the sequence of pairs

(x1, x1), . . . ,(xk, xk),(ui, vi),(y1, y1), . . . ,(yl, yl) (8.26)

fulfillsα(i) =s andβ(i) =t. ⊓⊔

Example 8.17.(Cont’d) Let s = aaac and t =aaaac. Then the sequences i = (4,4,4,6) and i = (10,10,10,12) provide

α(i) =α4◦α4◦α4◦α6=a◦a◦a◦c=aaac=s, β(i) =β4◦β4◦β4◦β6=a◦a◦a◦c =aaac=s, α(i) =α10◦α10◦α10◦α12=a◦a◦a◦c=aaac =s, β(i) =β10◦β10◦β10◦β12=a◦a◦a◦c=aaac=s.

Moreover,s→Rtby the rule (c, ac)∈R, and the sequencesi= (4,4,4,3) and i= (10,10,10,9) yield α(i) =α4◦α4◦α4◦α3=a◦a◦a◦c=aaac=s,

β(i) =β4◦β4◦β4◦β3=a◦a◦a◦ac =aaaac=t, α(i) =α10◦α10◦α10◦α9=a◦a◦a◦c =aaac =s, β(i) =β10◦β10◦β10◦α9=a◦a◦a◦ac=aaaac=t.

♦ Note that if the PCSΠ =Π(Σ, R, s, t) has a solutionα(i) =β(i) withi= (i1, . . . , ir), the solution starts withi1= 2q+ 3 and ends withir= 2q+ 4, i.e., the corresponding string has the prefixxsyand the postfix ztx. The reason is that by construction (α2q+3, β2q+3) = (x, xsy) is the only pair whose components have the same prefix and (α2q+4, β2q+4) = (ztx, x) is the only pair whose components have the same postfix.

8.4 Post’s Correspondence Problem 97

Proposition 8.18.Lets, t∈Σ+. If there is a derivations→Rt in the semi-Thue system(Σ, R), the PCS Π=Π(Σ, R, s, t)has a solution.

Proof. Lets→Rtwith

s=s1Rs2R. . .→Rsk−1Rsk =t, (8.27) where siR si+1, 1 ≤ i ≤ k−1. We may assume that the number of words k is odd; otherwise, the terminal word may be appended once (without effect), i.e.,s=s1, s2, . . . , sk =t, sk+1 =t. Then by Lemma 8.16, for eachj, 1 ≤j < k, there is a sequence i(j) of indices such that α(i(j)) =sj and β(i(j)) =sj+1 ifj is odd andα(i(j)) =sj andβ(i(j)) =sj+1 ifj is even.

Claim that a solution of the PCS is given by the sequence

i= (2q+ 3,i(1),2q+ 1,i(2),2q+ 2,i(3),2q+ 1, . . . ,i(k−2),2q+ 1,i(k−1),2q+ 4). (8.28) Indeed, this sequence can be evaluated as follows:

α(i) : x s1 y s2 z s3 y . . . sk−2 y sk−1 zskx

. . . .

i: 2q+ 3i(1) 2q+ 1i(2)2q+ 2i(3) 2q+ 1. . .i(k−2)2q+ 1i(k−1)2q+ 4

. . . .

β(i) : xs1y s2 z s3 y s4 z . . . sk−1 z sk x

(8.29)

This proves the claim. ⊓⊔

Example 8.19.(Cont’d) Consider the derivation (with an even number of words) s=aaba→Raaab→Raab→Rab=t

given in order by the rules (ba, ab), (aa, a), and (aa, a). An associated solution of the PCS Π = Π(Σ, R, aaba, ab) is defined by the sequence

i= (15,4,4,2,13,7,10,11,14,1,5,13,10,11,16), where

α(i) : x aaba y aaab z aab y ab zabx

. . . .

i: 15 (4,4,2) 13 (7,10,11) 14 (1,5) 13 (10,11) 16

. . . .

β(i) :xaabay aaab z aab y ab z ab x

♦ Example 8.20.(Cont’d) Consider the derivation (with an odd number of words)

s=baaac→Rabaac→Raabac→Raaabc→Raaabac=t

given in order by the rules (ba, ab), (ba, ab), (ba, ab), and (c, ac). A corresponding solution of the PCS Π =Π(Σ, R, baaac, aaabac) is defined by the sequence

i= (15,4,4,2,6,13,7,10,11,12,14,1,5,6,13,10,11,9,16), where

α(i) : x baaac y abaac z aabac y aaabc zaaabacx

. . . .

i: 15 (2,4,4,6) 13 (10,8,10,12) 14 (4,4,2,6) 13 (10,10,10,11,9) 16

. . . .

β(i) :xbaaacy abaac z aabac y aaabc z aaabac x

♦ Proposition 8.21.Let s, t∈ Σ+. If the PCS Π =Π(Σ, R, s, t)has a solution, there is a derivation s→Rt in the semi-Thue system (Σ, R).

Proof. Let i = (i1, . . . , ir) be a solution of the PCS Π. The observation prior to Proposition 8.18 shows that the string w=α(i) =β(i) must have the form w=xsy . . . ztx withs=s1. Then by the construction of the PCS Π, the string i must have the form given in (8.29). From this, we conclude

thats=s1Rs2R. . .→Rsk =t, as required. ⊓⊔

Propositions 8.18 and 8.21 provide an effective reduction of the word problem for semi-Thue systems to the word problem for Post correspondence systems. But the word problem for semi-Thue systems is undecidable. This gives rise to the following result.

Theorem 8.22.The word problem for Post correspondence systems is undecidable.

The Post correspondence problem can be used to prove an important undecidability result for context-free languages. The context-free languages form a principal class of formal languages that are particularly useful for the construction of compilers for programming languages.

Acontext-free grammar is a quadrupleG= (Σ, V, R, S), whereΣ is a finite set ofterminals,V is a finite set ofnon-terminalsorvariables,S∈V is thestart symbol, andRis a finite subset ofV×(Σ∪V) whose elements are calledrewriting rules. The alphabetsΣandV are assumed to be disjoint. The pair (Σ∪V, R) can be viewed as a semi-Thue system, but here rewriting rules of the form (v, ǫ) withv∈V are allowed.

The language of a context-free grammarG= (Σ, V, R, S) consists of all terminal strings that can be derived from the start symbol, i.e.,

L(G) ={w∈Σ|S →Rw}. (8.30)

The languageL(G) generated by a context-free grammarGis called context-free.

Example 8.23.The archetypical context-free language

L(G) ={anbnak |n≥1, k≥1} (8.31) is generated by the grammar G, where Σ ={a, b}, V ={S, A, B}, S is the start symbol, andR = {(S, AB),(A, aAb),(A, ab),(B, Ba),(B, a)}. For instance, the stringaaabbbaais derived as follows:

S→RAB→RABa→RAaa→RaAbaa→RaaAbbaa→Raaabbbaa.

8.4 Post’s Correspondence Problem 99

Example 8.24.An arithmetic expression in a programming language is a well-formed formula given by a combination of constants, variables, and operators. A typical context-free grammar describing (simple) arithmetic expressions consists of the following rewriting rules:

expr→expr op expr expr→(expr) expr→ −expr expr→id

op→+ op→ − op→ ∗ op→/ op→ ↑

The terminals of this grammar are id (identifier), +,−,∗,/,↑, (, and ), and the non-terminals are expr

and op, where expr is the start symbol. ♦

The problem whether two context-free grammars generate disjoint languages or not is undecidable.

To see this, the word problem for Post correspondence systems will be reduced to this problem. For this, let (Σ, Π) be a PCS, whereΣcontains (without restriction) no numbers andΠ={(αi, βi)|1≤i≤m}. Define two grammarsGαandGβ such that both have the same alphabet of terminalsΣ∪ {1, . . . , m}, the same alphabet of non-terminalsV ={S} and thereforeS as common start symbol. Moreover, the rewriting rules ofGαare

Rα={(S, iαi |1≤i≤m} ∪ {(S, iSαi|1≤i≤m} (8.32) and the rewriting rules ofGβ are

Rβ ={(S, iβi|1≤i≤m} ∪ {(S, iSβi|1≤i≤m}. (8.33) The construction gives immediately the following result.

Proposition 8.25.The languages of the grammarsGα andGβ are

L(Gα) ={ir. . . i1αi1. . . αir |1≤i1, . . . , ir≤m, r≥1} (8.34) and

L(Gβ) ={ir. . . i1βi1. . . βir |1≤i1, . . . , ir≤m, r≥1}, (8.35) respectively. Moreover,

L(Gα)∩L(Gβ) ={ir. . . i1αi1. . . αir |i= (i1, . . . , ir)solves PCS (Σ, Π)}. (8.36) Example 8.26.(Cont’d) Take the PCS (Σ, Π), whereΣ={a, b, c}andΠ={(aa, a),(ba, ab),(c, ac)}. The corresponding grammarsGα andGβ are given by the respective sets of rewriting rules

Rα={(S,1aa),(S,2ba),(S,3c),(S,1Saa),(S,2Sba),(S,3Sc)} (8.37) and

Rβ ={(S,1a),(S,2ab),(S,3ac),(S,1Sa),(S,2Sab),(S,3Sac)}. (8.38) A solution of the PCS isi= (1,2,2,3), since

α(i) =α1◦α2◦α2◦α3=aa◦ba◦ba◦c

=aababac

=a◦ab◦ab◦ac=β1◦β2◦β2◦β3=β(i).

The corresponding derivations inGα andGβ are

S→3Sc→32Sbac→322Sbabac→3221ababac and

S→3Sac→32Sabac→322Sababac→3221ababac,

respectively. Hence, the intersectionL(Gα)∩L(Gβ) is non-empty. ♦ Proposition 8.25 provides an effective reduction of the word problem for Post correspondence systems to the problem whether the intersection of two context-free languages is empty or not. But the word problem for Post correspondence systems is undecidable and therefore we obtain the following result.

Theorem 8.27.The problem to decide whether the intersection of two context-free languages is empty or not is undecidable.