4.1 Closure Properties of Language Families

(1)

(2)

Chapter 4 Algebraic Properties of Language Families

In this section we study the behaviour of languages under certain operations. Especially, we are interested in the question whether or not the application of some operation to languages of some language family yields a language of that family, again. The result will be used to present some characterizations of language families by operations. In addition, we also give a characterization of the set of regular languages by properties of associated congruence classes.

4.1 Closure Properties of Language Families

The basic definition for the behaviour of language families with respect to operation is the following one.

Definition 4.1 We say that a family L of languages is closed under the n-ary operation τ if, for any languages L1, L2, . . . , Ln of L, τ(L1, L2, . . . , Ln)∈ L.

We first study the closure properties of the families of the Chomsky hierarchy under set-theoretic operations.

Lemma 4.2 The families L(REG), L(LIN), L(CF), L(CS)and L(RE) are closed under union.

Proof. Let L₁ and L₂ are two languages in L(X) with X ∈ {REG,LIN,CF,CS,RE}.

Then there are grammars G₁ = (N₁, T₁, P₁, S₁) and G₂ = (N₂, T₂, P₂, S₂) of type X such that L(G1) = L1 and L(G2) = L2). Without loss of generality we assume that N₁∩N₂ =∅ (if this should not be the case we rename some nonterminals such that the required emptiness is obtained). We construct the grammar

G= ({S} ∪N1∪N2, T1∪T2,{S→S1, S →S2} ∪P1∪P2, S),

whereS is a new symbol not contained in N₁∪N₂∪T₁∪T₂. Obviously,G is of type X, too. Moreover, any derivation has the form

S =⇒S_i =⇒^∗

Gi w∈L(G_i) 93

(6)

for somei∈ {1,2} (because by N₁∪N₂ =∅ the rules of P_i do not produce nonterminals ofNj,j 6=i, i. e., we cannot merge the productions ofP1 andP2. Thus we can derive only words and all words of L(G₁)∪L(G₂) = L₁∪L₂. Therefore L₁∪L₂ =L(G)∈ L(X). 2 Lemma 4.3 The familiesL(REG), L(CS)andL(RE)are closed under intersection. The families L(LIN) and L(CF) are not closed under intersection.

Proof. L(REG). We have to show that, for two regular languages L₁ and L₂, their intersection L₁ ∩L₂ is a regular language, too. We only give the proof for the case that λ /∈L1∩L2 and leave the modifications for the general case to the reader.

Let

G₁ = (N₁, T₁, P₁, S₁) and G₂ = (N₂, T₂, P₂, S₂) be two regular grammars with

L(G₁) = L₁ and L(G₂) =L₂.

By Theorem 2.28, we can assume that both grammar are in the normal form, i. e., the rules have the form A → aB or A → a with nonterminals A, B and terminal a. We consider the regular grammar

G= (N1×N2, T, P,(S1, S2)) with

P = {(A₁, B₁)→a(A₂, B₂) :A₁ →aA₂ ∈P₁, B₁ →aB₂ ∈P₂}

∪{(A, B)→a:A→a∈P₁, B →a∈P₂}.

It is easy to see that a derivation

(S₁, S₂) =⇒a₁(A₁, B₁) =⇒a₁a₂(A₂, B₂) =⇒. . .=⇒a₁a₂. . . a_n−1(A_n−1, B_n−1) =⇒a₁a₂. . . a_n−1a_n exists in Gif and only derivations

S₁ =⇒a₁A₁ =⇒a₁a₂A₂ =⇒. . .=⇒a₁a₂. . . a_n−1A_n−1 =⇒a₁a₂. . . a_n−1a_n and

S₂ =⇒a₁B₁ =⇒a₁a₂A₂ =⇒. . .=⇒a₁a₂. . . a_n−1B_n−1 =⇒a₁a₂. . . a_n−1a_n

exist in G₁ and G₂, respectively. Therefore w ∈ L(G) holds if and only w ∈ L(G₁) and w∈L(G₂). Hence

L(G) = L(G1)∩L(G2) =L1∩L2. Since Gis a regular grammar, L₁∩L₂ is a regular languages.

L(RE). Let L₁ ∈ L(RE) and L₂ ∈ L(RE) be given. By Theorem 3.19, there are deterministic Turing machines

M1 = (X, Z1, z01, Q1, δ1, Q1) and M2 = (X, Z2, z02, Q2, δ2, Q2)

(7)

with

T(M1) = L1 and T(M2) =L2.

Without loss of generality we can assume that Z1 ∩Z2 = ∅. We construct a Turing machine M which works as follows (the formal description is left to the reader). First the machine replaces any letter xof the input word by (x, x). Then it works as M₁ using only the letters of the first components; thus the input input word is stored in the second component (if a ∗ is read, then it is handled as (∗,∗)). If M reaches a state from Q₁, then it replaces all letters (a, b) by their second component b, i. e., the input word is at the tape, again. Now M starts to work as M2 and stops if a state of Q2 is reached.

According to this work we first check whether the input is accepted by M₁ and then whether the input is inT(M₂). Thus M accepts a word W if and only w is accepted by M1 as well as by M2. Consequently,

T(M) = T(M₁)∩T(M₂) =L₁∩L₂, which proves thatL₁∩L₂ ∈ L(RE) by Theorem 3.19.

L(CS). The proof can be given analogously to that for recursively enumerable lan- guages, but we use linearly bounded automata and Theorem 3.23.

L(LIN) and L(CF). In order to prove the assertion it is sufficient to give two linear languages which have a non-context-free intersection. We consider the linear grammars

G1 = ({S, A}, {a, b, c}, {S→Sc, S →Ac, A→aAb, A→ab}, S), G₁ = ({S, A}, {a, b, c}, {S→aS, S →aA, A→bAc, A→bc}, S).

It is easy to see that

L(G₁) = {aⁿbⁿc^m |n≥1, m≥1}and L(G₂) = {a^mbⁿcⁿ |n≥1, m≥1}.

Obviously, L(G₁)∩L(G₂) = {aⁿbⁿcⁿ | n ≥ 1}. By the proof of Theorem 16.13 we know

that L(G₁)∩L(G₂) is not context-free. 2

Lemma 4.4 The families L(REG) and L(CS) are closed under complement. The fami- lies L(LIN), L(CF) and L(RE) are not closed under complement.

Proof. L(REG). Let L be a regular language. Then there is a deterministic finite automaton A = (alph(L), Z, z0, F, δ) such that L = T(A). Thus w ∈ L if and only if δ^∗(z₀, w) ∈ F. Consequently, w ∈ C(L) if and only if δ^∗(z₀, w) ∈/ F if and only if δ^∗(z₀, w) ∈ Z \F. Thus the automaton A⁰ = ((alph(L), Z, z₀, Z \F, δ) accepts C(L).

ThereforeC(L) is regular.

L(CS). We omit the proof since it requires some knowledge not presented in this book and is relatively long. We refer to [32] and [17] and the original papers [13], [29].

L(RE). If L(RE) is closed under complement, then any recursively-enumerable lan- guage is recursive by Theorem 3.10, in contradiction to Theorem 3.11.

L(CF). Let us assume thatL(CF) is closed under complement. Let L₁ andL₂ be two arbitrary context-free languages. We set

X =alph(L1)∪alph(L2), X1 =X\alph(L1), X2 =X\alph(L2).

(8)

LetR₁ andR₂ be the sets of all words overX which contain at least one letter ofX₁ and X2, respectively. If Xi =∅for somei∈ {1,2}, thenRi is the empty set, and thereforeRi

is a regular set. IfX_i 6=∅, then the regular grammar G_i = ({S, A}, X, [

a∈alph(Li)

{S →aS} ∪ [

b∈Xi

{S →bA, S →b} ∪ [

x∈X

{A→xA, A→x}, S)

generates R_i (since we can only terminate from S or switch from S to A, if a letter fromX_i is generated). Hence in all cases R₁ and R₂ are regular languages and therefore context-free, too. By our assumption and Lemma 4.2, for i∈ {1,2},

X^∗\L_i = ((alph(L_i))^∗\L_i)∪R_i =C(L_i)∪R_i is a context-free language. Again, by Lemma 4.2,

R = (X^∗\L₁))∪(X^∗\L₂)

is context-free. Now our assumption gives the context-freeness of

L₁∩L₂ =X^∗\((X^∗\L₁)∪(X^∗\L₂)) = (alph(R))^∗\R

is a context-free languages, which means that the intersection of arbitrary context-free languages is context-free. Thus we have a contradiction to Lemma 4.3. Therefore our assumption is not valid, i. e. L(CF) is not closed under complement.

L(LIN) We repeat the proof forL(CF) (word by word), but replace context-free in all

cases by linear andL(CF) by L(LIN). 2

Lemma 4.5 The families L(REG) and L(CS) are closed under set-theoretic difference.

The families L(LIN), L(CF) and L(RE) are not closed under set-theoretic difference.

Proof. Let X and Y be two languages andV =alph(X)∪alph(Y). Let us assume that alph(X)\alph(Y) is not empty (the easy modifications for alph(X)⊆alph(Y) are left to the reader). From the proof of Lemma 4.4, we know that V^∗\(alph(Y))^∗ is in L(REG) and therefore in L(CS), too. Because

X\Y = (V^∗\Y)∩X = ((V^∗\(alph(Y))^∗)∪((alph(Y))^∗\Y))∩X

= ((V^∗\(alph(Y))^∗)∪C(Y))∩X, the first assertion follows by Lemmas 4.2 – 4.4.

SinceX^∗is a regular language and belongs to all language families under consideration, the complement is a special case of difference. Thus the second statement of Lemma 4.4

implies the second assertion. 2

We now mention a special case of intersection; we require that the language of the family under consideration has to intersected with a regular set.

Lemma 4.6 The families L(REG), L(LIN), L(CF), L(CS), and L(RE) are closed un- der intersection with regular languages.

(9)

Proof. The statement holds trivially for L(REG), L(CS), and L(RE), because any of these language families is closed by intersection (see Lemma 4.3) and contains all regular languages (see Theorem 2.37).

In order to prove the statement for L(CF) we construct a pushdown automaton M which acceptsL∩R for a given context-free language Land a given regular language R.

Let

M₁ = (X, Z₁,Γ, z_0,1, F₁, δ₁) andA₂ = (X, Z₂.z_0,2, F₂, δ₂)

be a pushdown automaton and a finite automaton, respectively, such that T(M₁) = L and T(A₂) =R. We construct the pushdown automaton

M= (X, Z1×Z2,Γ,(z0,1, z0,2), F1×F2, δ) where

((z₁⁰, z₂⁰), R, β)∈δ((z₁, z₂), a, γ) if (z₁⁰, β)∈δ₁(z₁, a, γ) and δ₂(z₂, a) =z₂⁰, ((z₁⁰, z₂), N, β)∈δ((z₁, z₂), a, γ) if (z₁⁰, β)∈δ₁(z₁, a, γ).

By definition M behaves on the first component of the state and the pushdown tape as M₁ and on the second component of the state as A₂ (where a letter is only read by A₂, if M₁ moves to the right). Hence M accepts a word w if and only if w is accepted by M₁ as well as by A₂. Thus T(M) =L∩R.

For the family of linear languages, we only notice that the construction of M from M₁ gives a 1-turn pushdown automaton if M₁ is a 1-turn pushdown automaton. 2 We now study the algebraically motivated operations concatenation and Kleene closure and those operations related to homomorphisms.

Lemma 4.7 The families L(REG), L(CF), L(CS), and L(RE) are closed under con- catenation. L(LIN) is not closed under concatenation.

Proof. L(CF). Again, we start with two context-free grammars G₁ = (N₁, T, P₁, S₁) and G₂ = (N₂, T, P₂, S₂) with N₁∩N₂ =∅and show that the grammar

G= (N₁∪N₂∪ {S}, T, P₁∪P₂∪ {S →S₁S₂}, S) generates L(G₁)

cdotL(G₂). It is sufficient to mention that – up to the order of the applications of rules – any derivation in G has the form

S =⇒S₁S₂ =^∗⇒w₁S₂ =^∗⇒w₁w₂

where, for i ∈ {1,2}, S_i =^∗⇒ w_i is a derivation in G_i (i. e., the derivation only uses rules of P_i). Since G is a context-free grammar, L(G₁)

cdotL(G₂) is a context-free language.

L(CS) andL(RE). We repeat the proof for L(CF) where we suppose without loss of generality that the grammars are in the Kuroda normal form (see Theorem 2.19. This

(10)

ensures that the derivations in G₁ and G₂ cannot be influenced by the contexts of the other part. Furthermore, we have to take care of the empty word in case ofL(CS), which requires to represent the concatenation as a union by languages without the empty word and the language only consisting of the empty word; e. g., if λ ∈ L(G₁) and λ ∈ L(G₂), then

L(G1)·L(G2) = ((L(G1)\ {λ})·(L(G2)\ {λ}))∪(L(G1)\ {λ})∪(L(G2)\ {λ})∪ {λ}.

The details are left to the reader.

L(REG). The above proof (for L(CF)) does not work for regular languages since the newly introduced rule S →S1S2 has not the required form.

Let G₁ = (N₁, T₁, P₁, S₁) and G₂ = (N₂, T₂, P₂, S₂) be regular grammars such that L(G₁) = L₁, L(G₂) = L₂ and N₁∩N₂ =∅. Then we construct the grammar

G= (N₁∪N₂, T, P₁⁰ ∪P₂, S₁) where

P₁⁰ ={A→wB :A→wB ∈P₁, B ∈N₁} ∪ {A→wS₂ :A→w∈P₁, w ∈T^∗}.

According to this construction, all derivations in Ghave the form S₁ =⇒^∗ w⁰A=⇒w⁰wS₂ =⇒^∗ w⁰ww₂ where

S₁ =⇒^∗ w⁰A =⇒w⁰w=w₁ and S₂ =^∗⇒w₂ are derivations in G₁ and G₂, respectively. Hence

L(G) ={w₁w₂ :w₁ ∈L(G₁), w₂ ∈L(G₂)}=L(G₁)·L(G₂).

L(LIN) The method used forL(REG) does not work since the derivation of the first grammar can end somewhere in the middle of the word and not at the end as in the case of regular grammars.

By Example 2.5, L = {aⁿbⁿ | n ≥ 1} is a linear language. However, the language L·L = {aⁿbⁿa^mb^m | n ≥ 1, m ≥ 1} is not linear as we have shown in the proof of

Theorem 2.34. 2

Lemma 4.8 The families L(REG), L(CF), L(CS), and L(RE) are closed under (posi- tive) Kleene closure. L(LIN) is not closed under (positive) Kleene closure.

Proof. We first prove the statement for positive Kleene closure.

L(CF). Let L be a context-free language. Let G = (N, T, P, S) be a context-free grammar which generatesL. We set

G⁰ = (N ∪ {S⁰}, T, P ∪ {S⁰ →SS⁰, S⁰ →S}, S⁰)

(whereS⁰ is an additional symbol, again). Up to the order of the application of the rules, any derivation in G⁰ has the form

S⁰ =⇒ SS⁰ =⇒^∗ w₁S⁰ =⇒w₁SS⁰ =^∗⇒w₁w₂S⁰ =⇒w₁w₂SS⁰ =⇒...

=⇒ w1w2. . . wn−1S⁰ =⇒w1w2. . . wn−1S =^∗⇒w1w2. . . wn−1wn,

(11)

where, for 1 ≤ i ≤ n, each derivation S =⇒^∗ w_i uses only rules of P. Thus we have wi ∈ L(G) = L for 1 ≤ i ≤ n. Hence w1w2. . . wn ∈ Lⁿ. It is obvious that any word w∈Lⁿ and only words of L^m with m≥1 can be generated. Therefore

L(G⁰) = [

n≥1

Lⁿ =L⁺,

which proves the context-freeness of L⁺.

L(CS) and L(RE). Let L be a language of L(X), X ∈ {CS,RE}. Then L can be generated by a grammar G = (N, T, P, S) in Kuroda normal form (see Theorem 2.19).

We set

G⁰ = (N ∪ {S⁰, S⁰⁰}, T, P ∪P⁰, S⁰) whereP⁰ consists of the rules

S⁰ →S, S⁰ →SS⁰⁰,

xS⁰⁰ →xSS⁰⁰, xS⁰⁰→xS for x∈T.

By these it is ensured that the subderivations starting fromS can not influence each other by context (since a new derivation can only be started if the preceding one has already produced the last terminal letter). Now we get L(G⁰) =L⁺ as above. The details of the proof are left to the reader.

L(REG). LetG= (N, T, V, P, S) be a regular grammar withL(G) =L. We construct the regular grammar G⁰ = (N, T, P⁰, S) where P⁰ is obtained by adding all rules of the forms

A →wS for A→w∈P, w ∈T^∗ toP. Then the derivations of G⁰ have the form

S =^∗⇒ w₁⁰A₁ =⇒w⁰₁w⁰⁰₁S=⇒^∗ w⁰₁w₂⁰⁰w⁰₂A₂ =⇒w₁⁰w₁⁰⁰w⁰₂w₂⁰⁰S

=∗⇒ w₁⁰w₁⁰⁰. . . w_n−1⁰ w⁰⁰_n−1S =^∗⇒w₁⁰w⁰⁰₁. . . w_n−1⁰ w_n−1⁰⁰ w_n,

where w⁰_iw_i⁰⁰ ∈ L(G) for 1 ≤ i ≤ n−1 and w_n ∈ L(G). Now L(G⁰) = L⁺ can easily be proved.

Kleene closure. If λ ∈ L, then L^∗ = L⁺ and we can use the above constructions. If λ /∈L, then L^∗ =L⁺∪ {λ}; because a grammar with the only rule S →λ, generates the language which only consists of the empty word, the assertion follows by the above result for L^∗ and Lemma 4.2.

L(LIN). We consider the linear language L(G₂) ={aⁿbⁿ | n ≥1} from Example 2.5.

It is easy to see that

L(G₂)⁺ ={aⁿ¹bⁿ¹aⁿ²bⁿ². . . aⁿ^tbⁿ^t |t≥1, n_i ≥1, 1≤i≤t}.

Let us assume that L(G₂)⁺ is linear. Because R = {a^pb^qa^rb^s | p, q, r, s ≥ 1} is regular (the verification is left to the reader), then

L(G2)⁺∩R ={aⁿbⁿa^mb^m |n, m≥1}

(12)

is also linear by Lemma 4.6. However, as an application of the pumping lemma for linear languages we have shown that L(G2)⁺∩R is not linear. This contradiction shows that our above assumption is wrong, i. e.,L(G₂)⁺is not a linear languages. Thus we have shown the non-closure of the family of linear languages under positive Kleene closure. The analogous statement for the Kleene closure follows as above taking into consideration that

L(G₂)^∗∩R =L(G₂)⁺∩R. 2

Lemma 4.9 The families L(REG), L(LIN), L(CF), and L(RE)are closed under homo- morphisms.

Proof. Leth be homomorphism which maps T^∗ toY^∗.

L(CF). Let L be a context-free language. Then there is a context-free grammar G = (N, T, P, S) in Chomsky normal form such that L(G) = L (see Theorems 2.26).

Therefore all rules are of the form A → BC or A → a with A, B, C ∈ N and a ∈ T. Moreover, we can arrange the order of the applications of rules such that any derivation has the form

S =⇒^∗ A₁A₂. . . A_k =⇒a₁A₂A₃. . . A_k =⇒a₁a₂A₃A₄. . . A_k =⇒. . .=⇒a₁a₂. . . a_k (where we apply only rules of the formA →BC in the subderivation S =⇒^∗ A₁A₂. . . A_k. We now construct the grammar G⁰ = (N, Y, P⁰, S) where P⁰ is obtained from P by a replacement of any rule of the form A → a ∈ P by A → h(a). Then it follows that – without loss of generality – the derivations in G⁰ have the form

S =^∗⇒ A1A2. . . Ak=⇒h(a1)A2A3. . . Ak =⇒h(a1)h(a2)A3A4. . . Ak =⇒. . .

=⇒ h(a₁)h(a₂). . . h(a_k) =h(a₁a₂. . . a_k).

Thus we have w ∈ L(G) if and only if h(w) ∈ L(G⁰) and therefore L(G⁰) = h(L(G)) = h(L). Furthermore, G⁰ is a context-free grammar. Hence L(CF) is closed under homo- morphisms.

L(RE). We repeat the proof for L(CF) but use the Kuroda normal form instead of the Chomsky normal form.

L(LIN). LetLbe a linear grammar. Then there is a linear grammarG= (N, T, P, S) generating L. Moreover, any derivation inG has the form

S → w₁A₁v₁ =⇒w₁w₂A₂v₂v₁ =⇒. . .=⇒w₁w₂. . . w_kA_kv_kv_k−1. . . v₁

=⇒ w₁w₂. . . w_kuv_kv_k−1. . . v₁

where the rules S → w₁A₁v₁, A_i → w_i+1A_i+1v_i+1 for 1 ≤ i ≤ k−1, and A_k → u are applied.

We now define the grammar G= (N, Y, P⁰, S) by

P⁰ ={A →h(w)Bh(v)|A →wBv ∈P} ∪ {A →h(w)|A→w∈P}.

Any derivation inG⁰ has the form

S → h(w₁)A₁h(v₁) =⇒h(w₁)h(w₂)A₂h(v₂)h(v₁)

=⇒ . . .=⇒h(w₁)h(w₂). . . h(w_k)A_kh(v_k)h(v_k−1). . . h(v₁)

=⇒ h(w₁)h(w₂). . . h(w_k)h(u)h(v_k)h(v_k−1). . . h(v₁)

= h(w1w2. . . wkuvkvk−1. . . v1).

(13)

Again, we have z ∈ L(G) if and only if h(z) ∈ L(G⁰) and therefore L(G⁰) = h(L(G)) = h(L). The assertion follows becauseG⁰ is linear.

L(REG). The construction given in the proof forL(LIN) gives a regular grammarG⁰,

if G is regular. 2

We have not given the closure property ofL(CS) under homomorphisms. This will be added in Chapter 5.

Lemma 4.10 The families L(REG), L(LIN), L(CF), L(CS), and L(RE) are closed under inverse homomorphisms.

Proof. L(REG). Let L be a regular language. Then there is a deterministic finite automata A= (X, Z, z₀, F, δ) such thatT(A) = L. Now leth:Y^∗ →X^∗ be a homomorphism. Then a₁a₂. . . a_n ∈ h⁻¹(L), a_i ∈ Y for 1 ≤ i ≤ n if and only if h(a₁a₂. . . a_n) = h(a₁)h(a₂). . . h(a_n)∈L. We construct the automaton A⁰ = (Y, Z, z₀, F, δ⁰) by setting

δ⁰(z, a) =δ^∗(z, h(a)) for a∈Y.

By definition ofδ⁰, we immediately have

δ⁰(z₀, a₁a₂. . . a_n) =δ(z₀, h(a₁)h(a₂). . . h(a_n)∈F.

Therefore a₁a₂. . . a_n ∈ T(A⁰) if and only if h(a₁)h(a₂). . . h(a_n) ∈ T(A⁰). This implies that A⁰ accepts h⁻¹(T(A)) =h⁻¹(L). Henceh⁻¹(L) is regular.

L(CF). Let L be a context-free language andM = (X, Z,Γ, z₀, F, δ) be a pushdown automaton. Moreover, let h :Y^∗ → X^∗ be a homomorphism. For any letter a ∈ Y with h(a) = b1b2. . . bra, we introduce new symbols (a, i), 1≤ i ≤ra+ 1. Let Z⁰ be the set of all new symbols. Then we consider the pushdown automaton

M⁰ = (Y,{(z, z)|z ∈Z} ∪(Z ×Z⁰), z₀,{(z, z)|z ∈F}, δ⁰), whereδ⁰ is defined as follows:

δ⁰((z, z), a,#) = {(z,(a,1)), λ)} for z ∈Z, a∈Y,

δ⁰((z, z), a, γ) = {(z,(a,1)), γ)} for z∈Z, a ∈Y, γ ∈Γ, δ⁰((z,(a, i)), λ, γ) = {(z⁰,(a, i+ 1)), β)|(z⁰, β)∈δ(z, b_i, γ)}

for z ∈Z, a∈Y,1≤i≤r_a, γ ∈Γ∪ {#}, δ⁰((z,(a, i)), λ, γ) = {(z⁰,(a, i)), β)|(z⁰, β)∈δ(z, λ, γ)}

for z ∈Z, a∈Y,1≤i≤ra, γ ∈Γ∪ {#}, δ⁰((z,(a, r_a+1)), λ, γ) = {((z, z), γ)} for z ∈Z, a∈Y, γ ∈Γ,

δ⁰((z,(a, r_a+1)), λ,#) = {(z, z), λ)} for z ∈Z, a∈Y,

After reading a letter a in state (z, z), we change to (z,(a,1)) and simulate the work of M on h(a) = b₁b₂. . . b_r_a by changing the first component according to M and mov- ing to (a, i+ 1) if b_i is ”read”. The (z⁰, a_r_a₊₁) says that the work on h(a) is simulated and we enter (z⁰, z⁰). Therefore the pushdown automaton M⁰ accepts a1a2. . . an if and

(14)

only if we obtain (q, q) for some q ∈ F on the input a₁a₂. . . a_n if and only the simu- lation on h(a1)h(a2). . . h(an) leads to q ∈ F. Thus a1a2. . . an ∈ T(M⁰) if and only if h(a₁)h(a₂). . . h(a_n)∈T)M=Lif and only if a₁a₂. . . a_n∈h⁻¹(L).

We omit the proofs for L(LIN), L(CS), and L(RE) which can be given analogously, i. e., the automaton for h⁻¹(L) simulates the work of the automaton forL. 2

The proof of the following theorem is left to the reader (see Exercise ???).

Lemma 4.11 The familiesL(REG), L(LIN), L(CF), L(CS), and L(RE)are closed un-

der reversal. 2

We summarize the closure properties of the families of the Chomsky hierarchy in the table given in Figure 4.1 where a + or – in the meet of the column associated with a familyLand the row associated with an operation τ means that Lis closed or not closed underτ, respectively.

L(RE) L(CS) L(CF) L(LIN) L(REG)

union + + + + +

intersection + + – – +

intersection with regular sets + + + + +

complement – + – – +

product + + + – +

(positive) Kleene closure + + + – +

homomorphisms + – + + +

non-erasing homomorphisms + + + + +

inverse homomorphisms + + + + +

reversal + + + + +

Figure 4.1: Table of closure properties

We now show that a family of languages which is closed under certain operations is also closed under some further operations. In order to shorten the statements we give the following notation.

Definition 4.12 A family L of languages is called an abstract family of languages (ab- breviated by AFL) if

– it contains at least one non-empty language,

– it is closed under union, product, positive Kleene closure, non-erasing homomor- phisms, inverse homomorphisms and intersections with regular languages.

The familyL is called a full AFL if, in addition, it is closed under (arbitrary) homomor- phisms.

By Figure 4.1, L(REG), L(CF), L(CS), and L(RE) are AFLs; L(REG), L(CF), and L(RE) are full AFLs; L(LIN) is not an abstract family of languages.

Lemma 4.13 Any full AFL is closed under Kleene closure.

(15)

Proof. Since L^∗ = L⁺ ∪ {λ} and any full AFL is closed under positive Kleene closure and union, it is sufficient to show that any full AFL contains {λ}.

Let L be an AFL. We first show that {λ} ∈ L. By defition, L contains a non-empty language K. IfK ={λ}, then the assertion holds. IfK 6={λ}, then K contains a non- empty word z. We define the homomorphism h : (alph(K))^∗ → (alph(K))^∗ by h(a) = λ for all a∈alph(K). Then

{λ}=h(K∩ {w}).

Because Lis closed under intersections with regular sets and homomorphisms, we obtain

{λ} ∈ L. 2

Theorem 4.14 Any AFL is closed under set-theoretic subtraction of regular languages.

Proof. Let L be an AFL. For a language L ⊆ X^∗ from L and a regular set R ⊆ X^∗, L\R=L∩(X^∗\R). Since the complement of a regular set is regular, too (see Theorem 4.4), L\R is an intersection of a languages in L with a regular set. Thus L\R ∈ L by

the closure properties required for an AFL. 2

Theorem 4.15 Any full AFL is closed under left and right quotients by regular sets, i. e., for any language L of the AFLL and any regular set R, the quotients D_l(L, R) and Dr(L, R) belong to L.

Proof. We only give the proof for the left quotient; the proof for the right quotient is analogous.

Let L be an AFL,L a language in L, and R a regular set. Furthermore, let X =alph(L)∪alph(R) and X⁰ ={a⁰ |a∈X}.

We define the homomorphisms

h:X^∗ →X^∗, h1 : (X∪X⁰)^∗ →X^∗ and h2 : (X∪X⁰)^∗ →X^∗ by

h(a) =a⁰, h₁(a⁰) = a, h₁(a) =a, h₂(a⁰) =λ, h₂(a) =a for a∈X.

Additionally, we consider the set

Q=h(R)(alph(L))^∗.

By the closure of L(REG) under homomorphisms and concatenation (see Theorems 4.7 and 4.9), Q is regular. Because

h₂(h⁻¹₁ (L)∩Q) = h₂({w⁰v |w⁰ ∈h(R), v ∈(alph(L))^∗, wv ∈L})

= h2({w⁰v |w∈R, v ∈(alph(L))^∗, wv ∈L})

= {h₂(w⁰)h₂(v)|w∈R, v ∈(alph(L))^∗, wv ∈L}

= {v |wv∈L for some w∈R}, we have

Dl(L, R) = h2(h⁻¹₁ (L)∩Q).

By the closure properties of an AFL, we obtain D_l(L, R)∈ L. 2

(16)

Theorem 4.16 Any full AFL is closed under substitutions by regular sets.

Proof. Let L be an AFL, L ⊆ X^∗ a language of L and τ : X^∗ → Y^∗ a substitution such that τ(a) is a regular set for any a∈X. Let X ={a₁, a₂, . . . , a_n} and τ(a_i) =R_i ∈ L(REG) for 1≤i≤n. We define

X⁰ ={a⁰ |a∈X},

h1 : (X⁰∪Y)^∗ →X^∗ byh1(x⁰) = xfor x∈X and h1(y) = λ for y∈Y, h₂ : (X⁰∪Y)^∗ →Y^∗ byh₂(x⁰) = λ for x∈X and h₂(y) =y for y∈Y, R=

[n

i=1

a⁰_iRi.

Then we get

h⁻¹₁ (L) = {u₀x⁰₁u₁x⁰₂u₂. . . x⁰_ru_r |x₁x₂. . . x_r ∈L, u_i ∈Y^∗ for 1≤i≤r}, h⁻¹₁ (L)∩R = {x⁰₁u₁x⁰₂u₂. . . x⁰_ru_r |x₁x₂. . . x_r∈L, u_i ∈τ(x_i) for 1≤i≤r}, h₂(h⁻¹₁ (L)∩R) = {u₁u₂. . . u_r |x₁x₂. . . x_r ∈L, u_i ∈τ(x_i) for 1≤i≤r},

and finally,

τ(L) =h2(h⁻¹₁ (L)∩R).

By the closure properties required for a full AFL, we obtainτ(L)∈ L. 2

4.2 Algebraic Characterizations of Language Fami- lies

4.2.1 Characterizations of Language Families by Operations

The aim of this section is to present some characterizations of language families by algebraic means. We start with characterizations by closure properties under certain operations and containments of very special languages.

Definition 4.17 Regular expressions over an alphabet X are inductively defined as fol- lows:

1. ∅, λ and x with x∈X are regular expressions.

2. If R₁, R₂ and R are regular expressions, then (R₁+R₂), (R₁·R₂) and R^∗ are also regular expressions.

With any regular expression we associate a regular language.

Definition 4.18 For a regular expressionU over the alphabetX, the associated setM(U) is inductively defined by the following settings:

1. M(∅) = ∅, M(λ) = {λ} uand M(x) = {x} for x∈X,

(17)

2. If R₁, R₂ and R are regular expressions, then

M((R₁+R₂)) = M(R₁)∪M(R₂), M((R1·R2)) = M(R1)·M(R2),

M(R^∗) = (M(R))^∗.

Example 4.19 LetX ={a, b, c}. By condition 1. of Definition 4.17, R₀ =λ, R₁ =a, R₂ =b, R₃ =c

are regular expressions overX. By condition 2. of Definition 4.17, the following constructs are also regular expressions:

R⁰₁ = (R₁·R₁) = (a·a), R⁰⁰₁ = (R⁰₁·R₁) = ((a·a)·a), R⁰₂ = R^∗₂ =b^∗,

R⁰⁰₂ = (R⁰₂+R₁⁰⁰) = (b^∗+ ((a·a)·a))), R⁰₃ = R^∗₃ =c^∗,

R⁰⁰₃ = (R₃·R⁰₃) = (c·c^∗),

R₄ = (R⁰⁰₂ ·R₃⁰⁰) = ((b^∗+ ((a·a)·a)))·(c·c^∗)),

R₅ = (R₀+R₄) = (λ+ ((b^∗+ ((a·a)·a)))·(c·c^∗))).

According to Definition 4.18 we obtain the following associated sets (where obvious sim- plifications are done):

M(R₀) = {λ}, M(R₁) ={a}, M(R₂) = {b}, M(R₃) ={c}, M(R⁰₁) = =M((R₁·R₁)) ={a} · {a}={a²},

M(R⁰⁰₁) = M((R⁰₁·R₁)) ={a²} · {a}={a³}, M(R⁰₂) = M(R^∗₂) ={b}^∗ ={b^m :m≥0},

M(R⁰⁰₂) = M((R⁰₂+R⁰⁰₁)) = {b^m :m≥0} ∪ {a³}, M(R⁰₃) = M(R^∗₃) ={c}^∗ ={cⁿ :n ≥0},

M(R⁰⁰₃) = M((R₃·R⁰₃)) ={c}{cⁿ :n≥0}={cⁿ:c≥1}, M(R₄) = M((R⁰⁰₂·R⁰⁰₃)) = ({b^m :m≥0} ∪ {a³})· {cⁿ :n≥1}

= {b^mcⁿ:m≥0, n≥1} ∪ {a³cⁿ:n ≥3},

M(R₅) = M((R₀+R₄)) ={λ} ∪({b^mcⁿ:m ≥0, n ≥1} ∪ {a³cⁿ:n ≥3})

= {λ} ∪ {b^mcⁿ:m≥0, n≥1} ∪ {a³cⁿ:n ≥3}.

If U = ((. . .((R₁+R₂) +R₃) +. . .) +R_n), then to shorten the notation we write U =

Xn

i=1

Ri. Obviously,

M(U) = [n

i=1

M(R_i).

In an analogous way we use sums and unions over certain sets of indexes.

(18)

Theorem 4.20 A language L is regular if and only if there is a regular expression R such that M(R) =L.

Proof. ⇐= ) We show inductively that, for any regular expression U, the associated set M(U) is regular.

IfU is a regular expression by condition 1. of Definition 4.17, then all associated sets M(∅) = ∅,M(λ) = {λ}and M(x) ={x} withx∈X are finite and therefore regular (see Exercise ???).

Now letU be a regular expression, which is obtained from regular expressions R1, R2, andR according to condition 2. of Definition 4.17, and letM(R₁),M(R₂), and M(R) be the sets associated with R₁, R₂, and R, respectively. By induction hypotheses, M(R₁), M(R₂), and M(R) are regular. If U = (R₁ +R₂), then M(U) = M(R₁)∪M(R₂). By Theorem 4.2, M(U) is regular. If U = (R₁ ·R₂) or U = R^∗, then the associated sets M(U) = M(R₁)·M(R₂) or M(U) = (M(R))^∗, respectively, so sind nach den are also regular by Theorems 4.7 and 4.8,respectively.

=⇒ ) Let L be a regular language. Then there is a finite deterministic automaton A= (X, Z, z₀, F, δ) with T(A) = L. Without loss of generality we can assume that

Z ={0,1,2, . . . r} and z₀ = 0

for some r ≥0. For i, j ∈Z and 0≤k ≤r+ 1, by L^k_i,j we denote the set of all words w satisfying the following two conditions: Eigenschaften:

(a) δ(i, w) =j,

(b) for any u6=λ with w=uu⁰ and |u|<|w|, we have δ(i, u)< k.

Obviously,

L=T(A) = [

j∈F

L^r+1_0,j . (4.1)

We now prove that, for any set L^k_i,j, i, j ∈Z, 0 ≤k ≤r+ 1, there is a regular expression R^k_i,j with M(R_i,j^k ) =L^k_i,j. The proof will be given by induction on k.

Let k = 0. For i 6= j, by definition, L⁰_i,j consists of all words w, which directly transform the statei into statej, because by condition (b) no intermediate states occur.

Thusw is a word of length 1.Therefore

L⁰_i,j ={x:x∈X, δ(i, x) = j}.

This can be written as

L⁰_i,j = [

x∈X δ(i,x)=j

{x}.

Thus we also have

L⁰_i,j =M( X

x∈X δ(i,x)=j

x) = [

x∈X δ(i,x)=j

{x},

which proves our assertion. Ifi=j, in addition to the words of length 1 which transform i intoi, the empty word is in L⁰_i,i. Hence

L⁰_i,j =M(λ+ X

x∈X δ(i,x)=i

x) = {λ} ∪ [

x∈X δ(i,x)=j

{x}

4.1 Closure Properties of Language Families

Contents

Chapter 4

Algebraic Properties of Language Families

4.1 Closure Properties of Language Families

4.2 Algebraic Characterizations of Language Fami- lies

4.2.1 Characterizations of Language Families by Operations