April 23, 2004
2
Basic formal concepts
1.1 Formal Background
1.1.1 Monoid
In this section we present a brief introduction in the monoid theory.
Definition 1.1.1 AmonoidM(semigroup with unit element) is a triplehM,◦,ei where
• M is a non-empty set, the set of monoid elements;
• ◦:M×M→M is the monoid operation (we will use infix notation);
• e∈M is the monoid unit element, and the following conditions hold:
• ∀a, b, c∈M(a◦(b◦c) = (a◦b)◦c) (associativity),
• ∀a∈M(a◦e=e◦a=a).
Later we often useM to denote the monoidM=hM,◦,ei.
LetM=hM,◦,eibe a monoid. ForT1, T2⊆M, we define:
T1◦T2:={t1◦t2|t1∈T1, t2∈T2}.
It is easy to show that for any monoid M = hM,◦,ei the triple Mc = 2M,◦,{e}
is again a monoid. Note the we use the same symbol for the basic monoid operation and for the “lifted” version that acts on subsets.
Definition 1.1.2 LetM=hM,◦,eiis a monoid and T ⊆M, then for every n∈N, we defineTn inductively:
1. T0={e}
2. Tk+1=Tk◦T andT+=S∞
k=1Tk,T∗=S∞
k=0Tk. We callT∗theiterationorKleene-Star ofT.
3
4 CHAPTER 1. BASIC FORMAL CONCEPTS Definition 1.1.3 A subsetT ⊆M of a monoidMis asubmonoidofMif
e∈T andT2⊆T.
Proposition 1.1.4 For any subset T ⊆ M of a monoid M the set T∗ is the least submonoid of Mcontaining T.
Definition 1.1.5 LetM1 =hM1,◦,e1iand M2=hM2,•,e2ibe monoids. A function ϕ:M1→M2 is amonoid homomorphismiff the following conditions hold:
• ϕ(e1) =e2
• ∀a, b∈M1(ϕ(a◦b) =ϕ(a)•ϕ(b))
LetT ⊆M1, then withϕ(T)⊆M2 we will denote the set{ϕ(m)|m∈T}.
Proposition 1.1.6 The composition of two homomorphisms is again homo- morphism.
Proposition 1.1.7 Letϕ:M1→M2 be a monoid homomorphism.
1. LetTα⊆M1, for everyα∈A=⇒ϕ(S
α∈ATα) =S
α∈Aϕ(Tα) 2. LetT1, T2⊆M1=⇒ϕ(T1◦T2) =ϕ(T1)•ϕ(T2)
3. LetT ∈M1=⇒ϕ(T∗) =ϕ(T)∗
Proof. The proofs of Points 1, 2 are straightforward. The proof of Point 3 is based on a simple induction, using Points 1, 2.
Using Point 2 of the above proposition it is easy shown that ϕ induces a monoid homomorphismϕ: 2M1→2M2.
Definition 1.1.8 Let M1 = hM1,◦,e1i and M2 = hM2,•,e2i be monoids.
Then the triple M1× M2=hM1×M2,◦ × •,he1,e2iiis a monoid called the monoid cartesian product. Here◦ × •denotes the function ◦ × •:M1×M2→ M1×M2 defined ashu1, v1i ◦ × • hu2, v2i=hu1◦u2, v1•v2i.
Example 1.1.9 Some examples of monoids are listed below:
1. Let a set X be given. The set of all relations r ⊆X×X with relation composition as monoid operation and the identity function as unit element is a monoidRel(X).
2. The set of all (partial) functions f : X → X is a submonoid of Rel(X) written (pFun(X))Fun(X).
3. The set of all bijection ofX defines a submonoid ofRX, which is a group called the permutation group ofX.
Analphabet Σ is a set of symbols.
Definition 1.1.10 A wordwover an alphabet Σ is an-tuple w=ha1, . . . , ani, where (n≥0) andai∈Σ fori= 1, . . . , n.
The integernis called thelengthofw. Later we use|w|to denote the length of w. ifn= 0 we have the only 0-tupleεcalled theempty word. Theconcatenation of two wordsu=ha1, . . . , aniandw=hb1, . . . , bmiover Σ is the word over Σ
u·v=ha1, . . . , an, b1, . . . , bmi.
Clearly|u·v|=|u|+|v|and|ε|= 0. Later we may write w=a1. . . an for w=ha1, . . . , ani. Recall that words represent finite strings.
Definition 1.1.11 The set of words over an alphabet Σ with concatenation as monoid operation and the empty word ε as unit element is called the free monoidΣ∗=hΣ∗,·, εiover Σ.
Clearly, the only submonoid of Σ∗ containing Σ is Σ∗, and for each element of Σ∗ there exists a unique presentation as a finite concatenation of elements of Σ.
Proposition 1.1.12 Let M = hM,◦,ei be a monoid, let Σ be an alphabet and ϕ : Σ → M be a function. Then the natural extension φ of ϕ over Σ∗, inductively defined as
1. φ(ε) =e
2. φ(σ·a) =φ(σ)◦ϕ(a), whereσ ∈Σ∗, a∈Σ,
is the only homomorphism between the monoidsΣ∗andMwhich is an extension ofϕ.
Definition 1.1.13 Lets∈Σ∗. Then v∈Σ∗ is aninfix ofsifs=u·v·wfor u, w∈Σ∗. Ifu=εthenv is aprefix ofs. Ifw=εthenv is a sufixofs.
Definition 1.1.14 Thereverse functionρ: Σ∗ →Σ∗ is defined as ρ(ε) :=ε, ∀a∈Σ :ρ(a) :=a, ∀u, v∈Σ∗:ρ(u·v) :=ρ(v)·ρ(u).
A subset L⊆Σ∗ is called a languageover Σ. Later we shall consider finite alphabets only.
1.2 Classical finite-state automata theory
1.2.1 Finite-state automata and automaton languages
Definition 1.2.1 A finite state automaton (FSA) is a tuple of the formA= hΣ, Q, I, F,∆i where Σ is the input alphabet,Q is the set of states, I ⊆Q is the set of initial states,F ⊆Q is the set of final states, and ∆⊆Q×Σε×Q is the transition relation. The transitionhq, a, si ∈∆ beginsatq,endsatpand has thelabela. Here Σε:= Σ∪ {ε}.
6 CHAPTER 1. BASIC FORMAL CONCEPTS If we have ∆⊆Q×Σ×Q, then the FSAAis calledε-free.
Definition 1.2.2 Apathc inAis a finite seqence ofk >0 transitions:
c=hq0, a1, q1i hq1, a2, q2i. . .hqk−1, ak, qki, where hqi−1, ai, qii ∈∆ for i= 1. . . k.
The integerk is called thelength of c. The wordw=a1·. . .·ak is called the labelofc. We may denote the pathc as
c=q0→a1 q1. . .→akqk.
Thenull pathofq∈Qis 0q beginning and ending inqwith labelε.
Definition 1.2.3 Thegeneralized transition relation∆∗is defined as the small- est subset ofQ×Σ∗×Qwith the following closure properties:
• for allq∈Qwe havehq, ε, qi ∈∆∗.
• For all q1, q2, q3 ∈ Q and w ∈ Σ∗, a ∈ Σε: if hq1, w, q2i ∈ ∆∗ and hq2, a, q3i ∈∆, then alsohq1, w·a, q3i ∈∆∗.
Clearly, the set ∆∗ contains exactly the triples of the begining, label and ending of the paths in A. For q ∈ Qwe write LA(q) := {w ∈ Σ∗| ∃q1 ∈ F : hq, w, q1i ∈∆∗}for the language of all words, which are labels of paths leading fromqto a final state. The languageacceptedbyAisL(A) :=LA(I). GivenA as above, the set of active statesfor input w∈Σ∗ is{q∈Q| hq0, w, qi ∈ ∆∗}.
The FSAA1 andA2are equivalent if L(A1) =L(A2).
1.2.2 -closure of finite-state automata
Definition 1.2.4 Theε-closureCε:Q→2Q is defined as Cε(q) ={q0 ∈Q| hq, ε, q0i ∈∆∗}.
Proposition 1.2.5 For any FSAA=hΣ, Q, I, F,∆ithere exists an equivalent ε-free FSA.
Proof. Let us consider the FSAA0=hΣ, Q, I, F∪ {q∈I|Cε(q)∩F 6=∅},∆0i, where
∆0={hq1, a, q2i |q1, q2∈Q, a∈Σ,∃q0∈Cε(q1) :hq0, a, q2i ∈∆}.
1.2.3 Closure properties of finite-state automata and carte- sian product construction
Proposition 1.2.6 1. For A∅=hΣ,∅,∅,∅,∅iwe haveL(A∅) =∅.
2. ForAε=hΣ,{q},{q},{q},∅iwe haveL(Aε) ={ε}.
3. Leta∈Σ. For the FSAAa=hΣ,{q0, q1},{q0},{q1},{hq0, a, q1i}iwe have L(Aa) ={a}.
1 2
1. For the FSAA=hΣ, Q1∪Q2, I1∪I2, F1∪F2,∆1∪∆2iwe haveL(A) = L(A1)∪L(A2).
2. For the FSAA=hΣ, Q1∪Q2, I1, F2,∆1∪∆2∪ {hq1, ε, q2i |q1∈F1&q2∈I2}i we have L(A) =L(A1)·L(A2).
3. Letq0 be a new state. For the FSA
A=hΣ, Q1∪ {q0},{q0}, F1∪q0,∆1∪ {hq0, ε, q1i |q1∈I1} ∪ {hq2, ε, q0i |q2∈F1}i we have L(A) =L(A1)∗.
4. For the FSAA=hΣ, Q1, F1, I1,∆0i, where∆0={hq2, a, q1i | hq1, a, q2i ∈
∆1}, we haveL(A) =ρ(L(A1)).
5. IfA1andA2areε-free FSA then for the FSAA=hΣ, Q1×Q2, I1×I2, F1×F2,∆0i, where∆0:={hhq1, q2i, a,hr1, r2ii | hq1, a, r1i ∈∆1 & hq2, a, r2i ∈∆2}we
have L(A) =L(A1)∩L(A2).
1.2.4 Deteminization of finite-state automata
A finite state automatonAisdeterministiciff the transition relation is a function δ:Q×Σ→Qand|I|= 1. LetA=hΣ, Q, q0, F, δibe a deterministic FSA, let δ∗:Q×Σ∗→Qdenote the generalized transition function, which is defined as in the nondeterministic case.
The following theorem gives us an effective procedure for determinization of FSA.
Theorem 1.2.8 For any FSAA=hΣ, Q, I, F,∆ithere exists an deterministic FSAAD such thatL(A) =L(AD).
Proof. Let AD :=
Σ,2Q, I, FD, δ
, where FD :={S ⊆2Q|Cε(S)∩F 6=∅}
andδ(S, a) :={q∈Q| ∃q1∈S :hq1, a, qi ∈∆∗}forS⊆Q, a∈Σ.
Clearly, the functionδ in the above proof is a total function.
Proposition 1.2.9 For any deterministic FSA A=hΣ, Q, q0, F, δiwhere δ is a total function the following holds: for the FSA A0 = hΣ, Q, q0, Q\F, δi we haveL(A0) = Σ∗\L(A).
1.2.5 Regular languages and regular expressions
A language over Σ is called anautomaton languageiff it is recognized by some FSA. The class of automaton languages contains all finite languages and is closed under union, concatenation, iteration, intersection, reversal and complementa- tion.
Definition 1.2.10 Let Σ be a finite alphabet. We define a regular language over Σ by induction:
1. ∅is a regular language;
2. {ε}is a regular language;
8 CHAPTER 1. BASIC FORMAL CONCEPTS 3. ifσ∈Σ, then{σ}is a regular language;
4. ifL1, L2⊆Σ∗ are regular languages then
• L1∪L2 is a regular language (union),
• L1·L2 is a regular language (concatenation),
• L∗1is a regular language (Kleene closure).
In a similar way we define regular expression.
Definition 1.2.11 Let a finite alphabet Σ be given. Aregular expressionover Σ is a word over Σ∪ {(,),∗,+,·,∅}, defined by induction:
1. ∅is a regular expression;
2. εis a regular expression;
3. ifσ∈Σ, thenσ is a regular expression;
4. ifE1 andE2 are regular expressions over Σ, then
• (E1+E2) is a regular expression,
• (E1·E2) is a regular expression,
• (E1∗) is a regular expression.
Each regular expressions naturally corresponds to a regular language and vice versa by the obvious correspondence between the definitions.
The following theorem presents a deeper result for the correspondence be- tween automaton languages and regular languages.
1.2.6 Equivalence between regular languages and automa- ton languages
Theorem 1.2.12 (Kleene) A language is regular if and only if it is an au- tomaton language.
Proof. (=⇒) This direction follows directly from the closure properties given in Propositions 1.2.6,1.2.7.
(⇐=) LetA=hΣ, Q, q1, F, δiis a deterministic FSA. Let Q={q1, . . . , qn}, for 0≤k≤nlet andQk:={q1, . . . , qk}. We define
Rijk ={v∈Σ∗|v=a1. . . am&δ∗(qi, v) =qj&∀l∈ {1, . . . , m−1}:δ∗(qi, a1. . . al)∈Qk}.
Note that the final condition∀l∈ {1, . . . , m−1}:δ∗(qi, a1. . . al)∈ {q1, . . . , qk} is vacuous in the situation wherem= 1. Clearly
R0ij =
{a∈Σ|δ(qi, a) =qj}, ifi6=j {ε} ∪ {a∈Σ|δ(qi, a) =qj}, ifi=j and
Rkij =Rk−1ij ∪(Rk−1ik ·(Rk−1kk )∗·Rk−1kj )
From the above presentation it follows by induction that for any i, j, k ∈ {1, . . . , n}the languagesRkij are regular. We have that
L(A) = [
{j|qj∈F}
Rn1j
Hence L(A) is a regular language.
It is well-known that for any regular languageLthere exists a deterministic FSAALsuch thatL(A) =LandALis minimal (w.r.t. number of states) among all deterministic FSA acceptingL. AL is unique up to renaming of states.
1.2.7 Minimal finite-state automata and the Myhill-Nerod equivalence relation
Let a finite alphabet Σ be given.
Definition 1.2.13 An equivalence relationR⊆Σ∗×Σ∗is calledright invariant if
∀u, v∈Σ∗:u R v⇒(∀w∈Σ∗:u·w R v·w).
Proposition 1.2.14 LetA=hΣ, Q, q0, F, δi is a deterministic FSA. Then the relation
RA={hu, vi ∈Σ∗×Σ∗|δ∗(q0, u) =δ∗(q0, v)}
is a right invariant equivalence relation and|{[s]RA|s∈Σ∗}| ≤ |Q|.
Proposition 1.2.15 LetL⊆Σ∗ is a language overΣ. Then the relation RL={hu, vi ∈Σ∗×Σ∗| ∀w∈Σ∗:u·w∈L↔v·w∈L}
is a right invariant equivalence relation.
Proposition 1.2.16 LetL⊆Σ∗ is a language over Σ, such that the index of RL is finite. Then for the deterministic FSA
AL=hΣ,{[s]RL|s∈Σ∗},[ε]RL,{[s]RL|s∈L},{h[u]RL, a,[u·a]RLi |u∈Σ∗, a∈Σ}i we have thatL(AL) =L.
Proposition 1.2.17 LetA=hΣ, Q, q0, F, δi is a deterministic FSA. Then we have that
RA⊆RL(A).
Theorem 1.2.18 For any deterministic FSA there exists a unique (up to state renaming) equivalent deterministic FSA, which is minimal with respect to the number of states.