• Keine Ergebnisse gefunden

1.2 Classical finite-state automata theory

N/A
N/A
Protected

Academic year: 2023

Aktie "1.2 Classical finite-state automata theory"

Copied!
9
0
0

Wird geladen.... (Jetzt Volltext ansehen)

Volltext

(1)

April 23, 2004

(2)

2

(3)

Basic formal concepts

1.1 Formal Background

1.1.1 Monoid

In this section we present a brief introduction in the monoid theory.

Definition 1.1.1 AmonoidM(semigroup with unit element) is a triplehM,◦,ei where

• M is a non-empty set, the set of monoid elements;

• ◦:M×M→M is the monoid operation (we will use infix notation);

• e∈M is the monoid unit element, and the following conditions hold:

• ∀a, b, c∈M(a◦(b◦c) = (a◦b)◦c) (associativity),

• ∀a∈M(a◦e=e◦a=a).

Later we often useM to denote the monoidM=hM,◦,ei.

LetM=hM,◦,eibe a monoid. ForT1, T2⊆M, we define:

T1◦T2:={t1◦t2|t1∈T1, t2∈T2}.

It is easy to show that for any monoid M = hM,◦,ei the triple Mc = 2M,◦,{e}

is again a monoid. Note the we use the same symbol for the basic monoid operation and for the “lifted” version that acts on subsets.

Definition 1.1.2 LetM=hM,◦,eiis a monoid and T ⊆M, then for every n∈N, we defineTn inductively:

1. T0={e}

2. Tk+1=Tk◦T andT+=S

k=1Tk,T=S

k=0Tk. We callTtheiterationorKleene-Star ofT.

3

(4)

4 CHAPTER 1. BASIC FORMAL CONCEPTS Definition 1.1.3 A subsetT ⊆M of a monoidMis asubmonoidofMif

e∈T andT2⊆T.

Proposition 1.1.4 For any subset T ⊆ M of a monoid M the set T is the least submonoid of Mcontaining T.

Definition 1.1.5 LetM1 =hM1,◦,e1iand M2=hM2,•,e2ibe monoids. A function ϕ:M1→M2 is amonoid homomorphismiff the following conditions hold:

• ϕ(e1) =e2

• ∀a, b∈M1(ϕ(a◦b) =ϕ(a)•ϕ(b))

LetT ⊆M1, then withϕ(T)⊆M2 we will denote the set{ϕ(m)|m∈T}.

Proposition 1.1.6 The composition of two homomorphisms is again homo- morphism.

Proposition 1.1.7 Letϕ:M1→M2 be a monoid homomorphism.

1. LetTα⊆M1, for everyα∈A=⇒ϕ(S

α∈ATα) =S

α∈Aϕ(Tα) 2. LetT1, T2⊆M1=⇒ϕ(T1◦T2) =ϕ(T1)•ϕ(T2)

3. LetT ∈M1=⇒ϕ(T) =ϕ(T)

Proof. The proofs of Points 1, 2 are straightforward. The proof of Point 3 is based on a simple induction, using Points 1, 2.

Using Point 2 of the above proposition it is easy shown that ϕ induces a monoid homomorphismϕ: 2M1→2M2.

Definition 1.1.8 Let M1 = hM1,◦,e1i and M2 = hM2,•,e2i be monoids.

Then the triple M1× M2=hM1×M2,◦ × •,he1,e2iiis a monoid called the monoid cartesian product. Here◦ × •denotes the function ◦ × •:M1×M2→ M1×M2 defined ashu1, v1i ◦ × • hu2, v2i=hu1◦u2, v1•v2i.

Example 1.1.9 Some examples of monoids are listed below:

1. Let a set X be given. The set of all relations r ⊆X×X with relation composition as monoid operation and the identity function as unit element is a monoidRel(X).

2. The set of all (partial) functions f : X → X is a submonoid of Rel(X) written (pFun(X))Fun(X).

3. The set of all bijection ofX defines a submonoid ofRX, which is a group called the permutation group ofX.

(5)

Analphabet Σ is a set of symbols.

Definition 1.1.10 A wordwover an alphabet Σ is an-tuple w=ha1, . . . , ani, where (n≥0) andai∈Σ fori= 1, . . . , n.

The integernis called thelengthofw. Later we use|w|to denote the length of w. ifn= 0 we have the only 0-tupleεcalled theempty word. Theconcatenation of two wordsu=ha1, . . . , aniandw=hb1, . . . , bmiover Σ is the word over Σ

u·v=ha1, . . . , an, b1, . . . , bmi.

Clearly|u·v|=|u|+|v|and|ε|= 0. Later we may write w=a1. . . an for w=ha1, . . . , ani. Recall that words represent finite strings.

Definition 1.1.11 The set of words over an alphabet Σ with concatenation as monoid operation and the empty word ε as unit element is called the free monoidΣ=hΣ,·, εiover Σ.

Clearly, the only submonoid of Σ containing Σ is Σ, and for each element of Σ there exists a unique presentation as a finite concatenation of elements of Σ.

Proposition 1.1.12 Let M = hM,◦,ei be a monoid, let Σ be an alphabet and ϕ : Σ → M be a function. Then the natural extension φ of ϕ over Σ, inductively defined as

1. φ(ε) =e

2. φ(σ·a) =φ(σ)◦ϕ(a), whereσ ∈Σ, a∈Σ,

is the only homomorphism between the monoidsΣandMwhich is an extension ofϕ.

Definition 1.1.13 Lets∈Σ. Then v∈Σ is aninfix ofsifs=u·v·wfor u, w∈Σ. Ifu=εthenv is aprefix ofs. Ifw=εthenv is a sufixofs.

Definition 1.1.14 Thereverse functionρ: Σ →Σ is defined as ρ(ε) :=ε, ∀a∈Σ :ρ(a) :=a, ∀u, v∈Σ:ρ(u·v) :=ρ(v)·ρ(u).

A subset L⊆Σ is called a languageover Σ. Later we shall consider finite alphabets only.

1.2 Classical finite-state automata theory

1.2.1 Finite-state automata and automaton languages

Definition 1.2.1 A finite state automaton (FSA) is a tuple of the formA= hΣ, Q, I, F,∆i where Σ is the input alphabet,Q is the set of states, I ⊆Q is the set of initial states,F ⊆Q is the set of final states, and ∆⊆Q×Σε×Q is the transition relation. The transitionhq, a, si ∈∆ beginsatq,endsatpand has thelabela. Here Σε:= Σ∪ {ε}.

(6)

6 CHAPTER 1. BASIC FORMAL CONCEPTS If we have ∆⊆Q×Σ×Q, then the FSAAis calledε-free.

Definition 1.2.2 Apathc inAis a finite seqence ofk >0 transitions:

c=hq0, a1, q1i hq1, a2, q2i. . .hqk−1, ak, qki, where hqi−1, ai, qii ∈∆ for i= 1. . . k.

The integerk is called thelength of c. The wordw=a1·. . .·ak is called the labelofc. We may denote the pathc as

c=q0a1 q1. . .→akqk.

Thenull pathofq∈Qis 0q beginning and ending inqwith labelε.

Definition 1.2.3 Thegeneralized transition relation∆is defined as the small- est subset ofQ×Σ×Qwith the following closure properties:

• for allq∈Qwe havehq, ε, qi ∈∆.

• For all q1, q2, q3 ∈ Q and w ∈ Σ, a ∈ Σε: if hq1, w, q2i ∈ ∆ and hq2, a, q3i ∈∆, then alsohq1, w·a, q3i ∈∆.

Clearly, the set ∆ contains exactly the triples of the begining, label and ending of the paths in A. For q ∈ Qwe write LA(q) := {w ∈ Σ| ∃q1 ∈ F : hq, w, q1i ∈∆}for the language of all words, which are labels of paths leading fromqto a final state. The languageacceptedbyAisL(A) :=LA(I). GivenA as above, the set of active statesfor input w∈Σ is{q∈Q| hq0, w, qi ∈ ∆}.

The FSAA1 andA2are equivalent if L(A1) =L(A2).

1.2.2 -closure of finite-state automata

Definition 1.2.4 Theε-closureCε:Q→2Q is defined as Cε(q) ={q0 ∈Q| hq, ε, q0i ∈∆}.

Proposition 1.2.5 For any FSAA=hΣ, Q, I, F,∆ithere exists an equivalent ε-free FSA.

Proof. Let us consider the FSAA0=hΣ, Q, I, F∪ {q∈I|Cε(q)∩F 6=∅},∆0i, where

0={hq1, a, q2i |q1, q2∈Q, a∈Σ,∃q0∈Cε(q1) :hq0, a, q2i ∈∆}.

1.2.3 Closure properties of finite-state automata and carte- sian product construction

Proposition 1.2.6 1. For A=hΣ,∅,∅,∅,∅iwe haveL(A) =∅.

2. ForAε=hΣ,{q},{q},{q},∅iwe haveL(Aε) ={ε}.

3. Leta∈Σ. For the FSAAa=hΣ,{q0, q1},{q0},{q1},{hq0, a, q1i}iwe have L(Aa) ={a}.

(7)

1 2

1. For the FSAA=hΣ, Q1∪Q2, I1∪I2, F1∪F2,∆1∪∆2iwe haveL(A) = L(A1)∪L(A2).

2. For the FSAA=hΣ, Q1∪Q2, I1, F2,∆1∪∆2∪ {hq1, ε, q2i |q1∈F1&q2∈I2}i we have L(A) =L(A1)·L(A2).

3. Letq0 be a new state. For the FSA

A=hΣ, Q1∪ {q0},{q0}, F1∪q0,∆1∪ {hq0, ε, q1i |q1∈I1} ∪ {hq2, ε, q0i |q2∈F1}i we have L(A) =L(A1).

4. For the FSAA=hΣ, Q1, F1, I1,∆0i, where∆0={hq2, a, q1i | hq1, a, q2i ∈

1}, we haveL(A) =ρ(L(A1)).

5. IfA1andA2areε-free FSA then for the FSAA=hΣ, Q1×Q2, I1×I2, F1×F2,∆0i, where∆0:={hhq1, q2i, a,hr1, r2ii | hq1, a, r1i ∈∆1 & hq2, a, r2i ∈∆2}we

have L(A) =L(A1)∩L(A2).

1.2.4 Deteminization of finite-state automata

A finite state automatonAisdeterministiciff the transition relation is a function δ:Q×Σ→Qand|I|= 1. LetA=hΣ, Q, q0, F, δibe a deterministic FSA, let δ:Q×Σ→Qdenote the generalized transition function, which is defined as in the nondeterministic case.

The following theorem gives us an effective procedure for determinization of FSA.

Theorem 1.2.8 For any FSAA=hΣ, Q, I, F,∆ithere exists an deterministic FSAAD such thatL(A) =L(AD).

Proof. Let AD :=

Σ,2Q, I, FD, δ

, where FD :={S ⊆2Q|Cε(S)∩F 6=∅}

andδ(S, a) :={q∈Q| ∃q1∈S :hq1, a, qi ∈∆}forS⊆Q, a∈Σ.

Clearly, the functionδ in the above proof is a total function.

Proposition 1.2.9 For any deterministic FSA A=hΣ, Q, q0, F, δiwhere δ is a total function the following holds: for the FSA A0 = hΣ, Q, q0, Q\F, δi we haveL(A0) = Σ\L(A).

1.2.5 Regular languages and regular expressions

A language over Σ is called anautomaton languageiff it is recognized by some FSA. The class of automaton languages contains all finite languages and is closed under union, concatenation, iteration, intersection, reversal and complementa- tion.

Definition 1.2.10 Let Σ be a finite alphabet. We define a regular language over Σ by induction:

1. ∅is a regular language;

2. {ε}is a regular language;

(8)

8 CHAPTER 1. BASIC FORMAL CONCEPTS 3. ifσ∈Σ, then{σ}is a regular language;

4. ifL1, L2⊆Σ are regular languages then

• L1∪L2 is a regular language (union),

• L1·L2 is a regular language (concatenation),

• L1is a regular language (Kleene closure).

In a similar way we define regular expression.

Definition 1.2.11 Let a finite alphabet Σ be given. Aregular expressionover Σ is a word over Σ∪ {(,),∗,+,·,∅}, defined by induction:

1. ∅is a regular expression;

2. εis a regular expression;

3. ifσ∈Σ, thenσ is a regular expression;

4. ifE1 andE2 are regular expressions over Σ, then

• (E1+E2) is a regular expression,

• (E1·E2) is a regular expression,

• (E1) is a regular expression.

Each regular expressions naturally corresponds to a regular language and vice versa by the obvious correspondence between the definitions.

The following theorem presents a deeper result for the correspondence be- tween automaton languages and regular languages.

1.2.6 Equivalence between regular languages and automa- ton languages

Theorem 1.2.12 (Kleene) A language is regular if and only if it is an au- tomaton language.

Proof. (=⇒) This direction follows directly from the closure properties given in Propositions 1.2.6,1.2.7.

(⇐=) LetA=hΣ, Q, q1, F, δiis a deterministic FSA. Let Q={q1, . . . , qn}, for 0≤k≤nlet andQk:={q1, . . . , qk}. We define

Rijk ={v∈Σ|v=a1. . . am(qi, v) =qj&∀l∈ {1, . . . , m−1}:δ(qi, a1. . . al)∈Qk}.

Note that the final condition∀l∈ {1, . . . , m−1}:δ(qi, a1. . . al)∈ {q1, . . . , qk} is vacuous in the situation wherem= 1. Clearly

R0ij =

{a∈Σ|δ(qi, a) =qj}, ifi6=j {ε} ∪ {a∈Σ|δ(qi, a) =qj}, ifi=j and

Rkij =Rk−1ij ∪(Rk−1ik ·(Rk−1kk )·Rk−1kj )

From the above presentation it follows by induction that for any i, j, k ∈ {1, . . . , n}the languagesRkij are regular. We have that

(9)

L(A) = [

{j|qj∈F}

Rn1j

Hence L(A) is a regular language.

It is well-known that for any regular languageLthere exists a deterministic FSAALsuch thatL(A) =LandALis minimal (w.r.t. number of states) among all deterministic FSA acceptingL. AL is unique up to renaming of states.

1.2.7 Minimal finite-state automata and the Myhill-Nerod equivalence relation

Let a finite alphabet Σ be given.

Definition 1.2.13 An equivalence relationR⊆Σ×Σis calledright invariant if

∀u, v∈Σ:u R v⇒(∀w∈Σ:u·w R v·w).

Proposition 1.2.14 LetA=hΣ, Q, q0, F, δi is a deterministic FSA. Then the relation

RA={hu, vi ∈Σ×Σ(q0, u) =δ(q0, v)}

is a right invariant equivalence relation and|{[s]RA|s∈Σ}| ≤ |Q|.

Proposition 1.2.15 LetL⊆Σ is a language overΣ. Then the relation RL={hu, vi ∈Σ×Σ| ∀w∈Σ:u·w∈L↔v·w∈L}

is a right invariant equivalence relation.

Proposition 1.2.16 LetL⊆Σ is a language over Σ, such that the index of RL is finite. Then for the deterministic FSA

AL=hΣ,{[s]RL|s∈Σ},[ε]RL,{[s]RL|s∈L},{h[u]RL, a,[u·a]RLi |u∈Σ, a∈Σ}i we have thatL(AL) =L.

Proposition 1.2.17 LetA=hΣ, Q, q0, F, δi is a deterministic FSA. Then we have that

RA⊆RL(A).

Theorem 1.2.18 For any deterministic FSA there exists a unique (up to state renaming) equivalent deterministic FSA, which is minimal with respect to the number of states.

Referenzen

ÄHNLICHE DOKUMENTE

Pumping Lemma Minimal Automata Properties

Next, we consider the problem of computing the length of a shortest reset word for a given automaton: we establish that this problem is complete for the class FP NP[log] of all

Hence, we have 11 basic colour terms in English: red, orange, yellow, green, blue, purple, pink, brown, grey, black and

● in active systems, the S in intransitive clauses is marked with ergative or accusative, depending on its semantic role. ● Manipuri (Tibeto-Burman, Northern India) əy-nə

morphological features are only relevant to the dependent construction and not to the heada. Data from

More specifically, there is a procedure that, given a normal form formula, works in time bounded polynomially in the length of its input, exponentially in the number of its

Recall that the state complexity of the reverse for R-trivial regular languages with the state complexity n is 2 n−1 and there exists a ternary witness language meeting the bound

With this dynamic programming algorithm, the recognition problem can be trivially solved in linear time with respect to the length of the input, provided:.. there is a linear number