Implementing finite state machines and learning Prolog along the way
Detmar Meurers: Intro to Computational Linguistics I OSU, LING 684.01
Overview
•A first introduction to Prolog
•Encoding finite state machines in Prolog
•Recognition and generation with finite state machines in Prolog
•Completing the FSM recognition and generation algorithms to use
• ǫtransitions
• abbreviations
•Encoding finite state transducers in Prolog
2
The Prolog programming language (1)
PROgrammation LOGique was invented by Alain Colmerauer and colleagues at Marseille and Edinburgh in the early 70s. A Prolog program is written in a subset of first order predicate logic. There are
•constants naming entities
– syntax: starting with lower-case letter (or number or single quoted) – examples:twelve, a, q 1, 14, ’John’
•variables over entities
– syntax: starting with upper-case letter (or an underscore) – examples:A, This, twelve,
•predicate symbols naming relations among entities
– syntax: predicate name starting with a lower-case letter with parentheses around comma-separated arguments
– examples:father(tom,mary),age(X,15)
3
The Prolog programming language (2)
A Prolog program consists of a set of Horn clauses:
•unit clauses or facts
– syntax: predicate followed by a dot – example:father(tom,mary).
•non-unit clauses or rules
– syntax:rel0 :- rel1, ..., reln. – example:grandfather(Old,Young) :-
father(Old,Middle), father(Middle,Young).
4
The Prolog programming language (3)
•No global variables: Variables only have scope over a single clause.
•No explicit typing of variables or of the arguments of predicates.
•Negation by failure: For\+(P)Prolog attempts to prove P, and if this succeeds, it fails.
5
A first Prolog program
grandfather.pl father(adam,ben).
father(ben,claire).
father(ben,chris).
grandfather(Old,Young) :- father(Old,Middle), father(Middle,Young).
Query:
?- grandfather(adam,X).
X = claire ? ; X = chris ? ; no
6
Recursive relations in Prolog
Compound terms as data structures
To define recursive relations, one needs a richer data structure than the constants (atoms) introduced so far: compound terms.
A compound term comprises a functor and a sequence of one or more terms, the argument.1Compound terms are standardly written in prefix notation.2
Example:
– binary tree:bin tree(mother, l-dtr, r-dtr) – example:bin tree(s, np, bin tree(vp,v,n))
1An atom can be thought of as a functor with arity 0.
2Infix and postfix operators can also be defined, but need to be declared.
7
Recursive relations in Prolog
Lists as special compound terms
•empty list: represented by the atom ”[]”
•non-empty list: compound term with ”.” as binary functor – first argument: first element of list (“head ”)
– second argument: rest of list (“tail”) Example:.(a, .(b, .(c, .(d,[]))))
8
Abbreviating notations for lists
•bracket notation:[ element1 | restlist ] Example:[a | [b | [c | [d | []]]]]
•element separator:[ element1 , element2 ]
=[ element1 | [ element2 | []]]
Example:[a, b, c, d]
9
An example for the four notations
[a,b,c,d] = .(a, .(b, .(c, .(d,[]))))
= [a | [b | [c | [d | []]]]]
= a
b c
d []
. . . .
10
Recursive relations in Prolog
Example relations I: append
•Idea: a relation concatenating two lists
•Example:?- append([a,b,c],[d,e],X).⇒X=[a,b,c,d,e]
append([],L,L).
append([H|T],L,[H|R]) :- append(T,L,R).
11
Recursive relations in Prolog
Example relations IIa: (naive) reverse
•Idea: reverse a list
•Example:?- reverse([a,b,c],X).⇒X=[c,b,a]
naive_reverse([],[]).
naive_reverse([H|T],Result) :- naive_reverse(T,Aux), append(Aux,[H],Result).
12
Recursive relations in Prolog
Example relations IIb: reverse
reverse(A,B) :-
reverse_aux(A,[],B).
reverse_aux([],L,L).
reverse_aux([H|T],L,Result) :- reverse_aux(T,[H|L],Result).
13
Some practical matters
•To start Prolog on the Linguistics Department Unix machines:
• SWI-Prolog:pl(on Mac OSX:swipl)
• SICStus:prologorM-x run-prologin XEmacs
•At the Prolog prompt (?-):
• Trace the next command:trace.
• Exit Prolog:halt.
• Consult a file in Prolog:[filename].3
•The manuals are accessible from the course web page.
3The.plsuffix is added automatically, but use single quotes if name starts with a capital letter or contains special characters such as ”.” or ”–”. For example[’MyGrammar’].or[’˜/file-1’].
14
Encoding finite state automata in Prolog
What needs to be represented?
A finite state automaton is a quintuple(Q,Σ, E, S, F)with
•Qa finite set of states
•Σa finite set of symbols, the alphabet
•S⊆Qthe set of start states
•F⊆Qthe set of final states
•Ea set of edgesQ×(Σ∪ {ǫ})×Q
15
Prolog representation of a finite state automaton
The FSA is represented by the following kind of Prolog facts:
•initial nodes:initial(nodename).
•final nodes:final(nodename).
•edges:arc(from-node, label, to-node).
16
A simple example
FSTN representation of FSM:
0
1 2
3 4
5 6 c
r u r o
l o
Prolog encoding of FSM:
initial(0).
final(1).
arc(0,c,6). arc(6,o,5). arc(5,l,4). arc(4,o,2).
arc(2,r,1). arc(2,u,3). arc(3,r,1).
17
An example with two final states
FSTN representation of FSM:
0 1
2 3 c
a d
b
Prolog encoding of FSM:
initial(0).
final(1). final(2).
arc(0,c,1). arc(1,d,1). arc(0,a,3). arc(3,b,2).
18
Recognition with FSMs in Prolog
fstn traversal basic.pl
test(Words) :-
initial(Node),
recognize(Node,Words).
recognize(Node,[]) :- final(Node).
recognize(FromNode,String) :- arc(FromNode,Label,ToNode), traverse(Label,String,NewString), recognize(ToNode,NewString).
traverse(First,[First|Rest],Rest).
19
Generation with FSMs in Prolog
generate :- test(X), write(X), nl, fail.
20
Encoding finite state transducers in Prolog What needs to be represented?
A finite state transducer is a 6-tuple(Q,Σ1,Σ2, E, S, F)with
•Qa finite set of states
•Σ1a finite set of symbols, the input alphabet
•Σ2a finite set of symbols, the output alphabet
•S⊆Qthe set of start states
•F⊆Qthe set of final states
•Ea set of edgesQ×(Σ1∪ {ǫ})×Q×(Σ2∪ {ǫ})
21
Prolog representation of a transducer
The only change compared to automata, is an additional argument in the representation of the arcs:
arc(from-node, label-in, to-node, label-out).
Example:
initial(1).
final(5).
arc(1,2,where,ou).
arc(2,3,is,est).
arc(3,4,the,la).
arc(4,5,exit,sortie).
arc(4,5,shop,boutique).
arc(4,5,toilet,toilette).
arc(3,6,the,le).
arc(6,5,policeman,gendarme).
22
Processing with a finite state transducer
test(Input,Output) :- initial(Node),
transduce(Node,Input,Output), write(Output),nl.
transduce(Node,[],[]) :- final(Node).
transduce(Node1,String1,String2) :- arc(Node1,Node2,Label1,Label2),
traverse2(Label1,Label2,String1,NewString1, String2,NewString2), transduce(Node2,NewString1,NewString2).
traverse2(Word1,Word2,[Word1|RestString1],RestString1, [Word2|RestString2],RestString2).
23
FSMs with ǫ transitions and abbreviations
Defining Prolog representations
1. Decide on a symbol to use to markǫtransitions:’#’
2. Define abbreviations for labels:
macro(Label,Word).
3. Define a relationspecial/1to recognize abbreviations and epsilon transitions:
special(’#’).
special(X) :- macro(X,_).
24
FSMs with ǫ transitions and abbreviations
Extending the recognition algorithm test(Words) :-
initial(Node),
recognize(Node,Words).
recognize(Node,[]) :- final(Node).
recognize(FromNode,String) :- arc(FromNode,Label,ToNode), traverse(Label,String,NewString), recognize(ToNode,NewString).
25
traverse(Label,[Label|RestString],RestString) :-
\+ special(Label).
traverse(Abbrev,[Label|RestString],RestString) :- macro(Abbrev,Label).
traverse(’#’,String,String).
special(’#’).
special(X) :- macro(X,_).
26
A tiny English fragment as an example
(fsa/ex simple engl.pl) initial(1).
final(9).
arc(1,np,3).
arc(1,det,2).
arc(2,n,3).
arc(3,pv,4).
arc(4,adv,5).
arc(4,’#’,5).
arc(5,det,6).
arc(5,det,7).
arc(5,’#’,8).
arc(6,adj,7).
arc(6,mod,6).
arc(7,n,9).
arc(8,adj,9).
arc(8,mod,8).
arc(9,cnj,4).
arc(9,cnj,1).
macro(np,kim).
macro(np,sandy).
macro(np,lee).
macro(det,a).
macro(det,the).
macro(det,her).
macro(n,consumer).
macro(n,man).
macro(n,woman).
macro(pv,is).
macro(pv,was).
macro(cnj,and).
macro(cnj,or).
macro(adj,happy).
macro(adj,stupid).
macro(mod,very).
macro(adv,often).
macro(adv,always).
macro(adv,sometimes).
27
Reading assignment
•Pages 1–26 of Fernando Pereira and Stuart Shieber (1987): Prolog and Natural-Language Analysis. Stanford: CSLI.
28