• Keine Ergebnisse gefunden

A three-valued logic for Inductive Logic Programming

N/A
N/A
Protected

Academic year: 2021

Aktie "A three-valued logic for Inductive Logic Programming"

Copied!
20
0
0

Wird geladen.... (Jetzt Volltext ansehen)

Volltext

(1)

LS{8Rep ort 4

Siegfried Bell and SteoWeb er

Informatik VI I I

University ofDortmund

D-44221 Dortmund

email: b ell@ls8.informatik.uni-dortmund.de

Dortmund,July 12,1993

1

Thisrep ortisanextendedversionofthepresentationOnthecloselogicalrelationshipbetweenFOIL

and the frameworks of Helft and Plotkin given at the 3rd International Workshop of Inductive Logic

(2)

Inductive LogicProgramming (ILP)is closely related toLogic Programming(LP) by

the name. We extract the basic dierences of ILP and LP by comparing b oth and give

denitions ofthebasic assumptions oftheir paradigms, e.g. closed worldassumption,the

op en domainassumptionand theop en world assumptionused in ILP.

We then dene a three{valued semantic of ILP and p oint out relations b etween our

semanticandtheframeworkofPlotkin,[Plotkin,1971],andofHelft,[Helft,1989]. Finally,

we show how FOIL, [Quinlan, 1990] ts in ourwork and we compare oursemantic with

other three-valued logics.

(3)

1 Intro duction 2

2 Logic Programming and Inductive Logic Programming 2

3 A logical framework 6

3.1 Assumptions in ILP : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 6

3.2 A partial Herbrandinterpretation: : : : : : : : : : : : : : : : : : : : : : : : 8

3.3 A semantic of aprogram P : : : : : : : : : : : : : : : : : : : : : : : : : : : 9

3.4 Valid hyp otheses : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 10

3.5 Summary : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 12

4 Relations 12

4.1 Helft: inductionasnonmontonic inference : : : : : : : : : : : : : : : : : : : 12

4.2 FOIL in a logicalframework : : : : : : : : : : : : : : : : : : : : : : : : : : : 13

4.3 Plotkin's inductive task : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 15

4.4 Comparisonwith otherthree{valued logics: : : : : : : : : : : : : : : : : : : 15

5 Conclusions and further works 16

(4)

1 Introduction

The relationship of logic, control and algorithms can b e expressed symb olically by the

equation: Algorithm = Logic + Control, [Kowalski,1979]. Kowalski used this equation

to dene that logic programs express the logical comp onent of algorithmsand logic pro-

gramming (LP) is thecorresp onding research area,[Apt and van Emden, 1982]. Further

Kowalskip ointedoutthatLP isclosely relatedtotheclosed worldassumption(CWA)in

contrasttothe op enworldassumption (OWA)in classical logic, [Kowalski,1989].

Inductive Logic Programming isa new area of research,intro duced and describ ed by

S. Muggleton, [Muggleton,1990], as the intersection of logic programming and machine

learning. We give some understandings of this description, b ecause there is no unique

view. FirstILP can b eundersto o dasprogramsynthesis(P.Brazdil), secondas aninduc-

tive metho dforprogrammers(K.Morik 1

)todevelopand verifyprogramsortorepresent

knowledgeand third ourunderstanding: We emphasize thewordslogic programmingand

understand ILP asthe intersection of theresearch areasof inductive logic and logic pro-

gramming. Inthefollowing wewill describ e ILPfrom this p oint of view.

In Inductive Logic Programming there are two main frameworks, and they are pro-

p osed by [Plotkin,1971] and [Helft,1989]. These frameworks dier in the underlying

assumptions; what they consider false, and what they consider to b e still unknown.

We characterize the rst one with the OWA. Examples of this framework are CLINT

[DeRaedt andBruyno oghe, 1989], GOLEM[Muggletonand Feng, 1992]and RDT

[Kietz and Wrob el, 1991]. FOIL [Quinlan, 1990] can b e used in two mo des. First it can

b eusedregardingtheOWA.Second,FOIL canb eusedregarding theCWA.Withresp ect

to this mo de, it is an example of Helft's framework. But neither Helft norQuinlan give

formaldenitions oftheir CWA orOWA.

Inthefollowing weargue,thattheCWAin ILPcannotthesameastheCWAin logic

programming. Therefore,we need a new denitionof theCWA in ILP (CWA

ILP

). Also

wedene, whatwe mean by a op en domainassumption (ODA)and theOWA.We then

formalizethesesettingswithamo delbasedthree{valuedinterpretationofformulasin ILP

takingintoconsiderationthewellknowneldoflogic programming. Finallyweshowthat

this framework incorp orates Plotkin's settings of inductive inference, that it is a logical

description ofFOIL and we compareourlogic withother three{valued logics.

2 Logic Programming and Inductive Logic Programming

In thissection we comparethesemanticsof logic programming(LP)and Inductive Logic

Programming(ILP) toworkout thedierences b etween these two paradigms.

Table1containsthelogic program1whichshouldrepresent 4elementsandstu. But

whatdo wemean,if we write such a logic program. K. Apt, [Apt,1990], argued thatwe

meaninlogic programmingthatwaterisan element,:::andmudisnotan element,water

is nostu :::Thismeaningis alsolisted in Table1.

Shepherdson, [Shepherdson, 1987], claimed, that we want to deduce from the fact

son(bill,joe,jane) and the parent{relation given by the logic program 2 in table 1 the

negative information :son(bil l ;har r y;maude) and further :par ent(har r y;maude;bil l).

1

(5)

Logic Program 1:

element(re). element(air). element(water). element(earth). stu(mud).

Meaning:

: element(mud). :stu(water). :stu(earth). :stu(air).stu(mud).

: stu(re). element(re). element(air). element(water). element(earth).

Logic Program 2:

parents(Father,Mother,Child) :-son(Child, Father, Mother).

parents(Father,Mother,Child) :-daughter(Child,Father,Mother).

son(bill,jo e,jane). daughter(sue,harold,maude).

Meaning A:

parents(Father,Mother,Child) son(Child, Father, Mother).

parents(Father,Mother,Child) daughter(Child, Father,Mother).

Meaning B:

parents(Father,Mother,Child) $

son(Child, Father,Mother)_ daughter(Child,Father,Mother).

:son(bill,harry,maude).

Table 1: Logic Programming

This means we should interpret the logic program 2 as given by Meaning B. Another

reading is the interpretation of rules as premises, material implication and conclusion,

Meaning A. This reading would not p ermit todeduce the negative information, b ecause

we canconclude from afalse premise everything. So,wemean withlogic program2,that

theparent{relationisdenedbytheson{andthedaughter{predicateortheparent{relation

isequivalent toson{or daughter{relation.

Kowalski,[Kowalski,1989],relatedthisinterpretationsof logicprogramstotheclosed

world assumption in contrast to the op en world assumption. Rules of logic programs

should b einterpreted as if-then-and-only-if{denitions regarding theCWA. The if-then-

and-only-if-reading reects ourMeaning B in table 1. He dened a static and adynamic

management of a data base dep ending up on theform of denitions: whether data (facts

orrules)is dened bymeansof complete if-then-and-only-if{denitions orbyonly means

of the if{halves, whether the only{if{half of an if-then-and-only-if{denitions is stated

explicitly orisassumedimplicitly, andwhether theonly-ifassumptionisundersto o d asa

statement of theobjectlanguage oras astatementof themeta-language.

The meaningoflogicprogramswasformally capturedbytheclosed worldassumption

[Reiter, 1978]. Because this solution has some computational problems, Clark prop osed

the Negation As Failure - rule, [Clark,1978], whereas Apt and van Emden describ ed

semantics based on least xp oints, [Apt andvan Emden,1982]. This corresp onds to a

(6)

We say:

BackgroundKnowledge p(c),p(a),p(b) incompleteset

and Positive Examples: p(a)! q(a) offormulas

p(b)! q(b)

Negative Examples: q(d)

We mean

Hyp othesis: p(X)! q(X)

Prediction: q(c)

Table 2: Inductive Logic Programming

is true whatis given and all is false, what is not given. Shepherdson summarizes in the

helpful motto: Wemean, what we say and nothing more.

Denition 1 (Closed World Assumption,CWA

LP

) LetPbea logicprogram. Then

dene

CWA

LP

(X):=P [f:A:P 6k Ag

Aless restrictiveinterpretationisusedinILPin whichwemeanmore,as whatwe say

regarding Shepherdson, orwe have a dynamicrepresentationregarding Kowalski.

Following theframeworkof [Plotkin,1971],the inductive learning taskin ILP can b e

describ ed as using a background theory B, a set of p ositive examples E +

and a set of

negative examples E . The desired output is a hyp othesis which describ es the p ositive

facts w.r.t. the background knowledge and no negative facts. Plotkin has proven, that

there is no hyp othesis which covers exactly the p ositive examples and no other fact. In

general, we can deducefrom ahyp othesis,facts which arenotknown. Letus call this set

of factsprediction.

Orin anothersense, in learning we complete an incomplete set of data with thepre-

diction. Table 2 shows such an incomplete set of formulas from which we want to infer

a complete set of formulas. Again we distinguish in whatwe sayand what we mean. In

a semantical view,wewant toinfer an interpretationin which the prediction is satised.

We dothisbyadding thehyp othesisp(X)!q(X)whichdescrib esourintendedinterpre-

tation and satises q(c). So, we see the learning taskas toinfer a correct interpretation

withresp ect toan incomplete setof formulas.

Manylearningsystemsmakesp ecialassumptionsin ordertoinferthecorrectinterpre-

tation. Forexample FOIL [Quinlan, 1990]usestwoassumptions, CWAand ODA.Letus

recall how FOIL can b e used:

1. A setof p ositive examples E +

ispresented tothesystem.

2. ThesetE willb egeneratedaccordingtotheclosedworldassumption(CWA)with

resp ect totheminimal signatureofE +

.

(7)

Positive examples: Relation memb er

e;[eo[x]]

o;[eo[x]]

[x];[eo[x]]

o;[o[x]]

[x];[o[x]]

[x];[[x]]

x;[x]

Backgroundknowledge Relation nul l[]

Relation *comp onents

[eo[x]];e;[o[x]]

[o[x]];o;[[x]]

[[x]];[x];[]

[x];x;[]

Hyp otheses: member (A;B): components(B;C ;D );=(A;C)

member (A;B): components(B;C ;D );member (A;D )

Prediction: member (1;[1])

Table 3: FOIL learns memb er

4. ThepredictionisnotaectedbytheCWAaccordingtotheop endomainassumption

(ODA).

Table 3 shows p ositive examples ofthe predicate memberand backgroundknowledge

in the form of the predicates null and components. FOIL learns two hyp otheses which

describ esthep ositiveexamplesanddo esnotdescrib ethenegativeexamples. Thenegative

examples are like nul l [x]and are generated bythe CWA. The prediction is for example

member (1;[1]), which is not aected by the CWA, although they are not mentioned as

p ositive examples.

First, we cannot use the CWA as dened by [Lloyd, 1987] in logic programming 2

.

Lloyd and others relate the CWA to the domain closure assumption DCA, whereas we

need a CWAwhich is relatedtotheODA.

Hence CWA

LP

is not equal to CWA

ILP

, but then we have to ask what is the for-

mal denition of the CWA

ILP

or why is theprediction notinuenced bythe CWA

ILP .

Obviously, thedenition of theCWA

ILP

mustallow forprediction.

One solution was presentedby Helft, [Helft,1989]. He relates the CWA tothe Op en

Domain Assumption(ODA).Helft prop oses thatonly a subsetof theformulas according

a given signature should b e inuenced by the CWA

ILP

. Table 4 shows an example of

Helft'sframeworkwithangivenlogicprogram. Heconstructsregardingthelogicprogram

aminimalmo delandnds thenhyp otheseswhichhavetob evalidin thisminimalmo del.

2

TheCWAdeterminesauniquemo del,b ecauseequalityaxioms,freenessaxiomsandthedomainclosure

axioms restrictsthedomain togroundterms,andthesetofformulasandthe completiondetermines the

(8)

Logic Program:

deputy(tom). deputy(x) ! corrupt(x).

rich(tom). rich(bill).

Minimal Mo del

deputy(tom),corrupt(tom),rich(tom),rich(bill)

Hyp otheses:

1

: rich(x) deputy(x)

2

: rich(x) corrupt(x)

3

: corrupt(x) rich(x)

I

CWA j=

1

;

2

Table4: Exampleof Helft

Intheexample aretherules

1 and

2

valid. Finallyhe appliesthehyp otheses regarding

totheODA.

But Helft do es not give any formal denition of the CWA

ILP

and the ODA. We

give in thenext section denitions of thesesassumptions based on athree-valued seman-

tic. Three{valued semantics arewidely used in logic programming, see [Fitting,1985]or

[Shepherdson, 1987],butwecannottransfertheirtechniquesidentically. Inlogicprogram-

ming three-valued semantics are motivated by the observation, that each logic program

succeeds, fails or go on forever. So it is natural to use three values in the semantics of

logic programs.

In ILP we have a similar case: each clause is true, false or unknown, b ecause the

prediction is aset of prop osition whichwe notknow. Theyaremayb e trueorfalse.

Inthewordsof Kowalski,[Kowalski,1989],logic programsordatabases havea static

representationwhereasin ILP, setsofformulas shouldb e havea dynamic representation.

This seems very natural, b ecause in learning it should b e p ossible to have anything to

learn which iscurrently unknownand laterknown.

3 A logical framework

Inthissectionweformalizeourlogicalframework. Wedenethetermsminimalsignature

and the three assumptions CWA

ILP

, O WA and O D A . We then give a semantic of

logic programsregarding ILPbased on partialHerbrandinterpretations and describ e the

validityofhyp otheses. Forlogicaldenitionsofrstorderlanguagewereferto[Apt,1990].

We restrict ourselvestothelanguagesof Horn clause logic which aremainly usedin ILP.

Anyundened expressions can b efound in [Lloyd, 1987]or[Apt,1990].

3.1 Assumptions in ILP

In the following, using = fPS;FS;Vg we denote a signature where PS is a set of

(9)

usual. TERM() is dened as the set of terms. A signature is the sp ecic part of a

language.

Let us dene whatismeant bythetermminimalgenerating signature.

Denition 2 (Generating signature) Let=fPS;FS;Vg bea signature,X asetof

formulas. We say that generatesX, i for every A2X,it holdsthat A2FORM().

Denition 3 (Minimal generating signature) Let=fPS;FS;Vg,

0

=fPS 0

;FS 0

;V 0

g betwo signatures andX a set of formulas. We write 0

,

i PSPS 0

,FS FS 0

. We say that is a minimalsignaturew.r.t. X,i

1. generatesX

2. for every generatingsignatureof X, isminimalunder .

The p oint is, that theprediction should notb e aected bythe CWA.Thus, thecom-

pletion of thegiven setof formulasis only done w.r.t. theminimal signature. Intable

3 we generalize the p ositive and negative examples to the hyp otheses ( member (A;B) :

components(B;C ;D );:::)which isalso valid fornumb ers, e.g. member (1;[1]).

The idea is, that if we have a mo del M for the given incomplete set of formulas,

then there is a mo del M 0

for the generalization. Clearly, M is not necessarily a mo del

of theses generalized formulas b ecause M may not interpret all instantiations of these

formulas. Hence,M isonlyapartialinterpretation. Butpartialinterpretationsareclosely

connectedwitha thirdtruth-value, b ecauseone canregardtheuninterpretedformulasas

formulas which havethe truth-value undened. Whathapp ens within the generalization

stepisthatformulaswhichhavethetruth-valueunknownareassignedthevaluetrue. And

theODA p ermits thatvalid formulasremains valid.

Let us put these ideas intoa morepreciseform. According tothewayhow FOIL can

b e used, we relate the ODA to a minimal signature and the CWA to a minimal mo del

based ontheminimal signatureof agiven set offormulasora logic program.

Denition 4 (Op en Domain Assumption,ODA) Let bea signature,

A2 FO RM() a formula, V a variable, :V! TERM()a state (or variable assign-

ment),

G

a ground variable assignment. (This means in TERM() are no variables.)

TheODA canbestated as: if

M

j=

G

(A);forevery state

G

then for every 0

, every state 0

G

:V'!TERM(

0

) and M

(M

0

) is a modelframe

w.r.t. ( 0

)

M

0

j= 0

G (A)

Denition 5 (Closed World Assumption, CWA

ILP

) Letbea minimalgenerating

signatureaccording toa setof formulas X,A is a ground atom. M

min

is the leastmodel

of X. k isthe semanticalentailmentrelation. Then dene

CWA(X):=fA:M

min

kA and A2FO RM()g

[f:A:M 6k A and A2FO RM()g

(10)

Inlogic programmingOWA and CWA areused mutually exclusive, [Kowalski,1989].

Kowalski related theCWA to a static representation and the OWA to a dynamic repre-

sentation. Thus,in ILP areCWA

ILP

andOWAmutually exclusive used. Buttoexpress

theOWA weneed moreformal denitions and acloser lo okto three{valuedsemantics.

3.2 A partial Herbrand interpretation

Here, we follow the framework of [Wagner, 1991]. For the signature , the Herbrand

Universe U

consists of all ground terms. We regard non{ground clauses as a dynamic

representationofthecorresp onding set ofground clausesformedbymeansofthecurrent

domain of individual s. Therefore, a state or variable assignment is simply a function

whichassignseachvariableagroundterm. TheHerbrandbase,B

,isthesetofallground

atoms. We usethesamesymb olforsyntacticaland semantical objects.

M = hM +

;M i should characterize a partial Herbrand interpretation in which M +

and M are disjoint subsets of B

. M

+

contains the objects which are true and M

containsthosewhich arefalse. PartialHerbrand interpretationsgiverise toamo del asto

a countermo del relation,=j.

Denition 6 (Satisfaction) Let bea state,F a formula. We writeM j=(F)to say

M satises F in a state .

M j=F i M j=(F) for all states

M j=A i A2M

+

M j=F

1

_:::_F

n

i M j=F

i

for some 1il eqn

M j=F

1

^:::^F

n

i M j=F

i

for all i=1;:::;n

M j=A F

1

^:::^F

n

i A2M +

or F

i

2M for all i=1;:::;n

M j=:F i M =j F

M=j A i A2M

M=j : F i M j=F

M=j F

1

^:::^F

n

i M=j F

i

for all i=1;:::;n

M=j F

1

_:::_F

n

i M=j F

i

for some 1il eqn

WewriteM j=X,whereXisasetofformulas,ifM j=A,foreachA2X. Thesymb ol

k denotesthe semanticalentailment relation. If, forexample, X is aset of formulasand

Aisaformula,thenwewriteXkA,ieverymo delof X ismo delofA,orifM j=X then

M j=A.

Denition 7 (Least Mo del) Let M =hM +

;M i;M 0

=hM 0+

;M 0

i be two modelsfor

a set X of formulas. We say that M is a submo del of M 0

(denoted by M M 0

) i

M +

M

0+

. We saythat M isa leastmodelfor X,if for every M 0

j=X, M M 0

holds.

Note that due to the fact that nothing is said ab out M and M 0

, there can b e

incomparablemo delswithresp ect to. Howeverwehavenowanorderingonmo delsand

interpretations,resp ectively.

Letus brieyrecall someresults of logic programming.

Observation 1 LetM;M 0

betwo interpretationssuch thatM M 0

,X asetofformulas

(11)

2. P has a leastmodel

Aprop erpartialHerbrandinterpretationdeterminesathree{valuedassignmentv

m on

groundatoms. Weusev

m

todescrib eOWAandgiveanequivalent denitionofCWA

ILP

regarding thetruth assignment v

m .

Denition 8 (Truth Assignment with O WA ) LetM =hM +

;M ibethepartialHer-

brandinterpretationandAagroundatom. TheOWAisusedithefollowingthree{valued

truth assignment,v

m

,yields:

v

m (A)=

8

>

<

>

:

tr ue if M j=A

fal se if M=j A

unk now n if other w ise

Denition 9 (Truth Assignment with CWA

ILP

) Let M =hM +

;M i be the partial

Herbrand interpretation and A a ground atom. be a minimalgenerating signature ac-

cordingtoasetofformulasX. TheCWA

ILP

isusedithethree{valuedtruthassignment,

v

m

, yields:

v

m (A)=

8

>

<

>

:

tr ue if M j=A andA2FO RM()

fal se if notM j=A and A2FO RM()

unk now n if other w ise

3.3 A semanticof a program P

Wenowgivetwointerpretations: I

CWA andI

Neg

. Therstinterpretationb ehavesexactly

liketheCWA

ILP

combinedwiththeODA,inthatthoseformulaswhicharenotexplicitly

interpretedastrueare(implicitly)false. Thesecondinterpretation,I

Neg

whichcorresp ond

totheOWA,interprets formulasasfalse only if they arestatedexplicitlyasfalse. Hence,

the interpretations are able tohandle formulas which are unknown, namely those which

areneither true norfalse.

Let B

0

b e the Herbrand base of a logical program P FORM(

0

) and B

the

Herbrand base of a minimal generating signature 0

of P. To determine the least

mo del,we intro duce accordingto [Apt,1990]an immediate consequence op eratorT

P (I).

Then, we iteratively apply T

P

(I) toconstruct M +

. The result is a leastxp oint of M +

withresp ect toaminimalgeneratingsignatureofP. The setM isdetermined bythe

Herbrandbase oftheminimal signaturewithout M +

.

Denition 10 (T op erator) The T

P

(I) operator maps one Herbrand interpretation to

another Herbrand interpretation.

A2T

P

(I) i for some substitution

and a clause B B

1

;:::;B

n of P

we have AB and I j=(B

1

;:::;B

n )

IfT

P (I

f )=I

f

holds,I

f

iscalledapre-xp oint ofT. Weknowfromtheresultsoflogic

programmingthat T

P (I

f

) isa mo del of P if I

f

is a pre{xp oint of T

P

. Anotherresult is

thateach monotonicop eratorT hasaleastxp oint whichis alsoits leastpre-xp oint. If

is aminimal generatingsignatureforP,we can constructM byB w ithoutM +

.

(12)

P: deputy(tom),deputy(x)! corrupt(x),rich(tom),rich(bill)

B

MS

deputy(tom),corrupt(tom),rich(tom),rich(bill)

deputy(bill) corrupt(bill)

M +

: deputy(tom),corrupt(tom),rich(tom),rich(bill)

M : deputy(bill) corrupt(bill)

Table5: Example ofI

CWA

Denition 11 (Interpretation I

CWA

) Let P bea programgenerated by a minimalsig-

nature . The interpretation I

CWA

is characterized by hM +

;M i, whereas M +

is a

pre{xpoint of T

P

(I). M isdetermined by B

or formally:

M =fa: a2B

anda62M +

g

Table5 shows aset offormulasP and thecorresp onding HerbrandbaseB

MS . M

+

is

constructedbytheleastmo del of P and M bythedierence ofB

MS

and M +

.

Observation 2 Let P FORM(

0

) be a program and 0

the minimal generating

signature of P. If FORM() = FORM(

0

), then M +

corresponds exactly to the least

model of P.

It is easy to see that the interpretation I

CWA

corresp onds to the CWA

ILP

and the

ODA. Now,let us formulate an interpretation which handles explicit negation regarding

the OWA. The interpretation I

Neg

of a set of formulas with negative examples is con-

structed from the partial Herbrand mo dels. M +

was constructedfrom theleast xp oint

of T

P

(I). Theset M isa leastmo del ofthe negativeexamples.

Denition 12 (Interpretation I

Neg

) The interpretation I

Neg

ischaracterized by

hM +

;M i, whereas M +

is a pre{xpoint of T

P

(I). M is a Herbrand Interpretation of

the negativeexamples, or formally:

M =fa: a2E and ais gr oundg

Clearly, formulas which are neitherin M +

norin M areunknown. We do not have

a normative truth-value assignment like the CWA (those things which are unknown are

false).

Table 6 shows an example of I

Neg

, with a set of clauses, P, and the corresp onding

Herbrand base B

MS . M

+

is constructed from the least mo del of P and M from the

given negativeexamples.

3.4 Valid hypotheses

Wedene asetofhyp othesesHwhichcharacterizestheintendedinterpretationofagiven

(13)

P: deputy(tom),deputy(x)! corrupt(x),rich(tom),rich(bill)

: deputy(bill)

B

MS

deputy(tom),corrupt(tom),rich(tom),rich(bill)

deputy(bill)corrupt(bill)

M +

: deputy(tom),corrupt(tom),rich(tom),rich(bill)

M : deputy(bill)

Table6: Example of I

Neg

the T

P

op erator the truth value true or false of the up to now formulas with the truth

value unknown. Formally,the T

P

op eratormapthe truthvalueunknown of anyformulas

totrue orfalse. But rst we dene theset H and in favorof its size twomorerestricted

sets. The interpretationM =hM +

;M i isrelated toI

CWA orI

Neg .

Denition 13 (Set of Hyp otheses H) LetPr e 2FO RM(),whereFO RM()isthe

minimal generatingsignature of the examples, Q2FO RM(). Q may be a conjunction

of atoms.

H=fQ Pr e: for al l ((Q)62M or (Pr e)62M +

)g

Helft's idea, [Helft, 1989], istorestrict this setH bythere is a,such that (Pr e)2

M +

. This ensures that of each hyp othesis b oth, the premise and conclusion, have to

b e once satised. This gives a b etter conrmation, b ecause for example, if we add the

negativeexamplewoman(tom),wedon'thavethehyp othesisdeputy(tom) woman(tom)

in contradiction toH.

Denition 14 (Helft's Set of Hyp otheses, H

Helft )

H

Helft

=fQ Pr e: 8 ((Q)62M or (Pr e)62M +

)and9 (Pr e)2M +

g

A further restriction is to regard only hyp otheses with the same conclusion. This is

called single predicate learning in contrast to multiple predicate learning of the set H,

[Muggleton, 1993].

Denition 15 (Single Predicate Learning, H

SPL )

H

SPL

=fQ Pr e: 8 ((Q)62M or (Pr e)62M +

)

and 9 (Pr e)2M +

and 8i;j Q

i

=Q

j g

Table7 shows a pro ceduralview of ourframework. We determinefrom agiven set of

formulasthe partialHerbrand interpretation. Weconstructthen aset H and trytomap

thetruthvalueunknowntotrueorfalsewiththeT

P

op erator. Hence,wehavedetermined

(14)

Givensetof

WFFs

)

Partial Herbrand

Interpretation

)

SetH )

Prediction

Table 7: A pro cedural view ofourframework

Logic Programming Inductive Logic Programming

I

CWA

I

neg

CWA + DCA CWA

ILP

+ ODA OWA

Table8: Summaryof theassumptions

3.5 Summary

WehaverelatedtheOWAtotheinterpretationI

neg

andthecombinationofCWA

ILP

andODAtotheinterpretationI

CWA

,table8. Thesetofhyp othesesisdeterminedregard-

ing the interpretation and is a set of all p ossible hyp otheses. Several learning algorithm

restrictsthis set,forexample FOIL withsinglepredicatelearning uses only asubsetwith

always the same conclusion in contrast to multiple predicate learning. However, in the

next sectionwe p oint outseveral relations which canb e discarded usingthis framework.

4 Relations

4.1 Helft: induction as nonmontonic inference

In the following we restrict the framework of [Helft,1989] to horn clause logic. It is

easy to see that the minimal mo dels coincide with the least xp oint under a Herbrand{

Interpretation.

Denition 16 (Helft's language) Helft uses a language which is restricted by the fol-

lowing items:

1. Groundable: Functionsymbols are not used and every variable in the head of the

clausehas toappearin thebody. Thisensures thata niteleastmodelalways exists.

2. Injective: Thereisasubstitutionorstate ,mappingtheliteralsofPontoelements

of A, such that for every pair of variables x;y of P, (x)6=(y). This ensures that

the set of generalizationsis niteandno unnecessaryvariables areintroduced.

Helftconstructsaminimalmo delwithresp ecttoasetofformulas. Inourterminology,

M +

corresp ondstotheleastmo delandM istheHerbrandbasewithout theleastmo del

(15)

P: deputy(tom),deputy(x)! corrupt(x),rich(tom),rich(bill)

1

: rich(x) deputy(x)

2

: rich(x) corrupt(x)

3

: corrupt(x) rich(x)

B

MS

deputy(tom),corrupt(tom),rich(tom),rich(bill)

deputy(bill)corrupt(bill)

M +

: deputy(tom),corrupt(tom),rich(tom),rich(bill)

M : deputy(bill),corrupt(bill)

I

CWA j=

1

;

2

Table 9: Example of Helft

we always have an unique minimal mo del. Helft has to distinguish b etween strong and

weak generalizations.

Denition 17 ( ) Let M be an interpretation and =P Q a clause. is a nite

setof groundable clausesand is the setof generalizationswhich aredened as:

M j=and M j=

Q, whereas

(Q)isone groundinstanceandQis injective over

M.

6k

Forevery clause 0

of (),if 0

k thenk 0

Theorem1 I

CWA ()j=

Pro of

Assume that I

CWA

()6j= P Q and P Q 2 . Without loss of generality assume

that,therule is ground. Note thatQ can b ea conjunctionof atoms.

We know from the denition of satisability that P 62 M +

and that Q 62 M . We also

know, that P and Q are in the minimal signature . Therefore P 2 M and Q 2 M +

b ecause allmemb ersof B

P

arebydenitionin M +

orin M .

In words of [Helft,1989], M 6j= P Q. This contradicts the rst condition of Helft's

denition ThereforeP Q isnotin and this concludesour pro of. 2

Table 9 shows an example of Helft, in which each prop osedgeneralization is satised

by I

CWA ().

4.2 FOIL in a logical framework

Weshow,inthissection,thatFOIL 3

isanexampleforalearningsystembasedonPlotkin's

conditions. Inanother view, FOIL isan example of Helft's framework. Both t into our

3

(16)

Input of FOIL p: bill,tom,hank,jo e,p ete,carl,go.

rich(p) bill,+ tom,+.

*corrupt(p) tom,+.

*deputy(p) tom,+;bill,- .

Outputof FOIL: ***Warning: thefollowing denition

***do esnotcover1 tuplein therelation

rich(A) :-*corrupt(A)

M +

: deputy(tom),corrupt(tom),rich(tom),rich(bill)

M : deputy(bill)

Table 10: Example ofFOIL

logical framework. We do not want to investigate the information{based selection of a

certaingeneralizationoutof thehyp othesis set. Wedo notregardanythinglike predicate

invention inthisframework. We alsodonotallowrules withnegativeliterals intheb o dy.

FOILisalearningsystemwhichinfersfromatrainingsetofexamplesaformulawhich

describ es theselected concept. The examples aregiven by groundatomsand aredivided

intop ositive examplesand backgroundknowledge. FOIL selects theb esthyp othesis with

an information based measure that covers as much as p ossible of the p ositive examples

andnonegativeexample. FOILworksintwomo des. Inonemo de,onlyp ositiveexamples

aregiven, and theCWAis usedtoconstructnegative examples. Thiscorresp onds toour

interpretationI

CWA

. The othermo de in which p ositive and negativeexamples aregiven

corresp ondstoourinterpretationI

Neg

. Ourobservationthateachhyp othesisisamemb er

of ourhyp othesesset H is indep endent of thechosen mo del.

Table 10 shows FOIL applied to theexample of Helft. The rule is valid b ecause the

conditionsofPlotkin arefullled. Theonlyexceptionisthatr ich(bil l )cannotb ededuced

from thehyp othesis andthebackground knowledge.

Theorem2 Each FOILlearning resultis a member of H

SPL .

Pro of:

This is an easy pro of b ecause H

SPL

is exactly constructed to fulll the conditions of

Plotkin and Helft. First, we show this for thecase where we give explicitly p ositive and

negative examples, and second, where FOIL uses the CWA. P Q is a FOIL learning

result.

1. Here theconditions ofPlotkin are fullled. By denitionof thesetH

SPL

,weknow

that P 62M orQ62M +

. Then we see thatP also hastob e in M +

,b ecause Q2

M +

,and wewanttodeduce P fromthe hyp othesis. Theconsistency isguaranteed

bythedisjointnessof M +

and M .

2. Inthiscase,weknowthatallp ositiveexamplesareinM +

andif wewant todeduce

the p ositive examples, P also has tob e in M +

. To deduce the p ositive examples,

we also need to satisfy the background knowledge. This means that for instance

+

(17)

Following the framework of [Plotkin,1971], the inductive learning task can b e describ ed

using a backgroundtheory B, a set of p ositive examples E +

,a set of negative examples

E ,ahyp othesis H,and a partial orderingrelation . Theconditions are:

1. The backgroundtheoryshould notentail thep ositive examples,

B 6k E +

(Priornecessity),

2. anditshould b econsistentwiththeexamples, B;E ;E +

6k?(Priorsatisability).

3. The backgroundtheoryand thehyp othesis have toentail the p ositive examples,

B;HkE +

(Posteriorsuciency).

4. Thebackgroundtheoryandthehyp othesis shouldnotentailthenegativeexamples,

B;H ;E 6k ?(PosteriorSatisability).

5. We should use the most sp ecic hyp othesis that fullls the ab ove requirements.

Thereforewe have an ordering on the hyp otheses, whereash1 h2 means h1 is

moregeneralthan h2

Plotkin hasproven ingeneral, thereis nohyp othesis which coversexactlythep ositive

examples and nothing else. Ingeneral, we can deduce from a hyp othesis facts which are

notknown. Wecallthissetoffactsprediction. ToseethatthesetH fullls theconditions

of Plotkin,wetesteach condition:

1. The background theoryshould notentail the p ositive examples. Thiscan b eguar-

anteedbytherestrictionthat backgroundtheory andtheexamples aredisjointed.

2. Thebackgroundknowledge andtheexamplesshould b econsistent issatised bythe

restrictionthat M +

and M aredisjointed.

3. The condition B;H j= E +

guarantees that we can conclude the p ositive examples

fromthehyp othesisandthebackgroundknowledge. Thismeans,iftheb o dyofeach

hyp othesis isin M +

,wecan deduce thehead. ThereforeQ2M +

.

4. The consistency issatised bytherestrictionthat M +

and M aredisjoint.

4.4 Comparison with other three{valued logics

Usingourframeworktheunderlying logicofILPcanb ecomparedwithotherthree{valued

logics fairly easy. We will compare our three{valued logic with the logics of Kleene and

L ukasiewicz, [Turner,1984], whereaswe only regardtheimplication.

First let us lo ok on Kleene's strong three{valued implication. Kleene's concern was

to mo del mathematical statements which that means his undecidable third truth value

represents neither true nor false, but rather a state of partial ignorance [Turner,1984].

We guess,thatthisiscommontoourapproach,wherethethirdtruthvaluealsoindicates

that we do not knowanything. Nevertheless,we have avery optimistic viewin learning,

(18)

Logic of ILP:

Pre! Q t f u

t t f t

f t t t

u t t t

Kleene'slogic:

Pre! Q t f u

t t f u

f t t t

u t u u

L ukasiewicz'slogic:

Pre! Q t f u

t t f i

f t t t

i t i t

Table 11: Comparisonof logics

did not want to conclude from an unknown sentence somethingknown. Sp ecially in the

case whereu! uisundecided.

For a second comparison let us lo ok of Lukasiewicz three{valued logic. In this logic

the case u ! u is true. Turner p oints out that the dierence origin from the dierent

interpretationofththird truthvalue. Whereasin contextofKleenethisthird truthvalue

is atruth gap, L ukasiewicz assigned the third truthvalue an statement,if no assignment

of trueorfalse is p ossible. Howeverhedeals with statementsab out thefuture.

Againhavewethemostoptimisticviewoftheimplication. Thismeansifitisp ossible

that a fact implies another fact, this statement is true. The other semantics are more

cautious,b ecause obviouslythis can b efalse.

5 Conclusions and further works

In this work, we have compared the basic assumptions of logic programming and ILP.

Then, we have given a three{valued mo del{based semantic of ILP and shown that FOIL

tsin this framework. The semantic reectsthetwomain assumptions which we need in

machine learning. Finally we have related our framework with other works of ILP and

three{valued logics.

Currently weare implementing ourframeworkwithISABELLE, [Paulson,1989],and

doing exp eriments. We arealso evaluatingmorerestrictions on theset H.

In further works we want to show the need to handle more than three truth{values

in machinelearning. Forexample,wewill showthatit isuseful tohandle inconsistencies

in an incremental learning system. We often come up with inconsistencies, and we want

to wait for a less busy time tob e able to make revisions b ecause making revisions is an

exp ensive task.

Acknowledgments

We would like to thank Katharina Morikand the memb ersof Computer Science VI I I of

theUniversityDortmund. This work waspartly supp orted bytheEurop ean Community

(19)

[Apt,1990] Apt,K. (1990). Logicprogramming. InvanLeeuwen,J.,editor, Handbookof

theoreticalComputer Science,Volume B. Elsevier.

[Apt and van Emden,1982] Apt, K. and van Emden, M. (1982). Contributions to the

theory oflogic programming. ACM,29(3).

[Clark,1978] Clark, K. (1978). Negation as failure. In Gallaire, H. Minker, J., editor,

Logic and Data Bases,pages 293{322.Plenum Press,New York.

[DeRaedt and Bruyno oghe,1989] DeRaedt, L. and Bruyno oghe, M. (1989). Towards

friendly concept-learners. In Proc. of the 11th Int. Joint Conf. on Artif. Intelligence,

pages 849{ 854,LosAltos, CA.Morgan Kaufman.

[Fitting,1985] Fitting,M. (1985). AKripke-kleene semantics forgeneral logic programs.

Logic Programming,2:295{312.

[Helft,1989] Helft,N.(1989).Induction asnonmonotonicinference. InProceedingsof the

1st InternationalConferenceonKnowledge Representation andReasoning.

[Kietz and Wrob el,1991] Kietz, J.-U. and Wrob el, S.(1991). Controlling thecomplexity

oflearninginlogicthroughsyntacticandtask-orientedmo dels.InMuggleton,S.,editor,

Inductive Logic Programming, chapter 16, pages 335 { 360. Academic Press, London.

Also available asArb eitspapiere derGMDNo.503,1991.

[Kowalski,1979] Kowalski, R. (1979). Logic for Problem Solving. Elsevier Science Pub-

lishing Co.Inc.

[Kowalski,1989] Kowalski, R.(1989). Logic for data description. In Mylop oulos, J.and

Bro die, M., editors, Readings in Articial Intelligence and Databases. Morgan Kauf-

mann.

[Lloyd, 1987] Lloyd, J. (1987). Foundations of Logic Programming. Springer Verlag,

Berlin, NewYork, 2ndedition.

[Muggleton, 1990] Muggleton,S.(1990). Inductive logic programming. In Proceedingsof

the 1th conferenceon Algorithmic Learning Theory.

[Muggleton, 1993] Muggleton,S.(1993). Inductive logic programming: Derivations, suc-

cesses and shortcommings. In Brazdil, P., editor, ECML-93, European Conferenceon

Machine Learning.

[Muggletonand Feng,1992] Muggleton, S. and Feng, C. (1992). Ecient induction of

logic programs. In Muggleton, S., editor, Inductive Logic Programming, chapter 13,

pages 281{298.Academic Press, London.

[Paulson, 1989] Paulson, J. (1989). The foundation of a generic theorem prover. Auto-

(20)

[Plotkin,1971] Plotkin, G. D. (1971). A further note on inductive generalization. In

Meltzer, B. and Michie, D., editors, Machine Intelligence, chapter 8, pages 101{124.

American Elsevier.

[Quinlan, 1990] Quinlan, J. (1990). Learning logical denitions from relations. Machine

Learning,5(3):239{ 266.

[Reiter,1978] Reiter, R. (1978). On closed world data bases. In Gallaire, H.Minker, J.,

editor, Logic and Data Bases.Plenum Press,New York.

[Shepherdson, 1987] Shepherdson, J.(1987). Negation in logic programming. In Minker,

J.,editor, Foundationsof Deductive Databasesand LogicProgramming.MorganKauf-

mann Publisher Inc.

[Turner,1984] Turner,R.(1984).LogicsforArticialIntelligence.EllisHorwo o dlimited.

[Wagner,1991] Wagner,G.(1991). Logic programmingwithstrongnegationandinexact

predicates. Logic and Computation,1(6):835{859.

Abbildung

Table 3 shows p ositive examples of the predicate member and background knowledge
Table 4: Example of Helft
Table 7 shows a pro cedural view of our framework. We determine from a given set of
Table 8: Summary of the assumptions
+2

Referenzen

ÄHNLICHE DOKUMENTE

In particular, we show that complete strategies, like breadth-first search or iterative deepening search, are a viable alterna- tive to incomplete strategies, like depth-first

Our approach to computing intensional answers is based on bottom-up clause generalization, which computes a clause subsuming the extension of an answer, thus representing an

If we compare RFuzzy with “Fuzzy Prolog”, we can see that 1*) It uses real numbers instead of unions of intervals between real numbers to represent truth values. Answers like v = [0,

To this end, we derived an enhanced characteriza- tion of Horn-SH IQ, the description logic for which this translation is possible, and explained how the generated Datalog programs

Recently, following the idea of symbolic DP within the Sit- uation Calculus (SC) by Boutilier and colleagues (Boutilier, Reiter, &amp; Price 2001), we have developed an algorithm,

Proof.. However, I will freely use them to denote a correspond- ing derivation in SKS according to Theorem 3.23.. Implementing those systems for proof search thus requires

We have also shown properties of our system that seem not to hold for any sequent presentation of classical logic, that is, strong admissibility of cut, weakening and contraction

  Arc consistency does not guarantee that all possible combinations of domain values are solutions to the constraint problem..   Possibly no combination of values from reduced