• Keine Ergebnisse gefunden

Knowledge-Based Systems and Deductive Databases

N/A
N/A
Protected

Academic year: 2021

Aktie "Knowledge-Based Systems and Deductive Databases"

Copied!
76
0
0

Wird geladen.... (Jetzt Volltext ansehen)

Volltext

(1)

Wolf-Tilo Balke Philipp Wille

Institut für Informationssysteme

Technische Universität Braunschweig http://www.ifis.cs.tu-bs.de

Knowledge-Based Systems

and Deductive Databases

(2)

6.1 Implementation of Datalog in DBs 6.2 Top-Down-Evaluation

!"#$%&'(&)*+,&'-./,0&1,-+"'-2&'3456&-2+0+7+,&,-8--9#%:);<%#-*+%=&-8->?<%<@@-9<%%&-8-A:A.-8-;B-*C+3",4?$&<(- D-

Next Lecture

(3)

•  The Datalog semantics are given by Herbrand interpretations

– A Datalog program ! is a set of Horn clauses – Any Herbrand interpretation that satisfies ! is a

model

– Unfortunately, it is not quite that easy to compute a Herbrand model for !

– Also, multiple models exists per program – which conveys the intended semantic?

!"#$%&'(&)*+,&'-./,0&1,-+"'-2&'3456&-2+0+7+,&,-8--9#%:);<%#-*+%=&-8->?<%<@@-9<%%&-8-A:A.-8-;B-*C+3",4?$&<(- E-

Semantics of Datalog

(4)

•  Datalogf

– Datalogf is computationally complete

– The intended semantic of a Datalogf program is given by the least Herbrand model

• For the least Herbrand model "#$$"$%$"&$for any other Herbrand model "&$holds

• This leads to "$'($)$*#$$whereas * is the set of all #$$whereas * is the set of all Herbrand models

• Informally: The least model is a model for ! and does not contain superfluous statements

!"#$%&'(&)*+,&'-./,0&1,-+"'-2&'3456&-2+0+7+,&,-8--9#%:);<%#-*+%=&-8->?<%<@@-9<%%&-8-A:A.-8-;B-*C+3",4?$&<(- F-

Semantics of Datalog

(5)

•  Operational semantics for Datalogf

– To compute the least Herbrand model, a fixpoint iteration approach can be employed

• Start with an empty set of ground atoms

• Iteratively refine set (by adding more atoms)

• Fixpoint iteration is monotonous (set is only expanded in each iteration)

• As soon as the fixpoint is reached, set becomes stable (i.e.

no changes)

• The method is finite for Datalogf

• The stable result is equivalent to the least Herbrand model

!"#$%&'(&)*+,&'-./,0&1,-+"'-2&'3456&-2+0+7+,&,-8--9#%:);<%#-*+%=&-8->?<%<@@-9<%%&-8-A:A.-8-;B-*C+3",4?$&<(- G-

Semantics of Datalog

(6)

•  Iterative Transformation step:

– Elementary production rule +!

– Idea: Apply all given rules with premises contained in the set of the previous step

• For ,-(./# this puts all atoms into the result

• For following steps, everything which can be followed by a single application of any rule is added

!"#$%&'(&)*+,&'-./,0&1,-+"'-2&'3456&-2+0+7+,&,-8--9#%:);<%#-*+%=&-8->?<%<@@-9<%%&-8-A:A.-8-;B-*C+3",4?$&<(- H-

Semantics of Datalog

(7)

•  Datalogneg is more difficult

– Datalogneg does not provide more expressiveness, but allows for more natural modeling

– Problems:

• Datalogneg is potentially unsafe (i.e. generates infinite or excessively large models)

• Datalogneg is potentially ambiguous (i.e. multiple distinctive models possible)

– In general, no least Herbrand model possible – Instead, multiple minimal Herbrand Models with

0$"$12342$567$839385:$";<7:'$=$"&$>?42$@25@$"&A"$B – Intersection of minimal models is not a model itself…B

!"#$%&'(&)*+,&'-./,0&1,-+"'-2&'3456&-2+0+7+,&,-8--9#%:);<%#-*+%=&-8->?<%<@@-9<%%&-8-A:A.-8-;B-*C+3",4?$&<(- I-

Semantics of Datalog

(8)

•  Datalogneg problems can be addressed by restricting possible programs

– Ambiguity: Assume negation as failure

• A non-provided fact is assumed to be false

– Safety: Enforce positive grounding

• Each variable appearing in a negative clause needs to appear in a positive clause

• Variable is positively grounded

• Evaluation can thus be restricted to known facts,

examination of the whole (potentially infinite) universe not necessary

!"#$%&'(&)*+,&'-./,0&1,-+"'-2&'3456&-2+0+7+,&,-8--9#%:);<%#-*+%=&-8->?<%<@@-9<%%&-8-A:A.-8-;B-*C+3",4?$&<(- J-

Semantics of Datalog

(9)

– These restrictions allow a deterministic choice of models

• Negative dependencies of ground instances induce a preference on models

• “Best” model wrt. that preference is called perfect model and is also a minimal model

• Perfect model is the intended semantics of Datalogneg

– Operative semantics of Datalogneg is given by iterated fixpoint iteration

• Take advantage of positive grounding and work along program partitions representing the program strata

!"#$%&'(&)*+,&'-./,0&1,-+"'-2&'3456&-2+0+7+,&,-8--9#%:);<%#-*+%=&-8->?<%<@@-9<%%&-8-A:A.-8-;B-*C+3",4?$&<(- K-

Semantics of Datalog

(10)

– For each strata partition, consider only rules which are positively grounded in a previous strata

– On the union of those rules and the previous ground instances, apply normal fixpoint iteration

• i.e. iterate a fixpoint iteration along the program strata

•  Both fixpoint iteration and iterative fixpoint iteration are very inefficient

– Better algorithms in the next lectures….

!"#$%&'(&)*+,&'-./,0&1,-+"'-2&'3456&-2+0+7+,&,-8--9#%:);<%#-*+%=&-8->?<%<@@-9<%%&-8-A:A.-8-;B-*C+3",4?$&<(- LM-

Semantics of Datalog

(11)

•  In the previous week, we have seen the elementary production operator +!$B$B

– But how can we put this operator to use?

– Many deductive DBMS do not choose to implement everything “from the scratch”

•  Especially implementations in Prolog and Lisp are very common

– However, for reliably storing huge amounts of data (e.g.

the facts in the extensional DB), there is already a wonderful technology:

Relational Databases

•  Also, most applications already use RDBs and SQL

!"#$%&'(&)*+,&'-./,0&1,-+"'-2&'3456&-2+0+7+,&,-8--9#%:);<%#-*+%=&-8->?<%<@@-9<%%&-8-A:A.-8-;B-*C+3",4?$&<(- LL-

6.1 Datalog

neg

in Algebra

(12)

In this section, we will map Datalogneg to Relational Algebra

– This will allow us an implementation of Datalog concepts within a RDB

– Idea:

•  Take datalog program

•  Translate to relational algebra

•  Evaluate the algebra statement

•  Return results

– Also, this will allow us to take advantage of established features of databases

•  Query optimization

Indexing!

•  ACID properties

•  Load balancing

•  etc…

!"#$%&'(&)*+,&'-./,0&1,-+"'-2&'3456&-2+0+7+,&,-8--9#%:);<%#-*+%=&-8->?<%<@@-9<%%&-8-A:A.-8-;B-*C+3",4?$&<(- LD-

6.1 Datalog

neg

in Algebra

(13)

•  When using the Relational Model and

Relational Algebra, we assume the following:

– Data (i.e. facts) is stored in multiple relations

– A relation C over some sets DE, …, D9 is a subset of their Cartesian product

• C$%$DE ! … ! D9

• The sets DE, …, D9 are finite and are called domains

!"#$%&'(&)*+,&'-./,0&1,-+"'-2&'3456&-2+0+7+,&,-8--9#%:);<%#-*+%=&-8->?<%<@@-9<%%&-8-A:A.-8-;B-*C+3",4?$&<(- LE-

6.1 Relational Algebra

(14)

•  Relational algebra operations available

– Base operations of relational algebra

– Derived operations

!"#$%&'(&)*+,&'-./,0&1,-+"'-2&'3456&-2+0+7+,&,-8--9#%:);<%#-*+%=&-8->?<%<@@-9<%%&-8-A:A.-8-;B-*C+3",4?$&<(- LF-

6.1 Relational Algebra

F$B G56@7>359$H6;<?4@B

IB J7:74@3;9B

KB H6;L74@3;9B

M$B J7@$N93;9B

O$B J7@$"39?>B

PB Q;39>$R$C$P$J$S$IT$RC$F$JU$$UB

V#$WB X7Y@$Z$C3[2@$J783$Q;39>$R$C$V$J$S$K5@@RCU$RC$P$JU$$UB

(15)

•  In the following, we will use variants of normal relational algebra

– Attribute are referenced by their number instead by their name, e.g. \E or \]B

– When using references to relations in binary operations, e.g. joins, we may also refer to them as

^:7Y@_$or ^63[2@_B

• RC$F$JU$P^:7Y@_`\a(^63[2@_`\E$bB

– We distinguish two types of relational algebraB

• C7:c:[d excluding the set minus operator

• C7:c:[$$including the set minus operator

!"#$%&'(&)*+,&'-./,0&1,-+"'-2&'3456&-2+0+7+,&,-8--9#%:);<%#-*+%=&-8->?<%<@@-9<%%&-8-A:A.-8-;B-*C+3",4?$&<(- LG-

6.1 Relational Algebra

(16)

•  Examples:

– Name of hero with id=1

• K\eI\E(E$RfU$$B

– All powers of hero with id=2

• K\g$RRI\E(e$fUPRf`\E(fH`\EU$fH$PR^:7Y@_`\e(^63[2@_`\EU$H$U$$

!"#$%&'(&)*+,&'-./,0&1,-+"'-2&'3456&-2+0+7+,&,-8--9#%:);<%#-*+%=&-8->?<%<@@-9<%%&-8-A:A.-8-;B-*C+3",4?$&<(- LH-

6.1 Relational Algebra

!"# $%&'#

L- >?#&"<N-

D- >C#:&,,#C-O-

(!"# )!"#

L- D-

D- L-

D- E-

L- E-

L- F-

!"# )*+',#

L- >,<#"<4-P+"<@3%+5#"- D- ;&%&=<"&,<,-

E- ;&%&@+0?/-

F- Q#C4&-Q<&%'-R&"&C+5#"-

-#

.#

-.#

(17)

•  In the following, we will implement a simple fixpoint iteration with relational algebra

– We will only consider safe Datalogneg programs, i.e. negative literals and head variables are positively grounded

•  Given is a safe Datalogneg program ! and a relational database

– Task:

• Store extensional DB in tables

• Encode intensional DB in a customized relational algebra elementary production operator

!"#$%&'(&)*+,&'-./,0&1,-+"'-2&'3456&-2+0+7+,&,-8--9#%:);<%#-*+%=&-8->?<%<@@-9<%%&-8-A:A.-8-;B-*C+3",4?$&<(- LI-

6.1 Implementation

(18)

– Each predicate symbol 6E#$h#$68 of the extensional database is assigned to a relation CE#$h#$C8B

•  i.e. those predicates provide the facts, each fact has its own relation

– Each predicate symbol iE#$h#$i8 of the intensional database is assigned to a relation jE#$h#$j8

•  i.e. those predicates are defined by rules

– For ease of use, we restrict each predicate to be defined either in the intensional or the extensional DB

•  i.e. each predicate which was used to define facts is not allowed to occur in the head of a rule

•  This does not limit the expressiveness of Datalog programs

!"#$%&'(&)*+,&'-./,0&1,-+"'-2&'3456&-2+0+7+,&,-8--9#%:);<%#-*+%=&-8->?<%<@@-9<%%&-8-A:A.-8-;B-*C+3",4?$&<(- LJ-

6.1 Implementation

(19)

– The predicate symbols k#l#$m#$n#$(#$o$are assigned to the hypothetical relations p'($.X+#$q+#$X+r#$q+r#$rj#$

srj/ B

• Those relations are of infinite size and thus, of course, not stored the RDB

• We will see later that they can be removed

!"#$%&'(&)*+,&'-./,0&1,-+"'-2&'3456&-2+0+7+,&,-8--9#%:);<%#-*+%=&-8->?<%<@@-9<%%&-8-A:A.-8-;B-*C+3",4?$&<(- LK-

6.1 Implementation

(20)

•  Just a short consideration:

How could we map relational algebra to Datalog?

– I\e(g$CBB tB CRu#$gU`B

– K\E$CB B t$B C&RuU$'v$CRu#$wU`B

– CFJB B tB CJRb#$u#$w#$xU$'v$CRb#$uU#$JRw#$xU`B – C$P^:7Y@_`\E(^63[2@_`\e$J$B $ty

B B CJRb#$u#$w#$xU$'v$CRb#$uU#$JRw#$xU#$b(x`B – C$V^:7Y@_`\E(^63[2@_`\e$J$B $ty

B B CJRb#$uU$'v$CRb#$uU#$JRw#$xU#$b(x`B – C$M$JB B tB C&Ru#$wU$'v$CRu#wU`y

B B B B C&Ru#$wU$'v$JRu#wU`B

– C$z$JB B tB C&Ru#$wU$'v$CRu#$wU#${JRu#$wU`B

!"#$%&'(&)*+,&'-./,0&1,-+"'-2&'3456&-2+0+7+,&,-8--9#%:);<%#-*+%=&-8->?<%<@@-9<%%&-8-A:A.-8-;B-*C+3",4?$&<(- DM-

6.1 Implementation

(21)

•  Now, how can we translate from Datalog to relational algebra

– Some pre-processing is necessary!

•  Transform all rules of the intensional DB such that the head contains only variables

– This can be achieved by replacing any head constant with a new variable and adding a literal binding that variable to the old value

– e.g. iRu#$5#$|U$'v$XE#$h#$$X9$$$

t iRu#$w#$xU$'v$XE#$h#$$X9#$w(5#$x(|$B

!"#$%&'(&)*+,&'-./,0&1,-+"'-2&'3456&-2+0+7+,&,-8--9#%:);<%#-*+%=&-8->?<%<@@-9<%%&-8-A:A.-8-;B-*C+3",4?$&<(- DL-

6.1 Implementation

(22)

•  Change the order of the variables such that their safety is ensured by previous body literals

– A literal is unsafe, if it is potentially infinite

– e.g., CRu#wU$'v$u(w#$}RuU#$iRwU$is not in correct order as the safety u(w is not ensured by previous literals

• There are infinite possibilities for X being equal to Y

– t CRu#wU$'v$}RuU#$iRwU#$u$($wB

• is in correct order as }RuU and iRwU limit the possible values of u and wB

– We also sort positive literals before negative ones

• …for positive grounding

!"#$%&'(&)*+,&'-./,0&1,-+"'-2&'3456&-2+0+7+,&,-8--9#%:);<%#-*+%=&-8->?<%<@@-9<%%&-8-A:A.-8-;B-*C+3",4?$&<(- DD-

6.1 Implementation

(23)

•  Each rule $C$'v$XE#$h#$X9$ is now transformed to relational algebra as follows

–  For each literal XE#$h#$X9, the respective atomic component c3$S$}3R@E#$h#$@8U$is transformed into a relational expression r3B

•  r3$S$ITRH3U$with H3 being the relation corresponding to }3B

•  The selection criterion θ is a conjunction of conditions defined as follows:

For each @3, a condition is added

–  \L$($@L$ if$@L is a constant symbol

–  \L$($\~$$if$@L and @~ are the same variables

!"#$%&'(&)*+,&'-./,0&1,-+"'-2&'3456&-2+0+7+,&,-8--9#%:);<%#-*+%=&-8->?<%<@@-9<%%&-8-A:A.-8-;B-*C+3",4?$&<(- DE-

6.1 Implementation

(24)

– Example:

• }Ru#$eU$'v$iRu#$u#$w#$eU#$6Ru#$EU$$t$RC7}:547$4;9>@59@>U$y }Ru#$xU$'v$iRu#$u#$w#$eU#$6Ru#$EU#$x$($e$t$$R+659>:5@7$@;$Cv c:[UB

B rE$'($IR\E$($\e$$\Ä$(eU$jB B re$'($IR\e$(EU$CB

B ra$'($IR\e$(eU$rjB

•  After treating the single literals, we will compose the body expression Å$$from left to right

–  Initialize the temporary expression ÅE'($rEB

!"#$%&'(&)*+,&'-./,0&1,-+"'-2&'3456&-2+0+7+,&,-8--9#%:);<%#-*+%=&-8->?<%<@@-9<%%&-8-A:A.-8-;B-*C+3",4?$&<(- DF-

6.1 Implementation

(25)

•  Depending on the variables in the literals, the following expressions Åe - Å~$$are generated differently:B

–  Å3'($Å3vE$F$r3$ iff X3 does not contain any variables of the previous body literals,

i.e. Ç56>RX3U$)$Ç56>R.XE$#h#$X3vE/U$($ÉB

•  CRu#$w#$xU$'v$iRu#$eU#$6RwU#$x(a$$$t$$$y

rE$'($ÅE$($IR\e$(eU$j$Ñ$$re$($CÑ$$$ra$($IR\E$(aU$rj$$$$$$ty Åe$'($RIR\e$(eU$jU$F$CÑ$$$$Åa$'($RIR\e$(eU$jU$F$C$F$IR\E$(aU$ rjB

•  In short: Conjunctions of unrelated literals result to computing the Cartesian ProductB

!"#$%&'(&)*+,&'-./,0&1,-+"'-2&'3456&-2+0+7+,&,-8--9#%:);<%#-*+%=&-8->?<%<@@-9<%%&-8-A:A.-8-;B-*C+3",4?$&<(- DG-

6.1 Implementation

(26)

–  Å3'($Å3vE$PT$$r3$ iff X3 is positive and shares variables with previous body literals

•  T$forces the columns representing the shared variables to be equal B

•  CRu#$wU$'v$iRa#$uU#$6RwU$#$ukw$t$$$y

rE$'($ÅE$($IR\E(aUj$Ñ$re($CÑ$ra$($X+Ñ$t$y Åe$'($IR\E(aUj$F$CÑ$$y

Åa$'($RIR\E(aUj$F$CU$PR^:7Y@_`\e$($^63[2@_`\E$$^:7Y@_`\a(^63[2@_`\e$U$X+Ñ$B

•  In short: Conjunctions of related positive literals result in generating a join, using the related variables as join

conditionB

!"#$%&'(&)*+,&'-./,0&1,-+"'-2&'3456&-2+0+7+,&,-8--9#%:);<%#-*+%=&-8->?<%<@@-9<%%&-8-A:A.-8-;B-*C+3",4?$&<(- DH-

6.1 Implementation

(27)

–  Å3'($Å3vE$z$RÅ3vE$VT$$r3$U$iff X3 is negative and shares variables with previous body literals.

•  T$forces the columns representing the shared variables to be equal B

•  CRuU$'v$iRuU#${6RuU$$$t$$$y rE$'($ÅE$($j$#$$$$re$($C$$$$ty Åe$'($j$z$Rj$VRj`\E$($C`\EU$C$UB

•  In short: Conjunctions of related negative literals result to generating a set-minus, removing those tuples which are related to the negative literalB

!"#$%&'(&)*+,&'-./,0&1,-+"'-2&'3456&-2+0+7+,&,-8--9#%:);<%#-*+%=&-8->?<%<@@-9<%%&-8-A:A.-8-;B-*C+3",4?$&<(- DI-

6.1 Implementation

(28)

•  Now,$we still have the infinite hypothetical relations p'($.X+#$q+#$X+r#$q+r#$rj#$srj/ in our expressions

– Each join r$PT$p3$$or Cartesian product$$r$F$p3 for any “normal” expression r and p3Öp$is replaced by a suitable expression of the form KRI$RrUU, e.g.B

• rPr`\E(X+`\Er`\e(X+`\e$X+$$$t$$I\Ek\e$RrUB

– This expression was created by, e.g.: rRu#$w#$h`U#$ukw

• rPr`\E(rj`\Erj$$$t$$K5@@63|?@7>ÜYRrU#$rj`\ERrUB

– This expression was created by, e.g.: rRu#$h`U#$$u(w

!"#$%&'(&)*+,&'-./,0&1,-+"'-2&'3456&-2+0+7+,&,-8--9#%:);<%#-*+%=&-8->?<%<@@-9<%%&-8-A:A.-8-;B-*C+3",4?$&<(- DJ-

6.1 Implementation

(29)

• rFRI\e(4$rjU$$$t$$K5@@63|?@7>ÜYRrU#$4$RrUB

– This expression was created by, e.g.: rRh`U#$$u(4B

•  Examples:

– CRu#$wU$'v$iRa#$uU#$6RwU$#$ukw$tB

– Å$'($RIR\E(aUj$F$CUPR^:7Y@_`\e(^63[2@_`\E^:7Y@_`\a(^63[2@_`\e$U$

X+B

– Å$($I\ek\a$RIR\E(aUj$F$CUB

– By algebraic optimization, this will later result to

• Å$($RIR\E(aUjU$P\ek\a$CB

!"#$%&'(&)*+,&'-./,0&1,-+"'-2&'3456&-2+0+7+,&,-8--9#%:);<%#-*+%=&-8->?<%<@@-9<%%&-8-A:A.-8-;B-*C+3",4?$&<(- DK-

6.1 Implementation

(30)

•  Finally, the whole rule G$S$C$'v$XE#$h#$X9 is now

transformed to the expression 7Ç5:RGU$'($K275<RCU RÅU$$B

– i.e. to evaluate the rule G, we project all variables appearing in its head from its body expression ÅB

•  For evaluating one iteration step for given intensional predicate i3, all related

rules have to be united

– 7Ç5:Ri3U$'($MG$Ö$<7YRi3U$RGUB

!"#$%&'(&)*+,&'-./,0&1,-+"'-2&'3456&-2+0+7+,&,-8--9#%:);<%#-*+%=&-8->?<%<@@-9<%%&-8-A:A.-8-;B-*C+3",4?$&<(- EM-

6.1 Implementation

(31)

•  Now, the elementary production rule +! corresponds to evaluating all 7Ç5:Ri3UB

•  Queries á$S$}R@$S$}R@EE#$h#$@#$h#$@99U$can be transformed to U$can be transformed to relational algebra likewise

•  Also note that D5@5:;[ can be translated to

C7:c:[d while D5@5:;[97[ has to be translated to full C7:c:[B

– Negation requires the highly inefficient setminus operator B

!"#$%&'(&)*+,&'-./,0&1,-+"'-2&'3456&-2+0+7+,&,-8--9#%:);<%#-*+%=&-8->?<%<@@-9<%%&-8-A:A.-8-;B-*C+3",4?$&<(- EL-

6.1 Implementation

(32)

•  For actually performing the fixpoint iteration, the following is performed

1.  Create tables for each intensional predicate i3

2.  Execute the elementary production +! (i.e. run (i.e. run 7Ç5:Ri3U for each intensional predicate) and store results temporarily

a.  If result tables are of the same size as the predicate tables, the fixpoint has been reached and we can continue with step 3 b.  Replace content of intensional predicate tables with

respective temporary tables c.  Continue with step 2

3.  Run the actual query on the tables to obtain final result

!"#$%&'(&)*+,&'-./,0&1,-+"'-2&'3456&-2+0+7+,&,-8--9#%:);<%#-*+%=&-8->?<%<@@-9<%%&-8-A:A.-8-;B-*C+3",4?$&<(- ED-

6.1 Implementation

(33)

•  Example

– 7<[7RE#$eU`$7<[7RE#$aU`$7<[7Re#$ÄU`$y 7<[7Ra#$ÄU`$7<[7RÄ#$gU`B

– }5@2Ru#$wU$'v$7<[7Ru#$wU`$$B

– }5@2Ru#$wU$'v$7<[7Ru#$xU#$}5@2$Rx#$wU`B – }5@2Re#$uUàB

– The facts all go into the extensional table r<[7#$y an intensional table H5@2 is createdB

!"#$%&'(&)*+,&'-./,0&1,-+"'-2&'3456&-2+0+7+,&,-8--9#%:);<%#-*+%=&-8->?<%<@@-9<%%&-8-A:A.-8-;B-*C+3",4?$&<(- EE-

6.1 Implementation

#1 #2

1 2

1 3

2 4

3 4

4 5

edge

#1 #2

path

(34)

– }5@2Ru#$wU$'v$7<[7Ru#$wU`$$B

• Å$'($K\E#$\e$I@6?7$r<[7$y

$$$$($r<[7B

– }5@2Ru#$wU$'v$7<[7Ru#$xU#$}5@2$Rx#$wU`B

• Å$'($K\E#$\e$RI@6?7$r<[7$P^:7Y@_`\e(^63[2@_`\E$I@6?7$H5@2U$$y

$$$$($r<[7$P^:7Y@_`\e(^63[2@_`\EH5@2B

– }5@2Re#$uUà$($}5@2Rw#$uU#$w(eB

• Å$'($I\E(e$H5@2B

– 7Ç5:R}5@2U$'($r<[7$M$$r<[7$P^:7Y@_`\e(^63[2@_`\EH5@2B

!"#$%&'(&)*+,&'-./,0&1,-+"'-2&'3456&-2+0+7+,&,-8--9#%:);<%#-*+%=&-8->?<%<@@-9<%%&-8-A:A.-8-;B-*C+3",4?$&<(- EF-

6.1 Implementation

(35)

•  Execute elementary production on current tables

– 7Ç5:R}5@2U$'($r<[7$M$$r<[7$P^:7Y@_`\e(^63[2@_`\EH5@2B

!"#$%&'(&)*+,&'-./,0&1,-+"'-2&'3456&-2+0+7+,&,-8--9#%:);<%#-*+%=&-8->?<%<@@-9<%%&-8-A:A.-8-;B-*C+3",4?$&<(- EG-

6.1 Implementation

#1 #2

1 2

1 3

2 4

3 4

4 5

edge

#1 #2

path

#1 #2

1 2

1 3

2 4

3 4

4 5

temppath

(36)

•  Replace path table and repeat

– 7Ç5:R}5@2U$'($r<[7$M$$r<[7$P^:7Y@_`\e(^63[2@_`\EH5@2B

!"#$%&'(&)*+,&'-./,0&1,-+"'-2&'3456&-2+0+7+,&,-8--9#%:);<%#-*+%=&-8->?<%<@@-9<%%&-8-A:A.-8-;B-*C+3",4?$&<(- EH-

6.1 Implementation

#1 #2

1 2

1 3

2 4

3 4

4 5

edge

path

#1 #2

1 2

1 3

2 4

3 4

4 5

1 4

2 5

3 5

temppath

#1 #2

1 2

1 3

2 4

3 4

4 5

(37)

•  Replace path table and repeat

– 7Ç5:R}5@2U$'($r<[7$M$$r<[7$P^:7Y@_`\e(^63[2@_`\EH5@2B

!"#$%&'(&)*+,&'-./,0&1,-+"'-2&'3456&-2+0+7+,&,-8--9#%:);<%#-*+%=&-8->?<%<@@-9<%%&-8-A:A.-8-;B-*C+3",4?$&<(- EI-

6.1 Implementation

#1 #2

1 2

1 3

2 4

3 4

4 5

edge

path

#1 #2

1 2

1 3

2 4

3 4

4 5

1 4

2 5

3 5

temppath #1 #2

1 2

1 3

2 4

3 4

4 5

1 4

2 5

3 5

1 5

(38)

•  Replace path table and repeat

– No change – fixpoint is reachedB

!"#$%&'(&)*+,&'-./,0&1,-+"'-2&'3456&-2+0+7+,&,-8--9#%:);<%#-*+%=&-8->?<%<@@-9<%%&-8-A:A.-8-;B-*C+3",4?$&<(- EJ-

6.1 Implementation

#1 #2

1 2

1 3

2 4

3 4

4 5

edge

path

temppath #1 #2

1 2

1 3

2 4

3 4

4 5

1 4

2 5

3 5

1 5

#1 #2

1 2

1 3

2 4

3 4

4 5

1 4

2 5

3 5

1 5

(39)

•  Run query to obtain final resultB

– I\E(e$H5@2

!"#$%&'(&)*+,&'-./,0&1,-+"'-2&'3456&-2+0+7+,&,-8--9#%:);<%#-*+%=&-8->?<%<@@-9<%%&-8-A:A.-8-;B-*C+3",4?$&<(- EK-

6.1 Implementation

#1 #2

1 2

1 3

2 4

3 4

4 5

edge

path

#1 #2

1 2

1 3

2 4

3 4

4 5

1 4

2 5

3 5

#1 #2

2 4

2 5

result

(40)

•  Given an extensional database and a query,

there are two general strategies for evaluation

– Bottom-Up: Start with given facts in the EDB and generate all new facts. Then discard those which don’t match the query

• e.g. fixpoint iteration

• Performs well in restricted and smaller scenarios

• “forward-chaining”

!"#$%&'(&)*+,&'-./,0&1,-+"'-2&'3456&-2+0+7+,&,-8--9#%:);<%#-*+%=&-8->?<%<@@-9<%%&-8-A:A.-8-;B-*C+3",4?$&<(- FM-

6.2 Strategies

(41)

– Top-Down: Start with query and generate proofs down to the EDB facts

• Most logical programming environments choose this approach

– e.g. SDL resolution

• Performs well in more complex scenarios where bottom-up becomes prohibitive

• “backward-chaining”

!"#$%&'(&)*+,&'-./,0&1,-+"'-2&'3456&-2+0+7+,&,-8--9#%:);<%#-*+%=&-8->?<%<@@-9<%%&-8-A:A.-8-;B-*C+3",4?$&<(- FL-

6.2 Strategies

(42)

•  Scenario

– All facts are contained in extensional database rDâB – All rules are contained in the Datalog program !

•  No facts in !

– Given is a goal query á$S$}R@E#$h#$@9U$àB

•  Bottom-up problems

– Generate all deducible facts of ! $rDâB

– When finished, throw away all facts not matching the query pattern. Especially:

•  All those facts whose predicate is not }B

•  All those facts whose predicate is }, but are more general than the query

!"#$%&'(&)*+,&'-./,0&1,-+"'-2&'3456&-2+0+7+,&,-8--9#%:);<%#-*+%=&-8->?<%<@@-9<%%&-8-A:A.-8-;B-*C+3",4?$&<(- FD-

6.2 Strategies

(43)

– Example with constants:

• á$S$}R5#$u#$|U$à

• Why should we generate all facts of } and later discard those which are not subsumed by á?

– In the next lecture, we will explore bottom-up

approaches which avoid generating unnecessary facts

• Magic Sets

• Counting techniques

•  Today, we start with a simple top-down approach

!"#$%&'(&)*+,&'-./,0&1,-+"'-2&'3456&-2+0+7+,&,-8--9#%:);<%#-*+%=&-8->?<%<@@-9<%%&-8-A:A.-8-;B-*C+3",4?$&<(- FE-

6.2 Strategies

(44)

•  Basic Idea:

– Start with the query á$S$}R@$S$}R@EE#$h#$@#$h#$@99U$à U$à

– Iteratively generate all proof trees ending with a ground instance of á and starting with known facts

• Iterate over tree depth

• As a helper data structure create all possible search trees of current depth

• Transform search trees to all possible proof trees

• Stop if no additional search trees / proof trees can be constructed

!"#$%&'(&)*+,&'-./,0&1,-+"'-2&'3456&-2+0+7+,&,-8--9#%:);<%#-*+%=&-8->?<%<@@-9<%%&-8-A:A.-8-;B-*C+3",4?$&<(- FF-

6.2 Top-Down Evaluation

(45)

– A search tree is a generic proof tree which is still parameterized to some extent

• Proof trees can be generated from search trees

• Leaf nodes are called subgoal nodes

• Root node is called goal node

!"#$%&'(&)*+,&'-./,0&1,-+"'-2&'3456&-2+0+7+,&,-8--9#%:);<%#-*+%=&-8->?<%<@@-9<%%&-8-A:A.-8-;B-*C+3",4?$&<(- FG-

6.2 Top-Down Evaluation

(46)

•  Example:

– 7RE#$eU`$7RE#$aU`$7Re#$ÄU`$7Ra#$ÄU`$7RÄ#$gU`$7Rg#$äU`$7 Rg#ãUB

– }Ru#$wU$'v$7Ru#$wU`$$B B Rule 1B – }Ru#$wU$'v$7Ru#$xU#$}Rx#$wU`$$B Rule 2 – á$S$}Re#$uUB

!"#$%&'(&)*+,&'-./,0&1,-+"'-2&'3456&-2+0+7+,&,-8--9#%:);<%#-*+%=&-8->?<%<@@-9<%%&-8-A:A.-8-;B-*C+3",4?$&<(- FH-

6.2 Top-Down Evaluation

L- D-

E- F- G-

I- H-

7-

7- 7-

7-

7- 7- 7-

(47)

•  Proof Trees of depth 0

– Which facts are ground instances of á ?

– In our example, this is not the case for any fact…

•  Search Trees of depth 1

– Find all rules C$S$â$'v$cE#h#c~$$such that á and â are unifiable

•  Unifiable: There are substitutions such that â matches á

– For each rule C, construct a search tree with á as root

•  Attach a rule node to á containing CB

•  Attach k subgoal nodes representing cE#h#c~$in its unified form

!"#$%&'(&)*+,&'-./,0&1,-+"'-2&'3456&-2+0+7+,&,-8--9#%:);<%#-*+%=&-8->?<%<@@-9<%%&-8-A:A.-8-;B-*C+3",4?$&<(- FI-

6.1 Top-Down Evaluation

(48)

•  Search Trees of depth 1

– Rule 1: }Ru#$wU$'v$7Ru#$wU`B

– Rule 2: }Ru#$wU$'v$7Ru#$xU#$}Rx#$wU`$B

!"#$%&'(&)*+,&'-./,0&1,-+"'-2&'3456&-2+0+7+,&,-8--9#%:);<%#-*+%=&-8->?<%<@@-9<%%&-8-A:A.-8-;B-*C+3",4?$&<(- FJ-

6.2 Top-Down Evaluation

L- D-

E- F- G-

I- H-

7-

7- 7-

7-

7- 7- 7-

á$S$}Re#$uU--

}Rw#$uU$'v$7Rw#$uU`- 7Re#$uU--

á$S$}Re#$uU--

}Rw#$uU$'v$7Rw#$xU#$}Rx#$uU`- 7Re#$xU-- }Rx#$uU-- +EB

+eB

(#+%-

,37(#+%- C3%&-

(49)

•  To generate proof trees from a given search tree, we have to find a substitution å such that for each goal node with clause G, åRGU$Ö$!$M$rDâ$$$B

– By applying this substitution to the whole tree, we obtain a proof tree

– The root node is a result of the query

•  Example:

– Find a substitution for +E (+e does not have one)

!"#$%&'(&)*+,&'-./,0&1,-+"'-2&'3456&-2+0+7+,&,-8--9#%:);<%#-*+%=&-8->?<%<@@-9<%%&-8-A:A.-8-;B-*C+3",4?$&<(- FK-

6.2 Top-Down Evaluation

á$S$}Re#$uU--

}Rw#$uU$'v$7Rw#$uU`- 7Re#$uU--

å$'($.u$($Ä/-

}Re#$ÄU--

}Rw#$uU$'v$7Rw#$uU`- 7Re#$ÄU-- +E- HEB

(50)

•  For any 9lE, all existing search trees of depth n-1 are expanded by treating any subgoal node as a

goal node

– Thus, new rule nodes and subgoals are appended

•  Example: Expanding +e to +e#e$$and$+e#EB

!"#$%&'(&)*+,&'-./,0&1,-+"'-2&'3456&-2+0+7+,&,-8--9#%:);<%#-*+%=&-8->?<%<@@-9<%%&-8-A:A.-8-;B-*C+3",4?$&<(- GM-

6.2 Top-Down Evaluation

á$S$}Re#$uU--

}Rw#$uU$'v$7Rw#$xU#$}Rx#$uU`- 7Re#$xU-- }Rx#$uU-- +e#eB

7Rx#$bU--

}Rx#$uU$'v$7Rx#$bU#$}Rb#$uU`- }Rb#$uU--

(51)

•  +e#E and some substitutions å$

!"#$%&'(&)*+,&'-./,0&1,-+"'-2&'3456&-2+0+7+,&,-8--9#%:);<%#-*+%=&-8->?<%<@@-9<%%&-8-A:A.-8-;B-*C+3",4?$&<(- GL-

6.2 Top-Down Evaluation

á$S$}Re#$uU--

}Rw#$uU$'v$7Rw#$xU#$}Rx#$uU`- 7Re#$xU-- }Rx#$uU-- +e#EB

}Rx#$uU$'v$7Rx#$uU`- 7Rx#$uU-- å$'($.x$($Ä$u$($g/-

L- D-

E- F- G-

I- H-

7-

7- 7-

7-

7- 7- 7-

}Re#$gU--

}Rw#$uU$'v$7Rw#$xU#$}Rx#$uU`- 7Re#$ÄU-- He#EB

}RÄ#$gU--

}Rx#$uU$'v$7Rx#$uU`- 7RÄ#$gU--

(52)

•  +e#e#E and substitutions åE and åe

!"#$%&'(&)*+,&'-./,0&1,-+"'-2&'3456&-2+0+7+,&,-8--9#%:);<%#-*+%=&-8->?<%<@@-9<%%&-8-A:A.-8-;B-*C+3",4?$&<(- GD-

6.2 Top-Down Evaluation

á$S$}Re#$uU--

}Rw#$uU$'v$7Rw#$xU#$}Rx#$uU`- 7Re#$xU-- }Rx#$uU-- +e#e#EB

7Rx#$bU--

}Rx#$uU$'v$7Rx#$bU#$}Rb#$uU`- }Rb#$uU--

}Rb#$uU$'v$7Rb#$uU`- 7Rb#$uU-- åE$'($.x$($Ä#$b(g#$u$($ä/-

L- D-

E- F- G-

I- H-

7-

7- 7-

7-

7- 7- 7-

åe$'($.x$($Ä#$b(g#$u$($ã/-

}Re#$äU-- }Re#$ãU--

S- S-

He#e#EREUB

He#e#EReUB

(53)

•  Please note:

– By applying this type of backward-chaining, not all possible proof trees for the query can be generated – Only proof trees having ground facts in all leaf

nodes are possible

• Those trees are called full proof trees

• However, for each proof tree matching the query, there is also a respective full proof tree

!"#$%&'(&)*+,&'-./,0&1,-+"'-2&'3456&-2+0+7+,&,-8--9#%:);<%#-*+%=&-8->?<%<@@-9<%%&-8-A:A.-8-;B-*C+3",4?$&<(- GE-

6.2 Top-Down Evaluation

(54)

•  We can see that the backward chaining proof trees can reach arbitrary depth

– The backward chaining method is sound and complete – But consider the iterated use of rule 2

– The tree is of infinite depth

!"#$%&'(&)*+,&'-./,0&1,-+"'-2&'3456&-2+0+7+,&,-8--9#%:);<%#-*+%=&-8->?<%<@@-9<%%&-8-A:A.-8-;B-*C+3",4?$&<(- GF-

6.2 Top-Down Evaluation

á$S$}Re#$uU--

}Rw#$uU$'v$7Rw#$xU#$}Rx#$uU`- 7Re#$xU-- }Rx#$uU--

7Rx#$bU--

}Rx#$uU$'v$7Rx#$bU#$}Rb#$uU`- }Rb#$uU--

7Rb#$çU--

}Rb#$uU$'v$7Rb#$çU#$}Rç#$uU`- }Rç#$uU--

/#

(55)

•  When do we stop building trees?

– A-priory, we have no idea which recursion depth we will need

• ?path (a,X)

• Obviously, the more nodes we have, the deeper the recursion depth will be

– Still the number of sensible combinations of rDâ facts and predicates in !$is limited since

• Both the database and the datalog program are finite

• We can only substitute any constant symbol from some fact in any predicate symbol at any position of a variable

!"#$%&'(&)*+,&'-./,0&1,-+"'-2&'3456&-2+0+7+,&,-8--9#%:);<%#-*+%=&-8->?<%<@@-9<%%&-8-A:A.-8-;B-*C+3",4?$&<(- GG-

6.2 Top-Down Evaluation

+- 7- 4- '- &- /#

(56)

•  Theorem: Backwards chaining remains complete, if the search depth is limited to

\}67<345@7>$é$\4;9>@59@>85èR56[>U$B

– #predicates is the number of predicate symbols used – #constants is the number of constant symbols used – max(args) is the maximum number of arguments, i.e.

the arity, of all predicate symbols

– With this theorem, we can stop the backward chaining process after the last sensible production

!"#$%&'(&)*+,&'-./,0&1,-+"'-2&'3456&-2+0+7+,&,-8--9#%:);<%#-*+%=&-8->?<%<@@-9<%%&-8-A:A.-8-;B-*C+3",4?$&<(- GH-

6.2 Top-Down Evaluation

(57)

•  Proof sketch:

– \}67<345@7>$é$\4;9>@59@>85èR56[>U is an upper limit for the number of distinct ground facts derived from

! and rDâ$(purely syntactical)B and rDâ$(purely syntactical)B

– We can limit the production process to full proof trees, where at least one new fact is added in each depth level (otherwise the new level is useless…)

– Since we only have a limited number of ground facts, also the number of levels has to be limited…

!"#$%&'(&)*+,&'-./,0&1,-+"'-2&'3456&-2+0+7+,&,-8--9#%:);<%#-*+%=&-8->?<%<@@-9<%%&-8-A:A.-8-;B-*C+3",4?$&<(- GI-

6.2 Top-Down Evaluation

(58)

•  Consider an example: a finite number of facts .}5@2R5#|U#$}5@2R|#4U#h#$}5@2R8#9U/$and a rule }5@2Ru#wU$'v$}5@2Ru#xU#$}5@2Rx#wU`B

– Worst case

• Longest possible deduction chain is path(a,n) of length n-1

– The least determined query is ?path(X,Y), i.e. all paths

• There are 9 constant symbols and a single predicate symbol

• The constants can occur in two places, i.e. 85èR56[>U$($e$B

• That means the maximum number of deducible facts is 9eB

!"#$%&'(&)*+,&'-./,0&1,-+"'-2&'3456&-2+0+7+,&,-8--9#%:);<%#-*+%=&-8->?<%<@@-9<%%&-8-A:A.-8-;B-*C+3",4?$&<(- GJ-

6.2 Top-Down Evaluation

+- 7- 4- '- &- /# "-

(59)

•  Many backward-chaining algorithms rely on the concepts of search trees and proof trees

•  However, the generation strategy may differ

– In the previous example, the search trees have been generated one by one according to their depth

• depth 0, depth 1, depth 2, …

• This is called level saturation strategy and resembles an breadth-first approach

– Alternatively, depth-first approaches are possible

• Rule saturation strategy

!"#$%&'(&)*+,&'-./,0&1,-+"'-2&'3456&-2+0+7+,&,-8--9#%:);<%#-*+%=&-8->?<%<@@-9<%%&-8-A:A.-8-;B-*C+3",4?$&<(- GK-

6.2 Top-Down Evaluation

(60)

•  The previously presented top-down algorithm is extremely naïve

– It generates all possible search and proof trees up to the worst-case depth which are somehow

related to the query

• Performance is far from optimal

– In case of less restricted scenarios (e.g. not only Horn clauses or infinite universes), this approach is

inevitably doomed to failure

!"#$%&'(&)*+,&'-./,0&1,-+"'-2&'3456&-2+0+7+,&,-8--9#%:);<%#-*+%=&-8->?<%<@@-9<%%&-8-A:A.-8-;B-*C+3",4?$&<(- HM-

6.2 Resolution

(61)

•  From the field of “real” logics, we can borrow the concept of resolution

– A technique for refutation theorem proofing

• “Reductio ad absurdum”

– Mainly explored around 1965 by J.A. Robinson – Established itself as THE standard

technique for logical

symbolic computation

!"#$%&'(&)*+,&'-./,0&1,-+"'-2&'3456&-2+0+7+,&,-8--9#%:);<%#-*+%=&-8->?<%<@@-9<%%&-8-A:A.-8-;B-*C+3",4?$&<(- HL-

6.2 Resolution

(62)

•  There are several variants of resolution

– Best known in the field of logical programming is the class of SDL resolution algorithms

• “Linear Resolution with Selection Function for Definite Clauses”

• Most popular among these are the general algorithms employed in languages like Prolog or Lisp

• However, in the next lecture we shall study a simplified SDL resolution algorithm suitable for Datalog

– Be curious – that will be fun!

!"#$%&'(&)*+,&'-./,0&1,-+"'-2&'3456&-2+0+7+,&,-8--9#%:);<%#-*+%=&-8->?<%<@@-9<%%&-8-A:A.-8-;B-*C+3",4?$&<(- HD-

6.2 Resolution

(63)

•  The research and developments in the area of deductive databases successfully provided the ability to perform recursive queries

– And with these, some limited reasoning capabilities

•  However, most applications have been tailored to work with traditional SQL based databases

– When using SQL2 (SQL-92), recursive queries cannot be facilitated without external control and huge

performance penalties

– SQL2 is still the default for most today’s databases

!"#$%&'(&)*+,&'-./,0&1,-+"'-2&'3456&-2+0+7+,&,-8--9#%:);<%#-*+%=&-8->?<%<@@-9<%%&-8-A:A.-8-;B-*C+3",4?$&<(- HE-

6.3 Recursive SQL

(64)

•  SQL3 (SQL-99) is a later SQL standard which mainly aims at widening the scope of SQL

– Contains many features which extend beyond the scope of traditional RDBs

•  Binary Large Objects

•  Limited support for soft constraints

•  Updatable views

•  Active databases

•  Object orientation

•  UDF / UDT / UDM

•  References

•  Recursive Temporary Tables

!"#$%&'(&)*+,&'-./,0&1,-+"'-2&'3456&-2+0+7+,&,-8--9#%:);<%#-*+%=&-8->?<%<@@-9<%%&-8-A:A.-8-;B-*C+3",4?$&<(- HF-

6.3 Recursive SQL

(65)

•  Recursive temporary tables adopt many concepts of deductive databases into the SQL world

– Most vendors developed proprietary implementations of recursive tables

• Nobody cared for the standard…

• Syntax may thus differ

– In DB2 known as Common Table Expressions

!"#$%&'(&)*+,&'-./,0&1,-+"'-2&'3456&-2+0+7+,&,-8--9#%:);<%#-*+%=&-8->?<%<@@-9<%%&-8-A:A.-8-;B-*C+3",4?$&<(- HG-

6.3 Recursive SQL

Referenzen

ÄHNLICHE DOKUMENTE

• The basic building blocks of description logics are concepts, roles and individuals.. – Like with frame systems, think of concepts like OO classes

13.1 Generating ontologies 13.2 Collective Intelligence 13.3 Folksonomies.. 13

topic of question is Godzilla, text contains named entity Godzilla. • Sentence proximity between passage

•  In addition to ‘normal’ data models, ontologies offer reasoning capabilities. –  Allow to classify

–  Base building blocks are formulas, i.e.. •  When using a variable term in a statement in natural language, you may assign some value. –  ‘something

– Thus, Herbrand interpretation can be defined by listing all atoms from the base which evaluate to true. •  A Herbrand interpretation can identified with a subset of the

•  Proof Sketch: “Show that the consistency of Peano arithmetic directly follows from the Goodstein theorem. If Goodstein was provable within Peano, the consistency of Peano

•  For more expressive logic languages (like Prolog), deductive systems are used to find the truth values for the elements of the Herbrand universe.