Knowledge-Based Systems

(1)

Wolf-Tilo Balke Christoph Lofi

Institut für Informationssysteme

Technische Universität Braunschweig http://www.ifis.cs.tu-bs.de

Knowledge-Based Systems

and Deductive Databases

(2)

9.1 Uncertain Knowledge 9.2 Probabilistic Application 9.3 Belief Networks

9. Deduction with Uncertainty

(3)

• Transform the program to relational algebra

– Easy… just use the rules in the lecture – Eval(R) = S

– Eval(P) = Q ∖ (Q ⋉(Q.#1 = R.#1 ⋀ Q.#2 = R.#2) R ) ⋃ Q ∖ (Q ⋉(Q.#1 = S.#1 ⋀ Q.#2 = S.#2) S) ⋃ π_{#1, #2}(σ_{#1 = #2} (Q × Q))

• Compute Fixpoint

– R= {(1,3)}

– P= {(1,2),(2,3), (1, 3)}

Exercise 1.1&1.2

(4)

• WITH trip (from,to,price)

AS (SELECT from, to, price FROM flight

UNION ALL

SELECT f.from, t.to, f.price + t.price FROM flight f, trip t

WHERE f.to = t.from AND f.price + t.price < 400) SELECT from, to, price

FROM trip

WHERE from IN (SELECT code FROM airport

WHERE city = 'Berlin') AND to IN (SELECT code

FROM airport

WHERE city = 'Stuttgart')

Exercise 2.1

(5)

• WITH trip (from,to,price)

AS (SELECT from, to, price FROM flight

UNION ALL

SELECT f.from, t.to, f.price + t.price FROM flight f, trip t

WHERE f.to = t.from AND f.code != 'lh113' AND f.price + t.price < 400)

SELECT from, to, price FROM trip

WHERE from IN (SELECT code FROM airport

WHERE city = 'Berlin') AND to IN (SELECT code

FROM airport

WHERE city = 'Stuttgart')

Exercise 2.2

(6)

• WITH trip (from,to,price)

AS (SELECT from, to, price FROM flight

UNION ALL

SELECT f.from, t.to, f.price + t.price FROM flight f, trip t

WHERE f.to = t.from

AND f.price + t.price < 1000) SELECT p1+p2 FROM

SELECT min(price) AS p1 FROM trip

WHERE from= 'MUC' AND to = 'STR', SELECT min(price) AS p1 FROM trip

WHERE from= „STR' AND to = „HAM'

Exercise 2.3

(7)

• WITH fourports(a,b,c,d,p) AS (SELECT

f1.from, f2.from, f3.from, f3.to, f1.price+f2.price+f3.price

FROM flight f1, flight f2, flight f3

WHERE f1.to=f2.from AND f2.to=f3.from AND f1.from!=f3.from AND f2.from!=f3.to

) SELECT * FROM fourports

WHERE p IN (SELECT min(p) FROM fourports)

Exercise 2.4

(8)

• We have discussed ways of deriving new facts from other (ground) facts

– But often several rules can lead to a certain fact and we cannot be sure which one it was

• A patient experiences toothaches, what is the reason?

– Sometimes a certain fact might be derived from ground facts only in certain cases

• A normal bird can fly, except for penguins, ostriches,…

9.1 Uncertainty

(9)

• Typical sources of imperfect information in deductive databases are…

– Incomplete information

• Information is simply missing, which might clash with the closed world assumption

– Imprecise information

• The information needed has only been specified in a vague way, e.g., a person is young: young(Tim).

• Queries, about Tim‟s age are difficult to answer, e.g., ?age(Tim, 67) is false, but what about ?age(Tim, 25)?

– Uncertain information

• A deduction is not always correct, e.g., the question whether a bird can fly: fly(X) :- bird(X).

• What about penguins, dead birds, or birds with clipped wings?

9.1 Uncertainty

(10)

• Consider an expert system for dentists

– All possible causes for toothaches are contained in a database and the reason should be deduced

– cavities(X) :- toothache(X).

periodontosis(X) :- toothache(X).

• Not very helpful, since all possible causes are listed. Thus, all rules fire…

– cavities(X) :- toothache(X), ¬periodontosis(X).

periodontosis(X) :- toothache(X), ¬cavities(X).

• Not very helpful either, because now we need to disprove all alternatives before any rule fires…

• Remember the assumption of „negation as failure‟

9.1 Uncertainty

(11)

• But how do dentists deal with the problem?

– Like in our second program look for positive or negative clues

• e.g., bleeding of gums,…

• Still, how does a dentist know what to look for?

– What are probable causes?

– What are possible causes?

– Knowing the patient, what is the (subjective) judgement?

9.1 Uncertainty

(12)

• Basic idea: assign a measure of validity to each rule or statement and propagate this measure

through the deduction process

– Probabilistic truth values

• Use statistics: how often is cavities the reason and how often is peridontosis?

• Leads to a probability distribution over possible worlds

– Possibility values

• What are possible causes an to what degree do they cause toothache?

• Leads to a possibility distribution over possible worlds

– Belief values

• Lead to belief networks with facts that may influence each other

– …

9.1 Uncertainty

(13)

• Usually dealing with uncertainty needs an open world assumption

– Facts not stated in the database may or may not be false

• But the reasoning gets more difficult

– Remember our discussion about the existence of several minimal models in Datalog^neg

– The reasoning process is not monotonic any more

• Introduction of new knowledge might lead to a revision (and sometimes refutation) of previously derived facts

9.1 Uncertainty

(14)

• Non-monotonic reasoning considers that sometimes statements considered true, have to be revised in the light of new facts

– Tweety is a bird.

• Can Tweety fly? Yes!

– Tweety is a bird. Tweety is 2.5 meters.

• Can Tweety fly? No!

– The introduction of a new fact has

challenged the general rule that birds can fly

• Only ostriches reach a height of 2.5 meters!

9.1 Non-Monotonic Reasoning

(15)

• There are several classical approaches of dealing with the problem

– Default logic

– Predicate circumscription – Autoepistemic reasoning – …

9.1 Non-Monotonic Reasoning

(16)

• Default logic was proposed by Raymond Reiter (University of Toronto) in 1980

– Can express logical facts like

„by default, something is true‟

– Basically a default theory consists of two parts D and W

• W is a set of first order logical formulae known to be true

• D is a set of default rules of the form

prerequisite : justification₁, …, justification_n conclusion

9.1 Non-Monotonic Reasoning

(17)

– prerequisite : justification₁, …, justification_n conclusion

– If we believe the prerequisite to be true, and each of justification_i is consistent with our current beliefs, we are led to believe that conclusion is true

• Example: bird(X) : fly(X) with {bird(condor), bird(penguin), fly(X) fly(eagle), ¬fly(penguin)}

• fly(condor) is true by default, since it is a bird and we have no justification to believe otherwise

• But fly(penguin) cannot be derived here, since although bird(penguin) is true, we know that the justification is false

• Neither can we deduce bird(eagle) which would be abduction

9.1 Non-Monotonic Reasoning

(18)

• A common default assumption is the closed world assumption true : ¬F

¬F

• The semantics of default logics is again based on fixpoints

– Use set W as initial theory T

– Add to a theory T every fact that can be deduced by using any of the default rules in D, so-called extensions to the theory T

– Repeat until nothing new can be deduced

– If T is consistent with all justifications of the default rules used to derive any extension, output T

9.1 Non-Monotonic Reasoning

(19)

• The last check in the algorithm is necessary to avoid inconsistent theories

– i.e. something has been deduced using a justification that was later proven to be false

– E.g. consider a default rule true : A(X) and W := Ø

¬ A(X)

• Since A(X) is consistent with W we may conclude ¬A(X), which however is inconsistent with the previously

assumed A(X)

• In this case the theory simply has no extensions

9.1 Non-Monotonic Reasoning

(20)

• Interestingly, the semantics is non-deterministic

– The deduced theory may depend on the sequence in which defaults are applied

• Example: D:={ bird(X) : fly(X), penguin(X) : ¬fly(X) } fly(X) ¬fly(X)

with {bird(Tweety), penguin(Tweety)}

• Starting with W both default rules are applicable

• If we use the first rule, the extension fly(Tweety) would be added, and the second default rule is no longer applicable

• In case we apply the second rule first, the extension would be ¬fly(Tweety)

9.1 Non-Monotonic Reasoning

(21)

• Entailment of a formula from a default theory can be defined in two ways

– Skeptical entailment

• A formula is entailed by a default theory if it is entailed by all its extensions

– Credulous entailment

• a formula is entailed by a default theory if it is entailed by at least one of its extensions

– For example our Tweety theory has two extensions, one in which Tweety can fly and one in which he cannot fly

• Neither extension is skeptically entailed

• Both of them are credulously entailed

9.1 Non-Monotonic Reasoning

(22)

• Predicate circumscription was introduced by John McCarthy (Stanford University) in 1978

– Inventor of LISP and the „space fountain‟

– Basically circumscription tries to formalize the common sense

assumption that things are as expected, unless specified otherwise

9.1 Non-Monotonic Reasoning

(23)

• Consider the problem whether Tweety can fly, if we assume that Tweety is a penguin…

– Sure, Tweety can fly,…

…because he takes a helicopter!

– This solution is intuitively not valid, since no helicopter was mentioned in our facts – Of course we could exclude

all possible ways to fly in our program, but…

9.1 Non-Monotonic Reasoning

(24)

• Circumscription is a rule of conjecture that can be used for jumping to certain conclusions

– The objects that can be shown to have a certain

property P by reasoning from certain facts A, are all the objects that satisfy P

• More generally, circumscription can be used to conjecture that the substitutions that can be shown to satisfy a

predicate, are all the tuples satisfying this predicate

– Thus, the set of relevant tuples is circumscribed

9.1 Non-Monotonic Reasoning

(25)

• Example: by circumscription a bird can be

conjectured to fly unless something prevents it

– The only entities that can prevent the bird from flying are those whose existence follows from the facts

• If no clipped wings, being a penguin or other circumstances preventing flight are deducible, then the bird is concluded to fly

• Basically, this can be done by adding a predicate

¬abnormal(X) to all rules about flying

– The correctness of this conclusion depends on having taken into account all relevant facts when the

circumscription was made

9.1 Non-Monotonic Reasoning

(26)

• Circumscription therefore tries to derive all minimal models of a set of formulae

– If we have a predicate p(X₁, …, X_n) then a model tells

whether the predicate is true for any possible substitution with terms for X_i

• The extension of p(X₁, …, X_n) in a model is the set of substitutions for which p(X₁, …, X_n) evaluates to true

– The circumscription of a formula is a minimization

believing only the least possible number of predicates

• The circumscription of p(X₁, …, X_n) in a formula is obtained by selecting only models with a minimal extension of p(X₁, …, X_n)

9.1 Non-Monotonic Reasoning

(27)

• Example

– Consider a formula of the type A ⋀ (B ⋁ C)  D like fly(X) :- bird(X), eagle(X).

fly(X) :- bird(X), condor(X).

• Obviously bird(X) has to be true in any model, but to be minimal only eagle(X) or condor(X) has to be true

• Hence there are two circumscriptions of the formula {bird(X), eagle(X)} and {bird(X), condor(X)}, but not {bird(X), eagle(X), condor(X)}

– Note that predicates are only evaluated as false, if it is possible

• eagle(X) and condor(X) cannot both be false

9.1 Non-Monotonic Reasoning

(28)

• But sometimes circumscription handles disjunctive information incorrectly

– Toss a coin onto a chess board and consider the predicate lies_on(X, Y) where it lies

– There are several possibilities of models

• Obviously {lies_on(coin, floor)} should be false, since it was not mentioned that the coin could miss the board

• That leaves {lies_on(coin, white)}, {lies_on(coin, black)}, and {lies_on(coin, white), lies_on(coin, black)} for the overlapping case

– But the last model would be filtered out as not being minimal by circumscription

• One possibility to remedy this case is theory curbing, where iteratively the least upper bound(s) of the minimal models is

9.1 Non-Monotonic Reasoning

(29)

• Autoepistemic Logic was introduced by Robert C. Moore (Microsoft Research) in 1985

• Autoepistemic logic cannot only express

facts, but also knowledge and lack of knowledge about facts

• Formalizes non-monotonicity using statements with a belief operator B

– For every well-formed formula F, the „belief atom‟ B(F) means that F is believed

– ¬B(F) means that F is not believed

9.1 Non-Monotonic Reasoning

(30)

• It uses the following axioms

– All propositional tautologies are axioms

– If we believe in B(X) :- A(X)., then whenever we believe in A(X), we also have to believe in B(X) – Inconsistent conclusions are never believed, i.e.

¬B(false)

• It uses modus ponens as inference rule

– Given an conditional claim A  B and the truth of the antecedent A, it can be logically concluded that the

consequent B must be true as well

9.1 Non-Monotonic Reasoning

(31)

• This can be used to derive stable sets of sentences which are then believed

– i.e. the reflection of our own state of knowledge

• If we do not believe in a fact, then we believe that we do not believe it

– B(bird(X)) ⋀ ¬B(¬fly(X))  fly(X)

– If I believe that X is a bird and if I don‟t believe that X cannot fly, then I will conclude that X flies

9.1 Non-Monotonic Reasoning

(32)

• A belief theory T describes the knowledge base

– A restricted belief interpretation of T is a set of belief atoms I such that for each B(F) appearing in T either B(F)  I or ¬B(F)  I (but not both)

– A restricted belief model of T is a belief interpretation I such that T ⋃ I is consistent

9.1 Non-Monotonic Reasoning

(33)

• Again expansions to the theory can be derived

– Since all belief atoms have to be either true or false, the theory can be treated like propositional

formulae

– In particular, checking whether T entails F can be done using the rules of the propositional calculus – In order for an initial assumption to be an

expansion, it must be that F is entailed, iff B(F) has been initially assumed true

9.1 Non-Monotonic Reasoning

(34)

• Probability theory deals with expressing the belief or knowledge that a certain event will or has occurred

• In general, there are two major factions among probability theorists

– Frequentistic view:

• Probability of an event is its relative frequency of

occurrence during a long running random experiment

• Major supporters: Neyman, Pearson, Wald, …

9.2 Probability

(35)

– Bayesian view:

• Probabilities can be assigned to any event or statement whether it is part of a random process or not

• Probabilities thus express the degree of belief that a given event will happen

• Major supporters: Bayes, Laplace, de Finetti, …

• During the following slides, we will encounter both views

– …but still, formal notation and theory is similar in both

9.2 Probability

(36)

• The probability of an event or statement A is given by P(A)

– P(A) ∈ [0,1]

– P(¬A):=1-P(A)

– Depending on your world view, probability of P(A)=0.8 may mean

• During an longer random experiment, A was the outcome of 80% of all tries

• You have a strong belief (quantified by 0.8 of a maximum of 1) that A can / will happen

9.2 Probability

(37)

• Given two events A and B and assuming that they are statistically independent of each other,

probabilities may be combined

– P(A ⋀ B)= P(A) * P(B)

• also written P(A, B)

– e.g.

• P(isYellow(Tweety))=0.8 and P(canFly(Tweety))=0.2

⤇ P(isYellow(Tweety), canFly(Tweety)) = 0.16

9.2 Probability

(38)

• However, events are often not independent, thus we need conditional probabilities

– This is written as P(A | B)

• P(A | B) is the conditional probability of A given B

• P(A | B) := P(A ⋀ B) / P(B)

• e.g. P(canBark(X) | dog(X)) = 0.9

– Given that X is a dog, X can bark with a probability of 0.9

• Based on conditional probabilities, we can derive simple deductive system

– Probabilistic rules:

• B ←^P(B|A) A or B :-^P(B|A) A

9.2 Probabilistic Reasoning

(39)

• Of course, we can also form deductive chains

• Example:

– dog(X) ←^0.6 domestic_animal(X).

canBark(X) ←^0.9 dog(X).

⊢ canBark(X) ←^??domestic_animal(X).

– So, assuming statistical independence between barking and domestic animals, we may conclude that the probabilities may be just multiplied, i.e.

canBark(X) ←^0.54 domestic_animal(X).

9.2 Probabilistic Reasoning

(40)

• Unfortunately, this naïve approach breaks quickly

• Example:

– dog(X) ←^0.6 domestic_animal(X).

canBark(X) ←^0.9 dog(X).

⊢ canBark(X) ←^0.54 domestic_animal(X).

– domestic_animal(X) ←^1.0 cat(X).

⊢ canBark(X) ←^0.54 cat(X).

• Cats can bark with 0.54 probability? Something is wrong…

– Problem:

• dog(X) ←^0.6 domestic_animal(X) ←^1.0 cat(X). ??

9.2 Probabilistic Reasoning

(41)

• Why can‟t we have any confidence about barking cats?

– Not enough information!

– We don‟t know about

P(cat(x)|dog(x)), or P(bark(X)|cat(X)), …

9.2 Probabilistic Reasoning

dog(X) ←^0.7 domestic_animal(X).

canBark(X) ←^0.9dog(X).

domestic_animal(X) ←^1.0 cat(X)

canBark(X) ←^?? domestic_animal(X).

canBark(X) ←^?? cat(X).

domestic animals cats

barks dogs

domestic animals

cats

barks dogs

All cats bark No cat barks

(42)

• Given two events with their respective

probabilities, P(A)=α and P(B)=β, how could they be related, i.e. what is P(A ⋀ B) ?

a) A and B could be independent, and thus P(A ⋀ B) :=P(A) * P(B)

• e.g. P(isMonday(today)), P(cat(Garfield))

b) A and B could be mutually exclusive, thus P(A ⋀ B) := 0

• e.g. P(isMonday(today)), P(isTuesday(today))

c) A implies B, thus P(A ⋀ B) = P(A)

• e.g. P(isCat(X)), P(isAnimal(X))

9.2 Probabilistic Reasoning

(43)

d) There could also be no quantifiable relationship between P(A) and P(B)

• However, according to Boole, we can at least provide an interval which contains P(A ⋀ B)

• max(0, P(A)+P(B)-1) ≤ P(A ⋀ B) ≤ min(P(A), P(B))

– Those two boundaries are called T-Norms

– Minimum T-Norm: min(a, b) (also known as Gödel T-Norm) – Łukasiewicz T-Norm: max(0, a+b-1)

• Example: P(A)= 0.33, P(B) = 0.23 0 ≤ P(A ⋀ B) ≤ 0.23

9.2 Probabilistic Reasoning

P(A) P(B) P(A)

P(B)

P(A) P(B)

(44)

– Obviously, there may also be many additional cases (like negative correlation, A implies B when C, etc...) – However, if there is no further information available,

upper/lower bound estimation is the only possible case

• We should try to incorporate these results into our to-be- developed chaining rule

– Thus, we can conclude

• If there are no further properties known for A and B but their probabilities, their combined probability can only be described by an interval

9.2 Probabilistic Reasoning

(45)

• Confidence intervals also help to model probabilistic rules

– B ←^{(x1, x2)} A iff 0 ≤ x1 ≤ P(B|A) ≤ x2 ≤ 1

• i.e. given A, the probability for B is between x1 and x2

• If x1=x2, this can still be abbreviated with B ←^x1 A

• e.g. canBark(X) ← ^{(0.8, 1.0)} dog(X)

9.2 Probabilistic Reasoning

(46)

• Also, rules combined with their converse can be stated that way

– A ←^{(x1, x2)}B and its converse B ←^{(y1, y2)}A, denoted as A _{(y1, y2)}↔^{(x1, x2)}B

– e.g. domesticAnimal(X) (0.3, 0.3) ↔(1.0, 1.0) cat(X)

9.2 Probabilistic Reasoning

(47)

• The dominant reason for these flawed deductions is mixing causal rules with diagnostic rules

– Causal Rules: Relate a known cause to its effect

• A is the cause for B; A is given and B happened because of A

• e.g. groundIsWet ←^1.0 sprinklerWasOn

– Diagnostic Rules: Try to relate an observable effect to its cause

• i.e. B ←^0.2 A

• B is the cause for A, but just with a weaker probability / belief

• e.g. sprinklerWasOn ←^0.3 groundIsWet

9.2 Probabilistic Reasoning

(48)

• Rule chaining along just causal OR diagnostic rules works just fine

– groundIsWet ←^1.0 sprinklerWasOn youGetWetFeet ←^0.97 groundIsWet

⊢ youGetWetFeet ←^0.97 sprinklerWasOn – groundIsWet ←^0.2 youGotWetFeet

itRained ←^0.9 groundIsWet

⊢ itRained ←^0.18 youGotWetFeet

9.2 Probabilistic Reasoning

(49)

• But careful:

– Causal:

groundIsWet ←^1.0 sprinklerWasOn – Diagnostic:

itRained ←^0.9 groundIsWet

– but not: itRained ←^0.9 sprinklerWasOn

9.2 Probabilistic Reasoning

both are causes of wet ground, but are otherwise unrelated

(50)

• Causal and diagnostic can be treated in pairs

– Diagnostic rules are the converse of causal rules – groundIsWet ←^1.0 sprinklerWasOn

groundIsWet _0.1→ sprinklerWasOn Written as:

groundIsWet _0.1↔ ^1.0 sprinklerWasOn groundIsWet _0.9↔ ^1.0 itRained

– Now, we need a heuristic for dealing with diagnostic and causal rules together

9.2 Probabilistic Reasoning

(51)

• Observation:

– Causal rules usually have a quite high probability:

B ←^~1.0 A

• If probability was low, A is not really the cause for B

– Diagnostic rules usually have a lower probability:

A ←^≪1.0 B

• i.e., B may be the effect of A, but it is usually also the effect of other causes

9.2 Probabilistic Reasoning

(52)

• Observation:

– So, the main syntactic difference between those rule types is the strength of belief in the deduction

– Consider bi-directional rules:

groundIsWet _0.1↔ ^1.0 sprinklerWasOn

– Thus, when chaining two rules with diverging

probabilities, we probably mix diagnostic and causal rules

• A chaining rule needs a strong dampening factor for diverging probabilities

9.2 Probabilistic Reasoning

⇽ is probably causal rule, sprinkler wets the ground for sure

⇾ is probably diagnostic rule; there may be many other reasons for wet ground

(53)

• A correct chaining rule can be given as follows:

– C _{(y1, y2)}↔^{(x1, x2)}B, B _{(v1, v2)}↔^{(u1, u2)}A

⊢ C ← ^{(z1, z2)}A – z1 =

– z2 =

9.2 Probabilistic Reasoning

u1/v1 * max(0, v1+x1-1) u1

0

if v1>0

if v1=0 and x1=1 otherwise

min(1, u2+t*(1-y1), 1-u1+t*y1, t); t=u2*x2/v1*y1 min(1, 1-u1+(u2*x2)/v1)

1-u1

if v1>0 and y1>0 if v1>0 and y1=0 otherwise

if v1=0 and x2=0 1

Proof and derivation in:

(54)

• This chaining rule can be obtained by a lengthy proof within a deductive calculus

– …thus, it is correct

– Unfortunately, it is not really intuitively obvious what it does and how it works

• But we can try to find some rationales

– The chaining rule tries to „play safe‟ by incorporating the T-Norms as a worst-case estimation

9.2 Probabilistic Reasoning

Łukasiewicz T-Norm as safe lower bound Minimum T-Norm as safe upper bound

(55)

• But it works:

– dog(X) _1.0↔^0.7domestic(X).

barks(X) ←^0.9dog(X).

domestic(X)_0.3↔^1.0cat(X).

– By using the chaining rule, we get

⊢ barks(X) ←(0.63, 0.93) domestic(X).

⊢ barks(X) ←(0.0, 1.0) cat(X).

– If now additional knowledge is added, the belief intervals change

dog(X) ←^1.0barks(X). (Only dogs bark) dog(X) ←^0.0cat(X). (Cats are no dogs)

⊢ barks(X) _0.0↔^0.0cat(X). (No barking cats)

9.2 Probabilistic Reasoning

(56)

• The chaining rule dampens all conclusion which seem to involve mixed causal/diagnostic chains

9.2 Probabilistic Reasoning

C _{(y1, y2)}↔^{(x1, x2)}B, B _{(v1, v2)}↔^{(u1, u2)}A ⊢ C ← ^{(z1, z2)}A rain _1.0↔^0.9 wet. wet _0.1↔^1.0 sprinkler.

Known: ^⇽ diagnostic? ^⇽ ^causal causal ⇾ diagnostic ⇾

Rule:

very low value if rules seem of different type

very high value if rules seem of different type

(57)

• Lets try to perform a “safer” chaining

9.2 Probabilistic Reasoning

C _{(y1, y2)}↔^{(x1, x2)}B, B _{(v1, v2)}↔^{(u1, u2)}A ⊢ C ← ^{(z1, z2)}A

wetFeet _0.2↔^0.97 wetGround. wetGround _0.1↔^1.0 sprinkler.

Known: ^⇽ ^causal ^⇽ ^causal

diagnostic ⇾ diagnostic ⇾

Rule:

wetFeet ←^{(0.7, 1.0)} sprinkler Result:

higher value for causal chaining

(58)

• Summary: probabilistic deduction

– Chaining rules produce new rules which are only true within a certain confidence interval

– Non-monotonism is reflected by adjusting those confidence intervals

– For computing the confidence intervals of a chain, the converse rules are considered

• Thus, the problem of chaining diagnostic and causal rules is solved implicitly

9.2 Probabilistic Reasoning

(59)

• Bayesian belief networks are used to represent a set of random variables and their conditional

probabilities

– Introduced by Judea Pearl (UCLA) in 1985

• The networks explicitly model the

independence relationships in the data

– These independence relationships can then be used to make probabilistic inferences

9.3 Bayesian Belief Networks

(60)

• Bayesian networks are directed acyclic graphs whose nodes represent random variables

– Edges represent the direct (causal) influence between variables

• Missing edges encode conditional independencies between the variables

– What causes toothaches?

Has flu anything to do with it?

9.3 Bayesian Belief Networks

toothache

periodontosis

cavities flu

(61)

• Nodes are annotated with (conditional) probabilities

– Root nodes are assigned prior probability distributions

– Child nodes are assigned conditional probability tables with respect to

their parents

9.3 Bayesian Belief Networks

toothache

periodontosis

cavities flu

P(has_flu)

P(has_cavities) P(has_periodontitis)

P(toothache | has_cavities, has_periodontitis) P(toothache | has_cavities, ¬has_periodontitis) P(toothache | ¬has_cavities, has_periodontitis) P(toothache | ¬has_cavities, ¬has_periodontitis) gum bleeding

P(gum bleeding | has_periodontitis) P(gum bleeding | ¬has_periodontitis)

(62)

• What is the full joint distribution?

– P(X₁, X₂, ..., X_n)

=P(X₁) * P(X₂, X₁, ..., X_n | X₁)

=P(X₁) * P(X₂|X₁) * P(X₃, X₄, ..., X_n| X₁, X₂)

= ...

= P(X₁) * P(X₂|X₁) * P(X₃|X₁, X₂)* ...* P(X_n| X1,...,X_n-1)

– Note that we did not use any independence assumption here

9.3 Bayesian Belief Networks

(63)

• Now, use the semantics of Bayesian belief networks (local Markov property)

– Let X₁, …, X_n be an ordering of the nodes such that only the nodes that are indexed lower than i may have a directed path to X_i

– The full joint distribution can now be defined as the product of the local conditional distributions

P(X₁, …, X_n) = Π_{1≤ i ≤ n} P(X_i | Parents(X_i))

• Note that all these probabilities are available in the network

9.3 Bayesian Belief Networks

(64)

• For example, what is the joint probability that somebody has periodontities and has toothache, but no cavities?

– P(has_periodontitis, ¬has_cavities, toothache)

= P(has_periodontitis) * P(¬has_cavities) *

P(toothache | ¬has_cavities, has_periodontitis)

9.3 Bayesian Belief Networks

(65)

• Given a Bayesian network and its conditional probability tables, we can compute all

probabilities of the form P(H | X

₁

, X

₂

,…, X

_n

)

– Where H and X₁, X₂, ..., X_n are assignments to nodes (i.e. random variables) in the network

– H is the hypothesis we are interested in – X₁, X₂, ..., X_n are the influences

• By being conditionally dependent on their parents, beliefs are propagated through the network

9.3 Bayesian Belief Networks

(66)

• Inferring causal or diagnostic information can be done using the joint probability distributions

– E.g., what is the probability that somebody has cavities given that he/she suffers from toothache?

– Can be evaluated using the conditional probability formula:

P(has_cavities | toothache)

= P(has_cavities, toothache) P(toothache)

9.3 Bayesian Belief Networks

(67)

• A Bayesian belief network for breast cancer diagnosis

9.3 Example: Medicine

(68)

• More reasoning

– Fuzzy logic and possibilistic sytems – Case-based reasoning