Knowledge-Based Systems and Deductive Databases

(1)

Wolf-Tilo Balke Hermann Kroll

Institut für Informationssysteme

Technische Universität Braunschweig http://www.ifis.cs.tu-bs.de

Knowledge-Based Systems

and Deductive Databases

(2)

8.1 Uncertain Knowledge 8.2 Probabilistic Application 8.3 Belief Networks

8. Deduction with Uncertainty

(3)

• We have discussed ways of deriving new facts from other (ground) facts

–

But often several rules can lead to a certain fact and we cannot be sure which one it was

• A patient experiences toothaches, what is the reason?

–

Sometimes a certain fact might be derived from ground facts only in certain cases

• A normal bird can fly, except for penguins, ostriches,…

8.1 Uncertainty

(4)

• Typical sources of imperfect information in deductive databases are…

– Incomplete information

• Information is simply missing, which might clash with the closed world assumption

– Imprecise information

• The information needed has only been specified in a vague way, e.g., a person is young: young(Tim).

• Queries, about Tim’s age are difficult to answer, e.g., ?age(Tim, 67) is false, but what about ?age(Tim, 25)?

– Uncertain information

• A deduction is not always correct, e.g., the question whether a bird can fly: fly(X) :- bird(X).

• What about penguins, dead birds, or birds with clipped wings?

8.1 Uncertainty

(5)

• Consider an expert system for dentists

–

All possible causes for toothaches are contained in a database and the reason should be deduced

–

cavities(X) :- toothache(X).

periodontosis(X) :- toothache(X).

• Not very helpful, since all possible causes are listed. Thus, all rules fire…

–

cavities(X) :- toothache(X), ¬periodontosis(X).

periodontosis(X) :- toothache(X), ¬cavities(X).

• Not very helpful either, because now we need to disprove all alternatives before any rule fires…

• Remember the assumption of ‘negation as failure’

8.1 Uncertainty

(6)

• But how do dentists deal with the problem?

–

Like in our second program look for positive or negative clues

• e.g., bleeding of gums,…

• Still, how does a dentist know what to look for?

–

What are probable causes?

–

What are possible causes?

–

Knowing the patient, what is the (subjective) judgement?

8.1 Uncertainty

(7)

• Basic idea: assign a measure of validity to each rule or statement and propagate this measure

through the deduction process

– Probabilistic truth values

• Use statistics: how often is cavities the reason and how often is peridontosis?

• Leads to a probability distribution over possible worlds

– Possibility values

• What are possible causes and to what degree do they cause toothache?

• Leads to a possibility distribution over possible worlds

– Belief values

• Lead to belief networks with facts that may influence each other

– …

8.1 Uncertainty

(8)

• Usually dealing with uncertainty needs an open world assumption

–

Facts not stated in the database may or may not be false

• But the reasoning gets more difficult

–

Remember our discussion about the existence of several minimal models in Datalog

^neg

–

The reasoning process is not monotonic any more

• Introduction of new knowledge might lead to a revision (and sometimes refutation) of previously derived facts

8.1 Uncertainty

(9)

• Probability theory deals with expressing the belief or knowledge that a certain event will or has occurred

• In general, there are two major factions among probability theorists

–

Frequentistic view:

• Probability of an event is its relative frequency of

occurrence during a long running random experiment

• Major supporters: Neyman, Pearson, Wald, …

8.2 Probability

(10)

–

Bayesian view:

• Probabilities can be assigned to any event or statement whether it is part of a random process or not

• Probabilities thus express the degree of belief that a given event will happen

• Major supporters: Bayes, Laplace, de Finetti, …

• During the following slides, we will encounter both views

–

…but still, formal notation and theory is similar in both

8.2 Probability

(11)

• The probability of an event or statement A is given by P(A)

– P(A) ∈ [0,1]

–

P(¬A):=1-P(A)

–

Depending on your world view, probability of P(A)=0.8 may mean

• During an longer random experiment, A was the outcome of 80% of all tries

• You have a strong belief (quantified by 0.8 of a maximum of 1) that A can / will happen

8.2 Probability

(12)

• Given two events A and B and assuming that they are statistically independent of each other,

probabilities may be combined

– P(A ⋀ B)= P(A) * P(B)

• also written P(A, B)

–

e.g.

• P(isYellow(Tweety))=0.8 and P(canFly(Tweety))=0.2

⤇ P(isYellow(Tweety), canFly(Tweety)) = 0.16

8.2 Probability

(13)

• However, events are often not independent, thus we need conditional probabilities

–

This is written as P(A | B)

• P(A | B) is the conditional probability of A given B

• P(A | B) := P(A ⋀ B) / P(B)

• e.g. P(canBark(X) | dog(X)) = 0.9

– Given that X is a dog, X can bark with a probability of 0.9

• Based on conditional probabilities, we can derive simple deductive system

–

Probabilistic rules:

• B ←^P(B|A) A or B :- ^P(B|A) A

8.2 Probabilistic Reasoning

(14)

• Of course, we can also form deductive chains

• Example:

– dog(X) ←^0.6

domestic_animal(X).

canBark(X) ←

^0.9

dog(X).

⊢ canBark(X) ←^??

domestic_animal(X).

–

So, assuming statistical independence between barking and domestic animals, we may conclude that the probabilities may be just multiplied, i.e.

canBark(X) ←

^0.54

domestic_animal(X).

8.2 Probabilistic Reasoning

(15)

• Unfortunately, this naïve approach breaks quickly

• Example:

– dog(X) ←^0.6

domestic_animal(X).

canBark(X) ←

^0.9

dog(X).

⊢

canBark(X) ←

^0.54

domestic_animal(X).

–

domestic_animal(X) ←

^1.0

cat(X).

⊢

canBark(X) ←

^0.54

cat(X).

• Cats can bark with 0.54 probability? Something is wrong…

–

Problem:

• dog(X) ←^0.6 domestic_animal(X) ←^1.0 cat(X). ??

8.2 Probabilistic Reasoning

(16)

• Why can’t we have any confidence about barking cats?

–

Not enough information!

–

We don’t know about

P(cat(x)|dog(x)), or P(bark(X)|cat(X)), …

8.2 Probabilistic Reasoning

dog(X) ←^0.7 domestic_animal(X).

canBark(X) ←^0.9 dog(X).

domestic_animal(X) ←^1.0 cat(X)

canBark(X) ←^?? domestic_animal(X).

canBark(X) ←^?? cat(X).

domestic animals cats

barks dogs

domestic animals

cats

barks dogs

All cats bark No cat barks

(17)

• Given two events with their respective

probabilities, P(A)=α and P(B)=β, how could they be related, i.e. what is P(A ⋀ B) ?

a) A and B could be independent, and thus

P(A ⋀ B) :=P(A) * P(B)

• e.g. P(isMonday(today)), P(cat(Garfield))

b) A and B could be mutually exclusive, thus

P(A ⋀ B)

:= 0

• e.g. P(isMonday(today)), P(isTuesday(today))

c) A implies B, thus P(A ⋀ B) = P(A)

• e.g. P(isCat(X)), P(isAnimal(X))

8.2 Probabilistic Reasoning

(18)

d) There could also be no quantifiable relationship between P(A) and P(B)

• However, according to Boole, we can at least provide an interval which contains P(A ⋀ B)

• max(0, P(A)+P(B)-1) ≤ P(A ⋀ B) ≤ min(P(A), P(B))

– Those two boundaries are called T-Norms

– Minimum T-Norm: min(a, b) (also known as Gödel T-Norm) – ŁukasiewiczT-Norm: max(0, a+b-1)

• Example: P(A)= 0.33, P(B) = 0.23 0 ≤ P(A ⋀ B) ≤ 0.23

8.2 Probabilistic Reasoning

P(A) P(B)

P(A ⋀ B) = 0 = max(0, a+b-1)

P(A)

P(B) P(A ⋀ B) = 0.053

P(A) P(B)

P(A ⋀ B) = 0.23 = min(a, b)

(19)

–

Obviously, there may also be many additional cases (like negative correlation, A implies B when C, etc...)

–

However, if there is no further information available,

upper/lower bound estimation is the only possible case

• We should try to incorporate these results into our to-be- developed chaining rule

–

Thus, we can conclude

• If there are no further properties known for A and B but their probabilities, their combined probability can only be described by an interval

8.2 Probabilistic Reasoning

(20)

• Confidence intervals also help to model probabilistic rules

– B ←^{(x1, x2)}

A iff 0 ≤ x1 ≤ P(B|A) ≤ x2 ≤ 1

• i.e. given A, the probability for B is between x1 and x2

• If x1=x2, this can still be abbreviated with B ←^x1 A

• e.g. canBark(X) ←^{(0.8, 1.0)} dog(X)

8.2 Probabilistic Reasoning

(21)

• Also, rules combined with their converse can be stated that way

– A ←^{(x1, x2)}

B and its converse

B ←^{(y1, y2)}

A, denoted as A

_{(y1, y2)}↔^{(x1, x2)}

B

–

e.g. domesticAnimal(X)

(0.3, 0.3) ↔(1.0, 1.0)

cat(X)

8.2 Probabilistic Reasoning

(22)

• The dominant reason for these flawed deductions is mixing causal rules with diagnostic rules

–

Causal Rules: Relate a known cause to its effect

• A is the cause for B; A is given and B happened because of A

• e.g. groundIsWet ←^1.0 sprinklerWasOn

–

Diagnostic Rules: Try to relate an observable effect to its cause

• i.e. B ←^0.2 A

• B is the cause for A, but just with a weaker probability / belief

• e.g. sprinklerWasOn ←^0.3 groundIsWet

8.2 Probabilistic Reasoning

(23)

• Rule chaining along just causal OR diagnostic rules works just fine

–

groundIsWet

←^1.0

sprinklerWasOn youGetWetFeet

←^0.97

groundIsWet

⊢

youGetWetFeet

←^0.97

sprinklerWasOn

–

groundIsWet

←^0.2

youGotWetFeet

itRained

←^0.9

groundIsWet

⊢

itRained

←^0.18

youGotWetFeet

8.2 Probabilistic Reasoning

(24)

• But careful:

–

Causal:

groundIsWet

←^1.0

sprinklerWasOn

–

Diagnostic:

itRained

←^0.9

groundIsWet

–

but not: itRained

←^0.9

sprinklerWasOn

8.2 Probabilistic Reasoning

both are causes of wet ground, but are otherwise unrelated

(25)

• Causal and diagnostic can be treated in pairs

–

Diagnostic rules are the converse of causal rules

–

groundIsWet

←^1.0

sprinklerWasOn

groundIsWet

_0.1→

sprinklerWasOn Written as:

groundIsWet

_0.1↔ ^1.0

sprinklerWasOn groundIsWet

_0.9↔ ^1.0

itRained

–

Now, we need a heuristic for dealing with diagnostic and causal rules together

8.2 Probabilistic Reasoning

(26)

• Observation:

–

Causal rules usually have a quite high probability:

B

←^~1.0

A

• If probability was low, A is not really the cause for B

–

Diagnostic rules usually have a lower probability:

A

←^≪1.0

B

• i.e., B may be the effect of A, but it is usually also the effect of other causes

8.2 Probabilistic Reasoning

(27)

• Observation:

– So, the main syntactic difference between those rule types is the strength of belief in the deduction

– Consider bi-directional rules:

groundIsWet _0.1↔^1.0 sprinklerWasOn

– Thus, when chaining two rules with diverging

probabilities, we probably mix diagnostic and causal rules

• A chaining rule needs a strong dampening factor for diverging probabilities

8.2 Probabilistic Reasoning

⇽ is probably causal rule, sprinkler wets the ground for sure

⇾ is probably diagnostic rule; there may be many other reasons for wet ground

(28)

• A correct chaining rule can be given as follows:

–

C

_{(y1, y2)}↔^{(x1, x2)}

B, B

_{(v1, v2)}↔^{(u1, u2)}

A

⊢ C ← ^{(z1, z2)}

A

–

z1 =

–

z2 =

8.2 Probabilistic Reasoning

u1/v1 * max(0, v1+x1-1) u1

0

if v1>0

if v1=0 and x1=1 otherwise

min(1, u2+t*(1-y1), 1-u1+t*y1, t); t=u2*x2/v1*y1 min(1, 1-u1+(u2*x2)/v1)

1-u1

if v1>0 and y1>0 if v1>0 and y1=0 otherwise

if v1=0 and x2=0 1

Proof and derivation in:

U. Güntzer, W. Kießling, H. Thöne. New directions for uncertainty reasoning in deductive databases . Proc. ACM SIGMOD, 1991

(29)

• This chaining rule can be obtained by a lengthy proof within a deductive calculus

–

…thus, it is correct

–

Unfortunately, it is not really intuitively obvious what it does and how it works

• But we can try to find some rationales

–

The chaining rule tries to ‘play safe’ by incorporating the T-Norms as a worst-case estimation

8.2 Probabilistic Reasoning

ŁukasiewiczT-Norm as safe lower bound Minimum T-Norm as safe upper bound

(30)

• But it works:

– dog(X) _1.0↔^0.7domestic(X).

barks(X) ←^0.9dog(X).

domestic(X)_0.3↔^1.0cat(X).

– By using the chaining rule, we get

⊢ barks(X) ←(0.63, 0.93) domestic(X).

⊢ barks(X) ←(0.0, 1.0) cat(X).

– If now additional knowledge is added, the belief intervals change

dog(X) ←^1.0barks(X). (Only dogs bark) dog(X) ←^0.0cat(X). (Cats are no dogs)

⊢ barks(X) _0.0↔^0.0cat(X). (No barking cats)

8.2 Probabilistic Reasoning

(31)

• The chaining rule dampens all conclusion which seem to involve mixed causal/diagnostic chains

8.2 Probabilistic Reasoning

C _{(y1, y2)}↔^{(x1, x2)}B, B _{(v1, v2)}↔^{(u1, u2)}A ⊢ C ← ^{(z1, z2)}A rain _1.0↔^0.9wet. wet _0.1↔^1.0 sprinkler.

Known: ^⇽ diagnostic? ^⇽ ^causal causal ⇾ diagnostic ⇾

Rule:

very low value if rules seem of different type

very high value if rules seem of different type

(32)

• Lets try to perform a “safer” chaining

8.2 Probabilistic Reasoning

C _{(y1, y2)}↔^{(x1, x2)}B, B _{(v1, v2)}↔^{(u1, u2)}A ⊢ C ← ^{(z1, z2)}A

wetFeet _0.2↔^0.97 wetGround. wetGround_0.1↔^1.0sprinkler.

Known: ^⇽ ^causal ^⇽ ^causal

diagnostic ⇾ diagnostic ⇾

Rule:

wetFeet ←^{(0.7, 1.0)}sprinkler Result:

higher value for causal chaining

(33)

• Summary: probabilistic deduction

–

Chaining rules produce new rules which are only true within a certain confidence interval

–

Non-monotonism is reflected by adjusting those confidence intervals

–

For computing the confidence intervals of a chain, the converse rules are considered

• Thus, the problem of chaining diagnostic and causal rules is solved implicitly

8.2 Probabilistic Reasoning

(34)

• Non-monotonic reasoning considers that sometimes statements considered true, have to be revised in the light of new facts

–

Tweety is a bird.

• Can Tweety fly? Yes!

–

Tweety is a bird. Tweety is 2.5 meters.

• Can Tweety fly? No!

–

The introduction of a new fact has

challenged the general rule that birds can fly

• Only ostriches reach a height of 2.5 meters!

8.1 Non-Monotonic Reasoning

(35)

• There are several classical approaches of dealing with the problem

–

Default logic

–

Predicate circumscription

–

Autoepistemic reasoning

–

…

8.1 Non-Monotonic Reasoning

(36)

• Default logic was proposed by Raymond Reiter (University of Toronto) in 1980

–

Can express logical facts like

‘by default, something is true’

–

Basically a default theory consists of two parts D and W

• W is a set of first order logical formulae known to be true

• D is a set of default rules of the form

prerequisite : justification₁, …, justification_n conclusion

8.1 Default Logic

(37)

– prerequisite : justification₁, …, justification_n conclusion

– If we believe the prerequisite to be true, and each of justification_i is consistent with our current beliefs, we are led to believe that conclusion is true

• Example: bird(X) : fly(X) with {bird(condor), bird(penguin), fly(X) fly(eagle), ¬fly(penguin)}

• fly(condor) is true by default, since it is a bird and we have no justification to believe otherwise

• But fly(penguin) cannot be derived here, since although bird(penguin) is true, we know that the justification is false

• Neither can we deduce bird(eagle) which would be abduction

8.1 Default Logic

(38)

• A common default assumption is the closed world assumption true : ¬F

¬F

• The semantics of default logics is again based on fixpoints

– Use set W as initial theory T

– Add to a theory T every fact that can be deduced by using any of the default rules in D, so-called extensions to the theory T

– Repeat until nothing new can be deduced

– If T is consistent with all justifications of the default rules used to derive any extension, output T

8.1 Default Logic

(39)

• The last check in the algorithm is necessary to avoid inconsistent theories

–

i.e. something has been deduced using a justification that was later proven to be false

–

E.g. consider a default rule true : A(X) and W := Ø

¬ A(X)

• Since A(X) is consistent with W we may conclude ¬A(X), which however is inconsistent with the previously

assumed A(X)

• In this case the theory simply has no extensions

8.1 Default Logic

(40)

• Interestingly, the semantics is non-deterministic

–

The deduced theory may depend on the sequence in which defaults are applied

• Example: D:={ bird(X) : fly(X), penguin(X) : ¬fly(X) } fly(X) ¬fly(X)

with {bird(Tweety), penguin(Tweety)}

• Starting with W both default rules are applicable

• If we use the first rule, the extension fly(Tweety) would be added, and the second default rule is no longer applicable

• In case we apply the second rule first, the extension would be ¬fly(Tweety)

8.1 Default Logic

(41)

• Entailment of a formula from a default theory can be defined in two ways

– Skeptical entailment

• A formula is entailed by a default theory if it is entailed by all its extensions

– Credulous entailment

• A formula is entailed by a default theory if it is entailed by at least one of its extensions

– For example our Tweety theory has two extensions, one in which Tweety can fly and one in which he cannot fly

• Neither extension is skeptically entailed

• Both of them are credulously entailed

8.1 Default Logic

(42)

• Predicate circumscription was introduced by John McCarthy (Stanford University) in 1978

–

Inventor of LISP and the ‘space fountain’

–

Basically circumscription tries to formalize the common sense

assumption that things are as expected, unless specified otherwise

8.1 Predicate Circumscription

(43)

• Consider the problem whether Tweety can fly, if we assume that Tweety is a penguin…

–

Sure, Tweety can fly, …

…because he takes a helicopter!

–

This solution is intuitively not valid, since no helicopter was mentioned in our facts

–

Of course we could exclude

all possible ways to fly in our program, but…

8.1 Predicate Circumscription

(44)

• Circumscription is a rule of conjecture that can be used for jumping to certain conclusions

–

The objects that can be shown to have a certain

property P by reasoning from certain facts A, are all the objects that satisfy P

• More generally, circumscription can be used to conjecture that the substitutions that can be shown to satisfy a

predicate, are all the tuples satisfying this predicate

–

Thus, the set of relevant tuples is circumscribed

8.1 Predicate Circumscription

(45)

• Example: by circumscription a bird can be

conjectured to fly unless something prevents it

–

The only entities that can prevent the bird from flying are those whose existence follows from the facts

• If no clipped wings, being a penguin or other circumstances preventing flight are deducible, then the bird is concluded to fly

• Basically, this can be done by adding a predicate

¬abnormal(X) to all rules about flying

–

The correctness of this conclusion depends on having taken into account all relevant facts when the

circumscription was made

8.1 Predicate Circumscription

(46)

• Circumscription therefore tries to derive all minimal models of a set of formulae

– If we have a predicate p(X₁, …, X_n) then a model tells

whether the predicate is true for any possible substitution with terms for X_i

• The extension of p(X₁, …, X_n) in a model is the set of substitutions for which p(X₁, …, X_n) evaluates to true

– The circumscription of a formula is a minimization

believing only the least possible number of predicates

• The circumscription of p(X₁, …, X_n) in a formula is obtained by selecting only models with a minimal extension of p(X₁, …, X_n)

8.1 Predicate Circumscription

(47)

• Example

–

Consider a formula of the type A ⋀ (B ⋁ C) → D like fly(X) :- bird(X), eagle(X).

fly(X) :- bird(X), condor(X).

• Obviously bird(X) has to be true in any model, but to be minimal only eagle(X) or condor(X) has to be true

• Hence there are two circumscriptions of the formula {bird(X), eagle(X)} and {bird(X), condor(X)}, but not {bird(X), eagle(X), condor(X)}

–

Note that predicates are only evaluated as false, if it is possible

• eagle(X) and condor(X) cannot both be false

8.1 Predicate Circumscription

(48)

• But sometimes circumscription handles disjunctive information incorrectly

– Toss a coin onto a chess board and consider the predicate lies_on(X, Y) where it lies

– There are several possibilities of models

• Obviously {lies_on(coin, floor)} should be false, since it was not mentioned that the coin could miss the board

• That leaves {lies_on(coin, white)}, {lies_on(coin, black)}, and {lies_on(coin, white), lies_on(coin, black)} for the overlapping case

– But the last model would be filtered out as not being minimal by circumscription

• One possibility to remedy this case is theory curbing, where iteratively the least upper bound(s) of the minimal models is added until the set of models is closed

8.1 Predicate Circumscription

(49)

• Autoepistemic Logic was introduced by Robert C. Moore (Microsoft Research) in 1985

• Autoepistemic logic cannot only express

facts, but also knowledge and lack of knowledge about facts

• Formalizes non-monotonicity using statements with a belief operator B

– For every well-formed formula F, the ‘belief atom’ B(F) means that F is believed

– ¬B(F) means that F is not believed

8.1 Autoepistemic Logic

(50)

• It uses the following axioms

–

All propositional tautologies are axioms

–

If we believe in B(X) :- A(X)., then whenever we believe in A(X), we also have to believe in B(X)

–

Inconsistent conclusions are never believed, i.e.

¬B(false)

• It uses modus ponens as inference rule

–

Given an conditional claim A

→

B and the truth of the antecedent A, it can be logically concluded that the

consequent B must be true as well

8.1 Autoepistemic Logic

(51)

• This can be used to derive stable sets of sentences which are then believed

–

i.e. the reflection of our own state of knowledge

• If we do not believe in a fact, then we believe that we do not believe it

–

B

(bird(X)) ⋀ ¬B(¬fly(X)) →

fly(X)

–

If I believe that X is a bird and if I don’t believe that X cannot fly, then I will conclude that X flies

8.1 Autoepistemic Logic

(52)

• A belief theory T describes the knowledge base

–

A restricted belief interpretation of T is a set of belief atoms I such that for each B(F) appearing in T either B(F)  I or ¬B(F)



I (but not both)

–

A restricted belief model of T is a belief interpretation I such that T ⋃ I is consistent

8.1 Autoepistemic Logic

(53)

• Again expansions to the theory can be derived

–

Since all belief atoms have to be either true or false, the theory can be treated like propositional

formulae

–

In particular, checking whether T entails F can be done using the rules of the propositional calculus

–

In order for an initial assumption to be an

expansion, it must be that F is entailed, iff B(F) has been initially assumed true

8.1 Autoepistemic Logic

(54)

• Bayesian belief networks are used to represent a set of random variables and their conditional

probabilities

–

Introduced by Judea Pearl (UCLA) in 1985

• The networks explicitly model the

independence relationships in the data

–

These independence relationships can then be used to make probabilistic inferences

8.3 Bayesian Belief Networks

(55)

• Bayesian networks are directed acyclic graphs whose nodes represent random variables

–

Edges represent the direct (causal) influence between variables

• Missing edges encode conditional independencies between the variables

–

What causes toothaches?

Has flu anything to do with it?

8.3 Bayesian Belief Networks

toothache

periodontosis

cavities flu

gum bleeding

(56)

• Nodes are annotated with (conditional) probabilities

–

Root nodes are assigned prior probability distributions

–

Child nodes are assigned conditional probability tables with respect to

their parents

8.3 Bayesian Belief Networks

toothache

periodontosis

cavities flu

P(has_flu)

P(has_cavities) P(has_periodontitis)

P(toothache | has_cavities, has_periodontitis) P(toothache | has_cavities, ¬has_periodontitis) P(toothache | ¬has_cavities, has_periodontitis) P(toothache | ¬has_cavities, ¬has_periodontitis) gum bleeding

P(gum bleeding | has_periodontitis) P(gum bleeding | ¬has_periodontitis)

(57)

• What is the full joint distribution?

–

P(X

₁

, X

₂

, ..., X

_n

)

=P(X

₁

) * P(X

₂

, X

₁

, ..., X

_n

| X

₁

)

=P(X

₁

) * P(X

₂

|X

₁

) * P(X

₃

, X

₄

, ..., X

_n

| X

₁

, X

₂

)

= ...

= P(X

₁

) * P(X

₂

|X

₁

) * P(X

₃

|X

₁

, X

₂

)* ...* P(X

_n

| X1,...,X

_n-1

)

–

Note that we did not use any independence assumption here

8.3 Bayesian Belief Networks

(58)

• Now, use the semantics of Bayesian belief networks (local Markov property)

–

Let X

₁, …, X_n

be an ordering of the nodes such that only the nodes that are indexed lower than i may have a directed path to X

_i

–

The full joint distribution can now be defined as the product of the local conditional distributions

P(X

₁, …,

X

_n

) =

Π_1≤_i _{≤ n}

P(X

_i

| Parents(X

_i

))

• Note that all these probabilities are available in the network

8.3 Bayesian Belief Networks

(59)

• For example, what is the joint probability that somebody has periodontities and has toothache, but no cavities?

–

P(has_periodontitis, ¬has_cavities, toothache)

= P(has_periodontitis) * P(¬has_cavities) *

P(toothache | ¬has_cavities, has_periodontitis)

8.3 Bayesian Belief Networks

(60)

• Given a Bayesian network and its conditional probability tables, we can compute all

probabilities of the form P(H | X

₁

, X

₂

,…, X

_n

)

–

Where H and X

₁

, X

₂

, ..., X

_n

are assignments to nodes (i.e. random variables) in the network

–

H is the hypothesis we are interested in

–

X

₁

, X

₂

, ..., X

_n

are the influences

• By being conditionally dependent on their parents, beliefs are propagated through the network

8.3 Bayesian Belief Networks

(61)

• Inferring causal or diagnostic information can be done using the joint probability distributions

–