Wolf-Tilo Balke Christoph Lofi
Institut für Informationssysteme
Technische Universität Braunschweig http://www.ifis.cs.tu-bs.de
Knowledge-Based Systems
and Deductive Databases
9.1 Uncertain Knowledge 9.2 Probabilistic Application 9.3 Belief Networks
9. Deduction with Uncertainty
• Transform the program to relational algebra
– Easy… just use the rules in the lecture – Eval(R) = S
– Eval(P) = Q ∖ (Q ⋉(Q.#1 = R.#1 ⋀ Q.#2 = R.#2) R ) ⋃ Q ∖ (Q ⋉(Q.#1 = S.#1 ⋀ Q.#2 = S.#2) S) ⋃ π#1, #2(σ#1 = #2 (Q × Q))
• Compute Fixpoint
– R= {(1,3)}
– P= {(1,2),(2,3), (1, 3)}
Exercise 1.1&1.2
• WITH trip (from,to,price)
AS (SELECT from, to, price FROM flight
UNION ALL
SELECT f.from, t.to, f.price + t.price FROM flight f, trip t
WHERE f.to = t.from AND f.price + t.price < 400) SELECT from, to, price
FROM trip
WHERE from IN (SELECT code FROM airport
WHERE city = 'Berlin') AND to IN (SELECT code
FROM airport
WHERE city = 'Stuttgart')
Exercise 2.1
• WITH trip (from,to,price)
AS (SELECT from, to, price FROM flight
UNION ALL
SELECT f.from, t.to, f.price + t.price FROM flight f, trip t
WHERE f.to = t.from AND f.code != 'lh113' AND f.price + t.price < 400)
SELECT from, to, price FROM trip
WHERE from IN (SELECT code FROM airport
WHERE city = 'Berlin') AND to IN (SELECT code
FROM airport
WHERE city = 'Stuttgart')
Exercise 2.2
• WITH trip (from,to,price)
AS (SELECT from, to, price FROM flight
UNION ALL
SELECT f.from, t.to, f.price + t.price FROM flight f, trip t
WHERE f.to = t.from
AND f.price + t.price < 1000) SELECT p1+p2 FROM
SELECT min(price) AS p1 FROM trip
WHERE from= 'MUC' AND to = 'STR', SELECT min(price) AS p1 FROM trip
WHERE from= „STR' AND to = „HAM'
Exercise 2.3
• WITH fourports(a,b,c,d,p) AS (SELECT
f1.from, f2.from, f3.from, f3.to, f1.price+f2.price+f3.price
FROM flight f1, flight f2, flight f3
WHERE f1.to=f2.from AND f2.to=f3.from AND f1.from!=f3.from AND f2.from!=f3.to
) SELECT * FROM fourports
WHERE p IN (SELECT min(p) FROM fourports)
Exercise 2.4
• We have discussed ways of deriving new facts from other (ground) facts
– But often several rules can lead to a certain fact and we cannot be sure which one it was
• A patient experiences toothaches, what is the reason?
– Sometimes a certain fact might be derived from ground facts only in certain cases
• A normal bird can fly, except for penguins, ostriches,…
9.1 Uncertainty
• Typical sources of imperfect information in deductive databases are…
– Incomplete information
• Information is simply missing, which might clash with the closed world assumption
– Imprecise information
• The information needed has only been specified in a vague way, e.g., a person is young: young(Tim).
• Queries, about Tim‟s age are difficult to answer, e.g., ?age(Tim, 67) is false, but what about ?age(Tim, 25)?
– Uncertain information
• A deduction is not always correct, e.g., the question whether a bird can fly: fly(X) :- bird(X).
• What about penguins, dead birds, or birds with clipped wings?
9.1 Uncertainty
• Consider an expert system for dentists
– All possible causes for toothaches are contained in a database and the reason should be deduced
– cavities(X) :- toothache(X).
periodontosis(X) :- toothache(X).
• Not very helpful, since all possible causes are listed. Thus, all rules fire…
– cavities(X) :- toothache(X), ¬periodontosis(X).
periodontosis(X) :- toothache(X), ¬cavities(X).
• Not very helpful either, because now we need to disprove all alternatives before any rule fires…
• Remember the assumption of „negation as failure‟
9.1 Uncertainty
• But how do dentists deal with the problem?
– Like in our second program look for positive or negative clues
• e.g., bleeding of gums,…
• Still, how does a dentist know what to look for?
– What are probable causes?
– What are possible causes?
– Knowing the patient, what is the (subjective) judgement?
9.1 Uncertainty
• Basic idea: assign a measure of validity to each rule or statement and propagate this measure
through the deduction process
– Probabilistic truth values
• Use statistics: how often is cavities the reason and how often is peridontosis?
• Leads to a probability distribution over possible worlds
– Possibility values
• What are possible causes an to what degree do they cause toothache?
• Leads to a possibility distribution over possible worlds
– Belief values
• Lead to belief networks with facts that may influence each other
– …
9.1 Uncertainty
• Usually dealing with uncertainty needs an open world assumption
– Facts not stated in the database may or may not be false
• But the reasoning gets more difficult
– Remember our discussion about the existence of several minimal models in Datalogneg
– The reasoning process is not monotonic any more
• Introduction of new knowledge might lead to a revision (and sometimes refutation) of previously derived facts
9.1 Uncertainty
• Non-monotonic reasoning considers that sometimes statements considered true, have to be revised in the light of new facts
– Tweety is a bird.
• Can Tweety fly? Yes!
– Tweety is a bird. Tweety is 2.5 meters.
• Can Tweety fly? No!
– The introduction of a new fact has
challenged the general rule that birds can fly
• Only ostriches reach a height of 2.5 meters!
9.1 Non-Monotonic Reasoning
• There are several classical approaches of dealing with the problem
– Default logic
– Predicate circumscription – Autoepistemic reasoning – …
9.1 Non-Monotonic Reasoning
• Default logic was proposed by Raymond Reiter (University of Toronto) in 1980
– Can express logical facts like
„by default, something is true‟
– Basically a default theory consists of two parts D and W
• W is a set of first order logical formulae known to be true
• D is a set of default rules of the form
prerequisite : justification1, …, justificationn conclusion
9.1 Non-Monotonic Reasoning
– prerequisite : justification1, …, justificationn conclusion
– If we believe the prerequisite to be true, and each of justificationi is consistent with our current beliefs, we are led to believe that conclusion is true
• Example: bird(X) : fly(X) with {bird(condor), bird(penguin), fly(X) fly(eagle), ¬fly(penguin)}
• fly(condor) is true by default, since it is a bird and we have no justification to believe otherwise
• But fly(penguin) cannot be derived here, since although bird(penguin) is true, we know that the justification is false
• Neither can we deduce bird(eagle) which would be abduction
9.1 Non-Monotonic Reasoning
• A common default assumption is the closed world assumption true : ¬F
¬F
• The semantics of default logics is again based on fixpoints
– Use set W as initial theory T
– Add to a theory T every fact that can be deduced by using any of the default rules in D, so-called extensions to the theory T
– Repeat until nothing new can be deduced
– If T is consistent with all justifications of the default rules used to derive any extension, output T
9.1 Non-Monotonic Reasoning
• The last check in the algorithm is necessary to avoid inconsistent theories
– i.e. something has been deduced using a justification that was later proven to be false
– E.g. consider a default rule true : A(X) and W := Ø
¬ A(X)
• Since A(X) is consistent with W we may conclude ¬A(X), which however is inconsistent with the previously
assumed A(X)
• In this case the theory simply has no extensions
9.1 Non-Monotonic Reasoning
• Interestingly, the semantics is non-deterministic
– The deduced theory may depend on the sequence in which defaults are applied
• Example: D:={ bird(X) : fly(X), penguin(X) : ¬fly(X) } fly(X) ¬fly(X)
with {bird(Tweety), penguin(Tweety)}
• Starting with W both default rules are applicable
• If we use the first rule, the extension fly(Tweety) would be added, and the second default rule is no longer applicable
• In case we apply the second rule first, the extension would be ¬fly(Tweety)
9.1 Non-Monotonic Reasoning
• Entailment of a formula from a default theory can be defined in two ways
– Skeptical entailment
• A formula is entailed by a default theory if it is entailed by all its extensions
– Credulous entailment
• a formula is entailed by a default theory if it is entailed by at least one of its extensions
– For example our Tweety theory has two extensions, one in which Tweety can fly and one in which he cannot fly
• Neither extension is skeptically entailed
• Both of them are credulously entailed
9.1 Non-Monotonic Reasoning
• Predicate circumscription was introduced by John McCarthy (Stanford University) in 1978
– Inventor of LISP and the „space fountain‟
– Basically circumscription tries to formalize the common sense
assumption that things are as expected, unless specified otherwise
9.1 Non-Monotonic Reasoning
• Consider the problem whether Tweety can fly, if we assume that Tweety is a penguin…
– Sure, Tweety can fly,…
…because he takes a helicopter!
– This solution is intuitively not valid, since no helicopter was mentioned in our facts – Of course we could exclude
all possible ways to fly in our program, but…
9.1 Non-Monotonic Reasoning
• Circumscription is a rule of conjecture that can be used for jumping to certain conclusions
– The objects that can be shown to have a certain
property P by reasoning from certain facts A, are all the objects that satisfy P
• More generally, circumscription can be used to conjecture that the substitutions that can be shown to satisfy a
predicate, are all the tuples satisfying this predicate
– Thus, the set of relevant tuples is circumscribed
9.1 Non-Monotonic Reasoning
• Example: by circumscription a bird can be
conjectured to fly unless something prevents it
– The only entities that can prevent the bird from flying are those whose existence follows from the facts
• If no clipped wings, being a penguin or other circumstances preventing flight are deducible, then the bird is concluded to fly
• Basically, this can be done by adding a predicate
¬abnormal(X) to all rules about flying
– The correctness of this conclusion depends on having taken into account all relevant facts when the
circumscription was made
9.1 Non-Monotonic Reasoning
• Circumscription therefore tries to derive all minimal models of a set of formulae
– If we have a predicate p(X1, …, Xn) then a model tells
whether the predicate is true for any possible substitution with terms for Xi
• The extension of p(X1, …, Xn) in a model is the set of substitutions for which p(X1, …, Xn) evaluates to true
– The circumscription of a formula is a minimization
believing only the least possible number of predicates
• The circumscription of p(X1, …, Xn) in a formula is obtained by selecting only models with a minimal extension of p(X1, …, Xn)
9.1 Non-Monotonic Reasoning
• Example
– Consider a formula of the type A ⋀ (B ⋁ C) D like fly(X) :- bird(X), eagle(X).
fly(X) :- bird(X), condor(X).
• Obviously bird(X) has to be true in any model, but to be minimal only eagle(X) or condor(X) has to be true
• Hence there are two circumscriptions of the formula {bird(X), eagle(X)} and {bird(X), condor(X)}, but not {bird(X), eagle(X), condor(X)}
– Note that predicates are only evaluated as false, if it is possible
• eagle(X) and condor(X) cannot both be false
9.1 Non-Monotonic Reasoning
• But sometimes circumscription handles disjunctive information incorrectly
– Toss a coin onto a chess board and consider the predicate lies_on(X, Y) where it lies
– There are several possibilities of models
• Obviously {lies_on(coin, floor)} should be false, since it was not mentioned that the coin could miss the board
• That leaves {lies_on(coin, white)}, {lies_on(coin, black)}, and {lies_on(coin, white), lies_on(coin, black)} for the overlapping case
– But the last model would be filtered out as not being minimal by circumscription
• One possibility to remedy this case is theory curbing, where iteratively the least upper bound(s) of the minimal models is
9.1 Non-Monotonic Reasoning
• Autoepistemic Logic was introduced by Robert C. Moore (Microsoft Research) in 1985
• Autoepistemic logic cannot only express
facts, but also knowledge and lack of knowledge about facts
• Formalizes non-monotonicity using statements with a belief operator B
– For every well-formed formula F, the „belief atom‟ B(F) means that F is believed
– ¬B(F) means that F is not believed
9.1 Non-Monotonic Reasoning
• It uses the following axioms
– All propositional tautologies are axioms
– If we believe in B(X) :- A(X)., then whenever we believe in A(X), we also have to believe in B(X) – Inconsistent conclusions are never believed, i.e.
¬B(false)
• It uses modus ponens as inference rule
– Given an conditional claim A B and the truth of the antecedent A, it can be logically concluded that the
consequent B must be true as well
9.1 Non-Monotonic Reasoning
• This can be used to derive stable sets of sentences which are then believed
– i.e. the reflection of our own state of knowledge
• If we do not believe in a fact, then we believe that we do not believe it
– B(bird(X)) ⋀ ¬B(¬fly(X)) fly(X)
– If I believe that X is a bird and if I don‟t believe that X cannot fly, then I will conclude that X flies
9.1 Non-Monotonic Reasoning
• A belief theory T describes the knowledge base
– A restricted belief interpretation of T is a set of belief atoms I such that for each B(F) appearing in T either B(F) I or ¬B(F) I (but not both)
– A restricted belief model of T is a belief interpretation I such that T ⋃ I is consistent
9.1 Non-Monotonic Reasoning
• Again expansions to the theory can be derived
– Since all belief atoms have to be either true or false, the theory can be treated like propositional
formulae
– In particular, checking whether T entails F can be done using the rules of the propositional calculus – In order for an initial assumption to be an
expansion, it must be that F is entailed, iff B(F) has been initially assumed true
9.1 Non-Monotonic Reasoning
• Probability theory deals with expressing the belief or knowledge that a certain event will or has occurred
• In general, there are two major factions among probability theorists
– Frequentistic view:
• Probability of an event is its relative frequency of
occurrence during a long running random experiment
• Major supporters: Neyman, Pearson, Wald, …
9.2 Probability
– Bayesian view:
• Probabilities can be assigned to any event or statement whether it is part of a random process or not
• Probabilities thus express the degree of belief that a given event will happen
• Major supporters: Bayes, Laplace, de Finetti, …
• During the following slides, we will encounter both views
– …but still, formal notation and theory is similar in both
9.2 Probability
• The probability of an event or statement A is given by P(A)
– P(A) ∈ [0,1]
– P(¬A):=1-P(A)
– Depending on your world view, probability of P(A)=0.8 may mean
• During an longer random experiment, A was the outcome of 80% of all tries
• You have a strong belief (quantified by 0.8 of a maximum of 1) that A can / will happen
9.2 Probability
• Given two events A and B and assuming that they are statistically independent of each other,
probabilities may be combined
– P(A ⋀ B)= P(A) * P(B)
• also written P(A, B)
– e.g.
• P(isYellow(Tweety))=0.8 and P(canFly(Tweety))=0.2
⤇ P(isYellow(Tweety), canFly(Tweety)) = 0.16
9.2 Probability
• However, events are often not independent, thus we need conditional probabilities
– This is written as P(A | B)
• P(A | B) is the conditional probability of A given B
• P(A | B) := P(A ⋀ B) / P(B)
• e.g. P(canBark(X) | dog(X)) = 0.9
– Given that X is a dog, X can bark with a probability of 0.9
• Based on conditional probabilities, we can derive simple deductive system
– Probabilistic rules:
• B ←P(B|A) A or B :-P(B|A) A
9.2 Probabilistic Reasoning
• Of course, we can also form deductive chains
• Example:
– dog(X) ←0.6 domestic_animal(X).
canBark(X) ←0.9 dog(X).
⊢ canBark(X) ←?? domestic_animal(X).
– So, assuming statistical independence between barking and domestic animals, we may conclude that the probabilities may be just multiplied, i.e.
canBark(X) ←0.54 domestic_animal(X).
9.2 Probabilistic Reasoning
• Unfortunately, this naïve approach breaks quickly
• Example:
– dog(X) ←0.6 domestic_animal(X).
canBark(X) ←0.9 dog(X).
⊢ canBark(X) ←0.54 domestic_animal(X).
– domestic_animal(X) ←1.0 cat(X).
⊢ canBark(X) ←0.54 cat(X).
• Cats can bark with 0.54 probability? Something is wrong…
– Problem:
• dog(X) ←0.6 domestic_animal(X) ←1.0 cat(X). ??
9.2 Probabilistic Reasoning
• Why can‟t we have any confidence about barking cats?
– Not enough information!
– We don‟t know about
P(cat(x)|dog(x)), or P(bark(X)|cat(X)), …
9.2 Probabilistic Reasoning
dog(X) ←0.7 domestic_animal(X).
canBark(X) ←0.9dog(X).
domestic_animal(X) ←1.0 cat(X)
canBark(X) ←?? domestic_animal(X).
canBark(X) ←?? cat(X).
domestic animals cats
barks dogs
domestic animals
cats
barks dogs
All cats bark No cat barks
• Given two events with their respective
probabilities, P(A)=α and P(B)=β, how could they be related, i.e. what is P(A ⋀ B) ?
a) A and B could be independent, and thus P(A ⋀ B) :=P(A) * P(B)
• e.g. P(isMonday(today)), P(cat(Garfield))
b) A and B could be mutually exclusive, thus P(A ⋀ B) := 0
• e.g. P(isMonday(today)), P(isTuesday(today))
c) A implies B, thus P(A ⋀ B) = P(A)
• e.g. P(isCat(X)), P(isAnimal(X))
9.2 Probabilistic Reasoning
d) There could also be no quantifiable relationship between P(A) and P(B)
• However, according to Boole, we can at least provide an interval which contains P(A ⋀ B)
• max(0, P(A)+P(B)-1) ≤ P(A ⋀ B) ≤ min(P(A), P(B))
– Those two boundaries are called T-Norms
– Minimum T-Norm: min(a, b) (also known as Gödel T-Norm) – Łukasiewicz T-Norm: max(0, a+b-1)
• Example: P(A)= 0.33, P(B) = 0.23 0 ≤ P(A ⋀ B) ≤ 0.23
9.2 Probabilistic Reasoning
P(A) P(B) P(A)
P(B)
P(A) P(B)
– Obviously, there may also be many additional cases (like negative correlation, A implies B when C, etc...) – However, if there is no further information available,
upper/lower bound estimation is the only possible case
• We should try to incorporate these results into our to-be- developed chaining rule
– Thus, we can conclude
• If there are no further properties known for A and B but their probabilities, their combined probability can only be described by an interval
9.2 Probabilistic Reasoning
• Confidence intervals also help to model probabilistic rules
– B ←(x1, x2) A iff 0 ≤ x1 ≤ P(B|A) ≤ x2 ≤ 1
• i.e. given A, the probability for B is between x1 and x2
• If x1=x2, this can still be abbreviated with B ←x1 A
• e.g. canBark(X) ← (0.8, 1.0) dog(X)
9.2 Probabilistic Reasoning
• Also, rules combined with their converse can be stated that way
– A ←(x1, x2) B and its converse B ←(y1, y2) A, denoted as A (y1, y2)↔(x1, x2) B
– e.g. domesticAnimal(X) (0.3, 0.3) ↔(1.0, 1.0) cat(X)
9.2 Probabilistic Reasoning
• The dominant reason for these flawed deductions is mixing causal rules with diagnostic rules
– Causal Rules: Relate a known cause to its effect
• A is the cause for B; A is given and B happened because of A
• e.g. groundIsWet ←1.0 sprinklerWasOn
– Diagnostic Rules: Try to relate an observable effect to its cause
• i.e. B ←0.2 A
• B is the cause for A, but just with a weaker probability / belief
• e.g. sprinklerWasOn ←0.3 groundIsWet
9.2 Probabilistic Reasoning
• Rule chaining along just causal OR diagnostic rules works just fine
– groundIsWet ←1.0 sprinklerWasOn youGetWetFeet ←0.97 groundIsWet
⊢ youGetWetFeet ←0.97 sprinklerWasOn – groundIsWet ←0.2 youGotWetFeet
itRained ←0.9 groundIsWet
⊢ itRained ←0.18 youGotWetFeet
9.2 Probabilistic Reasoning
• But careful:
– Causal:
groundIsWet ←1.0 sprinklerWasOn – Diagnostic:
itRained ←0.9 groundIsWet
– but not: itRained ←0.9 sprinklerWasOn
9.2 Probabilistic Reasoning
both are causes of wet ground, but are otherwise unrelated
• Causal and diagnostic can be treated in pairs
– Diagnostic rules are the converse of causal rules – groundIsWet ←1.0 sprinklerWasOn
groundIsWet 0.1→ sprinklerWasOn Written as:
groundIsWet 0.1↔ 1.0 sprinklerWasOn groundIsWet 0.9↔ 1.0 itRained
– Now, we need a heuristic for dealing with diagnostic and causal rules together
9.2 Probabilistic Reasoning
• Observation:
– Causal rules usually have a quite high probability:
B ←~1.0 A
• If probability was low, A is not really the cause for B
– Diagnostic rules usually have a lower probability:
A ←≪1.0 B
• i.e., B may be the effect of A, but it is usually also the effect of other causes
9.2 Probabilistic Reasoning
• Observation:
– So, the main syntactic difference between those rule types is the strength of belief in the deduction
– Consider bi-directional rules:
groundIsWet 0.1↔ 1.0 sprinklerWasOn
– Thus, when chaining two rules with diverging
probabilities, we probably mix diagnostic and causal rules
• A chaining rule needs a strong dampening factor for diverging probabilities
9.2 Probabilistic Reasoning
⇽ is probably causal rule, sprinkler wets the ground for sure
⇾ is probably diagnostic rule; there may be many other reasons for wet ground
• A correct chaining rule can be given as follows:
– C (y1, y2)↔(x1, x2) B, B (v1, v2)↔(u1, u2) A
⊢ C ← (z1, z2) A – z1 =
– z2 =
9.2 Probabilistic Reasoning
u1/v1 * max(0, v1+x1-1) u1
0
if v1>0
if v1=0 and x1=1 otherwise
min(1, u2+t*(1-y1), 1-u1+t*y1, t); t=u2*x2/v1*y1 min(1, 1-u1+(u2*x2)/v1)
1-u1
if v1>0 and y1>0 if v1>0 and y1=0 otherwise
if v1=0 and x2=0 1
Proof and derivation in:
• This chaining rule can be obtained by a lengthy proof within a deductive calculus
– …thus, it is correct
– Unfortunately, it is not really intuitively obvious what it does and how it works
• But we can try to find some rationales
– The chaining rule tries to „play safe‟ by incorporating the T-Norms as a worst-case estimation
9.2 Probabilistic Reasoning
Łukasiewicz T-Norm as safe lower bound Minimum T-Norm as safe upper bound
• But it works:
– dog(X) 1.0↔0.7 domestic(X).
barks(X) ←0.9 dog(X).
domestic(X)0.3↔1.0 cat(X).
– By using the chaining rule, we get
⊢ barks(X) ←(0.63, 0.93) domestic(X).
⊢ barks(X) ←(0.0, 1.0) cat(X).
– If now additional knowledge is added, the belief intervals change
dog(X) ←1.0 barks(X). (Only dogs bark) dog(X) ←0.0 cat(X). (Cats are no dogs)
⊢ barks(X) 0.0↔0.0 cat(X). (No barking cats)
9.2 Probabilistic Reasoning
• The chaining rule dampens all conclusion which seem to involve mixed causal/diagnostic chains
9.2 Probabilistic Reasoning
C (y1, y2)↔(x1, x2) B, B (v1, v2)↔(u1, u2) A ⊢ C ← (z1, z2) A rain 1.0↔0.9 wet. wet 0.1↔1.0 sprinkler.
Known: ⇽ diagnostic? ⇽ causal causal ⇾ diagnostic ⇾
Rule:
very low value if rules seem of different type
very high value if rules seem of different type
• Lets try to perform a “safer” chaining
9.2 Probabilistic Reasoning
C (y1, y2)↔(x1, x2) B, B (v1, v2)↔(u1, u2) A ⊢ C ← (z1, z2) A
wetFeet 0.2↔0.97 wetGround. wetGround 0.1↔1.0 sprinkler.
Known: ⇽ causal ⇽ causal
diagnostic ⇾ diagnostic ⇾
Rule:
wetFeet ←(0.7, 1.0) sprinkler Result:
higher value for causal chaining
higher value for causal chaining
• Summary: probabilistic deduction
– Chaining rules produce new rules which are only true within a certain confidence interval
– Non-monotonism is reflected by adjusting those confidence intervals
– For computing the confidence intervals of a chain, the converse rules are considered
• Thus, the problem of chaining diagnostic and causal rules is solved implicitly
9.2 Probabilistic Reasoning
• Bayesian belief networks are used to represent a set of random variables and their conditional
probabilities
– Introduced by Judea Pearl (UCLA) in 1985
• The networks explicitly model the
independence relationships in the data
– These independence relationships can then be used to make probabilistic inferences
9.3 Bayesian Belief Networks
• Bayesian networks are directed acyclic graphs whose nodes represent random variables
– Edges represent the direct (causal) influence between variables
• Missing edges encode conditional independencies between the variables
– What causes toothaches?
Has flu anything to do with it?
9.3 Bayesian Belief Networks
toothache
periodontosis
cavities flu
• Nodes are annotated with (conditional) probabilities
– Root nodes are assigned prior probability distributions
– Child nodes are assigned conditional probability tables with respect to
their parents
9.3 Bayesian Belief Networks
toothache
periodontosis
cavities flu
P(has_flu)
P(has_cavities) P(has_periodontitis)
P(toothache | has_cavities, has_periodontitis) P(toothache | has_cavities, ¬has_periodontitis) P(toothache | ¬has_cavities, has_periodontitis) P(toothache | ¬has_cavities, ¬has_periodontitis) gum bleeding
P(gum bleeding | has_periodontitis) P(gum bleeding | ¬has_periodontitis)
• What is the full joint distribution?
– P(X1, X2, ..., Xn)
=P(X1) * P(X2, X1, ..., Xn | X1)
=P(X1) * P(X2|X1) * P(X3, X4, ..., Xn| X1, X2)
= ...
= P(X1) * P(X2|X1) * P(X3|X1, X2)* ...* P(Xn| X1,...,Xn-1)
– Note that we did not use any independence assumption here
9.3 Bayesian Belief Networks
• Now, use the semantics of Bayesian belief networks (local Markov property)
– Let X1, …, Xn be an ordering of the nodes such that only the nodes that are indexed lower than i may have a directed path to Xi
– The full joint distribution can now be defined as the product of the local conditional distributions
P(X1, …, Xn) = Π1≤ i ≤ n P(Xi | Parents(Xi))
• Note that all these probabilities are available in the network
9.3 Bayesian Belief Networks
• For example, what is the joint probability that somebody has periodontities and has toothache, but no cavities?
– P(has_periodontitis, ¬has_cavities, toothache)
= P(has_periodontitis) * P(¬has_cavities) *
P(toothache | ¬has_cavities, has_periodontitis)
9.3 Bayesian Belief Networks
• Given a Bayesian network and its conditional probability tables, we can compute all
probabilities of the form P(H | X
1, X
2,…, X
n)
– Where H and X1, X2, ..., Xn are assignments to nodes (i.e. random variables) in the network
– H is the hypothesis we are interested in – X1, X2, ..., Xn are the influences
• By being conditionally dependent on their parents, beliefs are propagated through the network
9.3 Bayesian Belief Networks
• Inferring causal or diagnostic information can be done using the joint probability distributions
– E.g., what is the probability that somebody has cavities given that he/she suffers from toothache?
– Can be evaluated using the conditional probability formula:
P(has_cavities | toothache)
= P(has_cavities, toothache) P(toothache)
9.3 Bayesian Belief Networks
• A Bayesian belief network for breast cancer diagnosis
9.3 Example: Medicine
• More reasoning
– Fuzzy logic and possibilistic sytems – Case-based reasoning