Inference in Belief Networks
Stefan Edelkamp
Universität Bremen
May 6, 2015
Overview
Bayes Theorem
Notation: P(x)short forP(X =x) Theorem: (Bayes, 17xx)
P(h)= prior probability of (hypothesis)h P(d)= prior probability of (training data)d P(h|d)= probability ofhgivend
P(d |h)= probability ofd givenh
P(h|d) = P(d |h)·P(h) P(d) Proof: By definition ofconditional probability
P(h|d) := P(h∩d)
= P(d |h)·P(h)
Warm-Up: Naive Bayes Classifier
Each instance described by attributesa1, . . . ,an. Most probable v∗ = arg maxv
j∈VP(vj|a1, . . . ,an)
= arg maxv
j∈V
P(a1, . . . ,an|vj)·P(vj) P(a1, . . . ,an)
= arg maxv
j∈VP(a1, . . . ,an|vj)·P(vj) Naive Bayes assumptionP(a1, . . .an|vj) =Q
iP(ai |vj)yields v∗=arg maxv
j∈VP(vj)Y
i
P(ai |vj)
Example:P(y)P(sun|y)P(wind|y)<P(n)P(sun|n)P(wind|n)
Overview
Belief Networks
Belief/Bayesian Network (BN):
graphical representation of
causality and independence using conditional probability tables(CPT)
BNs provide a way to structure knowledge and to exploit the structure for computational gain.
Definition
DefinitionBN(V,A,P), whereV vertices,AadjacenciesPcompact representation of joint probability distribution over all variables.
BN fully defined by graph(V,A), plus CPTsP(yi |φ(Yi)), whereφ(Yi) denotes immediate predecessors ofYi in graph, i.e,
P(y1, . . . ,yn) =
n
Y
i=1
P(yi |φ(Yi)) Naive BN: input nodes attached to one output node
Inference in Bayesian Networks
How can one infer the (probabilities of) values of one or more network variables, given observed values of others?
BN contains all information needed for this inference If only one variable with unknown value, easy to infer it In general case, problem is hard !!!
In practice
Monte Carlo methods “simulate” the network randomly to calculate approximate solutions
Exact inference methods work well for some network structures, e.g., polytrees (one path connecting every two nodes) !!!
Overview
Inventor
Judea Pearl, UCLA, Turing Award Winner
Hardness Results
Gregory F. Cooper (Stanford)
The Computational Complexity of Probabilistic Inference using Bayesian Belief Networks
Bucket Elimination
Rina Dechter, UCI, AAAI Fellow
Bucket Elimination: A Unifying Framework for Probabilistic Inference
Overview
SAT
In SAT we are given a formulaf as a conjunction ofclausesover the literals{x1, . . . ,xn} ∪ {x1, . . . ,xn}.
The task to search for an assignmenta= (a1, . . . ,an)∈ {0,1}nfor x1, . . . ,xn, so thatf(a) =true.
Theorem(Cook, 1971) SAT is NP-complete.
In 3-SAT the SAT instances consists of clauses of the form l1∨l2∨l3,withli ∈ {x1, . . . ,xn} ∪ {x1, . . . ,xn} Theorem(Garey/Johnson, 1979) 3-SAT is NP-complete.
Probabilistic Inference is NP-hard
W.l.o.g. Assume Boolean (propositional) Variables
Typically, PI in BNs means calculatingP(Exp1|Exp2), whereExpi is conjunction of (instantiated) random variables.
Example: P(X =T |Y =T ∧Z =F)
Most restricted: Decision Problem PIBNsP(Y =T)>0 Theorem(Cooper) PIBN is NP hard.
Towards a Proof
LetC={c1, . . . ,cm}be a set of 3-SAT clauses over{u1, . . . ,un}. We construct a belief network for which we show
P(Y =T)>0 if and only C is satisfiable Belief-Network Structure
Probabilities
Truth-Setting Componnent: one for each variable
P(u1=T) =P(u2=T) =. . .=P(un=T) =1/2
Clause-Satisfaction Testing Component:one for each clause (w.l.o.g.) ux∨uy ∨uz Variables: ux,uy,uz,Cj
Adjacencies:{(ux,Cj),(uy,Cj),(uz,Cj)}
Conditional Probabilities:
P(Cj =T |ux =F,uy =F,uz =F) =0 P(Cj =T |ux =F,uy =F,uz =T) =1. . . P(Cj =T |ux =T,uy =T,uz =T) =1 Overall-Satisfaction Testing Component Variables:Y
Adjacencies: Link from eachCj toY Conditional Probabilities:
Overview
Bucket Elimination (BE)
Given:
BN structure(V,A,P),
orderingπonnvariables with attached CPT, evidence nodesEi with valuesei,
query nodeX.
Algorithm to computeP(X =x | E1=e1, . . . ,En=en)for allx: Createn+1 bucketsb∅andbi for variableXi.
Store in bucket of highest index.
Process each bucket from highest index down to eliminate associated variable.
Algorithm
3 operators
Join (Combine 2 CPTs):
h(x) = (f⊗g)(x) =f(x |Vars(f))×g(x |Vars(g)) Eliminate (Remove a Variable):elimX[f](y) =P
xf(Y =y,x) Project (Delete a Variable):hX,−Y(x,y) =hX(x)
Loop:
1 Project evidence in CPTs
2 Process buckets from highest to lowest
1 gx =elimx[fX,1⊗fX,2⊗. . .⊗fX,k]is function of∪iVars(fX,i)\ {X}
2 StoregX intobY whereY is hightest index Example: [ Blackboard ]
Overview
Learning of Bayesian Networks
Learning (LEARN): Three cases
structure known and observe all variables⇒
I as easy as training a Naive Bayes classifier structure known, variables partially observable⇒
I learn network conditional CPTs using gradient ascent structure not known⇒
I greedy search to add/substract edges and nodes
Further Inference Tasks
Most Probably Explaination (MPE):Most likely assignment to all hidden variables given evidence
Algorithm: like BE but replaceP
by max
Complexity: NP-complete for general graphs, easy for polytrees Maximum-a-Posteriori (MAP):Most likely assignment to some hidden variables given evidence
Complexity: NPPP complete for general graphs, NP-hard for polytrees
Value of Information: Which evidence to seek?
Sensitivity Analysis: Which variables are most critical?
Explaination:Why does this happen?
Overview
Wrap-up and Outlook
Conclusion
Bayesian a.k.a. Belief Network represents joint probability function compactly
naturalmodel of causality Naive BN special case
PI is hard; MPE, MAP and LEARN as well BE efficient in practice
Outlook
linear (in size CPTs) for chains/trees/polytrees, exponential in size of the largest bucket
BE for PI and MAP more efficient with pruning rules (e.g, prune all