• Keine Ergebnisse gefunden

Inference in Belief Networks

N/A
N/A
Protected

Academic year: 2021

Aktie "Inference in Belief Networks"

Copied!
25
0
0

Wird geladen.... (Jetzt Volltext ansehen)

Volltext

(1)

Inference in Belief Networks

Stefan Edelkamp

Universität Bremen

May 6, 2015

(2)

Overview

(3)

Bayes Theorem

Notation: P(x)short forP(X =x) Theorem: (Bayes, 17xx)

P(h)= prior probability of (hypothesis)h P(d)= prior probability of (training data)d P(h|d)= probability ofhgivend

P(d |h)= probability ofd givenh

P(h|d) = P(d |h)·P(h) P(d) Proof: By definition ofconditional probability

P(h|d) := P(h∩d)

= P(d |h)·P(h)

(4)

Warm-Up: Naive Bayes Classifier

Each instance described by attributesa1, . . . ,an. Most probable v = arg maxv

j∈VP(vj|a1, . . . ,an)

= arg maxv

j∈V

P(a1, . . . ,an|vj)·P(vj) P(a1, . . . ,an)

= arg maxv

j∈VP(a1, . . . ,an|vj)·P(vj) Naive Bayes assumptionP(a1, . . .an|vj) =Q

iP(ai |vj)yields v=arg maxv

j∈VP(vj)Y

i

P(ai |vj)

Example:P(y)P(sun|y)P(wind|y)<P(n)P(sun|n)P(wind|n)

(5)

Overview

(6)

Belief Networks

Belief/Bayesian Network (BN):

graphical representation of

causality and independence using conditional probability tables(CPT)

BNs provide a way to structure knowledge and to exploit the structure for computational gain.

(7)

Definition

DefinitionBN(V,A,P), whereV vertices,AadjacenciesPcompact representation of joint probability distribution over all variables.

BN fully defined by graph(V,A), plus CPTsP(yi |φ(Yi)), whereφ(Yi) denotes immediate predecessors ofYi in graph, i.e,

P(y1, . . . ,yn) =

n

Y

i=1

P(yi |φ(Yi)) Naive BN: input nodes attached to one output node

(8)

Inference in Bayesian Networks

How can one infer the (probabilities of) values of one or more network variables, given observed values of others?

BN contains all information needed for this inference If only one variable with unknown value, easy to infer it In general case, problem is hard !!!

In practice

Monte Carlo methods “simulate” the network randomly to calculate approximate solutions

Exact inference methods work well for some network structures, e.g., polytrees (one path connecting every two nodes) !!!

(9)

Overview

(10)

Inventor

Judea Pearl, UCLA, Turing Award Winner

(11)

Hardness Results

Gregory F. Cooper (Stanford)

The Computational Complexity of Probabilistic Inference using Bayesian Belief Networks

(12)

Bucket Elimination

Rina Dechter, UCI, AAAI Fellow

Bucket Elimination: A Unifying Framework for Probabilistic Inference

(13)

Overview

(14)

SAT

In SAT we are given a formulaf as a conjunction ofclausesover the literals{x1, . . . ,xn} ∪ {x1, . . . ,xn}.

The task to search for an assignmenta= (a1, . . . ,an)∈ {0,1}nfor x1, . . . ,xn, so thatf(a) =true.

Theorem(Cook, 1971) SAT is NP-complete.

In 3-SAT the SAT instances consists of clauses of the form l1∨l2∨l3,withli ∈ {x1, . . . ,xn} ∪ {x1, . . . ,xn} Theorem(Garey/Johnson, 1979) 3-SAT is NP-complete.

(15)

Probabilistic Inference is NP-hard

W.l.o.g. Assume Boolean (propositional) Variables

Typically, PI in BNs means calculatingP(Exp1|Exp2), whereExpi is conjunction of (instantiated) random variables.

Example: P(X =T |Y =T ∧Z =F)

Most restricted: Decision Problem PIBNsP(Y =T)>0 Theorem(Cooper) PIBN is NP hard.

(16)

Towards a Proof

LetC={c1, . . . ,cm}be a set of 3-SAT clauses over{u1, . . . ,un}. We construct a belief network for which we show

P(Y =T)>0 if and only C is satisfiable Belief-Network Structure

(17)

Probabilities

Truth-Setting Componnent: one for each variable

P(u1=T) =P(u2=T) =. . .=P(un=T) =1/2

Clause-Satisfaction Testing Component:one for each clause (w.l.o.g.) ux∨uy ∨uz Variables: ux,uy,uz,Cj

Adjacencies:{(ux,Cj),(uy,Cj),(uz,Cj)}

Conditional Probabilities:

P(Cj =T |ux =F,uy =F,uz =F) =0 P(Cj =T |ux =F,uy =F,uz =T) =1. . . P(Cj =T |ux =T,uy =T,uz =T) =1 Overall-Satisfaction Testing Component Variables:Y

Adjacencies: Link from eachCj toY Conditional Probabilities:

(18)

Overview

(19)

Bucket Elimination (BE)

Given:

BN structure(V,A,P),

orderingπonnvariables with attached CPT, evidence nodesEi with valuesei,

query nodeX.

Algorithm to computeP(X =x | E1=e1, . . . ,En=en)for allx: Createn+1 bucketsbandbi for variableXi.

Store in bucket of highest index.

Process each bucket from highest index down to eliminate associated variable.

(20)

Algorithm

3 operators

Join (Combine 2 CPTs):

h(x) = (f⊗g)(x) =f(x |Vars(f))×g(x |Vars(g)) Eliminate (Remove a Variable):elimX[f](y) =P

xf(Y =y,x) Project (Delete a Variable):hX,−Y(x,y) =hX(x)

Loop:

1 Project evidence in CPTs

2 Process buckets from highest to lowest

1 gx =elimx[fX,1fX,2. . .fX,k]is function ofiVars(fX,i)\ {X}

2 StoregX intobY whereY is hightest index Example: [ Blackboard ]

(21)

Overview

(22)

Learning of Bayesian Networks

Learning (LEARN): Three cases

structure known and observe all variables⇒

I as easy as training a Naive Bayes classifier structure known, variables partially observable⇒

I learn network conditional CPTs using gradient ascent structure not known⇒

I greedy search to add/substract edges and nodes

(23)

Further Inference Tasks

Most Probably Explaination (MPE):Most likely assignment to all hidden variables given evidence

Algorithm: like BE but replaceP

by max

Complexity: NP-complete for general graphs, easy for polytrees Maximum-a-Posteriori (MAP):Most likely assignment to some hidden variables given evidence

Complexity: NPPP complete for general graphs, NP-hard for polytrees

Value of Information: Which evidence to seek?

Sensitivity Analysis: Which variables are most critical?

Explaination:Why does this happen?

(24)

Overview

(25)

Wrap-up and Outlook

Conclusion

Bayesian a.k.a. Belief Network represents joint probability function compactly

naturalmodel of causality Naive BN special case

PI is hard; MPE, MAP and LEARN as well BE efficient in practice

Outlook

linear (in size CPTs) for chains/trees/polytrees, exponential in size of the largest bucket

BE for PI and MAP more efficient with pruning rules (e.g, prune all

Referenzen

ÄHNLICHE DOKUMENTE

Für die Berechnung der Entwurfsentropie wird der Ansatz verfolgt, dass die Komplexität und Entwurfsgröße in einem Entwurf selbst zu finden sind und nicht über qualitative,

Infrared spectrum of phenylphosphonic acid as calcu- lated with the DFT/B3LYP method in a 6-311G** basis set (lower panel) and the experimental spectrum (upper panel) which starts

We continue by giving an example of how the reduced basis method is applied to the calibration of American options in the Heston stochastic volatility model, and how the results

Clique , IndSet , VertexCover three classical routing problems:. DirHamiltonCycle , HamiltonCycle

Does this graph have a vertex cover of size 4?.. VertexCover is

By using one or both of the padding numbers for each clause digit, all clause digits can be brought to their target value of 4, solving the SubsetSum instance.. For

By using one or both of the padding numbers for each clause digit, all clause digits can be brought to their target value of 4, solving the SubsetSum instance. I For

Since these subgames are small objects and it can be checked efficiently whether a player wins every play on a given graph, the winner of a finite parity game can be determined in NP