• Keine Ergebnisse gefunden

1 In this paper we focus on the problem of how to deal with incomplete Bayesian networks

N/A
N/A
Protected

Academic year: 2022

Aktie "1 In this paper we focus on the problem of how to deal with incomplete Bayesian networks"

Copied!
12
0
0

Wird geladen.... (Jetzt Volltext ansehen)

Volltext

(1)

Manfred Schramm Bertram Fronhofer

Institut fur Informatik KI Institut, FakultatfurInformatik

Technische UniversitatMunchen Technische UniversitatDresden

schramm@pit-systems.de fronhoefer@pit-systems.d e

Abstract

For usingprobability theory when reasoning with uncertainknowledge two mainap-

proacheshavebeendeveloped: Bayesiannetworks,whichmaydescribelargeproba-

bilitydistributions,butrequireagraphstructure,andacompletespecicationofaset

ofconditionalprobabilitiesaccordingtothisgraph,whichmaybecumbersometowork

out, and on the other hand, MaxEnt completion which can also cope with incom-

pleteknowledge,butlackstheorderingof knowledgebyagraphand theexpressibility

of certainindependencies.

Thispaperdiscusses waysto combine these two approaches. Tothisendtwo kinds of

incompleteBayesiannetworksareconsideredandvariationsoftheMaxEntcompletion

algorithmsare discussed inview of theirabilityto process these incomplete Bayesian

networks. Thisanalysis detectedlimitsof theuseof so-calledground constraints.

Keywords: Probabilistic Reasoning, Maximum Entropy, Bayesian networks, Conditionals,

Counterfactuals

1 Introduction

Foracouple ofyears Bayesian networks have becomeincreasinglypopular inmany applica-

tion domainswhere dealingwith uncertain knowledge is aneveryday task. Although quite

successful, Bayesiannetworkssuer fromthe drawback thattheir userequires the uncertain

knowledge to be known completely in respect to their graph, which quite often is not the

case.

1

In this paper we focus on the problem of how to deal with incomplete Bayesian networks.

Beingincomplete, theapplication of somekindof completion procedure isindispensablefor

obtaininga (complete) probabilitydistribution.

A natural proposalis to complete while obeying the Principle of Maximum Entropy. This

kindof completion suggests itself due toits minimality properties (see e.g. [SF01]) and due

to the eÆcient MaxEnt completion algorithms developed for its computation in case the

given knowledge isspecied in formof linear constraints([PIT , SPI]).

As well known | see e.g. [Hun89, Luk00] | the application of the MaxEnt completion

algorithm to Bayesian networks leads to problems as the resulting probability distribution

diers from the one intended by the Bayesian network specication. This is even the case

whentakingtherulesofacompleteBayesiannetworkasinput,becausecertainindependence

requirements implicit in Bayesian networks are not taken (automatically) into account by

the MaxEnt algorithm.

1

ThatthespecicationofaBayesiannetworkmayalsobecumbersomedueto therightarrangementof

this knowledge | atask which may become impossible to be performed by hand in case of large sets of

knowledge|isafurtherproblemwhich,however,liesoutsidethescopeofthispaper.

(2)

be modiedin two ways |so-called grounded update ([RM96]) and joint distribution

method [Hun89, Luk00] |althoughwith dierent complexity.

Having at hand with grounded update and joint distribution method, two methods which

can deal with a complete Bayesian network, we might expect that they are also applicable

toincomplete Bayesian networks.

This istrue, althoughwith dierentcomplexity,if incomplete Bayesian network meansthat

there are just missing some ofthe rules insome nodes (see Def.2).

However, if wefurther generalize incompleteness indirectionof rules whose antecedents are

no longer required to be elementary events, grounded MaxEnt is no more applicable. This

case is interesting as such rules may occur when single inuences of one node on another

are known. We discuss an elementary case of such an incomplete Bayesian network | see

Def.3 |and show that only the joint distribution methodworksto our satisfaction.

Letus nallypoint out that both update methods are available inthe system [PIT]and all

the examples in this paper are calculated with PIT and are also available on our webpage

for online tests.

The paper is organized as follows:

In section 2 we summerize important fundamental concepts and nally dene in subsec-

tion 2.3the notionsof a (complete)Bayesian network and the two incompleteness concepts

bayes-1-incomplete and bayes-2-incomplete on which we will focus in this paper. In

section3wepresent theMaxEnt completionalgorithmand itsvariation,thegrounded com-

pletion algorithm. In section 4 we apply the MaxEnt completion algorithm to complete

Bayesian networks. In the sections 5 and 6 we discuss completion of the two types of

incomplete Bayesian networks mentionedabove. In section 7,we summarize the results.

2 Technical Preliminaries

2.1 Notation

LetV be anite set ofdiscrete variables. Each V

i

2V can beidentied withitsset of

possible values fv

i1

;:::;v

ik

i g (k

i

>1),thus denoting the size ofthe variableV

i by k

i .

Forourexamples,weassumetheseVariablestobetwo-valued. Thisallows toconsider

a variableas the set of avalue and itsnegation, e.g. A=fa;:ag.

Let G = (V;K) be a directed acyclic graph (dag) whose set V of nodes is a set of

variables and with aset of edges K VV.

Letpa(V)=fW 2Vj(W;V)2Kg the set of parent nodes of variable V inG.

Let K B be aset of conditional probabilities onthe cartesian product of the variables

in V, also called constraints. Let us mark elements of K B by c

as in P c

(fjg) = x

with f and g events inthis cartesian product.

LetP

betheprobabilitymodelwithmaximalentropy,givenaset oflinearconstraints

onP (see below for denition of P).

(3)

In the terminological tradition of probability theory, we consider

V

:= V as an

event space with its power set as set of events or event algebra A

V

. Further

we consider P : A

V

! [0;1] as a function which assigns probabilities to events in

compliancewiththelawsofprobabilitytheory. Wedenoteconditionalprobabilitiesby

P(bja):=

P(a^b)

P(a)

and calla the antecedent of the conditional P(bja).

A Bayesian network (BN) isastructure (G;K B),where G isadag asdened above

and K B is the smallest set with the followingproperty:

8v

i 2V

i

;y2pa(V

i ): 9p

a;y

2[0;1] with [P c

(v

i

jy)=p

a;y

]2K B

A BNspecies a unique joint probability distributionP over V by:

P(v

1

;:::;v

n )=

n

Y

i=1 P

c

(v

i

jy) (BNA) (1)

where (v

1

;:::;v

n

)2V and y=(v

l

1

;:::;v

l

jpa(V

i )j

)2pa(V

i

)with v

l

j 2V

l

j

2pa(V

i ).

TheMethodofMaximumEntropy(MaxEnt) 2

takesaknowledgebaseKofconstraints

onthe set of possible P and denes a set S

K

of probability distributions, which fulll

all the constraints in K and have maximal (Shannon) entropy, where the (Shannon)

entropy is dened as 3

H(v

1

;:::;v

n ):=

n

X

i=1 v

i logv

i

(2)

Ifthe constraintsinK are linearin respect to P, S isknown tocontain aunique P to

whichwe referby P

.

Like BN, systems based on the MaxEnt method use a unique probability distribution in

order toobtain unique probabilistic judgment. ButMaxEnt-Systems also dierfromBN:

MaxEnt-Systems (like[PIT , SPI ]) are based ona set of linear constraints. Especially

theydonotallowthespecicationofnon-linearconstraintslikeindependencerelations

(neither specied as set of equations nor specied asdags (as used in[Pea88 ])).

MaxEnt-Systems derive independency relations automatically from the set of con-

straintsand use these independencies for making the implementationeÆcient.

2.3 Incomplete Bayesian Networks

As dened above, the pair (G;K B) denes aBayesian Network (BN).

InordertospecifywhatwemeanbyanincompleteBNweneedsomeauxiliarydenitions.

Denition 1 We dene a set E of events as admissible, if it is generated from a sub-

set SE pa(X) by deleting elements (of the parent nodes) in the elementary events in

pa(X).

2

Forthejusticationofthis methodseee.g. [JV90],[SF01]

3

Thebaseofthelogarithmdoesnotmatter,in mostcaseslnistaken.

(4)

Then pa(X)=f(a^b);(a^:b);(:a^b);(:a^:b)g.

With SE = f(a^b);(a^:b);(:a^b)g, the set of events E = fa;(a^:b)g is ad-

missible, as e.g. b was deleted when generating a from a^b, nothing was deleted in

(a^:b), and :a;b was deleted from(:a^b).

For aBN =(G;K B) wehave for eachvariable X inV the followingpattern of rules inK B:

Forall (but one) values x2X and for alle2pa(X)

exists a p

x;e

2[0;1]such that [P(xje)=p

x;e

]2K B

Denition 2 A specication K B is called bayes-1-incomplete i it is obtained from a

complete BN by deleting rules. More formally we just require that

for all x in a subset S X exists a subset SE(x) pa(X) and for every element

e2SE(x) exists a p

x;e

2[0;1]such that: [P(xje)=p

x;e

]2K B.

Denition 3 A specication K B iscalled bayes-2-incomplete isobtained from anbayes-

1-incomplete BN by giving up events in the antecedent of our rules. More formally we just

require that

for all x in a subset S X exists an admissible set E(x)

such that for all e2E(x) exists a p

x;e

2[0;1] suchthat: [P(xje)=p

x;e

]2 K B.

Inthispaperwilllimitourinvestigationtothesetwocasesofincompleteness. Ofcourse,they

are not the only ones which can be imagined. An further step would be to allow arbitrary

conditional probabilities on the variables X [pa(X), but since this generalizationwill not

falsify our negative results, we decided to focus on aninteresting subclass. Equivalently we

mightallowprobabilityintervals 4

for the constraints(butnot for the informationabout the

distribution ofthe parents heredenoted by Q(:::)) withoutchangingthe qualitativeresults.

3 Completion Algorithms

In this section we want to present dierent algorithms for completing a set of linear con-

straintson anevent space.

TheyareallbasedontheeÆcientalgorithmdescribedforinstancein[RM96]andrelyheavily

onfundamental investigations by [Csi75].

Given aset of m (linear) constraints C

1

;:::;C

m

, with this algorithmthe problemof maxi-

mizingentropyissolved by calculatingasequence ofdistributions(P 0

;P 1

;:::),wherethe

xpoint of this sequence | as has been shown in [Csi75] | is equivalent to the MaxEnt

distribution P

satisfying C

1

;:::;C

m

. The uniform distribution is taken as starting point

P 0

. Thedistribution P k+1

iscalculated fromP k

by taking the set of alldistributionswhich

fulll the constraint C

[(k+1)modm]

and choose from this set the distribution P k+1

for which

the information theoretic distance CR (P k+1

;P k

) to the distribution P k

is minimal, where

thisdistanceiscalculatedbythecross-entropyfunction(alsocalledKullback-Leiblernumber

orI-divergence)

CR (P k+1

;P k

):=

X

!2 P

k+1

(!)log P

k+1

(!)

P k

(!)

4

asdiscussedin [Luk00]andalso implementedinthesystem[PIT]

(5)

Instead of solving the problem of maximizing entropy | MaxEnt Problem for short | for

a set of linear constraints directly, e.g. by means of the well known Lagrange multiplier

method,with the algorithmdescribed abovewe solveasequence of problems wherewehave

tominimizethe cross-entropy CR (P k+1

;P k

)|cross-entropy Problemsforshort. Thisisan

advantage,becausethecomputationof CR (P k+1

;P k

)isverysimpleandeÆcient forcertain

typesof linear constraints ase.g. conditional probability statements.

For instance, for aconditional constraint

P c

(bja) =x

the distribution P k+1

with minimal CR (P k+1

;P k

) can be computed from P k

by means of

the followingformula(This means thatP c

(bja) =xis the constraint C

[(k+1)modm]

):

P k+1

(!)= 8

<

: P

k

(!) 1 x

for ! 2(a^b)

P k

(!) x

for ! 2(a^:b)

P k

(!) for ! 2(:a)

(3)

where

=

xP k

(a^:b)

(1 x)P k

(a^b)

and is the factor needed for renormalization. (This factor may also be omitted and

normalizationcan bedone easily ina subsequent independent step.)

Example:

If we assume a knowledge base K B

0

consisting of just one constraint, this yields (approxi-

mately) the followingMaxEnt distribution P

0 :

Knowledgebase K B

0

P c

(bja)=0:9

P

0

(a^b) P

0

(a^:b) P

0

(:a^b) P

0

(:a^:b)

0:36 0:04 0:30 0:30

Remark 1 In view of the objectives of this paper, it is important to notice that P(a)

has changed from its initialvalue P 0

(a) = 0:5 to the value of P

(a) = P 1

(a) 0:4. The

constraintP(bja)thereforehaschangedthe`weight'P(a)ofitsantecedenta,i.e. the`weight'

oated.

3.2 Grounded Conditional Probabilities

This oatingof P(a)inthe example justpresented isnecessary forreallyobtainingthe dis-

tributionwith maximum entropy. However, this has been criticized bypeoplewho construe

conditional probabilities as causalities and | as we will show in section 4.1 |it is also in

conict with the intentions underlying the specication of a Bayesian network by a set of

linear constraints.

5

To copewith the demands of these criticsa dierent update method for

5

see e.g thediscussion in [Hun89]as counterargumentto acritique of Pearl concerning MaxEntorsee

[Luk00]mentioningacounterintuitivedependencyonametalevelbyMaxEnt

(6)

computing P from P has been proposed where the `weight' of the antecedent remains

stable orgrounded.

6

Since these two update methods may be selected individually for each constraint, we will

writeP c

f

(xjy)foraconstraintwhichshall beprocessedby thestandardupdatemethodoat

and P c

g

(xjy) for aconstraint which shall be processed by the update methodground.

7

This ground method respects the idea, that the antecedent should not be changed by the

update. [Hun89 ]usesthis ideatodeneprobabilisticcounterfactuals,[RM96]mentionsthat

this methodcould be used to implementBN and alsooers the user to choose this method

in the system SPIRIT (described in [RM96]).

Usingthe method groundwe obtain fora conditional constraintP c

g

(bja)=x :

P k+1

(!)= 8

>

>

>

>

>

<

>

>

>

>

>

: P

k

(!)

x

P k

(bja)

for ! 2(a^b)

P k

(!)

(1 x)

P k

(bja)

for ! 2(a^:b)

P k

(!) for ! 2(:a)

(4)

Remark 2 This update is also optimal in respect to cross-entropy, i.e. if we start with

a distribution P k

and a conditional probability P(bja) = x together with the condition

P k+1

(a) = P k

(a) the solution found by minimization of cross-entropy will be the same as

applying the update rule given ineq. (4).

8

In the example from above | now with a ground constraint | we obtain the following

probability distribution:

Knowledgebase K B

1

P c

g

(bja)=0:9

P

1

(a^b) P

1

(a^:b) P

1

(:a^b) P

1

(:a^:b)

0:45 0:05 0:25 0:25

In our example P 1

1

= P

1

. Consequently, P

1

is the distribution P

such that the distance

CR (P

;P 0

) isminimalgiven the ground constraint P c

g

(bja)=0:9.

Letusaddtheremarkthatforunconditionalconstraintsitmakesnodierencewhetherthey

are processed as oat orground,as P(a) P(aj) and P() =1 is not changed anyway.

4 MaxEnt-like Algorithms Applied to Complete BN

Together with a (complete) BN= (G;K B) the algorithm of equation (1) (BNA) is used to

compute the probability of a particular event in the distribution determinedby (the graph

and) the knowledgebase.

6

Thisterminologygoesbackto[RM96] whereoatrulesaredistinguishedfromgroundrules

7

InthesyntaxofPIT,thedefaultvalueisoat,whileaconstrainttobeprocessedbythemethodground

has to be marked by 'm'(for respecting marginal independence) as e.g. in Pm(cja) = 0:7 instead of the

standardcaseP(cja)=0:7forthemethodoat.

8

As possibleelementsin thesubspaces(a^b);(a^:b)and (:a)are notdistinguishedbyaconstraint,

theLagrangesolutionwillusethesamefactorsforeveryelementinside thesesets. Butgiventhat wehave

tousethesamefactorsinsidethesubsets,theproblemcanbereducedtothreevariables(oneforeachcase)

andthe(unique)solutioncalculatedfromtheconditions.

(7)

C

Figure1: Marginal Independence of the variablesA and B ina BN

If the BN is incomplete, BNA is no more applicable and a completion algorithm must be

used instead.

SincewewanttoapplyalgorithmsinspiredbyMaxEntcompletion,itisreasonabletoanalyze

rst whethertheses algorithmsare applicabletocompute theprobabilityof aneventincase

of a complete BN, i.e. whether they can replace BNA.

Unfortunately, this is not without problems.

Technically,wecanrestrictforthisinvestigationourfocustothemostsimple(andstandard)

exampleofaBN,whichincludesmarginalindependencies: AbinarynodeCwithtwobinary

parent nodes A and B (see g. 1).

9

The reason is | see [Luk00] | that we may use

the (total) ordering of nodes, given by the graph of a BN, to split up the BN in a set of

(son-parent)problems (which we willcall localproblems) and to solve these localproblems

throughMaxEntcompletionintheordergivenbythegraphGofBN.The(possibleiteration

dependent) informationabout the probability of the parents (which has to berespected by

the local problem) is marked by Q (as in Q(a^b)) as this can be seen as a Query to the

whole network.

4.1 Using (oating) conditional probabilities

If we just take the set K B of probabilistic rules from a BN specication and compute the

MaxEnt distribution satisfying these rules, then in nearly all cases the computed MaxEnt

distribution will contain additionaldependencies which would not be contained in the dis-

tributioncomputed by the ordinaryBNA.

This eect iswellknown ([Hun89, Luk00]) and stems from the fact that the ordinary BNA

implicitly assumes marginal independencies (of the parents). However, this information is

not derivable fromthe set of rules and therefore

the information about marginal independencies (of the parents) was not imparted to

the MaxEnt completion algorithmand, moreover,

theseindependenciesseem tocontain relevantinformation(astheMaxEnt completion

algorithmusuallydoesnot heedtheseindependencieswithoutbeingexplicitlytoldto).

This eect can alsobedescribed by saying,that theMaxEntcompletion algorithmuses the

moral (undirected) graph (see [BKI00]) derived from the dag G of the BN when inferring

independencies.

Torecall this eect, weuse aspecication of our(simple)BNof Figure1(with A,B and C

binary nodes). The result on K B

2

isthe following distributionP

2 :

(8)

Knowledgebase K B

2

P c

(a) =Q(a) =0:5;

P c

(b) =Q(b) =0:5;

P c

(cja^b) =0:1;

P c

(cja^:b) =0:2;

P c

(cj:a^b) =0:3;

P c

(cj:a^:b) =0:4;

P

2

(a^b^c) 0:024

P

2

(a^b^:c) 0:219

P

2

(a^:b^c) 0:051

P

2

(a^:b^:c) 0:206

P

2

(:a^b^c) 0:077

P

2

(:a^b^:c) 0:180

P

2

(:a^:b^c) 0:097

P

2

(:a^:b^:c) 0:146

For checking the independence of the parents Aand B we calculate

P

2 (a)P

2

(b)=0:25; P

2

(a^b)=0:243

Though the dependence eectis smallinour (running)example, wesee that the (marginal)

independenceof the parents Aand B (presentinthe (apriori)uniform distribution)islost.

4.2 Using grounded conditional probabilities

Asalready mentionedin[RM96],we get thesame resultaswith theBNAif weconsider the

rules as ground.

By denition (see eq. 4), ground conditional probabilities do not change the probability of

their antecedent. Because an antecedent is an elementary event of the joint distribution of

the parents, the update does not change this distribution. Therefore using the conditionals

of the BN inthe mode 'ground' willkeep the parentsin their state of independence.

Example: Continuing the example frombeforeand assumingthe rules as groundis given

in K B

3

. The result of PITon K B

3

is shown indistribution P

3

Knowledgebase K B

3

P c

(a) =Q(a) =0:5;

P c

(b) =Q(b) =0:5;

P c

g

(cja^b) =0:1;

P c

g

(cja^:b) =0:2;

P c

g

(cj:a^b) =0:3;

P c

g

(cj:a^:b) =0:4;

P

3

(a^b^c) 0:025

P

3

(a^b^:c) 0:225

P

3

(a^:b^c) 0:050

P

3

(a^:b^:c) 0:200

P

3

(:a^b^c) 0:075

P

3

(:a^b^:c) 0:175

P

3

(:a^:b^c) 0:100

P

3

(:a^:b^:c) 0:150

Wecalculate

P

3 (a)P

3

(b)=0:25=P

3 (a^b)

and see that the independency of the parentsis preserved.

4.3 Using the joint distribution of the parents

As mentioned in section discussion in [Hun89], an equivalent result can be assured by in-

cluding into the knowledge base the actual (iteration dependent) joint distribution of the

parentsofthenode(heredenoted asQ(:::)). Inthis case,theseadditionalconstraintsensure

(9)

parents.

As it issuÆcient to consider a localupdate problem, the knowledge base contains numbers

(insteadof iterationdependent queries to the network,given inthe (general) specication).

Example:

Knowledgebase K B

4

P c

(a^b) =Q(a^b) =0:25; P

c

f

(cja^b) =0:1;

P c

(a^:b) =Q(a^:b) =0:25; P c

f

(cja^:b) =0:2;

P c

(:a^b) =Q(:a^b) =0:25; P c

f

(cj:a^b) =0:3;

(P c

(:a^:b) =Q(:a^:b) =0:25; redundant) P c

f

(cj:a^:b) =0:4;

As already mentioned,the resulting distributionis equivalentto P

3 .

Remark 3 Notethatitmakesnodierencewhethertheconstraintsareprocessedasground

oras oat, since the parents are fully constrainedand cannot oat.

Remark 4 It is not suÆcient to add the actual marginal probabilities of the parents (i.e.

Q(a) and Q(b) instead of the complete distribution of the parents in K B

4

). In this case, A

and B become dependent (compare K B

8

below).

Discussion: For aBNwith a standard set of conditional probabilitiesthe method of using

groundconditionals(asinK B

3

)fulllsthe demandofpreserving theindependencies(among

theparents). Themethodiseasytospecifyandisappliedlocally(i.e. ineverysingleupdate

step). It therefore doesnot need to introduce any additionalconstraints.

The alternativeof using additionalconstraints(as in K B

4

) does not seem tobeof interest,

asthe necessary number ofadditionalconstraintsis exponential inthe number n ofparents

and respective sizes k

i ((

Q

n

i=1 k

i

) 1). Moreover, the calculation of the queries may also

become expensive (even if done only once as proposed in [Luk00]), if the network demands

the conditioning onother nodes.

5 Bayes-1-Incomplete BN

Having shown how algorithmsderived from the MaxEnt completion algorithm can be used

to deal with complete BN specications, we will now study their application to incomplete

BN.

Weconsider rst the case of bayes-1-incomplete BN.

Referring to Figure 1 we give the following example, using the mode ground for our rules:

The result of PIT onK B

5

is presented indistribution P

5 :

Knowledge base K B

5

P c

g

(cja^b) =0:1

P c

g

(cj:a^:b) =0:4

P

5

(a^b^c) 0:025 P

5

(:a^b^c) 0:125

P

5

(a^b^:c) 0:225 P

5

(:a^b^:c) 0:125

P

5

(a^:b^c) 0:125 P

5

(:a^:b^c) 0:100

P

5

(a^:b^:c) 0:125 P

5

(:a^:b^:c) 0:150

As inthe complete case of K B

3

, we obtain P

5 (a)P

5

(b)=P

5

(a^b)=0:25.

This situation is indeed covered by the discussion of complete Bayesian networks, i.e. the

(10)

Butincompletenessmaygoevenfurtherandwewilldiscussinthissectionbayes-2-incompleteness.

Forourrunningexample,wetakeacloselookatthecasewheretheeventsweareconditioning

on, are not elementary inrespect to the parents' distribution:

fP c

(cja)=x

1

;P c

(cjb)=x

2 g.

How should we specify this situation, where we have knowledge about single inuence

('causal'or not), but noinformation about the joint inuence ?

6.1 Using (oating) conditional probabilities ?

If preservation of the possible marginal independence of the parents is not desired, we can

use the standard method of oating constraints (compare the complete case 2 above). The

result of PIT onK B

6

is presented indistribution P

6 :

Knowledgebase K B

6

P c

(cja) =0:9

P c

(cjb) =0:9

P

6

(a^b^c) 0:221 P

6

(:a^b^c) 0:188

P

6

(a^b^:c) 0:009 P

6

(:a^b^:c) 0:037

P

6

(a^:b^c) 0:188 P

6

(:a^:b^c) 0:160

P

6

(a^:b^:c) 0:037 P

6

(:a^:b^:c) 0:160

Within this distribution we obtain(as expected) a dependency between A and B:

P(a^b)=0:23; but P(a)P(b)=0:455 2

=0:207

Sincethere areargumentstoaccept thedependence |foraparticularexample see [Hun89 ]

|wemention this possibility. For allother cases, we have to trydierent approaches.

6.2 Using grounded conditional probabilities ?

Using the method of grounded constraints we obtain the knowledge base K B

7

and the dis-

tributionP

7 :

Knowledge base K B

7

P c

g

(cja) =0:9

P c

g

(cjb) =0:9

P

7

(a^b^c) 0:280 P

7

(:a^b^c) 0:163

P

7

(a^b^:c) 0:0125 P

7

(:a^b^:c) 0:037

P

7

(a^:b^c) 0:214 P

7

(:a^:b^c) 0:125

P

7

(a^:b^:c) 0:0425 P

7

(:a^:b^:c) 0:125

Within this distribution we obtain P(a) = 0:550 and P(b) = 0:493, which is, faced with

the symmetricspecication, a shock. Indeed, the result depends on the ordering of the

constraints, which is the worst case to happen. (If we switch the two rules in K B

7 , the

probabilities switch aswell!).

Thismeansthatthemethod'ground'onlyworksincertaincases. Moreprecisely,theconcept

of grounded constraints may not work with sets of conditionals where the antecedents are

not disjunct. In such cases, applying the update rule to one constraint may change the

antecedent of some other constraints (as e.g. the update of P c

g

(cja) may change P(b)),

whichis of course not intended by the methodground.

(11)

abilities of the parents ?

Thisconceptispresentedinthenextexample. ForalocalsituationitissuÆcienttoconsider

the K B

8

withQ(a)=Q(b)=0:5: The result of PITon K B

8

is distributionP

8 :

Knowledge base K B

8

P c

(a) =Q(a) =0:5

P c

(b) =Q(b) =0:5

P c

g

(cja) =0:9

P c

g

(cjb) =0:9

P

8

(a^b^c) 0:261 P

8

(:a^b^c) 0:189

P

8

(a^b^:c) 0:011 P

8

(:a^b^:c) 0:039

P

8

(a^:b^c) 0:189 P

8

(:a^:b^c) 0:136

P

8

(a^:b^:c) 0:039 P

8

(:a^:b^:c) 0:136

Though P(a) and P(b) are not any more dependent from the sequence of the constraints,

A and B isprobabilistically dependent, as P

8

(a^b)=0:272 but P(a)P(b)=0:25. Again

this may beaccepted (see [Hun89])or not.

6.4 Using conditional probabilities and the joint distribution of

the parents

For working with the method of complete parent distributions | mentioned by [Hun89 ,

Luk00] | it is suÆcient to consider the local problem K B

9

with Q(a^b) = Q(a^:b) =

Q(:a^b) = Q(:a^:b) = 0:25. In our example the dierence between grounded and

oatingconditional probabilitiesis not relevant,as the complete distributionof the parents

is xed. The result of PIT onK B

9

is distribution P

9 :

Knowledge base K B

9

P c

(a^b) =Q(a^b) =0:25

P c

(a^:b) =Q(a^:b) =0:25

P c

(:a^b) =Q(:a^b) =0:25

(P(:a^:b) =Q(:a^:b) =0:25 redundant)

P c

(cja) =0:9

P c

(cjb) =0:9

P

9

(a^b^c) 0:241

P

9

(a^b^:c) 0:009

P

9

(a^:b^c) 0:209

P

9

(a^:b^:c) 0:041

P

9

(:a^b^c) 0:209

P

9

(:a^b^:c) 0:041

P

9

(:a^:b^c) 0:125

P

9

(:a^:b^:c) 0:125

By specication, A and B remainindependent.

7 Conclusion

In contrast to complete or bayes-1-incomplete BN, where we can use ground conditionals,

with bayes-2-incomplete BN we seem to be bound (in the general case) to the expensive

methodofspecifyingthe(complete)jointdistributionoftheparents(asdescribedin[Hun89 ,

Luk00]), in orderto preserve their possibleindependence.

A second result is that a constraint of type ground gets suspicious, as we conclude, that a

constraint of type ground is not a linear constraint on the possible distributions P. If the

constraintwere linear, the MaxEnt solutionwould beunique and equivalent tothe xpoint

(12)

this case (and therefore especiallynot unique), the constraint P

g

(cje)=xi.e. P (cje)=x

and`stable'antecedente,cannotbelinear. Thisargumentisnotfalsiedbytheobservation,

thatineverycross-entropy step agroundedconstraintismapped intotwolinearconstraints:

some ofthe correspondingprobabilitiesare calculatedon the y(expressed by Q(:::))which

seems to generate a sort of non-linear statements in view of the global MaxEnt Problem.

This view is also supported by the observation that a constraint of type ground cannot be

expressed in the matrix of linear constraints given for the MaxEnt Problem. This matrix

only works on linear vectors of P

; the equation P k+1

(x) = P k

(x) with x cannot be

expressed onthis level.

References

[BKI00] ChristophBeierleandGabrieleKern-Isberner. Methodenwissensbasierter Systeme.

Vieweg, 1 edition,2000.

[Csi75] I. Csiszar. I-divergence geometry of probabilitiy distributions and minimization

problems. The Annals of Probability, 3(1):146{158, 1975.

[Hun89] DanielHunter. Causalityand maximumentropy updating. Int.Journal of approx-

imate reasoning, 3:87{114, 1989.

[JV90] J.B.Paris andA. Vencovska. A note onthe inevitability ofmaximum entropy. Int.

Journal of approximate reasoning, 4:183{223, 1990.

[Luk00] Thomas Lukaszewicz. Credal networks under maximum entropy. Uncertainty in

AI,pages 363{370,2000.

[Pea88] J. Pearl. Probabilistic Reasoning in Intelligent Systems: Networks of Plausible

Inference. Kaufmann, San Mateo, CA,1988.

[PIT] Homepageof PIT. http://www.pit-systems.de.

[RM96] Wilhelm Rodder and Carl-Heinz Meyer. Coherent knowledge processing at maxi-

mum entropy by SPIRIT. In KI 96,FernUniversitat Hagen, Germany, 1996.

[SF01] M. Schramm and B. Fronhofer. PIT | a system for reasoning with probabilities.

Technical Report 287 - 8/2001, FernUniversitat Hagen, Fachbereich Informatik,

2001.

[SPI] Homepageof SPIRIT. http://www.xspirit.de.

Referenzen

ÄHNLICHE DOKUMENTE

When two parties invest in human capital and at the same time decide on know-how disclosure it can be shown that joint ownership with veto power is the optimal ownership

Jade Hochschule Wilhelmshaven, Department of Management, Information, Technology. 7

Compared to the previous literature on bankruptcy hazard models, our specifications have new variables in the hazard function, which are the interaction effects between proxies for

We have presented drawing styles for the routing of undirected and directed edges in geographic networks using edge bundling at ends rather than interiors, and logarithmic- spiral

We also show how to improve the detection performance using an additional feature that quantifies the size of a structure, and by learning a linear feature combination from a

It also demontrates the possibilities of multivariate analysis in data reduction, by quantitating the extra Information provided by an additional analysis. Fi- nally, it reveals

- Combine physics based vision (generative models) with machine learning. There is an opening for a master project / PhD student – if you are interested talk to me

Widening is a frame- work for utilizing parallel resources and diversity to find models in a hypothesis space that are potentially better than those of a standard greedy algorithm..