Manfred Schramm Bertram Fronhofer
Institut fur Informatik KI Institut, FakultatfurInformatik
Technische UniversitatMunchen Technische UniversitatDresden
schramm@pit-systems.de fronhoefer@pit-systems.d e
Abstract
For usingprobability theory when reasoning with uncertainknowledge two mainap-
proacheshavebeendeveloped: Bayesiannetworks,whichmaydescribelargeproba-
bilitydistributions,butrequireagraphstructure,andacompletespecicationofaset
ofconditionalprobabilitiesaccordingtothisgraph,whichmaybecumbersometowork
out, and on the other hand, MaxEnt completion which can also cope with incom-
pleteknowledge,butlackstheorderingof knowledgebyagraphand theexpressibility
of certainindependencies.
Thispaperdiscusses waysto combine these two approaches. Tothisendtwo kinds of
incompleteBayesiannetworksareconsideredandvariationsoftheMaxEntcompletion
algorithmsare discussed inview of theirabilityto process these incomplete Bayesian
networks. Thisanalysis detectedlimitsof theuseof so-calledground constraints.
Keywords: Probabilistic Reasoning, Maximum Entropy, Bayesian networks, Conditionals,
Counterfactuals
1 Introduction
Foracouple ofyears Bayesian networks have becomeincreasinglypopular inmany applica-
tion domainswhere dealingwith uncertain knowledge is aneveryday task. Although quite
successful, Bayesiannetworkssuer fromthe drawback thattheir userequires the uncertain
knowledge to be known completely in respect to their graph, which quite often is not the
case.
1
In this paper we focus on the problem of how to deal with incomplete Bayesian networks.
Beingincomplete, theapplication of somekindof completion procedure isindispensablefor
obtaininga (complete) probabilitydistribution.
A natural proposalis to complete while obeying the Principle of Maximum Entropy. This
kindof completion suggests itself due toits minimality properties (see e.g. [SF01]) and due
to the eÆcient MaxEnt completion algorithms developed for its computation in case the
given knowledge isspecied in formof linear constraints([PIT , SPI]).
As well known | see e.g. [Hun89, Luk00] | the application of the MaxEnt completion
algorithm to Bayesian networks leads to problems as the resulting probability distribution
diers from the one intended by the Bayesian network specication. This is even the case
whentakingtherulesofacompleteBayesiannetworkasinput,becausecertainindependence
requirements implicit in Bayesian networks are not taken (automatically) into account by
the MaxEnt algorithm.
1
ThatthespecicationofaBayesiannetworkmayalsobecumbersomedueto therightarrangementof
this knowledge | atask which may become impossible to be performed by hand in case of large sets of
knowledge|isafurtherproblemwhich,however,liesoutsidethescopeofthispaper.
be modiedin two ways |so-called grounded update ([RM96]) and joint distribution
method [Hun89, Luk00] |althoughwith dierent complexity.
Having at hand with grounded update and joint distribution method, two methods which
can deal with a complete Bayesian network, we might expect that they are also applicable
toincomplete Bayesian networks.
This istrue, althoughwith dierentcomplexity,if incomplete Bayesian network meansthat
there are just missing some ofthe rules insome nodes (see Def.2).
However, if wefurther generalize incompleteness indirectionof rules whose antecedents are
no longer required to be elementary events, grounded MaxEnt is no more applicable. This
case is interesting as such rules may occur when single inuences of one node on another
are known. We discuss an elementary case of such an incomplete Bayesian network | see
Def.3 |and show that only the joint distribution methodworksto our satisfaction.
Letus nallypoint out that both update methods are available inthe system [PIT]and all
the examples in this paper are calculated with PIT and are also available on our webpage
for online tests.
The paper is organized as follows:
In section 2 we summerize important fundamental concepts and nally dene in subsec-
tion 2.3the notionsof a (complete)Bayesian network and the two incompleteness concepts
bayes-1-incomplete and bayes-2-incomplete on which we will focus in this paper. In
section3wepresent theMaxEnt completionalgorithmand itsvariation,thegrounded com-
pletion algorithm. In section 4 we apply the MaxEnt completion algorithm to complete
Bayesian networks. In the sections 5 and 6 we discuss completion of the two types of
incomplete Bayesian networks mentionedabove. In section 7,we summarize the results.
2 Technical Preliminaries
2.1 Notation
LetV be anite set ofdiscrete variables. Each V
i
2V can beidentied withitsset of
possible values fv
i1
;:::;v
ik
i g (k
i
>1),thus denoting the size ofthe variableV
i by k
i .
Forourexamples,weassumetheseVariablestobetwo-valued. Thisallows toconsider
a variableas the set of avalue and itsnegation, e.g. A=fa;:ag.
Let G = (V;K) be a directed acyclic graph (dag) whose set V of nodes is a set of
variables and with aset of edges K VV.
Letpa(V)=fW 2Vj(W;V)2Kg the set of parent nodes of variable V inG.
Let K B be aset of conditional probabilities onthe cartesian product of the variables
in V, also called constraints. Let us mark elements of K B by c
as in P c
(fjg) = x
with f and g events inthis cartesian product.
LetP
betheprobabilitymodelwithmaximalentropy,givenaset oflinearconstraints
onP (see below for denition of P).
In the terminological tradition of probability theory, we consider
V
:= V as an
event space with its power set as set of events or event algebra A
V
. Further
we consider P : A
V
! [0;1] as a function which assigns probabilities to events in
compliancewiththelawsofprobabilitytheory. Wedenoteconditionalprobabilitiesby
P(bja):=
P(a^b)
P(a)
and calla the antecedent of the conditional P(bja).
A Bayesian network (BN) isastructure (G;K B),where G isadag asdened above
and K B is the smallest set with the followingproperty:
8v
i 2V
i
;y2pa(V
i ): 9p
a;y
2[0;1] with [P c
(v
i
jy)=p
a;y
]2K B
A BNspecies a unique joint probability distributionP over V by:
P(v
1
;:::;v
n )=
n
Y
i=1 P
c
(v
i
jy) (BNA) (1)
where (v
1
;:::;v
n
)2V and y=(v
l
1
;:::;v
l
jpa(V
i )j
)2pa(V
i
)with v
l
j 2V
l
j
2pa(V
i ).
TheMethodofMaximumEntropy(MaxEnt) 2
takesaknowledgebaseKofconstraints
onthe set of possible P and denes a set S
K
of probability distributions, which fulll
all the constraints in K and have maximal (Shannon) entropy, where the (Shannon)
entropy is dened as 3
H(v
1
;:::;v
n ):=
n
X
i=1 v
i logv
i
(2)
Ifthe constraintsinK are linearin respect to P, S isknown tocontain aunique P to
whichwe referby P
.
Like BN, systems based on the MaxEnt method use a unique probability distribution in
order toobtain unique probabilistic judgment. ButMaxEnt-Systems also dierfromBN:
MaxEnt-Systems (like[PIT , SPI ]) are based ona set of linear constraints. Especially
theydonotallowthespecicationofnon-linearconstraintslikeindependencerelations
(neither specied as set of equations nor specied asdags (as used in[Pea88 ])).
MaxEnt-Systems derive independency relations automatically from the set of con-
straintsand use these independencies for making the implementationeÆcient.
2.3 Incomplete Bayesian Networks
As dened above, the pair (G;K B) denes aBayesian Network (BN).
InordertospecifywhatwemeanbyanincompleteBNweneedsomeauxiliarydenitions.
Denition 1 We dene a set E of events as admissible, if it is generated from a sub-
set SE pa(X) by deleting elements (of the parent nodes) in the elementary events in
pa(X).
2
Forthejusticationofthis methodseee.g. [JV90],[SF01]
3
Thebaseofthelogarithmdoesnotmatter,in mostcaseslnistaken.
Then pa(X)=f(a^b);(a^:b);(:a^b);(:a^:b)g.
With SE = f(a^b);(a^:b);(:a^b)g, the set of events E = fa;(a^:b)g is ad-
missible, as e.g. b was deleted when generating a from a^b, nothing was deleted in
(a^:b), and :a;b was deleted from(:a^b).
For aBN =(G;K B) wehave for eachvariable X inV the followingpattern of rules inK B:
Forall (but one) values x2X and for alle2pa(X)
exists a p
x;e
2[0;1]such that [P(xje)=p
x;e
]2K B
Denition 2 A specication K B is called bayes-1-incomplete i it is obtained from a
complete BN by deleting rules. More formally we just require that
for all x in a subset S X exists a subset SE(x) pa(X) and for every element
e2SE(x) exists a p
x;e
2[0;1]such that: [P(xje)=p
x;e
]2K B.
Denition 3 A specication K B iscalled bayes-2-incomplete isobtained from anbayes-
1-incomplete BN by giving up events in the antecedent of our rules. More formally we just
require that
for all x in a subset S X exists an admissible set E(x)
such that for all e2E(x) exists a p
x;e
2[0;1] suchthat: [P(xje)=p
x;e
]2 K B.
Inthispaperwilllimitourinvestigationtothesetwocasesofincompleteness. Ofcourse,they
are not the only ones which can be imagined. An further step would be to allow arbitrary
conditional probabilities on the variables X [pa(X), but since this generalizationwill not
falsify our negative results, we decided to focus on aninteresting subclass. Equivalently we
mightallowprobabilityintervals 4
for the constraints(butnot for the informationabout the
distribution ofthe parents heredenoted by Q(:::)) withoutchangingthe qualitativeresults.
3 Completion Algorithms
In this section we want to present dierent algorithms for completing a set of linear con-
straintson anevent space.
TheyareallbasedontheeÆcientalgorithmdescribedforinstancein[RM96]andrelyheavily
onfundamental investigations by [Csi75].
Given aset of m (linear) constraints C
1
;:::;C
m
, with this algorithmthe problemof maxi-
mizingentropyissolved by calculatingasequence ofdistributions(P 0
;P 1
;:::),wherethe
xpoint of this sequence | as has been shown in [Csi75] | is equivalent to the MaxEnt
distribution P
satisfying C
1
;:::;C
m
. The uniform distribution is taken as starting point
P 0
. Thedistribution P k+1
iscalculated fromP k
by taking the set of alldistributionswhich
fulll the constraint C
[(k+1)modm]
and choose from this set the distribution P k+1
for which
the information theoretic distance CR (P k+1
;P k
) to the distribution P k
is minimal, where
thisdistanceiscalculatedbythecross-entropyfunction(alsocalledKullback-Leiblernumber
orI-divergence)
CR (P k+1
;P k
):=
X
!2 P
k+1
(!)log P
k+1
(!)
P k
(!)
4
asdiscussedin [Luk00]andalso implementedinthesystem[PIT]
Instead of solving the problem of maximizing entropy | MaxEnt Problem for short | for
a set of linear constraints directly, e.g. by means of the well known Lagrange multiplier
method,with the algorithmdescribed abovewe solveasequence of problems wherewehave
tominimizethe cross-entropy CR (P k+1
;P k
)|cross-entropy Problemsforshort. Thisisan
advantage,becausethecomputationof CR (P k+1
;P k
)isverysimpleandeÆcient forcertain
typesof linear constraints ase.g. conditional probability statements.
For instance, for aconditional constraint
P c
(bja) =x
the distribution P k+1
with minimal CR (P k+1
;P k
) can be computed from P k
by means of
the followingformula(This means thatP c
(bja) =xis the constraint C
[(k+1)modm]
):
P k+1
(!)= 8
<
: P
k
(!) 1 x
for ! 2(a^b)
P k
(!) x
for ! 2(a^:b)
P k
(!) for ! 2(:a)
(3)
where
=
xP k
(a^:b)
(1 x)P k
(a^b)
and is the factor needed for renormalization. (This factor may also be omitted and
normalizationcan bedone easily ina subsequent independent step.)
Example:
If we assume a knowledge base K B
0
consisting of just one constraint, this yields (approxi-
mately) the followingMaxEnt distribution P
0 :
Knowledgebase K B
0
P c
(bja)=0:9
P
0
(a^b) P
0
(a^:b) P
0
(:a^b) P
0
(:a^:b)
0:36 0:04 0:30 0:30
Remark 1 In view of the objectives of this paper, it is important to notice that P(a)
has changed from its initialvalue P 0
(a) = 0:5 to the value of P
(a) = P 1
(a) 0:4. The
constraintP(bja)thereforehaschangedthe`weight'P(a)ofitsantecedenta,i.e. the`weight'
oated.
3.2 Grounded Conditional Probabilities
This oatingof P(a)inthe example justpresented isnecessary forreallyobtainingthe dis-
tributionwith maximum entropy. However, this has been criticized bypeoplewho construe
conditional probabilities as causalities and | as we will show in section 4.1 |it is also in
conict with the intentions underlying the specication of a Bayesian network by a set of
linear constraints.
5
To copewith the demands of these criticsa dierent update method for
5
see e.g thediscussion in [Hun89]as counterargumentto acritique of Pearl concerning MaxEntorsee
[Luk00]mentioningacounterintuitivedependencyonametalevelbyMaxEnt
computing P from P has been proposed where the `weight' of the antecedent remains
stable orgrounded.
6
Since these two update methods may be selected individually for each constraint, we will
writeP c
f
(xjy)foraconstraintwhichshall beprocessedby thestandardupdatemethodoat
and P c
g
(xjy) for aconstraint which shall be processed by the update methodground.
7
This ground method respects the idea, that the antecedent should not be changed by the
update. [Hun89 ]usesthis ideatodeneprobabilisticcounterfactuals,[RM96]mentionsthat
this methodcould be used to implementBN and alsooers the user to choose this method
in the system SPIRIT (described in [RM96]).
Usingthe method groundwe obtain fora conditional constraintP c
g
(bja)=x :
P k+1
(!)= 8
>
>
>
>
>
<
>
>
>
>
>
: P
k
(!)
x
P k
(bja)
for ! 2(a^b)
P k
(!)
(1 x)
P k
(bja)
for ! 2(a^:b)
P k
(!) for ! 2(:a)
(4)
Remark 2 This update is also optimal in respect to cross-entropy, i.e. if we start with
a distribution P k
and a conditional probability P(bja) = x together with the condition
P k+1
(a) = P k
(a) the solution found by minimization of cross-entropy will be the same as
applying the update rule given ineq. (4).
8
In the example from above | now with a ground constraint | we obtain the following
probability distribution:
Knowledgebase K B
1
P c
g
(bja)=0:9
P
1
(a^b) P
1
(a^:b) P
1
(:a^b) P
1
(:a^:b)
0:45 0:05 0:25 0:25
In our example P 1
1
= P
1
. Consequently, P
1
is the distribution P
such that the distance
CR (P
;P 0
) isminimalgiven the ground constraint P c
g
(bja)=0:9.
Letusaddtheremarkthatforunconditionalconstraintsitmakesnodierencewhetherthey
are processed as oat orground,as P(a) P(aj) and P() =1 is not changed anyway.
4 MaxEnt-like Algorithms Applied to Complete BN
Together with a (complete) BN= (G;K B) the algorithm of equation (1) (BNA) is used to
compute the probability of a particular event in the distribution determinedby (the graph
and) the knowledgebase.
6
Thisterminologygoesbackto[RM96] whereoatrulesaredistinguishedfromgroundrules
7
InthesyntaxofPIT,thedefaultvalueisoat,whileaconstrainttobeprocessedbythemethodground
has to be marked by 'm'(for respecting marginal independence) as e.g. in Pm(cja) = 0:7 instead of the
standardcaseP(cja)=0:7forthemethodoat.
8
As possibleelementsin thesubspaces(a^b);(a^:b)and (:a)are notdistinguishedbyaconstraint,
theLagrangesolutionwillusethesamefactorsforeveryelementinside thesesets. Butgiventhat wehave
tousethesamefactorsinsidethesubsets,theproblemcanbereducedtothreevariables(oneforeachcase)
andthe(unique)solutioncalculatedfromtheconditions.
C
Figure1: Marginal Independence of the variablesA and B ina BN
If the BN is incomplete, BNA is no more applicable and a completion algorithm must be
used instead.
SincewewanttoapplyalgorithmsinspiredbyMaxEntcompletion,itisreasonabletoanalyze
rst whethertheses algorithmsare applicabletocompute theprobabilityof aneventincase
of a complete BN, i.e. whether they can replace BNA.
Unfortunately, this is not without problems.
Technically,wecanrestrictforthisinvestigationourfocustothemostsimple(andstandard)
exampleofaBN,whichincludesmarginalindependencies: AbinarynodeCwithtwobinary
parent nodes A and B (see g. 1).
9
The reason is | see [Luk00] | that we may use
the (total) ordering of nodes, given by the graph of a BN, to split up the BN in a set of
(son-parent)problems (which we willcall localproblems) and to solve these localproblems
throughMaxEntcompletionintheordergivenbythegraphGofBN.The(possibleiteration
dependent) informationabout the probability of the parents (which has to berespected by
the local problem) is marked by Q (as in Q(a^b)) as this can be seen as a Query to the
whole network.
4.1 Using (oating) conditional probabilities
If we just take the set K B of probabilistic rules from a BN specication and compute the
MaxEnt distribution satisfying these rules, then in nearly all cases the computed MaxEnt
distribution will contain additionaldependencies which would not be contained in the dis-
tributioncomputed by the ordinaryBNA.
This eect iswellknown ([Hun89, Luk00]) and stems from the fact that the ordinary BNA
implicitly assumes marginal independencies (of the parents). However, this information is
not derivable fromthe set of rules and therefore
the information about marginal independencies (of the parents) was not imparted to
the MaxEnt completion algorithmand, moreover,
theseindependenciesseem tocontain relevantinformation(astheMaxEnt completion
algorithmusuallydoesnot heedtheseindependencieswithoutbeingexplicitlytoldto).
This eect can alsobedescribed by saying,that theMaxEntcompletion algorithmuses the
moral (undirected) graph (see [BKI00]) derived from the dag G of the BN when inferring
independencies.
Torecall this eect, weuse aspecication of our(simple)BNof Figure1(with A,B and C
binary nodes). The result on K B
2
isthe following distributionP
2 :
Knowledgebase K B
2
P c
(a) =Q(a) =0:5;
P c
(b) =Q(b) =0:5;
P c
(cja^b) =0:1;
P c
(cja^:b) =0:2;
P c
(cj:a^b) =0:3;
P c
(cj:a^:b) =0:4;
P
2
(a^b^c) 0:024
P
2
(a^b^:c) 0:219
P
2
(a^:b^c) 0:051
P
2
(a^:b^:c) 0:206
P
2
(:a^b^c) 0:077
P
2
(:a^b^:c) 0:180
P
2
(:a^:b^c) 0:097
P
2
(:a^:b^:c) 0:146
For checking the independence of the parents Aand B we calculate
P
2 (a)P
2
(b)=0:25; P
2
(a^b)=0:243
Though the dependence eectis smallinour (running)example, wesee that the (marginal)
independenceof the parents Aand B (presentinthe (apriori)uniform distribution)islost.
4.2 Using grounded conditional probabilities
Asalready mentionedin[RM96],we get thesame resultaswith theBNAif weconsider the
rules as ground.
By denition (see eq. 4), ground conditional probabilities do not change the probability of
their antecedent. Because an antecedent is an elementary event of the joint distribution of
the parents, the update does not change this distribution. Therefore using the conditionals
of the BN inthe mode 'ground' willkeep the parentsin their state of independence.
Example: Continuing the example frombeforeand assumingthe rules as groundis given
in K B
3
. The result of PITon K B
3
is shown indistribution P
3
Knowledgebase K B
3
P c
(a) =Q(a) =0:5;
P c
(b) =Q(b) =0:5;
P c
g
(cja^b) =0:1;
P c
g
(cja^:b) =0:2;
P c
g
(cj:a^b) =0:3;
P c
g
(cj:a^:b) =0:4;
P
3
(a^b^c) 0:025
P
3
(a^b^:c) 0:225
P
3
(a^:b^c) 0:050
P
3
(a^:b^:c) 0:200
P
3
(:a^b^c) 0:075
P
3
(:a^b^:c) 0:175
P
3
(:a^:b^c) 0:100
P
3
(:a^:b^:c) 0:150
Wecalculate
P
3 (a)P
3
(b)=0:25=P
3 (a^b)
and see that the independency of the parentsis preserved.
4.3 Using the joint distribution of the parents
As mentioned in section discussion in [Hun89], an equivalent result can be assured by in-
cluding into the knowledge base the actual (iteration dependent) joint distribution of the
parentsofthenode(heredenoted asQ(:::)). Inthis case,theseadditionalconstraintsensure
parents.
As it issuÆcient to consider a localupdate problem, the knowledge base contains numbers
(insteadof iterationdependent queries to the network,given inthe (general) specication).
Example:
Knowledgebase K B
4
P c
(a^b) =Q(a^b) =0:25; P
c
f
(cja^b) =0:1;
P c
(a^:b) =Q(a^:b) =0:25; P c
f
(cja^:b) =0:2;
P c
(:a^b) =Q(:a^b) =0:25; P c
f
(cj:a^b) =0:3;
(P c
(:a^:b) =Q(:a^:b) =0:25; redundant) P c
f
(cj:a^:b) =0:4;
As already mentioned,the resulting distributionis equivalentto P
3 .
Remark 3 Notethatitmakesnodierencewhethertheconstraintsareprocessedasground
oras oat, since the parents are fully constrainedand cannot oat.
Remark 4 It is not suÆcient to add the actual marginal probabilities of the parents (i.e.
Q(a) and Q(b) instead of the complete distribution of the parents in K B
4
). In this case, A
and B become dependent (compare K B
8
below).
Discussion: For aBNwith a standard set of conditional probabilitiesthe method of using
groundconditionals(asinK B
3
)fulllsthe demandofpreserving theindependencies(among
theparents). Themethodiseasytospecifyandisappliedlocally(i.e. ineverysingleupdate
step). It therefore doesnot need to introduce any additionalconstraints.
The alternativeof using additionalconstraints(as in K B
4
) does not seem tobeof interest,
asthe necessary number ofadditionalconstraintsis exponential inthe number n ofparents
and respective sizes k
i ((
Q
n
i=1 k
i
) 1). Moreover, the calculation of the queries may also
become expensive (even if done only once as proposed in [Luk00]), if the network demands
the conditioning onother nodes.
5 Bayes-1-Incomplete BN
Having shown how algorithmsderived from the MaxEnt completion algorithm can be used
to deal with complete BN specications, we will now study their application to incomplete
BN.
Weconsider rst the case of bayes-1-incomplete BN.
Referring to Figure 1 we give the following example, using the mode ground for our rules:
The result of PIT onK B
5
is presented indistribution P
5 :
Knowledge base K B
5
P c
g
(cja^b) =0:1
P c
g
(cj:a^:b) =0:4
P
5
(a^b^c) 0:025 P
5
(:a^b^c) 0:125
P
5
(a^b^:c) 0:225 P
5
(:a^b^:c) 0:125
P
5
(a^:b^c) 0:125 P
5
(:a^:b^c) 0:100
P
5
(a^:b^:c) 0:125 P
5
(:a^:b^:c) 0:150
As inthe complete case of K B
3
, we obtain P
5 (a)P
5
(b)=P
5
(a^b)=0:25.
This situation is indeed covered by the discussion of complete Bayesian networks, i.e. the
Butincompletenessmaygoevenfurtherandwewilldiscussinthissectionbayes-2-incompleteness.
Forourrunningexample,wetakeacloselookatthecasewheretheeventsweareconditioning
on, are not elementary inrespect to the parents' distribution:
fP c
(cja)=x
1
;P c
(cjb)=x
2 g.
How should we specify this situation, where we have knowledge about single inuence
('causal'or not), but noinformation about the joint inuence ?
6.1 Using (oating) conditional probabilities ?
If preservation of the possible marginal independence of the parents is not desired, we can
use the standard method of oating constraints (compare the complete case 2 above). The
result of PIT onK B
6
is presented indistribution P
6 :
Knowledgebase K B
6
P c
(cja) =0:9
P c
(cjb) =0:9
P
6
(a^b^c) 0:221 P
6
(:a^b^c) 0:188
P
6
(a^b^:c) 0:009 P
6
(:a^b^:c) 0:037
P
6
(a^:b^c) 0:188 P
6
(:a^:b^c) 0:160
P
6
(a^:b^:c) 0:037 P
6
(:a^:b^:c) 0:160
Within this distribution we obtain(as expected) a dependency between A and B:
P(a^b)=0:23; but P(a)P(b)=0:455 2
=0:207
Sincethere areargumentstoaccept thedependence |foraparticularexample see [Hun89 ]
|wemention this possibility. For allother cases, we have to trydierent approaches.
6.2 Using grounded conditional probabilities ?
Using the method of grounded constraints we obtain the knowledge base K B
7
and the dis-
tributionP
7 :
Knowledge base K B
7
P c
g
(cja) =0:9
P c
g
(cjb) =0:9
P
7
(a^b^c) 0:280 P
7
(:a^b^c) 0:163
P
7
(a^b^:c) 0:0125 P
7
(:a^b^:c) 0:037
P
7
(a^:b^c) 0:214 P
7
(:a^:b^c) 0:125
P
7
(a^:b^:c) 0:0425 P
7
(:a^:b^:c) 0:125
Within this distribution we obtain P(a) = 0:550 and P(b) = 0:493, which is, faced with
the symmetricspecication, a shock. Indeed, the result depends on the ordering of the
constraints, which is the worst case to happen. (If we switch the two rules in K B
7 , the
probabilities switch aswell!).
Thismeansthatthemethod'ground'onlyworksincertaincases. Moreprecisely,theconcept
of grounded constraints may not work with sets of conditionals where the antecedents are
not disjunct. In such cases, applying the update rule to one constraint may change the
antecedent of some other constraints (as e.g. the update of P c
g
(cja) may change P(b)),
whichis of course not intended by the methodground.
abilities of the parents ?
Thisconceptispresentedinthenextexample. ForalocalsituationitissuÆcienttoconsider
the K B
8
withQ(a)=Q(b)=0:5: The result of PITon K B
8
is distributionP
8 :
Knowledge base K B
8
P c
(a) =Q(a) =0:5
P c
(b) =Q(b) =0:5
P c
g
(cja) =0:9
P c
g
(cjb) =0:9
P
8
(a^b^c) 0:261 P
8
(:a^b^c) 0:189
P
8
(a^b^:c) 0:011 P
8
(:a^b^:c) 0:039
P
8
(a^:b^c) 0:189 P
8
(:a^:b^c) 0:136
P
8
(a^:b^:c) 0:039 P
8
(:a^:b^:c) 0:136
Though P(a) and P(b) are not any more dependent from the sequence of the constraints,
A and B isprobabilistically dependent, as P
8
(a^b)=0:272 but P(a)P(b)=0:25. Again
this may beaccepted (see [Hun89])or not.
6.4 Using conditional probabilities and the joint distribution of
the parents
For working with the method of complete parent distributions | mentioned by [Hun89 ,
Luk00] | it is suÆcient to consider the local problem K B
9
with Q(a^b) = Q(a^:b) =
Q(:a^b) = Q(:a^:b) = 0:25. In our example the dierence between grounded and
oatingconditional probabilitiesis not relevant,as the complete distributionof the parents
is xed. The result of PIT onK B
9
is distribution P
9 :
Knowledge base K B
9
P c
(a^b) =Q(a^b) =0:25
P c
(a^:b) =Q(a^:b) =0:25
P c
(:a^b) =Q(:a^b) =0:25
(P(:a^:b) =Q(:a^:b) =0:25 redundant)
P c
(cja) =0:9
P c
(cjb) =0:9
P
9
(a^b^c) 0:241
P
9
(a^b^:c) 0:009
P
9
(a^:b^c) 0:209
P
9
(a^:b^:c) 0:041
P
9
(:a^b^c) 0:209
P
9
(:a^b^:c) 0:041
P
9
(:a^:b^c) 0:125
P
9
(:a^:b^:c) 0:125
By specication, A and B remainindependent.
7 Conclusion
In contrast to complete or bayes-1-incomplete BN, where we can use ground conditionals,
with bayes-2-incomplete BN we seem to be bound (in the general case) to the expensive
methodofspecifyingthe(complete)jointdistributionoftheparents(asdescribedin[Hun89 ,
Luk00]), in orderto preserve their possibleindependence.
A second result is that a constraint of type ground gets suspicious, as we conclude, that a
constraint of type ground is not a linear constraint on the possible distributions P. If the
constraintwere linear, the MaxEnt solutionwould beunique and equivalent tothe xpoint
this case (and therefore especiallynot unique), the constraint P
g
(cje)=xi.e. P (cje)=x
and`stable'antecedente,cannotbelinear. Thisargumentisnotfalsiedbytheobservation,
thatineverycross-entropy step agroundedconstraintismapped intotwolinearconstraints:
some ofthe correspondingprobabilitiesare calculatedon the y(expressed by Q(:::))which
seems to generate a sort of non-linear statements in view of the global MaxEnt Problem.
This view is also supported by the observation that a constraint of type ground cannot be
expressed in the matrix of linear constraints given for the MaxEnt Problem. This matrix
only works on linear vectors of P
; the equation P k+1
(x) = P k
(x) with x cannot be
expressed onthis level.
References
[BKI00] ChristophBeierleandGabrieleKern-Isberner. Methodenwissensbasierter Systeme.
Vieweg, 1 edition,2000.
[Csi75] I. Csiszar. I-divergence geometry of probabilitiy distributions and minimization
problems. The Annals of Probability, 3(1):146{158, 1975.
[Hun89] DanielHunter. Causalityand maximumentropy updating. Int.Journal of approx-
imate reasoning, 3:87{114, 1989.
[JV90] J.B.Paris andA. Vencovska. A note onthe inevitability ofmaximum entropy. Int.
Journal of approximate reasoning, 4:183{223, 1990.
[Luk00] Thomas Lukaszewicz. Credal networks under maximum entropy. Uncertainty in
AI,pages 363{370,2000.
[Pea88] J. Pearl. Probabilistic Reasoning in Intelligent Systems: Networks of Plausible
Inference. Kaufmann, San Mateo, CA,1988.
[PIT] Homepageof PIT. http://www.pit-systems.de.
[RM96] Wilhelm Rodder and Carl-Heinz Meyer. Coherent knowledge processing at maxi-
mum entropy by SPIRIT. In KI 96,FernUniversitat Hagen, Germany, 1996.
[SF01] M. Schramm and B. Fronhofer. PIT | a system for reasoning with probabilities.
Technical Report 287 - 8/2001, FernUniversitat Hagen, Fachbereich Informatik,
2001.
[SPI] Homepageof SPIRIT. http://www.xspirit.de.