Wolf-Tilo Balke Christoph Lofi
Institut für Informationssysteme
Technische Universität Braunschweig http://www.ifis.cs.tu-bs.de
Knowledge-Based Systems
and Deductive Databases
9.1 Expert Systems
9.2 Heuristic Reasoning 9.3 Fuzzy Reasoning
9.4 Case-Based Reasoning
9 Expert Systems
• Expert Systems have been the main application of A.I. in the early 80ties
• Idea: Create a system which can draw
conclusions and thus support people in difficult decisions
– Simulate a human expert
– Extract knowledge of experts and just cheaply copy it to all places you might need it
9.1 Expert Systems
• Expert Systems were supposed to be especially useful in
– Medical diagnosis
• …used to be a failure
• Currently, has its come-back in specialized areas
– Production and machine failure diagnosis
• Works quite well
– Financial services
• Widely used
9.1 Expert Systems
• Usually, three user groups are involved when maintaining and using an expert system
– End Users: The group that actually uses the system for problem solving assistance
• e.g. young and/or general doctors, field users deploying complex machinery, …
– Domain Experts: Are those experts whose knowledge is to be “extracted”
• e.g. highly-skilled specialist doctors, engineers of complex machinery, ...
– Knowledge Engineers: Assist the domain experts in representing knowledge in a formally usable form, e.g.
representing it as rules
9.1 Expert Systems
9.1 Expert Systems
• Common architecture of an expert system
– User Interface: Usually based on a question-response dialog – Inference Engine: Tries to deduce an answer based on the
knowledge base and the problem data
– Explanation System: Explains to the user why a certain answer was given or question asked
– Knowledge Base: Set of rules and base facts
– Problem Data: Facts provided for a specific problem via user interface
User Int erf ace Explanation System
Inference Engine
Problem Data
Knowledge
Base
• Building an expert system has several steps
– Building up the knowledgebase needs the extraction of knowledge in the form of rules and beliefs from
domain experts
• For complex domains it is almost impossible
– Deciding for a suitable reasoning technique
• This part is usually well-understood
– Designing an explanation facility
• Automatically generating sensible explanations or even arguments for derived facts is a major problem
• Often only the proof tree is returned…
9.1 Expert Systems
• The actual way of performing deduction in expert systems may differ
– OftenProlog/Datalog-based logic programming engines build the core
– Heuristic approaches, like MYCIN – Fuzzy approaches
– Case-based reasoning
9.1 Expert Systems
• MYCIN
– Developed 1970 at Stanford University, USA
– Medical expert system for treating infections
• Diagnosis of infection types and recommended antibiotics (antibiotics names usually end with ~mycin)
– Around 600 rules (also supporting uncertainty)
– MYCIN was treated as a success by the project team…
• Experiments showed good results, especially with rare infections
– … but was never used in practice
• Too clumsy
• Technological constraints
9.2 MYCIN
• Design considerations
– Uncertain reasoning is necessary
• There is no complete and doubt-free data in medicine
– However, most known approaches for uncertain reasoning had some severe drawbacks
• No real distinction between doubt, lack of knowledge and absence of belief
• As seen in last lecture: You very often end up with
confidence intervals of [0, 1], i.e. deductions are useless
• A lot of additional facts or rules are necessary to reliably use uncertain reasoning
9.2 MYCIN
• MYCIN pioneered the idea of certainty factors for uncertain deduction
– Certainty factors: the relative change of belief in some hypothesis facing a given observation
– MYCIN is a heuristic system
• Rules provides by experts are heuristic rules (i.e. are usually correct, but not always)
• Also, there are additional heuristics involved by making certain assumptions (like the underlying model or
independence of observations)
9.2 MYCIN
• MYCIN example rule
– I.e. the expert stating this rule would strongly
strengthen his/her belief in streptococcus when given the observations 1-3
9.2 MYCIN
If the organism 1) stains grampos 2) has coccus shape 3) grows in chains
then there is a suggestive evidence of 0.7 that it is streptococcus
• MYCIN example
9.2 MYCIN
---PATIENT-1---
1) Patient's name: FRED SMITH 2) Sex: MALE
3) Age: 55
4) Have you been able to obtain positive cultures from a site at which Fred Smith has an infection?
YES
---INFECTION-1---
5) What is the infection? PRIMARY-BACTEREMIA
6) Please give the date when signs of INFECTION-1 appeared. 5/5/75
The most recent positive culture associated with the primary bacteremia will be referred to as:
---CULTURE-1---
7) From what site was the specimen for CULTURE-1 taken? BLOOD 8) Please give the date when this culture was obtained. 5/9/75 The first significant organism from this blood culture will be called:
---ORGANISM-1---
9) Enter the identity of ORGANISM-1. UNKNOWN
• The certainty factor model is further based on measures of belief and disbelief
– Certainty factor can be computed by combining belief and disbelief measures
– Both are treated individually, i.e. increasing belief does not decrease disbelief automatically
9.2 MYCIN
• The informal definitions of disbelief and belief are as follows
– Measure of belief for hypothesis h given the observation E
• MB(h|E) = x means “In the light of evidence E, one’s belief that h is true increases by x”
– Measure of disbelief for hypothesis h given the observation E
• MD(h|E) = x means “In the light of evidence E, one’s disbelief that h is true increases by x”
9.2 MYCIN
• Examples:
– MB(canFly(x)|isBird(x))=0.8
• “Knowing that x is a bird, my belief that x can fly increases strongly by 0.8”
– MD(canFly(x)|isBiggerThan(x, 2.00m))=0.9
• “Knowing that x is bigger than 2.00m, my disbelief that x can fly increases strongly by 0.9”
– MD(canFly(x)| isBird(x))=0.1
• “Knowing that x is a bird, my disbelief that x can fly increases by 0.1”
– Could be a chicken, or penguin, or whatever
9.2 MYCIN
• The certainty factor is finally the difference of belief and disbelief for a given pair hypotheses and observation
– CF(h|E) := MB(h|E) - MD(h|E)
– Thus, certainty factors are within [-1, 1]
– A certainty factor describes the change of belief when a given fact/observation is known
• It is thus a relative measurement combining belief and disbelief
9.2 Certainty Factors
– A positive certainty factor means that after learning a fact, my belief into something increases
• The fact “confirms” the hypotheses
• For negative certainty, the disbelief increases
– If only certainty factors are used for knowledge modeling, one can extract the according belief and disbelief values directly
• This approach is used in MYCIN
9.2 Certainty Factors
MB(…) = 0
CF(…)
if CF(…)<0
if CF(…)≥0 MD(…) = -CF(…) 0
if CF(…)<0
if CF(…)≥0
• Also note that CF(h|E)+CF(¬ h|E)≤1
– They are not probabilities! i.e. known equality
P(h|E)+P(¬ h|E)=1 does not hold for certainty factors
• This actually means
– If some evidence supports an hypothesis, this does not mean that the negation is supported in the inverse manner
• E.g. , no reliable statements regarding the negation
9.2 Certainty Factors
• How are belief factors and certainty factors related to probability?
– We will need a formalization in order to derive valid rules for combination and chaining of rules
– For understanding and modeling knowledge and rules, the informal definition is usually used
• I.e. the quantified change of belief when a given fact/
observation is discovered
• Assumption: The formal model matches the intended semantics of the informal definition
9.2 Certainty Factors
• Measure of belief
– This means
• Is 0 if P(h|E)≤P(h), i.e. the evidence does not increase the probability of the hypothesis
• Otherwise, is the increase in probability when giving a certain evidence in proportion to the uncertainty (improbability) of the hypothesis alone
9.2 Certainty Factors
max(P(h|E), P(h)) – P(h)
MB(h|E) = 1-P(h)
1
if P(h)≠1 otherwise
0.0 1.0
P(h|E)-P(h)
• Definitions for measure of disbelief and certainty factor are analogously
– Assumption: These statistical notation does represent a fuzzy concept of human increase of belief
9.2 Certainty Factors
P(h) - min(P(h|E), P(h))
MD(h|E) = P(h)
1
if P(h)≠0
otherwise P(h|E)-P(h)
CF(h|E) = 1-P(h) if P(h|E)≥P(h), P(h)≠1 P(h|E)-P(h)
P(h) if P(h)≥P(h|E), P(h)≠0
• These definitions heavily rely on various a priori probabilities and conditional probabilities
– Those are usually not known and / or cannot be determined
– A user-provided certainty factor (based on informal definitions) thus proxies for all those probabilities
• “Given observation E, my belief into h decreases by 0.3”
thus implicitly contains information on P(h|E), P(h) and their relation
9.2 Certainty Factors
• So finally, the simplest form of rules using certainty factors is
– IF a THEN h WITH CF(h|a)
– Thus, we can have confirming rules (positive CF) or disconfirming rules (negative CF)
– Based on this rule type, some simple operations may be defined
• Chaining
• Parallel Combination
9.2 Certainty Factors
• Cognitive user load using different models
– Strict reasoning:
“If there are black dots on teeth, then this is caries.”
• Easy, but too restrictive and thus often leads to wrong rules
– Probabilistic reasoning:
“If there are black dots on teeth, then this is caries with a probability of 0.82.”
• Absolute statement on probabilistic frequencies
• Lots of statistical evaluation necessary to determine all needed a-priori and conditional probabilities
– Certainty factors:
“If there are black dots on teeth, then this is a strong positive (0.8) evidence for caries.”
• Relative statement on strength of evidence
• No absolute statistics necessary
9.2 Certainty Factors
• Rule chaining
– Chain rules consecutively, e.g.
• IF e THEN a WITH CF(a|e)
• IF a THEN h WITH CF(h|a)
• ⤇ IF e THEN h WITH CF(h|e)
– CF(h|e) = MB(h|e) - MD(h|e) can be computed from it‟s components as follows
• MB(h|e) = MB(h|a) * max(0, CF(a|e))
• MD(h|e) = MD(h|a) * max(0, CF(a|e))
• Thus, chaining is essentially a simple multiplication
9.2 Certainty Factors
• Parallel combination
– Combining multiple rules for the same hypothesis
• IF e THEN h WITH CF(h|e
1)
• IF a THEN h WITH CF(h|e
2)
– Parallel combination should be undefined when both certainty factors are opposing with maximal certainty
9.2 Certainty Factors
• The combined certainty factor can be computed independently by determining the belief and disbelief values
9.2 Certainty Factors
MB(h|e
1)+MB(h|e
2) - MB(h|e
1)* MB(h|e
2) MB(h|e
1,e
2) =
0 if MD(h|e
1,e
2)=1
otherwise
MD(h|e
1)+MD(h|e
2) - MD(h|e
1)* MD(h|e
2) MD(h|e
1,e
2) =
0 if MB(h|e
1,e
2)=1
otherwise
Undefined if both are 1,
special handling needed
• Example:
– If there is are black dots on teeth, my belief in caries increases moderately (0.5).
– If the x-ray shows no damage to the adamantine, then my belief in caries decreases strongly (-0.9).
• CF(caries|dots) = 0.5, CF(caries|noDamage )= -0.9 CF(caries|dots,noDamage) = ?
• MB(caries|dots) = 0.5, MD(caries|dots) = 0 MB(caries|noDamage) = 0
MD(caries|noDamage) = 0.9
– MB(caries|dots, noDamage) = 0.5 + 0 - 0.5*0 = 0.5 MD(caries|dots, noDamage) = 0 + 0.9 - 0*0.9 = 0.9 CF(caries|dots, noDamage) = -0.4
9.2 Certainty Factors
– If the gum is red, my belief in periodontitis increases moderately (0.5).
– If the there are loose teeth, my belief in periodontitis increases slightly (0.3).
• CF(peridontitis|redGum) = 0.5, CF(peridontitis|looseTeeth) = 0.3
CF(peridontitis| redGum, looseTeeth) = ?
• MB(peridontitis|redGum) = 0.5, MD(peridontitis|redGum) = 0
MB(peridontitis|looseTeeth) = 0.3 MD(peridontitis|looseTeeth )= 0
– MB(peridontitis|rg, lt) = 0.5 + 0.3 - 0.5*0.3= 0.65 MD(peridontitis|rg, lt) = 0 + 0 - 0*0 = 0
CF(peridontitis| redGum, looseTeeth) = 0.65
9.2 Certainty Factors
• How did the actual MYCIN system work?
– Only confirming or disaffirming rules with certainty factors
– For each patient, a predefined set of standard facts has to be provided
• Like age, general, general condition, common facts on the sample, etc
• These are used to rule out all completely unrealistic conclusions
9.2 MYCIN
9.2 MYCIN
---PATIENT-1---
1) Patient's name: FRED SMITH 2) Sex: MALE
3) Age: 55
4) Have you been able to obtain positive cultures from a site at which Fred Smith has an infection? YES
---INFECTION-1---
5) What is the infection? PRIMARY-BACTEREMIA
6) Please give the date when signs of INFECTION-1 appeared. 5/5/75 The most recent positive culture associated with the primary bacteremia will be referred to as:
---CULTURE-1---
7) From what site was the specimen for CULTURE-1 taken? BLOOD
8) Please give the date when this culture was obtained. 5/9/75
• After that, the systems switches to a backward- chaining approach
– Most promising rules are selected, and the system tries to prove each of them
• Discard all rules with known false premises
• Prefer rules with high certainty factors
– Missing information is requested from the user in a dialog-style interaction
9.2 MYCIN
The first significant organism from this blood culture will be called:
---ORGANISM-1---
9) Enter the identity of ORGANISM-1. UNKNOWN
10) Is ORGANISM-1 a rod or coccus (etc.)? ROD
• Finally, the system will present all possible deductions to the user along with their certainty factors
• After that, the user may query why the system
deduced those facts (system explains the answers)
– Mainly using the proof trees of each successful rule
9.2 MYCIN
** Did you use RULE 163 to find out anything about ORGANISM-1?
RULE163 was tried in the context of ORGANISM-1, but it failed because it is not true that the patient has had a genito-urinary tract manipulative procedure (clause 3).
** Why didn't you consider streptococcus as a possibility?
The following rule could have been used to determine that the identity of ORGANISM-1 was streptococcus: RULE033
But clause 2 (“the morphology of the organism is coccus”) was already known to be
false for ORGANISM-1, so the rule was never tried.
• Was MYCIN a success?
– Partially…
– During field evaluation, MYCIN deduced a correct treatment in 69% of all test infections
• …which is a lot better than diagnoses by average non-specialist physicians
• …but worse than diagnoses by infection specialists (~80% -
however, those specialist often disagreed such that a real evaluation is not possible as there is no “gold” standard for infection treatments)
• This result is very representative for most expert systems which
perform worse than real experts, but usually better than non-experts
– However, the system never made it into practice mainly due to legal and ethical issues
• Who is responsible (and can be sued) in case of a mistake?
9.2 MYCIN
• Vagueness is in the nature of most expert decisions
– Symptom for cavities: a person has discolored teeth.
Not at all, slightly, very,…?!
• The vagueness cannot only be modeled by an agent‟s belief in a statement, but also directly
– Fuzzy set theory (Lotfi Zadeh, 1965) – Expresses the degree of possibility
(as opposed to probability)
– Captures the idea of linguistic variables
9.3 Fuzzy Reasoning
• Crisp set membership degrees (1 or 0) are often insufficient for expressing vague concepts
– Consider the set of discolored teeth in cavities diagnosis
• Teeth with brown spots are discolored (1)
• White teeth are not discolored (0)
• What about yellowish teeth?
Depends on the degree of stain! ([0,1])
9.3 Fuzzy Reasoning
discolored
normal
• A fuzzy set is defined by membership function μ mapping the universe Ω to the unit interval [0,1]
– The normal set operations with the characteristic membership function can be easily extended for
fuzzy sets
• (μ
1⋂ μ
2)(ω) := min{μ
1(ω), μ
2(ω)}
• (μ
1⋃ μ
2)(ω) := max{μ
1(ω), μ
2(ω)}
• The complement of μ(ω) := 1- μ(ω)
– Some characteristics of Boolean algebra are preserved, others not
• E.g., distributivity holds, but DeMorgan‟s laws not…
9.3 Fuzzy Reasoning
• How can this be applied for reasoning?
– We have fuzzy facts and can deduce new (fuzzy) facts from them
– Back to our toothache example…
• Fact: Tom has yellow stained teeth.
• Rule: If a person has very discolored teeth, it is cavities.
• Does Tom have cavities..?!
• Obviously we need to relate the degree of staining with the premise of our rule
– Usually the degree is (linguistically) discretized
– Different degrees of staining have different possibilities of relating to cavities
9.3 Fuzzy Reasoning
• This leads to possibility distributions
– Only depend on the possibility that a case is described by a certain class
• Nobody would state that somebody with white teeth has
„discolored‟ teeth
• There is a possibility of 50% that a yellow stain would be considered as „discolored‟
– Are somewhat similar to probability distributions, but not depending on observed cases
• Possibility is an upper limit for probabilities
– Possibility theory introduced by L. Zadeh in 1978
9.3 Fuzzy Reasoning
• A possibility distribution assigns the possibility of a characteristic to some measureable property
– Somebody has
„discolored‟ teeth
– Somebody has
„very discolored‟ teeth
9.3 Fuzzy Reasoning
1.0
0 white yellow brown black
1.0
0
• An important feature is the ability to define hedges
– Provide operations to maintain close ties to natural
language, and allow for the mathematical generation of fuzzy statements
– The initial definition of hedges is a subjective process
• A simple example may transform the statement
„teeth are stained.‟ to „teeth are very stained.‟
– The hedge „very‟ is usually defined as μ very (ω) := (μ(ω)) 2
• Thus, if μ
stained(Tom) = 0.8, then μ
very_stained(Tom) = 0.64
– Other common hedges are „more or less‟, typically SQRT(μ(ω)), „somewhat‟, „rather‟, „sort of‟, ...
Knowledge-Based Systems and Deductive Databases – Wolf-Tilo Balke - Christoph Lofi – IfIS – TU Braunschweig 42