Knowledge-Based Systems

(1)

Wolf-Tilo Balke Christoph Lofi

Institut für Informationssysteme

Technische Universität Braunschweig http://www.ifis.cs.tu-bs.de

Knowledge-Based Systems

and Deductive Databases

(2)

9.1 Expert Systems

9.2 Heuristic Reasoning 9.3 Fuzzy Reasoning

9.4 Case-Based Reasoning

9 Expert Systems

(3)

• Expert Systems have been the main application of A.I. in the early 80ties

• Idea: Create a system which can draw

conclusions and thus support people in difficult decisions

– Simulate a human expert

– Extract knowledge of experts and just cheaply copy it to all places you might need it

9.1 Expert Systems

(4)

• Expert Systems were supposed to be especially useful in

– Medical diagnosis

• …used to be a failure

• Currently, has its come-back in specialized areas

– Production and machine failure diagnosis

• Works quite well

– Financial services

• Widely used

9.1 Expert Systems

(5)

• Usually, three user groups are involved when maintaining and using an expert system

– End Users: The group that actually uses the system for problem solving assistance

• e.g. young and/or general doctors, field users deploying complex machinery, …

– Domain Experts: Are those experts whose knowledge is to be “extracted”

• e.g. highly-skilled specialist doctors, engineers of complex machinery, ...

– Knowledge Engineers: Assist the domain experts in representing knowledge in a formally usable form, e.g.

representing it as rules

9.1 Expert Systems

(6)

9.1 Expert Systems

• Common architecture of an expert system

– User Interface: Usually based on a question-response dialog – Inference Engine: Tries to deduce an answer based on the

knowledge base and the problem data

– Explanation System: Explains to the user why a certain answer was given or question asked

– Knowledge Base: Set of rules and base facts

– Problem Data: Facts provided for a specific problem via user interface

User Int erf ace Explanation System

Inference Engine

Problem Data

Knowledge

Base

(7)

• Building an expert system has several steps

– Building up the knowledgebase needs the extraction of knowledge in the form of rules and beliefs from

domain experts

• For complex domains it is almost impossible

– Deciding for a suitable reasoning technique

• This part is usually well-understood

– Designing an explanation facility

• Automatically generating sensible explanations or even arguments for derived facts is a major problem

• Often only the proof tree is returned…

9.1 Expert Systems

(8)

• The actual way of performing deduction in expert systems may differ

– OftenProlog/Datalog-based logic programming engines build the core

– Heuristic approaches, like MYCIN – Fuzzy approaches

– Case-based reasoning

9.1 Expert Systems

(9)

• MYCIN

– Developed 1970 at Stanford University, USA

– Medical expert system for treating infections

• Diagnosis of infection types and recommended antibiotics (antibiotics names usually end with ~mycin)

– Around 600 rules (also supporting uncertainty)

– MYCIN was treated as a success by the project team…

• Experiments showed good results, especially with rare infections

– … but was never used in practice

• Too clumsy

• Technological constraints

9.2 MYCIN

(10)

• Design considerations

– Uncertain reasoning is necessary

• There is no complete and doubt-free data in medicine

– However, most known approaches for uncertain reasoning had some severe drawbacks

• No real distinction between doubt, lack of knowledge and absence of belief

• As seen in last lecture: You very often end up with

confidence intervals of [0, 1], i.e. deductions are useless

• A lot of additional facts or rules are necessary to reliably use uncertain reasoning

9.2 MYCIN

(11)

• MYCIN pioneered the idea of certainty factors for uncertain deduction

– Certainty factors: the relative change of belief in some hypothesis facing a given observation

– MYCIN is a heuristic system

• Rules provides by experts are heuristic rules (i.e. are usually correct, but not always)

• Also, there are additional heuristics involved by making certain assumptions (like the underlying model or

independence of observations)

9.2 MYCIN

(12)

• MYCIN example rule

– I.e. the expert stating this rule would strongly

strengthen his/her belief in streptococcus when given the observations 1-3

9.2 MYCIN

If the organism 1) stains grampos 2) has coccus shape 3) grows in chains

then there is a suggestive evidence of 0.7 that it is streptococcus

(13)

• MYCIN example

9.2 MYCIN

---PATIENT-1---

1) Patient's name: FRED SMITH 2) Sex: MALE

3) Age: 55

4) Have you been able to obtain positive cultures from a site at which Fred Smith has an infection?

YES

---INFECTION-1---

5) What is the infection? PRIMARY-BACTEREMIA

6) Please give the date when signs of INFECTION-1 appeared. 5/5/75

The most recent positive culture associated with the primary bacteremia will be referred to as:

---CULTURE-1---

7) From what site was the specimen for CULTURE-1 taken? BLOOD 8) Please give the date when this culture was obtained. 5/9/75 The first significant organism from this blood culture will be called:

---ORGANISM-1---

9) Enter the identity of ORGANISM-1. UNKNOWN

(14)

• The certainty factor model is further based on measures of belief and disbelief

– Certainty factor can be computed by combining belief and disbelief measures

– Both are treated individually, i.e. increasing belief does not decrease disbelief automatically

9.2 MYCIN

(15)

• The informal definitions of disbelief and belief are as follows

– Measure of belief for hypothesis h given the observation E

• MB(h|E) = x means “In the light of evidence E, one’s belief that h is true increases by x”

– Measure of disbelief for hypothesis h given the observation E

• MD(h|E) = x means “In the light of evidence E, one’s disbelief that h is true increases by x”

9.2 MYCIN

(16)

• Examples:

– MB(canFly(x)|isBird(x))=0.8

• “Knowing that x is a bird, my belief that x can fly increases strongly by 0.8”

– MD(canFly(x)|isBiggerThan(x, 2.00m))=0.9

• “Knowing that x is bigger than 2.00m, my disbelief that x can fly increases strongly by 0.9”

– MD(canFly(x)| isBird(x))=0.1

• “Knowing that x is a bird, my disbelief that x can fly increases by 0.1”

– Could be a chicken, or penguin, or whatever

9.2 MYCIN

(17)

• The certainty factor is finally the difference of belief and disbelief for a given pair hypotheses and observation

– CF(h|E) := MB(h|E) - MD(h|E)

– Thus, certainty factors are within [-1, 1]

– A certainty factor describes the change of belief when a given fact/observation is known

• It is thus a relative measurement combining belief and disbelief

9.2 Certainty Factors

(18)

– A positive certainty factor means that after learning a fact, my belief into something increases

• The fact “confirms” the hypotheses

• For negative certainty, the disbelief increases

– If only certainty factors are used for knowledge modeling, one can extract the according belief and disbelief values directly

• This approach is used in MYCIN

9.2 Certainty Factors

MB(…) = 0

CF(…)

if CF(…)<0

if CF(…)≥0 MD(…) = -CF(…) 0

if CF(…)<0

if CF(…)≥0

(19)

• Also note that CF(h|E)+CF(¬ h|E)≤1

– They are not probabilities! i.e. known equality

P(h|E)+P(¬ h|E)=1 does not hold for certainty factors

• This actually means

– If some evidence supports an hypothesis, this does not mean that the negation is supported in the inverse manner

• E.g. , no reliable statements regarding the negation

9.2 Certainty Factors

(20)

• How are belief factors and certainty factors related to probability?

– We will need a formalization in order to derive valid rules for combination and chaining of rules

– For understanding and modeling knowledge and rules, the informal definition is usually used

• I.e. the quantified change of belief when a given fact/

observation is discovered

• Assumption: The formal model matches the intended semantics of the informal definition

9.2 Certainty Factors

(21)

• Measure of belief

– This means

• Is 0 if P(h|E)≤P(h), i.e. the evidence does not increase the probability of the hypothesis

• Otherwise, is the increase in probability when giving a certain evidence in proportion to the uncertainty (improbability) of the hypothesis alone

9.2 Certainty Factors

max(P(h|E), P(h)) – P(h)

MB(h|E) = 1-P(h)

1 if P(h)≠1 otherwise

0.0 1.0

P(h|E)-P(h)

(22)

• Definitions for measure of disbelief and certainty factor are analogously

– Assumption: These statistical notation does represent a fuzzy concept of human increase of belief

9.2 Certainty Factors

P(h) - min(P(h|E), P(h))

MD(h|E) = P(h)

1 if P(h)≠0

otherwise P(h|E)-P(h)

CF(h|E) = 1-P(h) if P(h|E)≥P(h), P(h)≠1 P(h|E)-P(h)

P(h) if P(h)≥P(h|E), P(h)≠0

(23)

• These definitions heavily rely on various a priori probabilities and conditional probabilities

– Those are usually not known and / or cannot be determined

– A user-provided certainty factor (based on informal definitions) thus proxies for all those probabilities

• “Given observation E, my belief into h decreases by 0.3”

thus implicitly contains information on P(h|E), P(h) and their relation

9.2 Certainty Factors

(24)

• So finally, the simplest form of rules using certainty factors is

– IF a THEN h WITH CF(h|a)

– Thus, we can have confirming rules (positive CF) or disconfirming rules (negative CF)

– Based on this rule type, some simple operations may be defined

• Chaining

• Parallel Combination

9.2 Certainty Factors

(25)

• Cognitive user load using different models

– Strict reasoning:

“If there are black dots on teeth, then this is caries.”

• Easy, but too restrictive and thus often leads to wrong rules

– Probabilistic reasoning:

“If there are black dots on teeth, then this is caries with a probability of 0.82.”

• Absolute statement on probabilistic frequencies

• Lots of statistical evaluation necessary to determine all needed a-priori and conditional probabilities

– Certainty factors:

“If there are black dots on teeth, then this is a strong positive (0.8) evidence for caries.”

• Relative statement on strength of evidence

• No absolute statistics necessary

9.2 Certainty Factors

(26)

• Rule chaining

– Chain rules consecutively, e.g.

• IF e THEN a WITH CF(a|e)

• IF a THEN h WITH CF(h|a)

• ⤇ IF e THEN h WITH CF(h|e)

– CF(h|e) = MB(h|e) - MD(h|e) can be computed from it‟s components as follows

• MB(h|e) = MB(h|a) * max(0, CF(a|e))

• MD(h|e) = MD(h|a) * max(0, CF(a|e))

• Thus, chaining is essentially a simple multiplication

9.2 Certainty Factors

(27)

• Parallel combination

– Combining multiple rules for the same hypothesis

• IF e THEN h WITH CF(h|e

₁

)

• IF a THEN h WITH CF(h|e

₂

)

– Parallel combination should be undefined when both certainty factors are opposing with maximal certainty

9.2 Certainty Factors

(28)

• The combined certainty factor can be computed independently by determining the belief and disbelief values

9.2 Certainty Factors

MB(h|e

₁

)+MB(h|e

₂

) - MB(h|e

₁

)* MB(h|e

₂

) MB(h|e

₁

,e

₂

) =

0 if MD(h|e

₁

,e

₂

)=1

otherwise

MD(h|e

₁

)+MD(h|e

₂

) - MD(h|e

₁

)* MD(h|e

₂

) MD(h|e

₁

,e

₂

) =

0 if MB(h|e

₁

,e

₂

)=1

otherwise

Undefined if both are 1,

special handling needed

(29)

• Example:

– If there is are black dots on teeth, my belief in caries increases moderately (0.5).

– If the x-ray shows no damage to the adamantine, then my belief in caries decreases strongly (-0.9).

• CF(caries|dots) = 0.5, CF(caries|noDamage )= -0.9 CF(caries|dots,noDamage) = ?

• MB(caries|dots) = 0.5, MD(caries|dots) = 0 MB(caries|noDamage) = 0

MD(caries|noDamage) = 0.9

– MB(caries|dots, noDamage) = 0.5 + 0 - 0.50 = 0.5 MD(caries|dots, noDamage) = 0 + 0.9 - 00.9 = 0.9 CF(caries|dots, noDamage) = -0.4

9.2 Certainty Factors

(30)

– If the gum is red, my belief in periodontitis increases moderately (0.5).

– If the there are loose teeth, my belief in periodontitis increases slightly (0.3).

• CF(peridontitis|redGum) = 0.5, CF(peridontitis|looseTeeth) = 0.3

CF(peridontitis| redGum, looseTeeth) = ?

• MB(peridontitis|redGum) = 0.5, MD(peridontitis|redGum) = 0

MB(peridontitis|looseTeeth) = 0.3 MD(peridontitis|looseTeeth )= 0

– MB(peridontitis|rg, lt) = 0.5 + 0.3 - 0.50.3= 0.65 MD(peridontitis|rg, lt) = 0 + 0 - 00 = 0

CF(peridontitis| redGum, looseTeeth) = 0.65

9.2 Certainty Factors

(31)

• How did the actual MYCIN system work?

– Only confirming or disaffirming rules with certainty factors

– For each patient, a predefined set of standard facts has to be provided

• Like age, general, general condition, common facts on the sample, etc

• These are used to rule out all completely unrealistic conclusions

9.2 MYCIN

(32)

9.2 MYCIN

---PATIENT-1---

1) Patient's name: FRED SMITH 2) Sex: MALE

3) Age: 55

4) Have you been able to obtain positive cultures from a site at which Fred Smith has an infection? YES

---INFECTION-1---

5) What is the infection? PRIMARY-BACTEREMIA

6) Please give the date when signs of INFECTION-1 appeared. 5/5/75 The most recent positive culture associated with the primary bacteremia will be referred to as:

---CULTURE-1---

7) From what site was the specimen for CULTURE-1 taken? BLOOD

8) Please give the date when this culture was obtained. 5/9/75

(33)

• After that, the systems switches to a backward- chaining approach

– Most promising rules are selected, and the system tries to prove each of them

• Discard all rules with known false premises

• Prefer rules with high certainty factors

– Missing information is requested from the user in a dialog-style interaction

9.2 MYCIN

The first significant organism from this blood culture will be called:

---ORGANISM-1---

9) Enter the identity of ORGANISM-1. UNKNOWN

10) Is ORGANISM-1 a rod or coccus (etc.)? ROD

(34)

• Finally, the system will present all possible deductions to the user along with their certainty factors

• After that, the user may query why the system

deduced those facts (system explains the answers)

– Mainly using the proof trees of each successful rule

9.2 MYCIN

** Did you use RULE 163 to find out anything about ORGANISM-1?

RULE163 was tried in the context of ORGANISM-1, but it failed because it is not true that the patient has had a genito-urinary tract manipulative procedure (clause 3).

** Why didn't you consider streptococcus as a possibility?

The following rule could have been used to determine that the identity of ORGANISM-1 was streptococcus: RULE033

But clause 2 (“the morphology of the organism is coccus”) was already known to be

false for ORGANISM-1, so the rule was never tried.

(35)

• Was MYCIN a success?

– Partially…

– During field evaluation, MYCIN deduced a correct treatment in 69% of all test infections

• …which is a lot better than diagnoses by average non-specialist physicians

• …but worse than diagnoses by infection specialists (~80% -

however, those specialist often disagreed such that a real evaluation is not possible as there is no “gold” standard for infection treatments)

• This result is very representative for most expert systems which

perform worse than real experts, but usually better than non-experts

– However, the system never made it into practice mainly due to legal and ethical issues

• Who is responsible (and can be sued) in case of a mistake?

9.2 MYCIN

(36)

• Vagueness is in the nature of most expert decisions

– Symptom for cavities: a person has discolored teeth.

Not at all, slightly, very,…?!

• The vagueness cannot only be modeled by an agent‟s belief in a statement, but also directly

– Fuzzy set theory (Lotfi Zadeh, 1965) – Expresses the degree of possibility

(as opposed to probability)

– Captures the idea of linguistic variables

9.3 Fuzzy Reasoning

(37)

• Crisp set membership degrees (1 or 0) are often insufficient for expressing vague concepts

– Consider the set of discolored teeth in cavities diagnosis

• Teeth with brown spots are discolored (1)

• White teeth are not discolored (0)

• What about yellowish teeth?

Depends on the degree of stain! ([0,1])

9.3 Fuzzy Reasoning

discolored

normal

(38)

• A fuzzy set is defined by membership function μ mapping the universe Ω to the unit interval [0,1]

– The normal set operations with the characteristic membership function can be easily extended for

fuzzy sets

• (μ

₁

⋂ μ

₂

)(ω) := min{μ

₁

(ω), μ

₂

(ω)}

• (μ

₁

⋃ μ

₂

)(ω) := max{μ

₁

(ω), μ

₂

(ω)}

• The complement of μ(ω) := 1- μ(ω)

– Some characteristics of Boolean algebra are preserved, others not

• E.g., distributivity holds, but DeMorgan‟s laws not…

9.3 Fuzzy Reasoning

(39)

• How can this be applied for reasoning?

– We have fuzzy facts and can deduce new (fuzzy) facts from them

– Back to our toothache example…

• Fact: Tom has yellow stained teeth.

• Rule: If a person has very discolored teeth, it is cavities.

• Does Tom have cavities..?!

• Obviously we need to relate the degree of staining with the premise of our rule

– Usually the degree is (linguistically) discretized

– Different degrees of staining have different possibilities of relating to cavities

9.3 Fuzzy Reasoning

(40)

• This leads to possibility distributions

– Only depend on the possibility that a case is described by a certain class

• Nobody would state that somebody with white teeth has

„discolored‟ teeth

• There is a possibility of 50% that a yellow stain would be considered as „discolored‟

– Are somewhat similar to probability distributions, but not depending on observed cases

• Possibility is an upper limit for probabilities

– Possibility theory introduced by L. Zadeh in 1978

9.3 Fuzzy Reasoning

(41)

• A possibility distribution assigns the possibility of a characteristic to some measureable property

– Somebody has

„discolored‟ teeth

– Somebody has

„very discolored‟ teeth

9.3 Fuzzy Reasoning

1.0 0 white yellow brown black

1.0

0

(42)

• An important feature is the ability to define hedges

– Provide operations to maintain close ties to natural

language, and allow for the mathematical generation of fuzzy statements

– The initial definition of hedges is a subjective process

• A simple example may transform the statement

„teeth are stained.‟ to „teeth are very stained.‟

– The hedge „very‟ is usually defined as μ _very (ω) := (μ(ω)) ²

• Thus, if μ

_stained

(Tom) = 0.8, then μ

very_stained

(Tom) = 0.64

– Other common hedges are „more or less‟, typically SQRT(μ(ω)), „somewhat‟, „rather‟, „sort of‟, ...

Knowledge-Based Systems and Deductive Databases – Wolf-Tilo Balke - Christoph Lofi – IfIS – TU Braunschweig 42

9.3 Fuzzy Reasoning

(43)

• Still, possibility distributions have to be linked to determining the truth of conclusions

– Idea is a conditional possibility distribution

• possibility(yellow stains | teeth are very discolored)

= truth(teeth are very discolored | yellow stains)

• The first part uses the fuzzy membership function describing the classes of all stains that are considered as discolored

9.3 Fuzzy Reasoning

(44)

• Now let‟s turn to the problem of reasoning

– Consider the general case with fuzzy sets A, A’ over Ω ₁ and B over Ω ₂

• Fact: X is A’.

• Rule: if X is A, then Y is B.

– Depending on the connection between A and A’ the inference will result in the conclusion Y is B’ with a fuzzy set B’ over Ω ₂

• How can this be calculated?

– Encode each piece of information by possibility measures corresponding to suitable fuzzy sets

9.3 Fuzzy Reasoning

(45)

• If knowledge from two or more facts with respective possibility distributions has to be combined…

– Then first the facts have to be aggregated (using min, max,…)

– Secondly, the aggregated possibility distribution has to be established (corresponding e.g., to the

conjunction of the facts)

9.3 Fuzzy Reasoning

(46)

• The actual inference process applying rules in fuzzy expert systems has usually four steps

– Fuzzification – Inference

– Composition – Defuzzification

– Called Mamdani-style fuzzy inference introduced by Ebrahim Mamdani of London University, 1975

9.3 Fuzzy Reasoning

(47)

• Fuzzification

– Membership functions defined on the input variables are applied to their actual values, to determine the degree of truth for each rule premise

• Inference

– The truth value for the premise of each rule is

computed, and applied to the conclusion part of each rule

• Either cut the consequent membership function at the level of the antecedent truth value (clipping)

• Or adjust the consequent membership function by multiplying all its membership degrees by the antecedent truth value (scaling)

– This results in one fuzzy subset to be assigned to each output variable for each rule

9.3 Fuzzy Reasoning

(48)

• Composition (or aggregation)

– Unification of the outputs of all rules

– All of the fuzzy subsets assigned to each output

variable are combined together to form a single fuzzy subset for each output variable

• Defuzzification

– It may be useful to just examine the fuzzy subsets that are the result of the composition process

– More often, this fuzzy value needs to be converted to a single crisp value (called defuzzification)

9.3 Fuzzy Reasoning

(49)

• Example: „Tom’s teeth have yellow stains‟

– Fuzzification: to what degree are they „slightly discolored‟, „discolored‟, „very discolored‟,…?

– Inference: apply all inputs to the fuzzy rules and calculate the degrees of the conclusion

• „if teeth are slightly discolored, then cavities is unlikely‟,

…, „if teeth are very discolored, then cavities is almost sure‟

• This leads to a possibility distribution for a diagnosis of cavities

9.3 Fuzzy Reasoning

(50)

– Composition: aggregate all membership degrees for the different conclusions using „⋃‟

– Defuzzification: there are several defuzzification methods, but probably the most popular one is the centroid technique

• It finds the point where a vertical line would slice the aggregate set into two equal masses (centre of gravity)

9.3 Fuzzy Reasoning

(51)

• Case-based reasoning (CBR) is a methodology for solving problems by utilizing previous

experiences

– It is not really a formal reasoning process, but relies on heuristics to arrive at conclusions

• Similar to case-based law systems using precedents…

• Or case analysis in medical treatments…

• Or repairing a car…

9.4 Case-Based Reasoning

(52)

• Examples

– Cooking banana pancakes is like cooking normal pancakes… just throw in some bananas…

– Biomimicry: imitate nature to utilize natural effects for complex engineering tasks

• E.g., how to cool houses in Africa without air-conditioning?

• Idea: the same way termites build hives

9.4 Case-Based Reasoning

(53)

• General operation

– Present the system with a problem – Search a case base for most similar

problems

– Return their solutions

9.4 Case-Based Reasoning

GUI . adaptation

case base

case retriever

case

reasoner

(54)

• Cases are records of previous experiences

– Problem specification

– Relevant attributes of the environment – Applied solution

– Benefit/success of the solution

• Representation needs to reflect all features necessary for retrieval

9.4 Case-Based Reasoning

(55)

• 4-phase model proposed by Agnar Aamodt and

Enric Plaza in 1994

9.4 Case-Based Reasoning

(56)

• Retrieve

– Given a target problem, retrieve cases from memory that are relevant to solving it

• Reuse

– Map the solution from the previous case to the target problem

• Revise

– Test the new solution in the real world (or a simulation) and, if necessary, revise

• Retain

– Store successfully adapted experiences as a new case in memory

9.4 Case-Based Reasoning

(57)

• Case retrieval is the process of finding closest cases, i.e., most similar cases, to the current case

– (Indexed) features of cases in the case base are compared to the features of the current case

– Syntactical approaches vs. semantic approaches – The hardest part of the CBR process is defining a

suitable similarity measure

• Nearest neighbor retrieval, hierarchical browsing, knowledge-guided approaches, validated retrieval, …

• Often a semi-automatic process

9.4 Case-Based Reasoning

(58)

• Case adaptation translates the retrieved solution into a solution appropriate for the current problem

– Applied in the reuse phase (basic adaptation) and in the revise phase (learning from failure)

– Often a manual process needing deeper domain understanding

– The degree of success (and thus the value for the case base) has to be measured

9.4 Case-Based Reasoning

(59)

• Case base maintenance is part of the retain phase

– The larger the case base, the more of the problem space is covered, but too many cases will degrade system performance

– Maintenance strategies are quite similar to caching strategies

– Is a case really necessary for the case base?

• How successful was its solution?

• Are there already similar cases?

• How often is a specific case used?

9.4 Case-Based Reasoning

(60)

• Comparison to rule-based systems

– Rule bases…

• Abstract knowledge in a set of production rules of the „If…Then…’-type

• Have to be acquired before the system can be used

• Applicable to a large set of general domains

• Provide proofs for derived statements

– Case bases…

• Only state specific characteristics of previous cases plus solutions

• Are built up while the system is used

• Applicable only for specific kinds of domains

• Provide arguments for derived statements

9.4 Case-Based Reasoning

(61)

• When to use case-based reasoning?

– Does the domain have an underlying model?

• Random factors cannot be captured…

– Are there exceptions and novel cases?

• Without them rules might be easier…

– Do cases recur?

• I not, there is no point in building a case base…

– Is there significant benefit in adapting past solutions?

• The reasoning process might be more expensive than actually solving the problem…