The Abduction Algorithm - Deep-level Text Interpretation as Abduction

Deep-Level Interpretation of Text

2.5 Deep-level Text Interpretation as Abduction

2.5.3 The Abduction Algorithm

The abduction algorithm (see Algorithm 1, p. 54) is at the heart of the DLI process and has two main tasks, which are hypothesis construction and hypothesis selection.

The main function, called compute explanations, is in charge of finding explanations for an observation (γ). This function requires the following elements as input: a domain ontology O, a set of rules R, an Abox A_x associated to the document object d_x being interpreted, an Abox assertion γ representing the observation that should be explained, an Abox Γ₂ with the rest of the observations obtained by SLI, and a preference function Ω.

As will be explained in Section 2.5.4, DLI distinguishes betweenbona fide (Γ1) andfiat (Γ₂) observations from Γ. Observations in Γ₂should be explained. Distinguishing between bona fide and fiat observations is a design decision. This design decision is specified by means of rule heads in R which define the space of abducibles. A_x contains Γ₁.

The preference function Ω is used to configure the ampliative capabilities of the abduc-tion process, i.e., to reduce (or enlarge) the number of possible explanaabduc-tions by influencing the variable substitution functionσ. This will be explained later. As an example, consider the following input

O= The AEO ontology from Appendix A (see page 145) R= The rules in Appendix B (see page 149)

A_x= all assertions from Figure 2.17 (page 40) γ= (hjN ame₁, city₁) :sportsN ameT oCityN ame Γ₂= all assertions from Figure 2.19 (page 41) minusγ Ω = :reuse-old :one-new-ind

54 CHAPTER 2. DEEP-LEVEL INTERPRETATION OF TEXT

Algorithm 1The Abduction Algorithm

1: functioncompute explanations(O,R, Ax, γ,Γ2,Ω)

2: γ⁰:=transf orm into query(γ), //whereγ= (z) :P andγ⁰ =P(z)

3: Θs:={{Q₁(σ(Y₁)), . . . , Q_n(σ(Y_n))} | ∃r∈ Rwithr:=P(X)←Q₁(Y₁), . . . , Q_n(Y_n),

∃σsuch thatP(σ(X)) =γ⁰} 4: ∆s:={∆| ∃Θ∈Θs,

individuals={inds(Γ₂)∪inds(A_x)∪inds(γ)∪ {new_i|i={1, . . . , n}, n=|vars(Θ)|}},

∃σ∈get substitutions(vars(Θ),Ω, individuals) such that

∆ ={Q1(σ(Y1)), . . . , Qn(σ(Yn))},Σ∪transf orm(∆)∪Ax2⊥}

5: ∆s1:= sup(poset(∆s, λ(x, y)•S(x,O, Ax)< S(y,O, Ax))) 6: ∆s₂:={}

7: for all∆∈∆s1 do 8: ∆⁰:={}

9: for allα∈∆ do

10: if {()|α} is false w.r.t. Othen 11: ∆⁰:= ∆⁰∪transf orm(α) 12: end if

13: end for

14: ∆s2:= ∆s2∪ {∆⁰} 15: end for

16: ∆s3:={∆∈∆s2| ¬∃∆⁰ ∈∆s2 such thatO ∪∆⁰ |= ∆}

17: return∆s3

18: functionS(∆,O, Ax) 19: Sc=| {α∈∆| O ∪Axα} | 20: Ss=| {α∈∆| O ∪Ax2α} | 21: S=S_c−S_s

22: returnS

The functiontransform into query is applied to the Abox assertion γ and returns the corresponding query atom of the form P(z).

γ⁰:=sportsN ameT oCityN ame(hjN ame₁, city₁)

The next step (see line 3 of Algorithm 1, page 54) is to identify rules that apply in a backward way, i.e., from the head (P(X)) to the body of a rule (Q₁(Y₁), . . . , Q_n(Y_n)).

They are identified if there exists a substitution (P(σ(X))) that matches γ⁰.

For this example, and according to the set of rules in Appendix B (page 149), three matching heads are found, such that body substitutions are obtained for each rule. The result is a set Θs of atom sets Θ as follows

Θs:={{HighJ umpCompetition(Z),

hasN ame(Z, hjN ame₁), HighJ umpN ame(hjN ame₁), takesP lace(Z, city₁), CityN ame(city₁)}

{P oleV aultCompetition(Z),

hasN ame(Z, hjN ame1), P oleV aultN ame(hjN ame1), takesP lace(Z, city1), CityN ame(city1)}

{SportsCompetition(Z),

hasN ame(Z, hjN ame1), SportsN ame(hjN ame1), takesP lace(Z, city1), CityN ame(city1)} }

The next step (see line 4) is to find substitutions for the variables in each Θ ∈ Θs to obtain a corresponding ∆. For this purpose the function get substitutions is used. It gets as input the variables from Θ, obtained from a call to the function vars, a set of individuals composed of all individuals in the knowledge base, i.e, individuals from Γ2, Ax

and γ, obtained with the help of the functioninds, and a set of freshly created individuals for each variable in Θ (|vars(Θ) |), and finally the function Ω.

The function Ω is used to set a priority over the elements of the set individuals used by the substitution function σ. In this example the preference function (Ω: :reuse-old :one-new-ind⁷) prioritizes the use of individuals that already exist in the knowledge base (individuals from Γ2, Ax, γ) over the newly created individuals, as long as such a substitution is consistent w.r.t. to the background knowledge (Σ ∪ σ(Y) : Q 2 ⊥). In

7In RacerPro, the preference function Ω can be configured by using the following commands: “:c-mode (:reuse-old :one-new-ind)” for concepts, and “:r-mode (:reuse-old :one-new-ind) ” for roles. “:reuse-old”

refers to the use of existing assertions and has a higher priority than “:one-new-ind” that allows to create a new individual.

56 CHAPTER 2. DEEP-LEVEL INTERPRETATION OF TEXT this way, the result of get substitutions is a set of ∆ containing a set of substitutions {Q₁(σ(Y₁)), . . . , Q_n(σ(Y_n))}, such that once they are transformed into assertions of the form {(z₁) : Q₁, . . . ,(z_n) : Q_n}, with the help of the function transf orm, then Σ ∪ transf orm(∆)∪A_x 2 ⊥ should hold. In this way, the set ∆s has only explanations ∆ wich are consistent w.r.t. the knowledge base.

In the running example, all individuals in Figure 2.17 (page 40) are used in the substi-tution process. But all substisubsti-tutions result in Aboxes which are inconsistent with respect to the Tbox due to disjointness axioms (DLCv ¬SLC). For this reason a freshly created individual (new₁) is used for substitution, such that the following ∆s is created

∆s={

∆1={new1:HighJ umpCompetition,

(new₁, hjN ame₁) :hasN ame, hjN ame₁:HighJ umpN ame, (new₁, city₁) :takesP lace, city₁:CityN ame}

∆2={new1:SportsCompetition,

(new1, hjN ame1) :hasN ame, hjN ame1:SportsN ame, (new₁, city₁) :takesP lace, city₁:CityN ame}

}

Note that the second set of atoms in this example (see second element of Θs in page 55) produces the following explanation which is inconsistent w.r.t. Σ

∆ ={new1:P oleV aultCompetition,

(new1, hjN ame1) :hasN ame, hjN ame1:P oleV aultN ame, (new1, city1) :takesP lace, city1:CityN ame}

The universal restriction over the rolehasN ame, where the range should be an instance of typeP oleV aultN ame, enforceshjN ame₁ to be of typeP oleV aultN amewhich produces an inconsistency due to disjointness axioms in the Tbox

P oleV aultCompetition v SportsCompetition

u ∀hasP art.P oleV aultRound u ∀hasN ame.P oleV aultN ame HighJ umpN ame v SportsN ameu ¬P oleV aultN ame

A ∆ that is inconsistent w.r.t. Σ is discarded. In this way, reasoning services and Σ are used to restrict the number of explanations. This helps in keeping the consistency constraintintroduced before (see page 50). Moreover, this example shows that disjointness axioms in the Tbox are relevant during the variable substitution process.

Continuing with the explanation of the abduction algorithm, line 5 shows how to obtain preferred explanations. The set of explanations in ∆_s is transformed into a poset and all explanations with a score lower than the highest score found in the poset are discarded.

The function S computes a score (S :=S_c−S_s) that meets the criteria of simplicity and consilience following the definitions described in Section 2.5.2. S_c represents the number of assertions in ∆ that are entailed in O ∪Ax, and therefore are not hypothesized. Ss is the number of hypothesized assertions in ∆. The score formula can be paraphrased as follows, the less hypothesized assertions an explanation contains (simplicity, i.e., S_s) and the more bona fide assertions a theory (∆) explains (consilience, i.e., S_c), the higher its preference score will be. In this example, the following scores are obtained

Sc(∆1) = 2, Ss(∆1) = 3, S(∆1) =−1 Sc(∆2) = 2, Ss(∆2) = 3, S(∆2) =−1

In this example, both explanations have the highest score such that the set ∆s2 has more than one explanation ∆. The following steps, from line 6 until line 15, reduces the size of each ∆ in ∆s₂ to obtain minimum explanations, i.e., containing only assertions that are hypothesized, such that {α ∈ ∆ | {() | α} is false w.r.t. O}. Continuing with the example, ∆s₂ is as follows

∆s2={

∆1={new1:HighJ umpCompetition,

(new1, hjN ame1) :hasN ame, (new1, city1) :takesP lace}

∆₂={new1:SportsCompetition,

(new1, hjN ame1) :hasN ame, (new1, city1) :takesP lace}

}

Finally, the ∆s in ∆s₂ are compared for entailment, such that the most specific ∆(s) are preferred. The following entailment relation exists between ∆₁ and ∆₂: Σ∪∆₁ |= ∆₂. The abductive retrieval inference service returns ∆₁ as the most preferred explanation, since it provides more information than ∆₂.

return:

∆₁={new₁:HighJ umpCompetition,

(new₁, hjN ame₁) :hasN ame, (new₁, city₁) :takesP lace}

With the previous example, we explained the Abox abduction algorithm. In the following section we are now able to explain the DLI process which relies on Abox abduction.

58 CHAPTER 2. DEEP-LEVEL INTERPRETATION OF TEXT

Im Dokument Content Management and Knowledge Management: Two Faces of Ontology-based Deep-Level Interpretation of Text (Seite 63-68)