• Keine Ergebnisse gefunden

Experiments and Evaluation

6.3 Seed Behavior

We conducted a series of experiments with the tasks of investigating the behav-ior of the seed complexity and its influence on the relevant sentence retrieval for the Nobel Prize winning event (Xu et al. 2006) and the management succession event. In our experiment, we start from the entire list of Nobel Prize winners of 1998 and 1999. Our Nobel-Prize winning event seed contains four arguments:

recipient,prize name,year and area:

(6.3)

Since the seed is a semantic relation, we can also map any slot value to a number of patterns. Thus, we have generated all variants of the potential mentions of person names or areas, in order to boost the matching coverage of our seeds with the texts. For example, for the person name,Alan J. Heeger, its mentions can beAlan J. Heeger,Alan Heeger,Heeger, andA. J. Heeger. We did the same with the prize area, e.g., the mention variants of Chemistry can be chemical, sometimes the professional description chemist provides also an indication of the area. Then a seed instance looks as follows:







recipient “Alan Heeger”|“Alan J. Heeger”|“A. J. Heeger”|“Heeger”

prize “Nobel”

area “Chemistry”|“chemical”|“chemist”

year “2000”







Thus, we annotated our training texts in the Nobel Prize Domain with the entity mentions of the seed events automatically, usingSProUT. Then all sen-tences containing entity mentions of the seeds are extracted by our system. The extracted sentences are sorted by the number of event arguments contained:

quaternary, ternary and binary complexity. A sentence with quaternary com-plexity is a sentence containing all four arguments of one event seed. Within ternary complexity and binary complexity, we classify them into different groups according to the entity class combination, e.g., < person, area, time >, <

person, prize, area >,< person, area >, etc. Then we evaluated whether these

6.3 Seed Behavior 110 sentences are about the Nobel Prize winning event. In Table 6.2, we show the distribution of the seed complexity in the sentences describing the events.

complexity matched sentence relevant event extent precision %

4-ary 36 34 94.0

3-ary 110 96 87.0

2-ary 495 18 3.6

Table 6.2: Nobel Prize domain: distribution of the seed complexity For the entity-class combinations, e.g., 3-ary and 2-ary, the projections of the target relation, we also carried out a distribution count, presented in Table 6.3.

combination matched sentence relevant event extent precision % (3-ary, 2-ary)

person, prize, area 103 91 82.0

person, prize, time 0 0 0.0

person, area, year 1 1 100.0

prize, area, year 6 4 68.0

person, prize 40 15 37.5

person, area 123 0 0.0

person, year 8 3 37.5

prize, area 286 0 0.0

prize, year 25 0 0.0

area, year 12 0 0.0

Table 6.3: Nobel Prize domain: distribution of relation projections Table 6.2 tells us that the more event arguments a sentence contains, the higher the probability is that the sentence is an event extent. Table 6.3 shows the difference between different entity class combinations with respect to the event identification. We can potentially regard these values as additional validation criteria for event extraction rules. Whereas Table 6.2 helps us preestimate the contribution of the different arity classes for successful event extraction, Table 6.3 shows us which types of incomplete seeds might be most useful. Both distributions, especially the second one, will be very much dependent on the kind of relations to be extracted. Such seed analyses could be used to better characterize a given relation-extraction task.

The target relation in the management succession domain is a little more prob-lematic than the target Nobel Prize award relation, since the same entity cept person can assume the role either of personIn or personOut. We con-structed two relation instance sets for the evaluation of the seed behavior. The relation instances are extracted from the gold-standard annotation.

Experiments and Evaluation 111

ambiguous set: a set of relation instances where the same person in the same corpus occurs also as personIn in a relation instance and has the personOut role in another relation instance.

unambiguous set: a set of relation instances where a person has only personIn or just personOut role in the corpus.

There are 60 instances occurring in the corpus belonging to the ambiguous set, while 55 instances belong to the unambiguous set. At first, we put the two sets of instances together and calculated the general distribution of the seed complexity.

complexity matched sentence relevant event extent precision %

4-ary 21 19 90.4

3-ary 102 77 75.4

2-ary 206 86 40.7

Table 6.4: Management succession: distribution of the seed complexity Table 6.4 confirms our interpretation of Table 6.2 that the greater the arity of the seed relation, the higher the precision of relevant sentence retrieval. How-ever, both tables also show that the less complex projections of the target relation help find more relevant sentences. Therefore, the relation projections play an important role for the improvement of the recall value. For the entity class combinations of 3-ary and 2-ary, we also carried out a distribution count for the two different seed sets, presented in Table 6.5 and Table 6.6.

combination matched relevant precision %

(3-ary, 2-ary) sentence event extent

personIn, personOut,organization 6 6 100.0

personIn, personOut,position 10 7 70.0

personIn, organization,position 26 20 76.9

personOut, organization,position 13 9 69.2

personIn, personOut 12 11 91.7

personIn, organization 40 11 27.5

personIn, position 19 8 42.1

personOut, organization 25 4 16.0

personOut, position 6 2 33.3

organization, position 0 0 0.0

Table 6.5: Ambiguous set: distribution of relation projections

If we ignore the combination cases in the unambiguous set where no matches are found, the results of the two entity combinations in Table 6.6 are in general

6.3 Seed Behavior 112

combination matched relevant precision

(3-ary, 2-ary) sentence event extent %

personIn, personOut,organization 8 4 50.0

personIn, personOut,position 12 8 66.7

personIn, organization,position 15 15 100.0

personOut, organization,position 12 8 66.7

personIn, personOut 21 11 52.4

personIn, organization 11 0 0.0

personIn, position 16 9 56.3

personOut, organization 14 8 57.1

personOut, position 8 6 75.0

organization, position 0 0 0.0

Table 6.6: Unambiguous set: distribution of relation projections

much better than those in Table 6.5. This means that the unambiguous relation instances are better seed candidates than the ambiguous relation instances for finding the relevant event extents. Furthermore, we also compared the projec-tions containing both personIn and personOut with the projecprojec-tions containing only one person role, either personIn or personOut. It turns out that the pro-jections with two person roles on the average achieve better precision (73.3%) than the projections with only one person role (48.7%). This gives us a very useful insight into the domain and confirms our discussion about the ambiguous seed example in section 5.3 of the previous chapter:

A relation instance whose arguments play unambiguous semantic roles in the corpus or which is unambiguous is a better seed can-didate for learning unambiguous patterns than relation instances which have potential ambiguities.

An interesting side effect of this study is the observation that there is almost no sentence in the corpus containing only the argument pair organization and position.

Seed construction analysis helps us learn the characteristics of a relation, and its projections and potential influence on the pattern extraction quality.

Experiments and Evaluation 113