Evaluation of Candidate Retrieval - Entity Linking to Wikipedia

4.8 Evaluation

4.8.2 Evaluation of Candidate Retrieval

First, we evaluate the quality of the proposed candidate retrieval. In these exper-iments, we omit candidate consolidation through the Ranking SVM and set the prediction to the top ranked candidate returned by the index search, i.e.

e(m) = arg max

e∈e^∗(m)

sW(q, e), (4.23)

1This analyzer is also available for other languages, e.g. German and French.

138

4.8 Evaluation

where the queryqis formed according to Alg. 2. This corresponds to an unsupervised entity linking model for which we here show performance in Ratinov et al.’s BoT measure, i.e. F_BoT, to be consistent with the results published in Pilz and Paaß [2012]. For this model, we evaluate the influence of search coverage as well as the effect of prioritization on e^coh candidates retrieved from collective search for the different weighting factors of cross coherence.

For evaluation, we distinguish among the following degrees of search coverage that treat the different attributes of a mention as described in Section 4.5:

1. name coverage (SC_n^∗, SC_n): We use only the surface form of the mention name(m)to place queries against alias fields (line 5 in Alg. 2). We evaluate the name of a mention both in its original form (SC_n^∗) as well as in the expanded form (SC_n). Here, we use no other information such as type or context in the query terms.

2. name and type coverage (SC_nt): We extend the query with terms treating the type type(m) as assigned to a mention through the NER model. This information may be missing for some entities but if available activates line 7 in Alg. 2.

3. name, type and context coverage (SC_ntc): In the full search coverage, we additionally query context fields using the mention context text(m). This additionally activates line 9 in Alg. 2.

4. prioritization: To evaluate the quality of the candidates retrieved from col-lective search, we prioritize on the candidate e^coh(m) using the baseline cross coherence weight coh_SRL* (Eq. 4.11). This activates line 4 in Alg. 2.

Search coverage is evaluated cumulatively, i.e. experiments using type information use expanded names, experiments using contextual evidence use expanded names and type information. We use the baseline cross coherence weight coh_SRL* for the collective search candidate and evaluate the effect of different weight factors sepa-rately.

Tab. 4.5 shows the results obtained for different search coverages on the benchmark corpora. In the table, the figure left from the arrow is obtained for varying degrees of search coverage, the figure right from the arrow shows how the performance is influenced when we additionally use collective search and a prioritization on e^coh candidates.

We observe that the proposed name expansion in SC_n has a positive effect and generally increases performance or at least yields similar results to the usage of the original name. The increase in performance is the highest on MSNBC which can be explained by the annotation scheme: for this corpus, all entity mentions are to be linked and not only the first ones that typically use the full name of the underlying entity. Since later mentions of an entity in a document are often abbreviated, name expansion is especially useful.

Chapter 4 Local and Global Search for Entity Linking

Table 4.5: Influence of search coverage and prioritization on e^coh candidate on F_BoT performance (all values in %). We omit candidate consolidation and use the candidate with highest score sIW as prediction (cf. Eq. 4.23). The figure left from the arrow is obtained without prioritization one^coh, the figure right from the arrow is obtained with prioritization. Apart fromIITB, candidate prioritization consistently improves performance on all corpora.

increased search coverage

−−−−−−−−−−−−−−−→

corpus SCn^∗ SCn SCnt SCntc

MSNBC 75.20% 76.75 77.86% 78.98 77.97% 78.92 77.86% 79.43 ACE 76.17% 78.03 76.44% 78.19 76.43 %78.19 76.98% 78.03 AQUAINT 81.27 %81.80 81.16% 81.69 80.61% 81.69 81.58& 80.97 CoNLLb 64.27% 68.34 65.17% 69.02 66.32% 74.43 70.18% 77.10 IITB 75.90& 75.48 76.13& 75.60 76.14& 75.63 73.67& 72.49

Interestingly, Cucerzan [2007] reported that his title&redirect baseline using exact matches in combination with the EMP prior achieved an accuracy of 51.7% on MSNBC. This is notably lower compared to the accuracy value of 63.7% obtained with our name baseline that does not even yet use EMP.

The usage of type information (SC_nt) has only marginal influence. This is not surprising considering our model design: we did not focus on this attribute in order to avoid type dependency and error propagation from NER models. Only onCoNLLb we observe a slight increase of about 1 pp in performance. Note that this corpus was designed for the evaluation of NER models and contains high quality manual annotations of named entity types. Given that for the other corpora the influence of this attribute is negligible and does also not dramatically decrease performance, we argue that its usage is in general acceptable.

We find that the usage of contextual information (SCntc) is also helpful in general.

ForCoNLLb we observe the highest influence of contextual information, boosting performance by about 5 pp compared to the purely name based search, and by about 4 pp compared to the search using type information. We assume that this is because this corpus is from editorial news documents where authors use canonical names and give special attention to clarify the ambiguity of mention by provid-ing disambiguation terms close to the mention. On web documents such as IITB this may be missing. Here, somewhat counterintuitively the usage of contextual information leads to a notable decrease in performance.

Concerning candidate prioritization, we find a general improvement in perfor-mance on most of the corpora and nearly all configurations of search coverage. The highest increase is again observed onCoNLLbwith up to 7 pp but also on the other corpora with an average increase of about 1 pp. Again,IITBis the exception. This may stem from the comparably high percentage of mentions denoting missing

enti-140

4.8 Evaluation

Table 4.6: FBoT performance on the benchmark corpora for different cross co-herence weights using full search coverage SC_ntc (all values in %). We omit can-didate consolidation and use the cancan-didate with highest score sIW as prediction (cf. Eq. 4.23). We observe no notable difference among the weighting factors, for MSNBC there is no difference at all.

weighting factors in cross coherence coh×

corpus coh_SRL* coh_τSRL* coh_cos_SRL* coh_cos MSNBC 79.43 79.43 79.43 79.43

ACE 78.03 78.03 77.54 77.54

AQUAINT 80.09 80.83 80.78 80.86 CoNLLb 77.10 77.38 77.35 77.37

IITB 72.49 72.39 72.52 72.57

ties in this corpus. Note that we did not manually check the existence of the given ground truth entities and thus missing entities may indeed denote also truly uncov-ered entities. In that case, collective search can not retrieve as many relevant source entities since this fraction is anti-proportional to the number of matches on these entities link text fields. Another explanation is that for this corpus, entities are just not as semantically related as for the other corpora. Interestingly, this goes along with the results published in Kulkarni et al. [2009] showing that their collective approach performs only one point in percentage better than a local name baseline using popularity priors.

To summarize the findings so far, we observe that the more information we use, the better the performance of the linking model in general. Thus we evaluate now the different weighting factors for cross coherence (Eqs. 4.11 to 4.13 and 4.15) using the full search coverageSC_ntc. As Tab. 4.6 shows, the influence of the proposed weight-ing factors is not strikweight-ing. Comparweight-ing to the baseline coh_SRL* using no additional weight on semantic relatedness, we find no difference for MSNBC and only minor improvements for the other corpora when varying the weighting scheme. However, this result is not very surprising, since the influence of cross-coherence weights on purely search based prediction is not very strong. It affects only the identity of the collective search candidate and as we see from the obtained results, this does not happen often. Nevertheless, we will still evaluate the different weighting schemes in the experiments on candidate consolidation. As stated in Section 4.7, these weights are used in two dedicated features and thus may have higher influence in the context of candidate consolidation.

For a better interpretability of cross coherence influence, we also analysed the average cross coherence of the ground truth entities in the benchmark corpora. The results are given in Tab. 4.7. Independently of the used weight, the average cross coherence is the highest onCoNLLb. This may be due to the underlying nature of

Chapter 4 Local and Global Search for Entity Linking

Table 4.7: For each of the proposed cross coherence weights, the table shows the average cross coherence of ground truth entities in the benchmark corpora. The resulting values are strongly correlated (p < 0.02) even though coh_cos does not use relational information from SRL^*.

weighting factors in cross coherencecoh×

corpus coh_SRL* coh_τSRL* coh_cos_SRL* cohcos

MSNBC 0.381 0.179 0.104 0.215

ACE 0.354 0.139 0.096 0.211

AQUAINT 0.310 0.119 0.070 0.161 CoNLLb 0.402 0.208 0.134 0.262

IITB 0.224 0.087 0.041 0.112

the documents at hand, but we should also keep in mind that this corpus was chosen as reference to demonstrate the effect of relational or collective information (Hoffart et al. [2011b]). Interestingly, the average coherence is the lowest onIITB, a dataset that was also published by a collective approach (Kulkarni et al. [2009]). We assume that here the average coherence is low because of the comparably high number of mentions per document. Fig. 4.3 depicts the average cross coherence over ground truth entities in relation to the increase in F_BoT performance when prioritizing on e^coh candidates. The figure implies that the increase in performance is related to the average cross coherence and that a high value results in a high increase inFBoT

performance.

To summarize, we observe that unsupervised entity linking using only the name of a mention already achieves fairly good results. The FBoT ranges between 64%

and 81% with an average of about 75% across the different corpora. This is due to our carefully chosen alias resources as well as the title&redirect baseline candi-date e^loc(m). We can increase performance through the usage of collective search candidates but this increase varies by the corpus at hand. This goes along with the results obtained by Varma et al. [2009] who showed that, while depending on the corpus at hand, an elaborate candidate selection method has major impact on performance. This may either reduce the required complexity of the consecutive candidate consolidation or may even render it obsolete.

However, since we did not incorporate any kind of threshold on the index score sI_W, this unsupervised linking model can not handle uncovered entity mentions.

Instead of empirically determining this threshold, we prefer learning an appropri-ate candidappropri-ate consolidation model. The training procedure as well as the results obtained with this candidate consolidation model will be described next.

142

4.8 Evaluation

cohcos cohSRL* coh_τSRL* cohcosSRL*

●●

●

●●

●

● 0.0

2.5 5.0 7.5

0.1 0.2 0.3 0.4 0.1 0.2 0.3 0.4 0.1 0.2 0.3 0.4 0.1 0.2 0.3 0.4 average cross coherence cohx over ground truth entities e⁺(mi) increase in FBoT using ecoh

corpus

●

● MSNBC ACE AQUAINT CONLL IITB

Figure 4.3: The figure shows for each cross coherence weight the averaged cross coherence of ground truth entities in the benchmark corpora in relation to the in-crease inF_BoT performance when prioritizing one^cohcandidates. The figure implies that the increase in performance is related to the average cross coherence and that a high value may result in a high increase in F_BoT performance.

Im Dokument Entity Linking to Wikipedia (Seite 152-157)