Synonym Detection in Knowledge Graphs - Resolving Representation Heterogeneity in Real-World Kn

3.5 Conclusion

4.1.2 Synonym Detection in Knowledge Graphs

To the best of our knowledge, the research landscape for synonymous relation detection in knowledge graphs is scarce.

Abedjan and Naumann [1] argue that synonyms are problematic for querying since only partial results may be returned. To overcome this issue, they propose a query expansion usingsynonymously used relations. Two relations are synonymously used if one relation may replace another relation in a specific domain of interest.

As an example, they use the relations artist and starring. These relations are not synonymous (as defined by us in the next section). In the context of movies, both relations are used synonymously since they describe the relationship between a movie and an actor. Some synonymously used relations are only synonymous in specific contexts and thus could be defined as hypernyms or hyponyms. As an example,company andrecordLabel may be used synonymously for music, but an integration of both relations for the complete knowledge graph is not suitable. In this work, we do not use their definition for synonymously used relations but rely on

clear-cut synonymous relations that are generally valid as defined in the next section.

Furthermore, restricting to clear-cut synonyms is easier to evaluate manually. Since the work by Abedjan and Naumann is the only available previous work at the point of publication of our work, we used it as a baseline.

The method that is proposed by Abedjan and Naumann is based on association rule mining. Overall, the proposed method consists of three steps: (1) All relation pairs with similar object ranges, i.e., a high proportion of overlapping object entities, are computed. This involves the computation of frequent itemsets using a minimum support value. (2) Relation pairs are filtered by the type of object entities. Similar to the first step, a high overlap in their type distribution is needed. (3) Synonymously used relations should not occur for the same subject entity. To achieve this goal, a reversed correlation coefficient is computed. (4) All relation pairs are ranked by the reversed correlation coefficient. Pairs with high rank are most likely to be synonymously used.

Abedjan and Naumann evaluate their approach on small and manually created datasets from DBpedia, Magnatune, and Govwild. The achieved precision values are extremely dependent on an input minimum support value and the dataset it is evaluated on. While on DBpedia in one setting, only a precision of 15% is achieved, on Magnatude, 100% precision is possible. In the end, the authors show that the second step (type filtering) of their method is not applicable for datasets with missing type information and is not improving the overall precision. Hence, in our evaluation, we left out the type filtering step and only implemented the first, third, and fourth steps.

Another approach for identifying synonymous relations was proposed by Zhang et al. [123, 124, 125]. The general idea of the approach presented in the three papers is similar to ours. It is a purely data-driven approach, which does not require any domain knowledge, and it is independent of the language since no string metrics are used for computing the similarity.

The matching approach identifies synonymous relations in a knowledge graph in three steps: (1) At first, a blocking technique is used to prevent a quadratic comparison between all pairs of relations. Only the similarity of relations having the same entity types in the subject entities are compared. (2) Three similarity measures are computed. This is a triple overlap, a subject overlap, and a cardinality comparison. The three measures are integrated into a single similarity measure.

(3) Synonym groups of relations are formed by an agglomerative clustering using the similarity scores from the previous step. The output of the algorithm is clusters of relations, where each cluster contains only synonymous relations.

Overall, several experiments have been evaluated the performance of the technique in three different publications. In the first publication, experiments for evaluating the performance on synonymous relation detection have been performed on the datasets DBpedia and Syndice [123]. The precision in these small-scale experiments is over 70% for DBpedia at high recall values and over 85% for Syndice. In another publication, the approach has been extended to work for detecting synonymous relations to perform query expansion on DBpedia [125].

4.1 Related Work

In the corresponding journal publication in [124], an extensive evaluation and discussion of the approach are presented. It involves a detailed evaluation of the performance of the approach on different DBpedia datasets and typical sources of errors.

Unfortunately, the implementation of the approach was not available for comparison. However, since the approach relies heavily on overlapping triples for the synonymous relations, we think the results should show the same weaknesses as our baseline by Abedjan and Naumann [1].

After the publication of our work, another approach for measuring semantic similarity between relations in knowledge graphs has been published [23]. Relation similarity is expressed over head-tail entity probability distributions using a feed-forward neural network. The similarity of two relations is computed by their Kullback-Leibler divergence.

As a baseline, the authors chose to compare to knowledge graph embedding approaches similar to what we propose in Section 4.2. The authors argue that their approach has two important advantages to knowledge graph embeddings:

(1) Embeddings have a fixed dimension size for every relation and (2) the comparison of probability distributions has better interpretability than simply measuring the distance in the knowledge graph embeddings. Even though the evaluation of this work comprises several experiments on different datasets and a hand full of different applications, only two embedding baselines are used (TransE and DistMult). Both of them have only an average precision in our experiments presented in Section 4.2.

To give a better inside into the results of this work, we go into the details of their various experiments.

1. Human Judgement: In a first experiment, the similarity computations are compared to human judgments by computing the Spearman correlation on the judgments. While the presented approach shows a correlation of 0.63, DistMult achieves almost a 0.60 correlation.

2. Synthetic Synonyms on Wikidata: In this experiment, synonym detection is tested on synthetically created synonyms similar to our experiments in Section 4.2 and Section 4.3. In this experiment, the precision of their method is 65% at a recall of 50%. These are slightly better results than we show in our experiments on Wikidata in Section 4.2, but significantly worse results than our method in Section 4.3 However, since this work’s experimental setup is slightly different, a detailed comparison to the results of our work is not possible.

3. Synonym Detection on the ReVerb Dataset: The ReVerb dataset is a large set of triples extracted from natural language text by an open information extraction method. Hence, the triples have no schema, and synonymous relations may have different IRIs. For their evaluation, the authors perform an approximation of precision and recall using a sampling method. The presented methods achieve a precision of around 30% for various recall values, while the knowledge graph embedding-based approaches show results below 10%

precision.

4. Relation Prediction on FB15K: Relation prediction is about predicting new triples based on existing triples in a knowledge graph. For TransE [12], it is evaluated how relation similarity may be included into negative filtering for the embedding training to improve the prediction quality. Indeed, the quality for prediction on the FB15K knowledge graph completion dataset is improved by a tiny percentage.

5. Relation Extraction on TACRED: Similar to the relation prediction case, in this experiment, the relation similarity is integrated as an adaptive margin into the soft margin-loss of a relation extraction technique. The results show a slight improvement for one of the methods.

Overall, some of the experiments show that the presented methods are superior to knowledge graph embedding-based approaches. In other experiments, both methods seem to be on par. However, significantly fewer embedding methods have been evaluated, and the overall experimental setup is different. An interesting takeaway from this work is that synonymous relation detection can be used in various applications to improve the results of existing systems.

Im Dokument Resolving Representation Heterogeneity in Real-World Knowledge Graphs (Seite 65-68)