Semantic Classes in Lexical Semantic Resources

5. Application to Unsupervised Semantic Frame Induction

6.1 Semantic Classes in Lexical Semantic Resources

A semantic class is a set of words that share the same semantic feature (Kozareva, Riloff, and Hovy 2008). Depending on the definition of the notion of thesemantic feature the granularity and sizes of semantic classes may vary greatly. Examples of concrete semantic classes include sets of animals (dog, cat, . . . ), vehicles (car, motorcycle, . . . ), and fruit trees (apple tree, peach tree, . . . ). In this experiment, we use a gold standard derived from a reference lexicographical database, namely WordNet (Fellbaum 1998).

This allows us to benchmark the ability of WATSETto reconstruct the semantic lexicon of such a reliable reference resource that has been widely used in NLP for many decades.

6.1.1 WordNet Supersenses.The first dataset used in our experiments consists of 26 broad semantic classes, also known assupersensesin the literature (Ciaramita and John-son 2003): person,communication,artifact, act,group, food,cognition,possession,location, substance, state, time,attribute, object, process, process, tops, phenomenon, event, quantity, motive,animal,body,feeling,shape,plant, andrelation.

This system of broad semantic categories was used by lexicographers who orig-inally constructed WordNet to thematically order the synsets; Figure 14 shows the distribution of the 82,115 noun synsets from WordNet 3.1 across the supersenses. In our experiments in this section, these classes are used as gold standard clustering of

34We used the DKPro Agreement tookit by Meyer et al. (2014) to compute the inter-annotator agreement.

35The examples are from the filetriw2v-watset-n30-top-top-triples.txtis available in the

“Downloads” section of our GitHub repository athttps://github.com/uhh-lt/triframes.

noun.motive noun.tops noun.shape noun.feeling noun.relation noun.phenomenon noun.processnoun.time noun.possession noun.event noun.quantitynoun.object noun.body noun.food noun.group noun.cognition noun.substance noun.attribute noun.locationnoun.state noun.communication noun.act noun.animalnoun.plant noun.person noun.artifact

0 4000 8000 12000

Number of WordNet synsets

WordNet supersense

Figure 14

A summary of the noun semantic classes in WordNet supersenses (Ciaramita and Johnson 2003).

word senses as recorded in WordNet. One can observe a Zipfian-like power-law (Zipf 1949) distribution with a few clusters, such asartifactandpersonaccounting for a large fraction of all nouns in the resource. Overall, in this experiment we decided to focus on nouns as the input distributional thesauri used in this experiment (as presented in Section 6.2) are most studied for modelling of noun semantics (Panchenko et al. 2016b).

The WordNet supersenses were applied later also for word sense disambiguation as a system of broad sense labels (Flekova and Gurevych 2016). For BabelNet, there is a similar dataset called BabelDomains (Camacho-Collados and Navigli 2017) produced by automatically labeling BabelNet synsets with 32 different domains based on the topics of Wikipedia featured articles. Despite the larger size, however, BabelDomains provides only a silver standard (being semi-automatically created). We thus opt in the following to use WordNet supersenses only, since they provide instead a gold standard created by human experts.

6.1.2 Flat Cuts of the WordNet Taxonomy.The second type of semantic classes used in our study are more semantically-specific and defined as subtrees of WordNet at some fixed path length ofd steps from the root node. We used the following procedure to gather these semantic classes.

First, we find a set of synsets that are located a exactly distance ofdedges from the root node. Each such a starting node, e.g., the synsetplant_material.n.01, identifies one semantic class. This starting node and all its descendants, e.g., cork.n.01, coca.n.03, ethyl_alcohol.n.1, methylated_spirit.n.01, and so on, in the case of the plant material example, are included into the semantic class. Finally, we remove semantic classes that contain only one element as our goal is to create a gold standard dataset for clustering.

Figure 15 illustrates distribution of the number of semantic classes as a function of the path length from the root. As one may observe, the largest number of clusters is obtained for the path length d of 7. In our experiments, we use three versions of these WordNet “taxonomy cuts” which correspond to d∈ {4,5,6}, since the cluster

3 22 227 2017

5737 11274

19088 16869

14430 11450

6916 3896

20061086 573 458 212 42 1 0

5000 10000 15000 20000

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19

Path length from the WordNet root, d

Number of semantic classes

Figure 15

Relationship between the number of semantic classes and path length from the WordNet (Fellbaum 1998) root. We have chosend∈ {4,5,6}for our experiments.

Table 17

Examples of semantic classes extracted from WordNet hierarchy of synsets for the path length d= 5from the root synset.

Root Synset Child Synsets

rock.n.02 aphanite.n.01, caliche.n.02, claystone.n.01, dolomite.n.01, emery_stone.n.01, fieldstone.n.01, gravel.n.01, ballast.n.02, bank_gravel.n.01, shingle.n.02, greisen.n.01, igneous_rock.n.01, adesite.n.01, andesite.n.01, . . .63 more entries. . . , tufa.n.01

toxin.n.01 animal_toxin.n.01, venom.n.01, kokoi_venom.n.01, snake_venom.n.01, anatoxin.n.01, botulin.n.01, cytotoxin.n.01, enterotoxin.n.01, nephrotoxin.n.01, endotoxin.n.01, exotoxin.n.01, . . .19 more entries. . . , ricin.n.01

axis.n.01 coordinate_axis.n.01, x-axis.n.01, y-axis.n.01, z-axis.n.01, ma-jor_axis.n.01, minor_axis.n.01, optic_axis.n.01, principal_axis.n.01, semimajor_axis.n.01, semiminor_axis.n.01

sizes generated at these levels are already substantially larger than those from the supersense dataset while providing a complementary evaluation at different levels of granularities. Although at some levels, such asd= 2, the number of semantic classes is similar to the number of supersenses (Ciaramita and Johnson 2003), there is no one-to-one relationship between them. As Richardson, Smeaton, and Murphy (1994) points out, this cut-based derivative resource might bias towards the concepts belonging to shallow hierarchies: the node for “horse” is 10 levels from the root, while the node for

“cow” is 13 levels deep. However, we believe that it adds an additional perspective to our evaluation while keeping the interpretability at the same time. Examples of the extracted semantic classes are presented in Table 17.

java

lisp

pascal cobol

delphi

eiffel erlang

python

fortran

ruby soap

beer

cocoa lemonade

espresso

tea cappuccino

malt

coffee

palm

Figure 16

An example of the lexical unit “java” and a part of its neighborhood in a distributional thesaurus.

This polysemous word is not disambiguated, so it acts as a hub between two different senses.

Im Dokument Watset: Local-Global Graph Clustering with Applications in Sense and Frame Induction (Seite 38-41)