Frame Identification System SimpleFrameId

3.6 Summary of the Chapter

4.1.2 Frame Identification System SimpleFrameId

In the previous Section 4.1.1, we have seen that state-of-the-art systems for Frame Identification encode the situational context of the predicate using pre-trained tex-tual embeddings for words (see Hermann et al., 2014). Hence, it is assumed that the context of the situation is explicitly expressed in words. Two aspects are important.

First, the textual embedding of the predicate itself is promising as this embedding

4.1. Frame Identification with Textual Embeddings

contains information about contexts the predicate appeared in during training. Sec-ond, the textual embeddings of the context words in the sentence are promising as they reveal the actual context, the actual meaning of the predicate in question. This is, Frame Identification systems using textual embeddings in these two ways assume and implement the idea of context carrying meaning of single words (distributional hypothesis, Harris, 1954).

We follow this assumption with our Frame Identification systems that are based on textual embeddings (Hartmann et al., 2017; Botschen et al., 2018a). In this section, we discuss our prototype system SimpleFrameId for Frame Identification, out of which we then develop our optimized system UniFrameId for English and German FrameNets (see next Section 4.1.3).

4.1.2.1 Architecture: Matrix Factorization versus Neural Approach We explain the development of the Frame Identification classifier in the context of SimpleFrameId (Hartmann et al., 2017). First, we re-implement the matrix factorization architecture of the previous state-of-the-art approach by Hermann et al.

(2014), we aim at replicating the results of the state-of-the-art systemHermann-14. Then, we explore an alternative approach with a neural network architecture, which is our prototype system for Frame IdentificationSimpleFrameId.

Textual Input Embeddings for both Approaches. The input representation (Equation 4.1) for both approaches is a simple concatenation_(cf. Equation 3.19) of the predicate’s pre-trained embedding−−−→v_(pred), and an embedding of the predicate context−−−→v_(cont):

−−→v_(in) = −−−→v_(cont) _ −−−→v_(pred) ; with −−−→v_(cont)=

w∈contvsm(w)

|cont| , and −−−→v_(pred)=vsm(pred).

(4.1)

Regarding the predicate context cont, we experiment with two kinds of contexts to build a dimension-wise mean of the pre-trained embeddings of a set of selected words w in the sentence. First, we orient ourselves to Hermann-14 by considering the dependency parse of the sentence: we include only words, which are direct de-pendents of the predicate, to build an average of the respective word embeddings (we will refer to this as dependency-based bag-of-words approachDepBOW). Second, we include all the words in the sentence to build an average of the respective word embeddings (we will refer to this as sentence-based bag-of-words approachSentBOW).

Thus, in both cases, we consider the average embedding of the pre-trained embed-dings of the predicate’s dependents or of all words in the sentence.

As the pre-trained word embeddings by Hermann et al. (2014) are not publicly available, we choose other pre-trained word embeddings that are public. Hermann et al. (2014) incorporate the notion of context in terms of syntactic dependents into their approach, thus we decide to choosedependency-based embeddings (Levy and Goldberg, 2014a, cf. Section 3.2). By this choice, syntactic knowledge of co-occurrence of syntactic dependents is integrated into the word embeddings directly.

We experiment with two different classification methods to process the input rep-resentations: one is a matrix factorization approach following the line ofHermann-14, the other one is a straight-forward two-layer neural network.

Matrix Factorization Approach. With the matrix factorization approach we follow the line of the current state-of-the-art system Hermann-14 (Hermann et al., 2014, cf. Section 4.1.1) and learn representations for frames and predicates in the same latent space using theWSABIEalgorithm (Weston et al., 2011, cf. Section 3.2).³ We will refer to this approach as WSB. Note that a by-product of the approach oriented on Hermann-14 are the WSABIE embeddings for frames that will be further examined in Section 5.1.

The outputs are scores for each frame known to the system by the lexicon, such that the frame with the highest score is selected as prediction.

Neural Network Approach. With the neural approach we follow the recent suc-cess of neural methods, which improved the performance of role labeling (cf.PathLSTM, NNs for SRL, and Open-SESAME in Table 4.1), but was not yet implemented for Frame Identification. We decide for a conceptually simple prototype to explore the potential of neural methods for Frame Identification. We will refer to this approach as NN.

Our neural network-based system is a two-layer feed-forward neural network, implemented with ‘adagrad’ optimizer. The first hidden layer comprises 256 neu-rons, followed by 100 neurons in the second hidden layer. Each node in the output layer corresponds to one frame-label class known from the lexicon. We use recti-fied linear units (Nair and Hinton, 2010, ReLU, common abbreviation) as activation function for the hidden layers, and a softmax activation function for the output layer yielding a multinomial distribution over frames. We take the highest activated neuron (arg max) to obtain the most likely frame label according to the classifier as the final prediction at test time. Optionally, filtering based on the lexicon can be performed on the predicted probabilities for each frame label. As this is a proto-type, no hyperparameters have been optimized yet – this is done with UniFrameId (cf. Section 4.1.3).

Note that the classifier itself is agnostic to the predicate’s part-of-speech and exact lemma and only relies on word representations from the vsm.

4.1.2.2 Experimental Setup and Data

We contrast the performance of four systems with respect to Frame Identification:

dependency- versus sentence-based bag-of-words for input embeddings in the matrix factorization approach, WSB+DepBOW and WSB+SentBOW, and also in the neural net-work approach, NN+DepBOW and NN+SentBOW. Regarding the approach, WSB+DepBOW is most similar toHermann-14(Hermann et al., 2014). We compare the performances of our systems to the state-of-the-art system Hermann-14.

3 In our implementation, we use the LightFM package (Kula, 2015) for matrix factorization with theWARPoption for a Weighted Approximate-Rank Pairwise loss.

4.1. Frame Identification with Textual Embeddings

model acc acc amb Hermann-14 88.41 73.10 WSB+DepBOW 85.69 69.93 WSB+SentBOW 84.46 67.56 NN+DepBOW 87.53 73.58 SimpleFrameId: NN+SentBOW 87.63 73.80

Table 4.2: FrameId results (in %) on FrameNet test data. Reported are overall accuracy and accuracy forambiguous predicates. Best results highlighted in bold.

Models: (a) State of the art Hermann-14, (b) WSB+DepBOW, (c) WSB+SentBOW, (d) NN+DepBOW,(e) SimpleFrameId: NN+SentBOW

Data and Data Splits: Berkeley FrameNet. The Berkeley FrameNet (Baker et al., 1998; Ruppenhofer et al., 2016), as presented in Section 2.1.2, is a lexical resource for English with annotations based on frame semantics (Fillmore, 1976).

The fully annotated texts provide the sentences with frame-labels for the predicates for training and evaluation. The lexicon, mapping predicates to the frames they can evoke, can be used to facilitate the identification of the frame for a predicate.

(Table 2.1 contains the lexicon statistics, Table 2.2 the dataset statistics.)

In this work, we use FrameNet 1.5 to ensure comparability with the previous state of the art. Also, we use the common evaluation split for Frame Identification systems introduced by Das and Smith (2011) together with the development split of Hermann et al. (2014). Due to having one single annotation as consent of experts, it is impossible to determine the performance of a single human based on the experts’

agreement.

4.1.2.3 Results and Discussion

We present the results of our four systems in Table 4.3.

Interestingly, we find that our straight-forward neural approachNN+SentBOW us-ing sentence-based bag-of-words embeddus-ings achieves results (accuracy of 87.63%) comparable to the state-of-the-art system,Hermann-14 (accuracy of 88.41%). From now on, we refer to our best systemNN+SentBOW as SimpleFrameId.

However, the performance of WSB+DepBOW (accuracy of 85.69%) is worse than that of Hermann-14 even if the approach with WSB+DepBOW is the most similar one toHermann-14. This gap in performance is relativized, but not nullified, by taking into account the slightly worse performance of Hermann-14 (accuracy of 86.49%) when using the Semaforlexicon to be directly comparable.

Our initial attempts to replicate Hermann-14, which is not publicly available, revealed that the container-based input feature space is very sparse: there exist many syntactic paths that can connect a predicate to its arguments, but a predicate instance rarely has more than five arguments in the sentence. So, by design, the input representation bears no information in most of its path containers. Moreover, Hermann-14 makes heavy use of automatically created dependency parsers, which might decline in quality when applied to a new domain or another language.

With respect to the input representation, we find an interesting tendency. On the one hand, for the matrix factorization approach WSB, the dependency-based

input representation DepBOW is the better choice compared to the sentence-based input representation SentBOW. This mirrors the strength of the dependency-based input representation in the by matrix factorization approach proposed by Hermann et al. (2014). On the other hand, for the neural approach NN, using all words of the sentence as context representation SentBOW performs slightly better than the complex context representation that uses dependency parses DepBOW. This could be an effect of the network leveraging the dependency information incorporated into the dependency-based word embeddings by Levy and Goldberg (2014a).

We demonstrate that the straight-forward neural approachSimpleFrameId, which is a simpler system compared to Hermann-14, achieves competitive performance on the FrameNet test data. Importantly, the performance on ambiguous predicates is slightly higher with SimpleFrameId (accuracy of 73.80%) than with Hermann-14 (accuracy of 73.10%). Furthermore, its performance is competitive even in out-of-domain performance – for details on this, the interested reader is referred to Hartmann et al. (2017).

As we find an advantage of the neural approach (accuracy of 87.63%) over the matrix factorization approach (accuracy of 85.69%) in terms of performance and in terms of simplicity, we decide to further explore the potential of the neural approach.

Im Dokument Uni- and Multimodal and Structured Representations for Modeling Frame Semantics (Seite 63-67)