Model description - Question Answering over Knowledge Graphs

4.2 Approach

4.2.1 Model description

From a broad perspective, our model (see Figure4.1) is a matching function: given a question𝑞and sets of candidate subject entities and relations,C𝑠 ={𝑠

1, . . . , 𝑠

𝑛}andC𝑝 ={𝑝

1, . . . , 𝑝

𝑚}respectively, it returns the subject and predicate that matches the question best. To do so, it

(1) maps a question𝑞to vector representation𝑟

𝑞 =(𝑟^𝑠

𝑞, 𝑟^𝑟

𝑞)^𝑇, where𝑟^𝑠

𝑞and𝑟^𝑟

𝑞are subject and the relation sprecific encodings of the question respectively,

(2) maps each candidate subject𝑠

𝑖 ∈ C_𝑠to a vector representation𝑟

𝑠𝑖, (3) maps each candidate predicate 𝑝

𝑗 ∈ C𝑝to a vector representation𝑟

𝑝 𝑗, (4) and computes scores𝑆

𝑠(𝑞, 𝑠

𝑖)and𝑆

𝑝(𝑞, 𝑝

𝑗)for each pair𝑟^𝑠

𝑞, 𝑟

𝑠 𝑖

, 𝑖=1, . . . , 𝑛, and𝑟^𝑝

𝑞, 𝑟

𝑝 𝑗

, 𝑗= 1, . . . , 𝑚.

Based on these scores the final prediction is(𝑠,ˆ 𝑝ˆ), with ˆ

𝑠 =argmax^𝑠_𝑖∈ C_𝑠𝑆

𝑠(𝑞, 𝑠

𝑖) , (4.1)

𝑝 =argmax^𝑝_𝑗∈ C_𝑝𝑆

𝑝(𝑞, 𝑝

𝑗) . (4.2)

Steps (1)-(3) heavily rely on RNNs with Gated Recurrent Units (GRUs) [45], which are described in Section2.2.2. In the following subsections, the four parts of our model are described in detail.

4.2 Approach ENC_Q

“What”

“cyclone”

“affected”

“Hainan”

GRU

GRU REP_W

REP_W

r^q

Figure 4.2: Question encoding networkENC_Q. Each word is represented by a vector usingREP_W. The sequence of word vectors is encoded using a GRU.

Representing the question The mapping of a question𝑞 ={𝑤

1, . . . , 𝑤

𝑇}to its subject and predicate related vector representations r^𝑠𝑞andr^𝑟𝑞, respectively, is done using a single-layered unidirectional GRU based encoder network. We call this part of the model thequestion encoderENC_Q

r𝑞 = r𝑞^𝑠

r𝑞^𝑝

=ENC_Q({𝑤

1, . . . , 𝑤

𝑇}) . (4.3)

The question encoderENC_Qfirst uses the word representation functionREP_W(𝑤

𝑡)to generate vector representations for all words 𝑤

𝑡, 𝑡 = 1, . . . , 𝑇 (as described in the next paragraph), which are subsequently fed to the RNN until all words have been seen. Starting with the initial hidden stateh₀, the GRU of the question encoder RNN iteratively updates its hidden stateh𝑡 after processing each word according to Equations (2.6) to (2.9), where the word representation vectorREP_W(𝑤

𝑡)is fed as input to the GRU (i.e. x𝑡 =REP_W(𝑤

𝑡)). The final hidden stateh𝑇 (produced after processing the last word represented byREP_W(𝑤

𝑇)) is returned byENC_Qas the representation of question𝑞. The question encoder is visualized in Figure4.2.

Word representation In the following we descibe how we generate the vector representations of the words𝑤

1, . . . , 𝑤

𝑇. In order to exploit both word- and character-level information of the question, we use a “nested” word- and character-level aproach concatenating the pre-trained embedding of a word with an RNN-based encoding on character level.

As word embeddings, we useGloVe[40] vectors provided on the GloVe website¹. Such pre-trained word embeddings implicitly incorporate word semantics inferred from a large text corpus based on the distributional semantics hypothesis [186]. This hypothesis can be phrased as “words with similar meanings occur in similar contexts”, which in our case translates to similar vectors for words with similar meanings. See Section2.4.1for more on pre-trained word embeddings. Using such pretrained word embeddings allows us to better handle synonyms and find better matches between words in the question and subject labels or predicate URI’s. In addition, during testing, it allows to handle words

1http://nlp.stanford.edu/projects/glove/

Chapter 4 Word- and character-level representations in question answering over knowledge graphs

GRU Emb

Emb Emb Emb Emb Emb

GRU GRU GRU GRU GRU

ENC_W

“hainan”

h a i n a n

GloVe

w_t^c w_t^e

Figure 4.3: Word representation networkREP_Wwith example. The word is considered as a sequence of character and fed toENC_W, where each character is embedded and the sequence of character vectors is encoded using a GRU to produce a character-level encoding of that word. This is concatenated to the word embedding (we use GloVe) to produce the complete word representation.

that have not been seen during training.

The word embedding of𝑤

𝑡 resulting in the 𝑑

𝑤

𝑒-dimensional vector representation w^𝑒𝑡 can be formally described as follows

w^𝑒𝑡 =W^⊤𝑔v𝑡 , (4.4)

whereW𝑔∈R^|𝑉^𝑔^|×𝑑^𝑤^𝑒 is the provided pretrained word embedding matrix for a vocabulary of size

|𝑉

𝑔|(GloVe covers 400k words), andv𝑡is the one-hot vector representation of𝑤

𝑡. Since the coverage of word embeddings is limited, many words appearing in the questions (especially those that are part of a reference to a particular subject entity, e.g. the last name “Golfis” in the question “What city was Alex Golfis born in”) are not contained in the vocabulary of the pre-trained embeddings. In such cases (20.8% and 14.5% of unique words in the train resp. test questions), we set the word embedding to the zero vector.

The encoding of the word𝑤

𝑡 ={𝑤¹

𝑡, . . . 𝑤

𝐾

𝑡 }on character-level is based on a RNN encoder:

𝑤

𝑐

𝑡 =ENC_W({𝑤¹

𝑡, . . . 𝑤

𝐾

𝑡 }) , (4.5)

InsideENC_W, the characters𝑤

𝑘

𝑡, 𝑘 =1, . . . , 𝐾are first embedded by

c𝑡^𝑘 =W^⊤𝑐v^𝑘𝑡 , (4.6)

with character embedding matrixW𝑐 ∈R^|

𝑉 𝑐|×𝑑

𝑐𝑒

(for|𝑉

𝑐|characters) learned during training, and v𝑡^𝑘 the one-hot vector representation of the character𝑤

𝑘

𝑡. Then we feed the sequence of character vectors{c¹𝑡, . . . ,c𝑡^𝐾}to a single-layered unidirectional GRU network and take its final state as the character-level word encodingw𝑐.

The added character-level encoding provides information necessary for matching question words

4.2 Approach with entity labels, in addition to providing distinguishable representations for OOV words. This approach is similar to thechar2wordmodel proposed by Ling et al. [187] with the difference that we use a unidirectional GRU network.

Finally, to get the vector representation of a word𝑤

𝑡, the word embeddingw^𝑒𝑡 and character-level encodingw^𝑐𝑡 are concatenated:

REP_W(𝑤

𝑡) = w𝑡^𝑒

w𝑡^𝑐

. (4.7)

The whole word representation network is illustrated in Figure4.3.

Representing the subject

We use both the entity label and the type label of the entities in the knowledge graph to build subject representations. Entity labels are encoded on the level of characters because of the high prevalence of OOV words and their importance for entity labels. On the other hand, OOV words are rather rare in type labels and thus type labels are encoded on word level.

For Freebase entities, we extract entity labels using thetype.object.nameproperties of entities and type labels of entities by first getting thecommon.topic.notable_typesof entities²and then taking thetype.object.namevalue of the types.

The character-level entity label encodings^𝑙and word-level type label encodings^𝑡are concatenated to produce the subject representation vector

s^𝑙=ENC_SL({𝑐¹

𝑠, 𝑐²

𝑠, . . .}) , (4.8a)

s^𝑡 =ENC_ST({𝑤¹

𝑡, 𝑤²

𝑡, . . .}) , (4.8b)

r𝑠 = s^𝑙

s^𝑡

, (4.8c)

whereENC_SLis the character-level encoder of the subject entity label andENC_STis the word-level type label encoder. The label characters and type label words, respectively, are first embedded (following Equation4.6and Equation4.4, respectively) and the embedding vectors are fed to the respective encoding RNNs. BothENC_SLandENC_STcorrespond to single-layer unidirectional GRU-based RNNs and take their final hidden state as the entity label encodings^𝑙and type label encodings^𝑡, respectively.

The subject representation network is visualized in Figure4.4.

This method of building subject representations is similar to the method proposed by Sun et al. [188]

who focus on entity linking and CNNs for word-level entity name encoding (instead of using RNNs for character level based encodings like our model, which allows to handle OOV words) and word-level entity type name encoding, followed by an additional layer that merges the two (where we simply concatenate both representations).

Representing the predicate

We use the predicate URI’s provided by the KG to build latent vector representations of the predicates.

The predicate URI is first split into words 𝑤¹

𝑝, 𝑤²

𝑝, . . ., each word is embedded (as described by

2The notable types property provide the single, most characteristic type for that entity. However, using all types of the entity (e.g. concatenating their labels) could be interesting as well, which we leave for future work.

Chapter 4 Word- and character-level representations in question answering over knowledge graphs

GloVe GloVe GRU

GRU Emb

Emb Emb Emb Emb Emb

GRU GRU GRU GRU GRU

ENC_SL

“hainan”

h a i n a n

GRU

ENC_ST

“chinese”

“province”

s

r

Figure 4.4: Entity encoder with example. The entity label is encoded on character level (ENC_SL) and the subject type label is encoded on word level (ENC_ST). The two are concatenated to produce the subject vector.

ENC_R

/meteorology /affected

_area /cyclone

GRU

GRU GloVe

GloVe

r^p

Figure 4.5: Predicate encoder networkENC_Rwith example. The predicate URI is split into words and encoded on word level using GloVe embedding.

Equation4.4), and then the word embeddings are fed into a single-layer word-level GRU-based encoder ENC_Rthat takes the final state of its RNN as the representation of the predicate URI, that is

r𝑝=ENC_R({𝑤¹

𝑝, 𝑤²

𝑝, . . .}) . (4.9)

The relation encoding network is visualized in Figure4.5.

Matching scores

Given the question encoding vectorr𝑞 =(r^𝑠𝑞,r𝑞^𝑝), the latent vector representationr𝑝 of the relation, and the latent representationr𝑠 of the subject entity, we compute two matching scores: one between the question and subject entity and one between the question and predicate, as follows:

𝑆𝑠(𝑞, 𝑠) =_cos(r^𝑠𝑞,r𝑠) (4.10a) 𝑆𝑝(𝑞, 𝑝) =_cos(r𝑞^𝑝,r𝑝) , (4.10b)

4.2 Approach

wherecosis the cosine similarity given bycos(a,b) = (a·b)

|a||b| .

Im Dokument Question Answering over Knowledge Graphs (Seite 76-81)