• Keine Ergebnisse gefunden

4.2 Approach

4.2.1 Model description

From a broad perspective, our model (see Figure4.1) is a matching function: given a questionπ‘žand sets of candidate subject entities and relations,C𝑠 ={𝑠

1, . . . , 𝑠

𝑛}andC𝑝 ={𝑝

1, . . . , 𝑝

π‘š}respectively, it returns the subject and predicate that matches the question best. To do so, it

(1) maps a questionπ‘žto vector representationπ‘Ÿ

π‘ž =(π‘Ÿπ‘ 

π‘ž, π‘Ÿπ‘Ÿ

π‘ž)𝑇, whereπ‘Ÿπ‘ 

π‘žandπ‘Ÿπ‘Ÿ

π‘žare subject and the relation sprecific encodings of the question respectively,

(2) maps each candidate subject𝑠

𝑖 ∈ C𝑠to a vector representationπ‘Ÿ

𝑠𝑖, (3) maps each candidate predicate 𝑝

𝑗 ∈ C𝑝to a vector representationπ‘Ÿ

𝑝 𝑗, (4) and computes scores𝑆

𝑠(π‘ž, 𝑠

𝑖)and𝑆

𝑝(π‘ž, 𝑝

𝑗)for each pairπ‘Ÿπ‘ 

π‘ž, π‘Ÿ

𝑠 𝑖

, 𝑖=1, . . . , 𝑛, andπ‘Ÿπ‘

π‘ž, π‘Ÿ

𝑝 𝑗

, 𝑗= 1, . . . , π‘š.

Based on these scores the final prediction is(𝑠,Λ† 𝑝ˆ), with Λ†

𝑠 =argmaxπ‘ π‘–βˆˆ C𝑠𝑆

𝑠(π‘ž, 𝑠

𝑖) , (4.1)

Λ†

𝑝 =argmaxπ‘π‘—βˆˆ C𝑝𝑆

𝑝(π‘ž, 𝑝

𝑗) . (4.2)

Steps (1)-(3) heavily rely on RNNs with Gated Recurrent Units (GRUs) [45], which are described in Section2.2.2. In the following subsections, the four parts of our model are described in detail.

4.2 Approach ENCQ

β€œWhat”

β€œcyclone”

β€œaffected”

β€œHainan”

GRU

GRU

GRU

GRU REPW

REPW

REPW

REPW

rq

Figure 4.2: Question encoding networkENCQ. Each word is represented by a vector usingREPW. The sequence of word vectors is encoded using a GRU.

Representing the question The mapping of a questionπ‘ž ={𝑀

1, . . . , 𝑀

𝑇}to its subject and predicate related vector representations rπ‘ π‘žandrπ‘Ÿπ‘ž, respectively, is done using a single-layered unidirectional GRU based encoder network. We call this part of the model thequestion encoderENCQ

rπ‘ž = rπ‘žπ‘ 

rπ‘žπ‘

=ENCQ({𝑀

1, . . . , 𝑀

𝑇}) . (4.3)

The question encoderENCQfirst uses the word representation functionREPW(𝑀

𝑑)to generate vector representations for all words 𝑀

𝑑, 𝑑 = 1, . . . , 𝑇 (as described in the next paragraph), which are subsequently fed to the RNN until all words have been seen. Starting with the initial hidden stateh0, the GRU of the question encoder RNN iteratively updates its hidden stateh𝑑 after processing each word according to Equations (2.6) to (2.9), where the word representation vectorREPW(𝑀

𝑑)is fed as input to the GRU (i.e. x𝑑 =REPW(𝑀

𝑑)). The final hidden stateh𝑇 (produced after processing the last word represented byREPW(𝑀

𝑇)) is returned byENCQas the representation of questionπ‘ž. The question encoder is visualized in Figure4.2.

Word representation In the following we descibe how we generate the vector representations of the words𝑀

1, . . . , 𝑀

𝑇. In order to exploit both word- and character-level information of the question, we use a β€œnested” word- and character-level aproach concatenating the pre-trained embedding of a word with an RNN-based encoding on character level.

As word embeddings, we useGloVe[40] vectors provided on the GloVe website1. Such pre-trained word embeddings implicitly incorporate word semantics inferred from a large text corpus based on the distributional semantics hypothesis [186]. This hypothesis can be phrased as β€œwords with similar meanings occur in similar contexts”, which in our case translates to similar vectors for words with similar meanings. See Section2.4.1for more on pre-trained word embeddings. Using such pretrained word embeddings allows us to better handle synonyms and find better matches between words in the question and subject labels or predicate URI’s. In addition, during testing, it allows to handle words

1http://nlp.stanford.edu/projects/glove/

Chapter 4 Word- and character-level representations in question answering over knowledge graphs

GRU Emb

Emb Emb Emb Emb Emb

GRU GRU GRU GRU GRU

ENCW

β€œhainan”

h a i n a n

GloVe

wtc wte

Figure 4.3: Word representation networkREPWwith example. The word is considered as a sequence of character and fed toENCW, where each character is embedded and the sequence of character vectors is encoded using a GRU to produce a character-level encoding of that word. This is concatenated to the word embedding (we use GloVe) to produce the complete word representation.

that have not been seen during training.

The word embedding of𝑀

𝑑 resulting in the 𝑑

𝑀

𝑒-dimensional vector representation w𝑒𝑑 can be formally described as follows

w𝑒𝑑 =WβŠ€π‘”v𝑑 , (4.4)

whereWπ‘”βˆˆR|𝑉𝑔|×𝑑𝑀𝑒 is the provided pretrained word embedding matrix for a vocabulary of size

|𝑉

𝑔|(GloVe covers 400k words), andv𝑑is the one-hot vector representation of𝑀

𝑑. Since the coverage of word embeddings is limited, many words appearing in the questions (especially those that are part of a reference to a particular subject entity, e.g. the last name β€œGolfis” in the question β€œWhat city was Alex Golfis born in”) are not contained in the vocabulary of the pre-trained embeddings. In such cases (20.8% and 14.5% of unique words in the train resp. test questions), we set the word embedding to the zero vector.

The encoding of the word𝑀

𝑑 ={𝑀1

𝑑, . . . 𝑀

𝐾

𝑑 }on character-level is based on a RNN encoder:

𝑀

𝑐

𝑑 =ENCW({𝑀1

𝑑, . . . 𝑀

𝐾

𝑑 }) , (4.5)

InsideENCW, the characters𝑀

π‘˜

𝑑, π‘˜ =1, . . . , 𝐾are first embedded by

cπ‘‘π‘˜ =WβŠ€π‘vπ‘˜π‘‘ , (4.6)

with character embedding matrixW𝑐 ∈R|

𝑉 𝑐|×𝑑

𝑐𝑒

(for|𝑉

𝑐|characters) learned during training, and vπ‘‘π‘˜ the one-hot vector representation of the character𝑀

π‘˜

𝑑. Then we feed the sequence of character vectors{c1𝑑, . . . ,c𝑑𝐾}to a single-layered unidirectional GRU network and take its final state as the character-level word encodingw𝑐.

The added character-level encoding provides information necessary for matching question words

4.2 Approach with entity labels, in addition to providing distinguishable representations for OOV words. This approach is similar to thechar2wordmodel proposed by Ling et al. [187] with the difference that we use a unidirectional GRU network.

Finally, to get the vector representation of a word𝑀

𝑑, the word embeddingw𝑒𝑑 and character-level encodingw𝑐𝑑 are concatenated:

REPW(𝑀

𝑑) = w𝑑𝑒

w𝑑𝑐

. (4.7)

The whole word representation network is illustrated in Figure4.3.

Representing the subject

We use both the entity label and the type label of the entities in the knowledge graph to build subject representations. Entity labels are encoded on the level of characters because of the high prevalence of OOV words and their importance for entity labels. On the other hand, OOV words are rather rare in type labels and thus type labels are encoded on word level.

For Freebase entities, we extract entity labels using thetype.object.nameproperties of entities and type labels of entities by first getting thecommon.topic.notable_typesof entities2and then taking thetype.object.namevalue of the types.

The character-level entity label encodings𝑙and word-level type label encodings𝑑are concatenated to produce the subject representation vector

s𝑙=ENCSL({𝑐1

𝑠, 𝑐2

𝑠, . . .}) , (4.8a)

s𝑑 =ENCST({𝑀1

𝑑, 𝑀2

𝑑, . . .}) , (4.8b)

r𝑠 = s𝑙

s𝑑

, (4.8c)

whereENCSLis the character-level encoder of the subject entity label andENCSTis the word-level type label encoder. The label characters and type label words, respectively, are first embedded (following Equation4.6and Equation4.4, respectively) and the embedding vectors are fed to the respective encoding RNNs. BothENCSLandENCSTcorrespond to single-layer unidirectional GRU-based RNNs and take their final hidden state as the entity label encodings𝑙and type label encodings𝑑, respectively.

The subject representation network is visualized in Figure4.4.

This method of building subject representations is similar to the method proposed by Sun et al. [188]

who focus on entity linking and CNNs for word-level entity name encoding (instead of using RNNs for character level based encodings like our model, which allows to handle OOV words) and word-level entity type name encoding, followed by an additional layer that merges the two (where we simply concatenate both representations).

Representing the predicate

We use the predicate URI’s provided by the KG to build latent vector representations of the predicates.

The predicate URI is first split into words 𝑀1

𝑝, 𝑀2

𝑝, . . ., each word is embedded (as described by

2The notable types property provide the single, most characteristic type for that entity. However, using all types of the entity (e.g. concatenating their labels) could be interesting as well, which we leave for future work.

Chapter 4 Word- and character-level representations in question answering over knowledge graphs

GloVe GloVe GRU

GRU Emb

Emb Emb Emb Emb Emb

GRU GRU GRU GRU GRU

ENCSL

β€œhainan”

h a i n a n

GRU

ENCST

β€œchinese”

β€œprovince”

s

l

s

t

r

s

Figure 4.4: Entity encoder with example. The entity label is encoded on character level (ENCSL) and the subject type label is encoded on word level (ENCST). The two are concatenated to produce the subject vector.

ENCR

/meteorology /affected

_area /cyclone

GRU

GRU

GRU

GRU GloVe

GloVe

GloVe

GloVe

rp

Figure 4.5: Predicate encoder networkENCRwith example. The predicate URI is split into words and encoded on word level using GloVe embedding.

Equation4.4), and then the word embeddings are fed into a single-layer word-level GRU-based encoder ENCRthat takes the final state of its RNN as the representation of the predicate URI, that is

r𝑝=ENCR({𝑀1

𝑝, 𝑀2

𝑝, . . .}) . (4.9)

The relation encoding network is visualized in Figure4.5.

Matching scores

Given the question encoding vectorrπ‘ž =(rπ‘ π‘ž,rπ‘žπ‘), the latent vector representationr𝑝 of the relation, and the latent representationr𝑠 of the subject entity, we compute two matching scores: one between the question and subject entity and one between the question and predicate, as follows:

𝑆𝑠(π‘ž, 𝑠) =cos(rπ‘ π‘ž,r𝑠) (4.10a) 𝑆𝑝(π‘ž, 𝑝) =cos(rπ‘žπ‘,r𝑝) , (4.10b)

4.2 Approach

wherecosis the cosine similarity given bycos(a,b) = (aΒ·b)

|a||b| .