• Keine Ergebnisse gefunden

Uncovering divergent linguistic information in word embeddings

N/A
N/A
Protected

Academic year: 2022

Aktie "Uncovering divergent linguistic information in word embeddings"

Copied!
70
0
0

Wird geladen.... (Jetzt Volltext ansehen)

Volltext

(1)

Uncovering divergent linguistic information in word embeddings

VL Embeddings

Uni Heidelberg

SS 2019

(2)

Uncovering linguistic information in word embeddings

Artetxe et al (2018): Uncovering divergent linguistic information in word embeddings with lessons for intrinsic and extrinsic evaluation

Word embeddings capture more information than we can directly observe

We can apply linear transformations to pretrained embeddings to adjust performance to different tasks along the axes of similarity/relatednessandsemantics/syntax

FastText Dep-based embeddings GloVe good syntactic functional semantic for analogies similarities analogies

(3)

Uncovering linguistic information in word embeddings

Artetxe et al (2018): Uncovering divergent linguistic information in word embeddings with lessons for intrinsic and extrinsic evaluation

Word embeddings capture more information than we can directly observe

We can apply linear transformations to pretrained embeddings to adjust performance to different tasks along the axes of similarity/relatednessandsemantics/syntax

FastText Dep-based embeddings GloVe good syntactic functional semantic for analogies similarities analogies

(4)

Linear transformation of embedding matrix

Artetxe et al (2018)

Let X be the word embeddings matrix

Let Xi be the embedding of the ith word in the vocabulary

The dot productsim(i,j) =Xi ·Xj is a measure of the similarity between the ith and thejth word

Define the similarity matrix M(X) :=XX> so that sim(i,j) =M(X)ij ⇒ first order similarity We can also define a second order similarity measure:

first order: How similar arewi andwj?

second order: How similar are the contexts ofwi andwj? We can even define a third, fourth or nth order similarity Idea: Some higher order similarities might be better at capturing specific aspects of language.

(5)

Linear transformation of embedding matrix

Artetxe et al (2018)

Let X be the word embeddings matrix

Let Xi be the embedding of the ith word in the vocabulary

The dot productsim(i,j) =Xi·Xj is a measure of the similarity between the ith and thejth word

Define the similarity matrix M(X) :=XX> so that sim(i,j) =M(X)ij ⇒ first order similarity We can also define a second order similarity measure:

first order: How similar arewi andwj?

second order: How similar are the contexts ofwi andwj? We can even define a third, fourth or nth order similarity Idea: Some higher order similarities might be better at capturing specific aspects of language.

(6)

Linear transformation of embedding matrix

Artetxe et al (2018)

Let X be the word embeddings matrix

Let Xi be the embedding of the ith word in the vocabulary

The dot productsim(i,j) =Xi·Xj is a measure of the similarity between the ith and thejth word

Define the similarity matrix M(X) :=XX> so that sim(i,j) =M(X)ij ⇒ first order similarity We can also define a second order similarity measure:

first order: How similar arewi andwj?

second order: How similar are the contexts ofwi andwj? We can even define a third, fourth or nth order similarity Idea: Some higher order similarities might be better at capturing specific aspects of language.

(7)

Linear transformation of embedding matrix

Artetxe et al (2018)

X =

cat -0.19 0.45 -0.40 dog -0.28 0.43 -0.39 blue 0.02 -0.40 -0.39 red 0.03 -0.22 -0.31 happy -0.03 -0.07 -0.19

X>=

cat dog blue red happy

-0.19 -0.28 0.02 0.03 -0.03 0.45 0.43 -0.40 -0.22 -0.07 -0.40 -0.39 -0.39 -0.31 -0.19

(8)

Linear transformation of embedding matrix

Artetxe et al (2018)

X =

cat -0.19 0.45 -0.40 dog -0.28 0.43 -0.39 blue 0.02 -0.40 -0.39 red 0.03 -0.22 -0.31 happy -0.03 -0.07 -0.19

X>=

cat dog blue red happy

-0.19 -0.28 0.02 0.03 -0.03 0.45 0.43 -0.40 -0.22 -0.07 -0.40 -0.39 -0.39 -0.31 -0.19

(9)

Linear transformation of embedding matrix

Artetxe et al (2018)

X =

cat -0.19 0.45 -0.40 dog -0.28 0.43 -0.39 blue 0.02 -0.40 -0.39 red 0.03 -0.22 -0.31 happy -0.03 -0.07 -0.19

sim(i,j) =Xi ·Xj

X>=

cat dog blue red happy

-0.19 -0.28 0.02 0.03 -0.03 0.45 0.43 -0.40 -0.22 -0.07 -0.40 -0.39 -0.39 -0.31 -0.19

(10)

Linear transformation of embedding matrix

Artetxe et al (2018)

X =

cat -0.19 0.45 -0.40 dog -0.28 0.43 -0.39 blue 0.02 -0.40 -0.39 red 0.03 -0.22 -0.31 happy -0.03 -0.07 -0.19

sim(i,j) =Xi ·Xj

X>=

cat dog blue red happy

-0.19 -0.28 0.02 0.03 -0.03 0.45 0.43 -0.40 -0.22 -0.07 -0.40 -0.39 -0.39 -0.31 -0.19

M(X) :=XX>

sim(i,j) =M(X)ij

(11)

Linear transformation of embedding matrix

Artetxe et al (2018)

Let X be the word embeddings matrix

Let Xi be the embedding of the ith word in the vocabulary

The dot productsim(i,j) =Xi·Xj is a measure of the similarity between the ith and thejth word

Define the similarity matrix M(X) :=XX> so that sim(i,j) =M(X)ij ⇒ first order similarity

We can also define a second order similarity measure:

first order: How similar arewi andwj?

second order: How similar are the contexts ofwi andwj?

We can even define a third, fourth or nth order similarity Idea: Some higher order similarities might be better at capturing specific aspects of language.

(12)

Linear transformation of embedding matrix

Artetxe et al (2018)

Let X be the word embeddings matrix

Let Xi be the embedding of the ith word in the vocabulary

The dot productsim(i,j) =Xi·Xj is a measure of the similarity between the ith and thejth word

Define the similarity matrix M(X) :=XX> so that sim(i,j) =M(X)ij ⇒ first order similarity

We can also define a second order similarity measure:

first order: How similar arewi andwj?

second order: How similar are the contexts ofwi andwj?

We can even define a third, fourth or nth order similarity Idea: Some higher order similarities might be better at capturing specific aspects of language.

(13)

Linear transformation of embedding matrix

Artetxe et al (2018)

Let X be the word embeddings matrix

Let Xi be the embedding of the ith word in the vocabulary

The dot productsim(i,j) =Xi·Xj is a measure of the similarity between the ith and thejth word

Define the similarity matrix M(X) :=XX> so that sim(i,j) =M(X)ij ⇒ first order similarity

We can also define a second order similarity measure:

first order: How similar arewi andwj?

second order: How similar are the contexts ofwi andwj?

We can even define a third, fourth or nth order similarity

Idea: Some higher order similarities might be better at capturing specific aspects of language.

(14)

Linear transformation of embedding matrix

Artetxe et al (2018)

Let X be the word embeddings matrix

Let Xi be the embedding of the ith word in the vocabulary

The dot productsim(i,j) =Xi·Xj is a measure of the similarity between the ith and thejth word

Define the similarity matrix M(X) :=XX> so that sim(i,j) =M(X)ij ⇒ first order similarity

We can also define a second order similarity measure:

first order: How similar arewi andwj?

second order: How similar are the contexts ofwi andwj?

We can even define a third, fourth or nth order similarity Idea: Some higher order similarities might be better at capturing specific aspects of language.

(15)

Linear transformation of embedding matrix

Artetxe et al (2018)

Define thefirst ordersimilarity matrix as

M(X) :=XX> so that sim(i,j) =M(X)ij

Define thesecond order similarity matrix as

M2(X) :=M(M(X)) so that sim2(i,j) =M2(X)ij

whereM2(X) =XX>XX>

Define then-th order similarity matrix as

Mn(X) = (XX>)n so that simn(i,j) =Mn(X)ij Instead of changing the similarity measure, we can also change the word embeddings themselves through a linear transformation so they directly capture this second orn−th order similarity

(16)

Linear transformation of embedding matrix

Artetxe et al (2018)

Definition: M2(X) :=M(M(X))

M2(X) = M(M(X))

= M(X)M(X)>

= XX>(XX>)> (AB)> =B>A>

= XX>X>>X>

= XX>XX>

(17)

Linear transformation of embedding matrix

Artetxe et al (2018)

Definition: M2(X) :=M(M(X)) M2(X) = M(M(X))

= M(X)M(X)>

= XX>(XX>)> (AB)> =B>A>

= XX>X>>X>

= XX>XX>

(18)

Linear transformation of embedding matrix

Artetxe et al (2018)

Definition: M2(X) :=M(M(X)) M2(X) = M(M(X))

= M(X)M(X)>

= XX>(XX>)> (AB)> =B>A>

= XX>X>>X>

= XX>XX>

(19)

Linear transformation of embedding matrix

Artetxe et al (2018)

Definition: M2(X) :=M(M(X)) M2(X) = M(M(X))

= M(X)M(X)>

= XX>(XX>)> (AB)>=B>A>

= XX>X>>X>

= XX>XX>

(20)

Linear transformation of embedding matrix

Artetxe et al (2018)

Definition: M2(X) :=M(M(X)) M2(X) = M(M(X))

= M(X)M(X)>

= XX>(XX>)> (AB)>=B>A>

= XX>X>>X>

= XX>XX>

(21)

Linear transformation of embedding matrix

Artetxe et al (2018)

Definition: M2(X) :=M(M(X)) M2(X) = M(M(X))

= M(X)M(X)>

= XX>(XX>)> (AB)>=B>A>

= XX>X>>X>

= XX>XX>

(22)

Linear transformation of embedding matrix

Artetxe et al (2018)

Define the second order similarity matrix as

M2(X) :=XX>XX> so that sim2(i,j) =M2(X)ij whereM2(X) =M(M(X))

Define then-th order similarity matrix as

Mn(X) = (XX>)n so that simn(i,j) =Mn(X)ij

Instead of changing the similarity measure, we can also change the word embeddings themselves through a linear transformation so they directly capture this second orn−th order similarity

(23)

Linear transformation of embedding matrix

Artetxe et al (2018)

Define the second order similarity matrix as

M2(X) :=XX>XX> so that sim2(i,j) =M2(X)ij whereM2(X) =M(M(X))

Define then-th order similarity matrix as

Mn(X) = (XX>)n so that simn(i,j) =Mn(X)ij

Instead of changing the similarity measure, we can also change the word embeddings themselves through a linear transformation so they directly capture this second orn−th order similarity

(24)

Linear transformation of embedding matrix

Artetxe et al (2018)

Define the second order similarity matrix as

M2(X) :=XX>XX> so that sim2(i,j) =M2(X)ij whereM2(X) =M(M(X))

Define then-th order similarity matrix as

Mn(X) = (XX>)n so that simn(i,j) =Mn(X)ij

Instead of changing the similarity measure, we can also change the word embeddings themselves through a linear transformation so they directly capture this second orn−th order similarity

(25)

Linear transformation of embedding matrix

Artetxe et al (2018)

Let X>X =QΛQ> be the eigendecomposition of X>X

Λ is a positive diagonal matrix whose entries are the eigenvalues ofX>X andQ is an orthogonal matrix with their respective eigenvectors as columns

Define the linear transformation matrix W =Q√ Λ Apply W to the original embeddingsX ⇒ X0 =XW M(X0) =M2(X)X ⇒transformed embeddings X0 capture the second order similarity as defined for the original

embeddings

(26)

Linear transformation of embedding matrix

Artetxe et al (2018)

Let X>X =QΛQ> be the eigendecomposition of X>X

Λ is a positive diagonal matrix whose entries are the eigenvalues ofX>X andQ is an orthogonal matrix with their respective eigenvectors as columns Define the linear transformation matrix W =Q√

Λ Apply W to the original embeddingsX ⇒ X0 =XW M(X0) =M2(X)X ⇒ transformed embeddings X0 capture the second order similarity as defined for the original

embeddings

(27)

Linear transformation of embedding matrix

Artetxe et al (2018)

X = U · Σ · Q>

m×n m×m m×n n×n SVD

X>X = (U ·Σ·Q>)> · (U·Σ·Q>)

= (Q>)>·Σ>·U> · U·Σ·Q> U orthonormal

= Q·Σ>·Σ·Q> Σ Diagonalmatrix mit √ EW

= Q·Λ·Q>

(28)

Linear transformation of embedding matrix

Artetxe et al (2018)

X = U · Σ · Q>

m×n m×m m×n n×n SVD

X>X = (U·Σ·Q>)> · (U·Σ·Q>)

= (Q>)>·Σ>·U> · U·Σ·Q> U orthonormal

= Q·Σ>·Σ·Q> Σ Diagonalmatrix mit √ EW

= Q·Λ·Q>

(29)

Linear transformation of embedding matrix

Artetxe et al (2018)

X = U · Σ · Q>

m×n m×m m×n n×n SVD

X>X = (U·Σ·Q>)> · (U·Σ·Q>)

= (Q>)>·Σ>·U> · U·Σ·Q>

U orthonormal

= Q·Σ>·Σ·Q> Σ Diagonalmatrix mit √ EW

= Q·Λ·Q>

(30)

Linear transformation of embedding matrix

Artetxe et al (2018)

X = U · Σ · Q>

m×n m×m m×n n×n SVD

X>X = (U·Σ·Q>)> · (U·Σ·Q>)

= (Q>)>·Σ>·U> · U·Σ·Q> U orthonormal

= Q·Σ>·Σ·Q> Σ Diagonalmatrix mit √ EW

= Q·Λ·Q>

(31)

Linear transformation of embedding matrix

Artetxe et al (2018)

X = U · Σ · Q>

m×n m×m m×n n×n SVD

X>X = (U·Σ·Q>)> · (U·Σ·Q>)

= (Q>)>·Σ>·U> · U·Σ·Q> U orthonormal

= Q·Σ>·Σ·Q>

Σ Diagonalmatrix mit √ EW

= Q·Λ·Q>

(32)

Linear transformation of embedding matrix

Artetxe et al (2018)

X = U · Σ · Q>

m×n m×m m×n n×n SVD

X>X = (U·Σ·Q>)> · (U·Σ·Q>)

= (Q>)>·Σ>·U> · U·Σ·Q> U orthonormal

= Q·Σ>·Σ·Q> Σ Diagonalmatrix mit √ EW

= Q·Λ·Q>

(33)

Linear transformation of embedding matrix

Artetxe et al (2018)

X = U · Σ · Q>

m×n m×m m×n n×n SVD

X>X = (U·Σ·Q>)> · (U·Σ·Q>)

= (Q>)>·Σ>·U> · U·Σ·Q> U orthonormal

= Q·Σ>·Σ·Q> Σ Diagonalmatrix mit √ EW

= Q·Λ·Q>

(34)

Linear transformation of embedding matrix

Artetxe et al (2018)

Let X>X =QΛQ> be the eigendecomposition of X>X

Λ is a positive diagonal matrix whose entries are the eigenvalues ofX>X andQ is an orthogonal matrix with their respective eigenvectors as columns

Define thelinear transformation matrix W :=Q√ Λ

Apply W to the original embeddingsX ⇒ X0 =XW

M(X0) =M2(X) ⇒ transformed embeddingsX0 capture the second order similarity as defined for the original embeddings

(35)

Linear transformation of embedding matrix

Artetxe et al (2018)

Let X>X =QΛQ> be the eigendecomposition of X>X

Λ is a positive diagonal matrix whose entries are the eigenvalues ofX>X andQ is an orthogonal matrix with their respective eigenvectors as columns

Define thelinear transformation matrix W :=Q√ Λ

Apply W to the original embeddingsX ⇒ X0 =XW

M(X0) =M2(X) ⇒ transformed embeddingsX0 capture the second order similarity as defined for the original embeddings

(36)

Linear transformation of embedding matrix

Artetxe et al (2018)

Let X>X =QΛQ> be the eigendecomposition of X>X

Λ is a positive diagonal matrix whose entries are the eigenvalues ofX>X andQ is an orthogonal matrix with their respective eigenvectors as columns

Define thelinear transformation matrix W :=Q√ Λ

Apply W to the original embeddingsX ⇒ X0 =XW

M(X0) =M2(X) ⇒ transformed embeddingsX0 capture the second order similarity as defined for the original embeddings

(37)

Linear transformation of embedding matrix

Artetxe et al (2018)

Let X>X =QΛQ> be the eigendecomposition of X>X

Λ is a positive diagonal matrix whose entries are the eigenvalues ofX>X andQ is an orthogonal matrix with their respective eigenvectors as columns

Define thelinear transformation matrix W :=Q√ Λ

Apply W to the original embeddingsX ⇒ X0 =XW

M(X0) =M2(X) ⇒ transformed embeddingsX0 capture the second order similarity as defined for the original embeddings

(38)

Linear transformation of embedding matrix

Artetxe et al (2018)

M(X0) = X0·X0> first order similarity ofX0

= X ·W ·(XW)> (AB)>=B>A>

= X ·W ·W>·X> W :=Q√ Λ

= X ·Q√

Λ·(Q√

Λ)>·X> (AB)>=B>A>

= X ·Q√ Λ·√

Λ>·Q>X>

= X ·Q·Λ·Q>·X>

= X ·X>·X ·X> =M2(X) second order similarity ofX

(39)

Linear transformation of embedding matrix

Artetxe et al (2018)

M(X0) = X0·X0> first order similarity ofX0

= X·W ·(XW)>

(AB)>=B>A>

= X ·W ·W>·X> W :=Q√ Λ

= X ·Q√

Λ·(Q√

Λ)>·X> (AB)>=B>A>

= X ·Q√ Λ·√

Λ>·Q>X>

= X ·Q·Λ·Q>·X>

= X ·X>·X ·X> =M2(X) second order similarity ofX

(40)

Linear transformation of embedding matrix

Artetxe et al (2018)

M(X0) = X0·X0> first order similarity ofX0

= X·W ·(XW)> (AB)>=B>A>

= X ·W ·W>·X> W :=Q√ Λ

= X ·Q√

Λ·(Q√

Λ)>·X> (AB)>=B>A>

= X ·Q√ Λ·√

Λ>·Q>X>

= X ·Q·Λ·Q>·X>

= X ·X>·X ·X> =M2(X) second order similarity ofX

(41)

Linear transformation of embedding matrix

Artetxe et al (2018)

M(X0) = X0·X0> first order similarity ofX0

= X·W ·(XW)> (AB)>=B>A>

= X·W ·W>·X>

W :=Q√ Λ

= X ·Q√

Λ·(Q√

Λ)>·X> (AB)>=B>A>

= X ·Q√ Λ·√

Λ>·Q>X>

= X ·Q·Λ·Q>·X>

= X ·X>·X ·X> =M2(X) second order similarity ofX

(42)

Linear transformation of embedding matrix

Artetxe et al (2018)

M(X0) = X0·X0> first order similarity ofX0

= X·W ·(XW)> (AB)>=B>A>

= X·W ·W>·X> W :=Q√ Λ

= X ·Q√

Λ·(Q√

Λ)>·X> (AB)>=B>A>

= X ·Q√ Λ·√

Λ>·Q>X>

= X ·Q·Λ·Q>·X>

= X ·X>·X ·X> =M2(X) second order similarity ofX

(43)

Linear transformation of embedding matrix

Artetxe et al (2018)

M(X0) = X0·X0> first order similarity ofX0

= X·W ·(XW)> (AB)>=B>A>

= X·W ·W>·X> W :=Q√ Λ

= X·Q√

Λ·(Q√

Λ)>·X>

(AB)>=B>A>

= X ·Q√ Λ·√

Λ>·Q>X>

= X ·Q·Λ·Q>·X>

= X ·X>·X ·X> =M2(X) second order similarity ofX

(44)

Linear transformation of embedding matrix

Artetxe et al (2018)

M(X0) = X0·X0> first order similarity ofX0

= X·W ·(XW)> (AB)>=B>A>

= X·W ·W>·X> W :=Q√ Λ

= X·Q√

Λ·(Q√

Λ)>·X> (AB)>=B>A>

= X ·Q√ Λ·√

Λ>·Q>X>

= X ·Q·Λ·Q>·X>

= X ·X>·X ·X> =M2(X) second order similarity ofX

(45)

Linear transformation of embedding matrix

Artetxe et al (2018)

M(X0) = X0·X0> first order similarity ofX0

= X·W ·(XW)> (AB)>=B>A>

= X·W ·W>·X> W :=Q√ Λ

= X·Q√

Λ·(Q√

Λ)>·X> (AB)>=B>A>

= X·Q√ Λ·√

Λ>·Q>X>

= X ·Q·Λ·Q>·X>

= X ·X>·X ·X> =M2(X) second order similarity ofX

(46)

Linear transformation of embedding matrix

Artetxe et al (2018)

M(X0) = X0·X0> first order similarity ofX0

= X·W ·(XW)> (AB)>=B>A>

= X·W ·W>·X> W :=Q√ Λ

= X·Q√

Λ·(Q√

Λ)>·X> (AB)>=B>A>

= X·Q√ Λ·√

Λ>·Q>X>

= X·Q·Λ·Q>·X>

= X ·X>·X ·X> =M2(X) second order similarity ofX

(47)

Linear transformation of embedding matrix

Artetxe et al (2018)

M(X0) = X0·X0> first order similarity ofX0

= X·W ·(XW)> (AB)>=B>A>

= X·W ·W>·X> W :=Q√ Λ

= X·Q√

Λ·(Q√

Λ)>·X> (AB)>=B>A>

= X·Q√ Λ·√

Λ>·Q>X>

= X·Q·Λ·Q>·X>

= X·X>·X ·X>

=M2(X) second order similarity ofX

(48)

Linear transformation of embedding matrix

Artetxe et al (2018)

M(X0) = X0·X0> first order similarity ofX0

= X·W ·(XW)> (AB)>=B>A>

= X·W ·W>·X> W :=Q√ Λ

= X·Q√

Λ·(Q√

Λ)>·X> (AB)>=B>A>

= X·Q√ Λ·√

Λ>·Q>X>

= X·Q·Λ·Q>·X>

= X·X>·X ·X> =M2(X) second order similarity ofX

(49)

Linear transformation of embedding matrix

Artetxe et al (2018)

More generally

DefineWα:=QΛα

whereα is a parameter of the transformation that adjusts to the desired similarity order:

first order similarity α= 0 ⇒ M(XW0) =M(X) second order similarity α= 0.5 ⇒ M(XW0.5) =M2(X) n-th order similarity α= (n−1)/2 ⇒ M(XWα) =Mn(X)

(50)

Linear transformation of embedding matrix

Artetxe et al (2018)

More generally

DefineWα:=QΛα

whereα is a parameter of the transformation that adjusts to the desired similarity order:

first order similarity α= 0 ⇒ M(XW0) =M(X) second order similarity α= 0.5 ⇒ M(XW0.5) =M2(X) n-th order similarity α= (n−1)/2 ⇒ M(XWα) =Mn(X)

(51)

Linear transformation of embedding matrix

Artetxe et al (2018)

M(XW0) = M(X ·Q·Λ0) Λ0 Einheitsmatrix

= M(X ·Q)

= X ·Q(XQ)>

= X ·Q·Q>·X>

= X ·X>

= M(X)

(52)

Linear transformation of embedding matrix

Artetxe et al (2018)

M(XW0) = M(X ·Q·Λ0) Λ0 Einheitsmatrix

= M(X ·Q)

= X ·Q(XQ)>

= X ·Q·Q>·X>

= X ·X>

= M(X)

(53)

Linear transformation of embedding matrix

Artetxe et al (2018)

M(XW0) = M(X ·Q·Λ0) Λ0 Einheitsmatrix

= M(X ·Q)

= X ·Q(XQ)>

= X ·Q·Q>·X>

= X ·X>

= M(X)

(54)

Linear transformation of embedding matrix

Artetxe et al (2018)

M(XW0) = M(X ·Q·Λ0) Λ0 Einheitsmatrix

= M(X ·Q)

= X ·Q(XQ)>

= X ·Q·Q>·X>

= X ·X>

= M(X)

(55)

Linear transformation of embedding matrix

Artetxe et al (2018)

M(XW0) = M(X ·Q·Λ0) Λ0 Einheitsmatrix

= M(X ·Q)

= X ·Q(XQ)>

= X ·Q·Q>·X>

= X ·X>

= M(X)

(56)

Linear transformation of embedding matrix

Artetxe et al (2018)

M(XW0) = M(X ·Q·Λ0) Λ0 Einheitsmatrix

= M(X ·Q)

= X ·Q(XQ)>

= X ·Q·Q>·X>

= X ·X>

= M(X)

(57)

Linear transformation of embedding matrix

Artetxe et al (2018)

More generally

DefineWα:=QΛα

whereα is a parameter of the transformation that adjusts to the desired similarity order:

first order similarity α= 0 ⇒ M(XW0) =M(X) second order similarity α= 0.5 ⇒ M(XW0.5) =M2(X) n-th order similarity α= (n−1)/2 ⇒ M(XWα) =Mn(X)

(58)

Linear transformation of embedding matrix

Artetxe et al (2018) M(XW0.5) = M(X ·Q√

Λ)

= X ·Q√

Λ (X ·Q√ Λ)>

= X ·Q√ Λ√

Λ>Q>·X>

= X ·Q√ Λ√

Λ·Q>·X>

= X ·Q·Λ·Q>·X>

= X ·X>·X ·X>

= M2(X)

(59)

Linear transformation of embedding matrix

Artetxe et al (2018) M(XW0.5) = M(X ·Q√

Λ)

= X ·Q√

Λ (X ·Q√ Λ)>

= X ·Q√ Λ√

Λ>Q>·X>

= X ·Q√ Λ√

Λ·Q>·X>

= X ·Q·Λ·Q>·X>

= X ·X>·X ·X>

= M2(X)

(60)

Linear transformation of embedding matrix

Artetxe et al (2018) M(XW0.5) = M(X ·Q√

Λ)

= X ·Q√

Λ (X ·Q√ Λ)>

= X ·Q√ Λ√

Λ>Q>·X>

= X ·Q√ Λ√

Λ·Q>·X>

= X ·Q·Λ·Q>·X>

= X ·X>·X ·X>

= M2(X)

(61)

Linear transformation of embedding matrix

Artetxe et al (2018) M(XW0.5) = M(X ·Q√

Λ)

= X ·Q√

Λ (X ·Q√ Λ)>

= X ·Q√ Λ√

Λ>Q>·X>

= X ·Q√ Λ√

Λ·Q>·X>

= X ·Q·Λ·Q>·X>

= X ·X>·X ·X>

= M2(X)

(62)

Linear transformation of embedding matrix

Artetxe et al (2018) M(XW0.5) = M(X ·Q√

Λ)

= X ·Q√

Λ (X ·Q√ Λ)>

= X ·Q√ Λ√

Λ>Q>·X>

= X ·Q√ Λ√

Λ·Q>·X>

= X ·Q·Λ·Q>·X>

= X ·X>·X ·X>

= M2(X)

(63)

Linear transformation of embedding matrix

Artetxe et al (2018) M(XW0.5) = M(X ·Q√

Λ)

= X ·Q√

Λ (X ·Q√ Λ)>

= X ·Q√ Λ√

Λ>Q>·X>

= X ·Q√ Λ√

Λ·Q>·X>

= X ·Q·Λ·Q>·X>

= X ·X>·X ·X>

= M2(X)

(64)

Linear transformation of embedding matrix

Artetxe et al (2018) M(XW0.5) = M(X ·Q√

Λ)

= X ·Q√

Λ (X ·Q√ Λ)>

= X ·Q√ Λ√

Λ>Q>·X>

= X ·Q√ Λ√

Λ·Q>·X>

= X ·Q·Λ·Q>·X>

= X ·X>·X ·X>

= M2(X)

(65)

Linear transformation of embedding matrix

Artetxe et al (2018)

More generally

DefineWα:=QΛα

whereα is a parameter of the transformation that adjusts to the desired similarity order:

first order similarity α= 0 ⇒ M(XW0) =M(X) second order similarity α= 0.5 ⇒ M(XW0.5) =M2(X) n-th order similarity α= (n−1)/2 ⇒ M(XWα) =Mn(X)

(66)

Linear transformation of embedding matrix

Artetxe et al (2018)

Assuming that the embeddingsX capture some second order similarity, it is possible to transform them so that they capture the corresponding first order similarity

One can easily generalise this to higher order similarities by using smaller values of α

⇒ Parameter α can be used to either increase or decrease the similarity order that we want our embeddings to capture

⇒ α can be continuous

(67)

Linear transformation of different embeddings

Artetxe et al (2018)

(68)

Linear transformation of different embeddings

Artetxe et al (2018)

(69)

Lessons learned for intrinsic and extrinsic evaluations

Artetxe et al (2018)

Standard intrinsic evaluation is static and incomplete

⇒ Intrinsic evaluation not a good predictor for performance in downstream applications

⇒ Systems that use embeddings as features can learn task-specific optimal balance between the two axes

(70)

References

Mikolov, Yih and Zweig: (2013): Linguistic regularities in continuous space word representations. NAACL 2013.

Faruqui, Tsvetkov, Rastogi and Dyer (2016): Problems With Evaluation of Word Embeddings Using Word Similarity Tasks. The 1st Workshop on Evaluating Vector Space Representations for NLP, Berlin, Germany.

Artetxe, Labaka, Lopez-Gazpio and Agirre (2018): Uncovering Divergent Linguistic Information in Word Embeddings with Lessons for Intrinsic and Extrinsic Evaluation. CoNLL 2018. Brussels, Belgium.

Rubenstein and Goodenough (1965): Contextual correlates of synonymy. Communications of the ACM 8(10):627–633.

Harris, Z. (1954). Distributional structure. Word, 10(23): 146-162.

Multimodal Distributional Semantics E. Bruni, N. K. Tran and M. Baroni. Journal of Artificial Intelligence Research 49: 1-47.

Collobert, Weston Bottou, Karlen, Kavukcuoglu and Kuksa (2011): Natural Language Processing (almost) from Scratch. Journal of Machine Learning Research 12 (2011) 2461-2505.

Lu, Wang, Bansal, Gimpel and Livescu (2015): Deep multilingual correlation for improved word embeddings. NAACL 2015.

Rastogi, Van Durme and Arora (2015): Multiview LSA: Representation learning via generalized CCA.

NAACL 2015.

Chiu, Korhonen and Pyysalo (2016): Intrinsic Evaluation of Word Vectors Fails to Predict Extrinsic Performance. ACL 2016.

Data and Code

Code for Artetxe etal. (2018):https://github.com/artetxem/uncovec

The MEN datasethttps://staff.fnwi.uva.nl/e.bruni/MEN

Datasets for word vector evaluationhttps://github.com/vecto-ai/word-benchmarks

Referenzen

ÄHNLICHE DOKUMENTE

Based on different performance met- rics, the proposed technique is compared with widely used and state of the art techniques such as K-means, agglomera- tive clustering,

For classifying Explicit connectives we follow a feature-based approach, developing features based on word embeddings and semantic similarity mea- sured between parts of the

2.3 Bias Identification in Word Embeddings In the original WEAT paper (Caliskan et al., 2017), several different IAT results have been con- firmed on pre-trained GloVe and word2vec

Results on intrinsic and extrinsic evaluation sup- port that images provide a perceptual context that benefits current textual embeddings. The major findings in our experiments are

as follows: a) using two word embedding methods f ; b) corpora of different sizes to induce E org , i.e., small, medium and web-scale; c) evaluation across 18 semantic benchmark

G old-R omane (Zauberkreis Verlag). Da jede dieser Untergruppen an eine klar definierte Erwartungshaltung gebunden ist, werden sie im folgenden als eigenständige Genres

In Section 2, we discuss related work; Section 3 describes the recurrent neural network language model we used to obtain word vectors; Section 4 dis- cusses the test sets; Section

Diese Vorgehensweise lässt somit die semantische paradigmatische Belegung einer konkreten Valenzstelle festmachen, die zwar aus sprachwissenschaft- licher Sicht