Kollaborative Filterung

(1)

Kollaborative Filterung

ML, Stefan Edelkamp

(2)

Example

5 Nutzer, 4 Filme, Bewertungen 1-5

Die Aufgabe ist, die Leerfelder zu füllen

D1 D2 D3 D4

U1 5 3 - 1

U2 4 - - 1

U3 1 1 - 5

U4 1 - - 4

U5 - 1 5 4

(3)

Ziel

● Vorhersage durch Matrix-Faktorisierung

● U1 ~ U2 (wg. D1,D2)

● U4 on D2 da U4 und U5 hoch auf D4

D1 D2 D3 D4

U1 4.97 2.98 2.18 0.98

U2 3.97 2.40 1.97 0.99

U3 1.02 0.93 5.32 4.93

U4 1.00 0.85 4.59 3.93

U5 1.36 1.07 4.89 4.12

(4)

Formeln

● P ~ Q^T = R

→ R'ij = Pi^T * Qj = sum Pik * Qkj

● Eij² = (R'ij – Rij)² = (Rij – sum Pik * Qkj)^2

● d/d Pik = -2 (R'ij – Rij) * Qkj = -2 * Eij * Qkj

● d/d Qkj = -2 (R'ij – Rij) * Pik = -2 * Eij *Pik Update P'ik := Pik + 2 * alpha * Eij * Qkj Update Q'kj := Qkj + 2 * alpha * Eij * Pik

(5)

Stopp

● Alpha = 0.0002 Lernrate

● Prädiktion ? Nicht = 0

● Matrix muss nicht exakt bestimmt, sondern für bestehende Werte approximiert werden

● Nimm jedes (ui,dj,Rij) im Trainingsset

(6)

Regularisierung

● Error + beta * (||P||+ ||Q||)

→ P'ik := Pik + 2 * alpha * (Eij * Qkj - beta Pik)

→ Q'kj := Qkj + 2 * alpha * (Eij * Pik- beta Qkj)

(7)

Ein und Ausgabe

@INPUT:

R : a matrix to be factorized, dimension N x M P : an initial matrix of dimension N x K

Q : an initial matrix of dimension M x K K : the number of latent features

steps : the maximum number of steps to perform the optimisation alpha : the learning rate

beta : the regularization parameter

@OUTPUT: the final matrices P and Q

(8)

Code

def matrix_factorization(R, P, Q, K, steps=5000, alpha=0.0002, beta=0.02):

Q = Q.T

for step in xrange(steps):

for i in xrange(len(R)):

for j in xrange(len(R[i])):

if R[i][j] > 0:

eij = R[i][j] - dot(P[i,:],Q[:,j]) for k in xrange(K):

P[i][k] = P[i][k] + alpha * (2 * eij * Q[k][j] - beta * P[i][k]) Q[k][j] = Q[k][j] + alpha * (2 * eij * P[i][k] - beta * Q[k][j]) R = dot(P,Q)

e = 0

for i in xrange(len(R)):

for j in xrange(len(R[i])):

if R[i][j] > 0:

e = e + pow(R[i][j] - numpy.dot(P[i,:],Q[:,j]), 2) for k in xrange(K):

e = e + (beta/2) * (pow(P[i][k],2) + pow(Q[k][j],2))

(9)

Erweiterungen

● Forderung alle Matrix-Einträge sind nicht ndegativ (non-negative matrix factorization NMF)

● Keine Subtraktion: Rekonstruktion Orginaler Daten durch Linearkombination latenter

Features.

(10)

Literatur

Gábor Takács et al (2008).

Matrix factorization and neighbor based algorithms for the Netflix prize problem.

In: Proceedings of the 2008 ACM Conference on Recommender Systems, Lausanne, Switzerland, October 23 - 25, 267-274.

Patrick Ott (2008). Incremental Matrix Factorization for Collaborative Filtering.

Science, Technology and Design 01/2008, Anhalt University of Applied Sciences.

Daniel D. Lee and H. Sebastian Seung (2001).

Algorithms for Non-negative Matrix Factorization. Advances in Neural Information Processing Systems 13: Proceedings of the 2000 Conference. MIT Press. pp.

556–562.

Daniel D. Lee and H. Sebastian Seung (1999).

Learning the parts of objects by non-negative matrix factorization. Nature, Vol.

401, No. 6755. (21 October 1999), pp. 788-791.