Kollaborative Filterung
ML, Stefan Edelkamp
Example
5 Nutzer, 4 Filme, Bewertungen 1-5
Die Aufgabe ist, die Leerfelder zu füllen
D1 D2 D3 D4
U1 5 3 - 1
U2 4 - - 1
U3 1 1 - 5
U4 1 - - 4
U5 - 1 5 4
Ziel
● Vorhersage durch Matrix-Faktorisierung
● U1 ~ U2 (wg. D1,D2)
● U4 on D2 da U4 und U5 hoch auf D4
D1 D2 D3 D4
U1 4.97 2.98 2.18 0.98
U2 3.97 2.40 1.97 0.99
U3 1.02 0.93 5.32 4.93
U4 1.00 0.85 4.59 3.93
U5 1.36 1.07 4.89 4.12
Formeln
● P ~ Q^T = R
→ R'ij = Pi^T * Qj = sum Pik * Qkj
● Eij² = (R'ij – Rij)² = (Rij – sum Pik * Qkj)^2
● d/d Pik = -2 (R'ij – Rij) * Qkj = -2 * Eij * Qkj
● d/d Qkj = -2 (R'ij – Rij) * Pik = -2 * Eij *Pik Update P'ik := Pik + 2 * alpha * Eij * Qkj Update Q'kj := Qkj + 2 * alpha * Eij * Pik
Stopp
● Alpha = 0.0002 Lernrate
● Prädiktion ? Nicht = 0
● Matrix muss nicht exakt bestimmt, sondern für bestehende Werte approximiert werden
● Nimm jedes (ui,dj,Rij) im Trainingsset
Regularisierung
● Error + beta * (||P||+ ||Q||)
→ P'ik := Pik + 2 * alpha * (Eij * Qkj - beta Pik)
→ Q'kj := Qkj + 2 * alpha * (Eij * Pik- beta Qkj)
Ein und Ausgabe
@INPUT:
R : a matrix to be factorized, dimension N x M P : an initial matrix of dimension N x K
Q : an initial matrix of dimension M x K K : the number of latent features
steps : the maximum number of steps to perform the optimisation alpha : the learning rate
beta : the regularization parameter
@OUTPUT: the final matrices P and Q
Code
def matrix_factorization(R, P, Q, K, steps=5000, alpha=0.0002, beta=0.02):
Q = Q.T
for step in xrange(steps):
for i in xrange(len(R)):
for j in xrange(len(R[i])):
if R[i][j] > 0:
eij = R[i][j] - dot(P[i,:],Q[:,j]) for k in xrange(K):
P[i][k] = P[i][k] + alpha * (2 * eij * Q[k][j] - beta * P[i][k]) Q[k][j] = Q[k][j] + alpha * (2 * eij * P[i][k] - beta * Q[k][j]) R = dot(P,Q)
e = 0
for i in xrange(len(R)):
for j in xrange(len(R[i])):
if R[i][j] > 0:
e = e + pow(R[i][j] - numpy.dot(P[i,:],Q[:,j]), 2) for k in xrange(K):
e = e + (beta/2) * (pow(P[i][k],2) + pow(Q[k][j],2))
Erweiterungen
● Forderung alle Matrix-Einträge sind nicht ndegativ (non-negative matrix factorization NMF)
● Keine Subtraktion: Rekonstruktion Orginaler Daten durch Linearkombination latenter
Features.
Literatur
Gábor Takács et al (2008).
Matrix factorization and neighbor based algorithms for the Netflix prize problem.
In: Proceedings of the 2008 ACM Conference on Recommender Systems, Lausanne, Switzerland, October 23 - 25, 267-274.
Patrick Ott (2008). Incremental Matrix Factorization for Collaborative Filtering.
Science, Technology and Design 01/2008, Anhalt University of Applied Sciences.
Daniel D. Lee and H. Sebastian Seung (2001).
Algorithms for Non-negative Matrix Factorization. Advances in Neural Information Processing Systems 13: Proceedings of the 2000 Conference. MIT Press. pp.
556–562.
Daniel D. Lee and H. Sebastian Seung (1999).
Learning the parts of objects by non-negative matrix factorization. Nature, Vol.
401, No. 6755. (21 October 1999), pp. 788-791.