Refine the computePrediction() logic for the CorrelationRecommender algorithm

danithaca - June 5, 2009 - 16:11
Project:Recommender API
Version:6.x-2.x-dev
Component:Code
Category:bug report
Priority:critical
Assigned:danithaca
Status:active
Description

The code works but looks ugly. Need to clean it up a little bit.

Also, it doesn't support (Resnick 94), which is to compute the mean based only on the items that are co-voted. I think my algorithm should work fine. But it's better to provide a variation as well.

#1

danithaca - June 5, 2009 - 22:26
Priority:minor» normal

Also, implement Amazon's adjusted tweak (Linden, 2003)

#2

danithaca - June 29, 2009 - 21:28

Fixed (Resnick, 94) problem as a side effect of #483102: Internalize NAN support to the Vector/Matrix class.
however, computePrediction() still need more work.

#3

danithaca - June 29, 2009 - 23:45
Category:task» bug report

Not fixed. Should work on it again.

#4

danithaca - July 3, 2009 - 15:06
Priority:normal» critical

has to fix this and make it more efficient (maybe limit to k nearest neighbor). otherwise it'll cost too much time.
on a site w/ 50 users and 500 nodes, computeSimilarity() took <10 seconds, and computePrediction() took 5-6 mins.

#5

danithaca - July 3, 2009 - 21:45

the computationPrediction() logic probably requires a new architecture. the key to improve performance is to skip un-necessary computations, such as using knn (k-nearest-neighbor)

#6

danithaca - July 3, 2009 - 23:30

implemented the knn algorithm. could work faster. but the bug still exists. and the code is error prone. need refactoring.

#7

danithaca - July 4, 2009 - 02:23

we re-used the 'lowerbound' setting for computeSimilarity(). maybe we should use a new config param for it.

#8

jabowery - September 13, 2009 - 18:03

I suspect you're going to need to come up with some sort of API to allow offloaded batch computation. That way people can hook up multicore processors on the LAN to do the matrix computations.

Are you using singular value decomposition?

#9

danithaca - September 13, 2009 - 19:40

thanks for the comments.
i'm not sure whether PHP would support multiprocessing. for high performance computing, i'm thinking to outsource to ApacheMahout or other Java/C++ implementations. this module will remain logically simple, and provide interface for 3rd party app integration.
SVD is going to be the next algorithm I'm going to develop.

 
 

Drupal is a registered trademark of Dries Buytaert.