More than a year of usage on this module has shown that the current architecture based on PHP is not scalable for high demanding environment. It takes too much CPU cycles and too much memory to finish the computation (see #509424: too much memory consumption for details). Due to the complex nature of the problem, these is no way that the current implementation on PHP could handle this problem. Therefore, it has to be using other architecture.

There are 3 approaches to be evaluated:

1. Outsource the main matrix computation to 3rd party software (running locally), such as Octave, Matlab, Python+NumPy, Java+JAMA/CommonsMath. This would involve the least code change. Just change the Matrix.php to use those software is fine.

2. Outsource the entire recommender computation to 3rd party software running locally. Perhaps the most possible candidate is Apache Mahout (#414570: add local java (Mahout) support). The changes to the current code include some stub classes in Recommender.php and some stub functions in recommender.module. The benefit is that I don't have to develop the algorithm by myself anymore.

3. Outsource recommender computation to remote 3rd party software through WebServices/REST. This probably would involve a lot of code change (need to develop the interface for WebServices/REST). However, this might solve the performance problem once and for all. My best candidate is ApacheMahout (#503212: add Apache Mahout web services support), although some people suggest easyrec.

If you have any comments, please feel free to provide here.

Comments

slowfamily’s picture

Hi danithaca,

Any progress on this? Have you narrowed down what approach you are leaning to?

IMO I think either of the first two choices would be fine. I don't see the need to make it functional via REST, especially if that's going to take a bit more work. If you feel your algorithm works pretty well, then just recoding Matrix.php would probably be fairly easy and would get the job done, since the focal issue here is the matrix calculations.

slowfamily’s picture

Oops. Gave me an error when I submitted, and created a second comment. Sorry about that.

danithaca’s picture

Status: Active » Fixed

Decision made: will follow option #2 -- outsource the entire recommender computation to Apache Mahout. If Mahout can't run locally, it will use mysql-client to remotely connect to the drupal db for data access.

achton’s picture

Subscribe! Sounds incredibly cool.

Status: Fixed » Closed (fixed)

Automatically closed -- issue fixed for 2 weeks with no activity.

ressa’s picture

Thanks for your work on the module! I think you made the right decision in outsourcing the computation to Apache Mahout locally (option #2) in stead of outsourcing to a 3rd party. With VPS like linode at the prices they are at now, I would much rather spend money on increasing my slice at them, and overall performance, than buying a 3rd party service only for the recommender computations.

dabeast’s picture

Subscribing. This sounds like a great idea.

danithaca’s picture

submitted http://groups.drupal.org/node/137054 to work on this issue.