Currently Recommender API requires direct database access. This is the design choice for 2 reasons: 1) Mahout requires direct database access, 2) it has better performance. However, when deploying Recommender API on a remote server, people often can't give Recommender API direct database access because of technical limitations or security concerns. In those cases, the solution is to transfer required Drupal data to another database where Recommender API can have direct database access.

Some Drupal modules are for data transfer purposes:
1. http://drupal.org/project/services: Drupal publish data for others to pull.
2. http://drupal.org/project/feeds: Drupal pull data from another data source.

What we need here is Drupal push data to another database. This seems to be missing in Drupal. Some solutions:

First approach is to have Drupal output data and the remote RecAPI pulls it:

  • Drupal uses the services module to publish data, and RecAPI pulls it.
  • Drupal users Backup and Migrate module to dump data, then transfer it to RecAPI.
  • Drupal uses Views or RSS to export data, and RecAPI pulls it.

Second approach is to use database replication/synchronization tools to transfer data between Drupal and the database RecAPI will use. Such tools include:

Third approach is to have Drupal push data directly to RecAPI.

  • Could look at the framework in http://drupal.org/project/apachesolr on how data is pushed to another server.
  • Write a new Drupal module which uses XMLRPC or REST to call RecAPI web service and push data to it.
  • Could use Web Analytics (GA, piwik, etc) to transfer, say, browsing/purchasing data, directly from the client to RecAPI

I'm still investigating what is the best solution here.

Comments

danithaca’s picture

Title: support data transfer on HTTP (beyond direct database access) » support data transfer over HTTP (beyond direct database access)

I'll first look at the third approach, and then the second approach.
If you have suggestions, please comment.

danithaca’s picture

Some more thoughts:

  • ApacheSolr approach is quite heavy weight. You need to define data structure in xml, and then perhaps need another Solr instance to separate it from search index. This does not look like the right approach.
  • Have the client push data to the RecAPI server would create extra overhead to the RecAPI server, which is already overloaded. Since the Drupal client is already optimized to serve HTTP request, it's perhaps easier to use pull model defined in the first approach.
  • The second approach requires dependency on 3rd party applications, which is also not good.

So the most promising approach seems to be the first one.

danithaca’s picture

Looks like http://dbreplicator.org is a fork/successor of http://opensource.replicator.daffodilsw.com/. Both use java.rmi to transfer data so basically it's not applicable here. https://www.forge.funambol.org/download/ is very complicated. For the second approach, looks like http://symmetricds.codehaus.org is the only possible solution.

Another promising approach is just to generate CSV file and send both the input/output file automatically via HTTP. Could be the most effective way, and also a simple way.

mikeytown2’s picture

Something to think about
http://drupal.org/project/httprl

danithaca’s picture

Status: Active » Fixed

This is done. See #1238572: create cloud service to help people use the Recommender modules. for details on how to use the cloud service.

Status: Fixed » Closed (fixed)

Automatically closed -- issue fixed for 2 weeks with no activity.