CRE currently offers valuable user, content, and comment recommendations based on a single criterion, voting behavior.

There are many other criteria we might want to include in content recommendations. Examples include:

* recommend users based on similar user profile data
* recommend users based on similar
* recommend content based on similar terms

Ideally we would be able to produce recommendations based on any one of these criteria or - more importantly - on a synthesis of all of criteria.

Looking briefly at the CRE code, it seems that the methods are partially generalized. That is, while some methods are very voting-specific, others seem to be relatively agnostic in terms of the parameters handled. In terms of data storage, the cre_average_vote table is obviously voting-specific but cre_similarity_matrix looks to be generic.

How much work would it be to convert the module to use, e.g., a hook system that allowed modules to implement recommendation parameters, with votingapi being one such implementation?

An existing approach that's potentially relevant is the search engine, which assigns match quality scores based on diverse criteria, with an admin UI to adjust the relative weighing of these criteria. Is this a useful model for enabling recommendation on multiple weighted parameters?

Comments

Scott Reynolds’s picture

Assigned: Unassigned » Scott Reynolds

Have you looked at how cre_top works in the API documentation? You are able to provide it with a function what allows you to augment the query object with new tables and other criteria.

So if you wanted to personalize a search, you call cre_top and augment object with your search db query.

As far as generalizing the cre_similarity matrix to use something other then votingapi, I don't know how I would approach that, but of course will listen to ideas. I believe though, the better approach is to see how can the other criteria can be modeled as a vote.

(for instance, a node with a tag gets a +1 vote for that tag. A node view gives a +1 vote on that node etc). If you give me more information with a tiny spec I can lay some ideas out for ya

nedjo’s picture

Hi Scott,

Thanks for your response.

There are several existing approaches to the question of recommending content or, looked at slightly differently, inferring what content is related or similar to a given user or piece of content:

  1. Similar By Terms http://drupal.org/project/similarterms

    This simple module provides a block that shows, for a particular piece of content, other pieces of content that are similar to it in terms of the categories (tags, taxonomy terms) that have been applied.

    Comparison is done through a single SQL query that considers the number of shared terms.

  2. Similar Entries, http://drupal.org/project/similar

    Similar Entries presents a list of content similar to the content being viewed using two criteria: (1) the similarity of the content itself and (2) the similarity of categories applied to the content.

    Only two fields (title and body) are considered when comparing similarity. This is done through the use of MySQL full text indexes. Category (taxonomy term) matching is rudimentary and looks to answer the question "does the other piece of content share at least one term with this piece of content".

  3. Related Links, http://drupal.org/project/relatedlinks

    Includes support for "discovered links" (nodes inferred to be similar to the node being viewed). The particular function is _relatedlinks_get_discovered_links(). This code uses the search module's functionality and indexing to determine content similarity based on similar keyword occurrence as well as implementing taxonomy matching. Matching takes into account the author and date of content and results are collated with some basic ranking parameters.

What's missing is a way to enable these various approaches to interact and form part of an overall set of recommendations. It's a drag to have to choose one solution or another when what we typically want is control over how we combine them.

Looking at Similar By Terms, for example. Nice implementation. But hard-coded to the particular block context. I can't take that as a parameter and mix it with other similarity parameters. I can't do anything with it other than enable its block.

This is where CRE comes in. It's what a content recommendation API is all about. What's confusing, though, is that the data are described in terms of "vote" rather than, say, "score" totals.

Can we enable other parameters to be plugged into CRE?

Sticking with our term similarity measure. say we calculate a percentage-based score by determining a piece of content is 100% similar if it shares all terms with content item X and 0 % similar if it shares none. Can we use this score as a new criterion in our content recommendation? Can we add support to similarterms.module that hooks into CRE? If so, how? If not, what changes would be needed to CRE?

(And then can we then, e.g., expose the similarity scores to Views, so we can construct arbitrary views sorted by similarity, rather than the hard-coded block lists currently in these modules?)

I believe though, the better approach is to see how can the other criteria can be modeled as a vote.

This is indeed what we need to do: model all criteria on a common basis. I suspect it would be clearer to call that a "score" (as we do in Search module), but the semantics are not critical.

Likely implementing this would require moving to a cron-based processing approach like we use in Search. That is, we need a way to index all new content (and users, and comments) and then mark it for reindexing when it when it has been edited. (I vaguely wonder if we could piggy-back on Search module's system, since the conditions under which indexing and reindexing need to happen might be the same. But probably not.)

So here's the "tiny spec" you suggested:

Enable the consideration of two additional criteria in content recommendation:

  • Term similarity (the content has been similarly categorized). Model implementation and sample code: Similar By Terms module.
  • Content similarity (the content is similar in terms of wording or the values in particular fields). Model implementation and sample code: either Similar Entries (using MySQL full text indexing) or Related Links (leveraging search module indexing).

Ideally these implementations could go into their respective modules (e.g., Similar By Terms) in such a way that they would be taken into account in node, user, and content recommendations as provided by those CRE modules.

Extras on the wishlist:

* Ability for admins to tweak the weighting of each score parameter

Possible hook approach:


function cre_nodeapi(&$node, $op, $teaser, $page) {
  switch ($op) {
    case 'insert':
    case 'update':
      module_invoke_all('cre_queue', 'node', $node);
      break;
  }
}

jcruz’s picture

Hi Scott and nedjo,

Sounds like you're on to something. It seems like an ideal Recommendation Engine would take into account votes, taxonomy, and similarity scores.

Scott, I'm interested to see where you think we should go from here in order to implement this. Also, thanks for the great module.

-John

nedjo’s picture

My "possible hook approach" above makes no sense. Instead, we would likely do best to model on the search module's code, see:

search_cron(), http://api.drupal.org/api/function/search_cron/5
hook_update_index(), http://api.drupal.org/api/function/hook_update_index/5
search_index(), http://api.drupal.org/api/function/search_index/5

This patch on search module has some useful approaches in terms of how to enable hook_based rankings:

http://drupal.org/node/145242

nedjo’s picture

I've committed a draft module to my sandbox to sketch out the first steps of a generecized approach:

http://cvs.drupal.org/viewvc.py/drupal/contributions/sandbox/nedjo/modul...

It introduces two new hooks, 'recommended_types' and 'recommended_factors'.

Types are types of objects supporting ranking, e.g., 'node' and 'user'.

Factors are ways that those objects can be ranked, e.g., by comment count, by number of hits, etc.

Factors can be adjusted (given different weights) through a UI based on that in the search module.

Ranking can be generic (give me the top ranked nodes) or can be in reference to a given item (e.g., node 23). The reference approach I've only sketched in and not yet written any implementations for.

moshe weitzman’s picture

Any comments on nedjo's sandbox code?

icecreamyou’s picture

I like this idea--I'd really like to see CRE take CCK fields into account.

coupet’s picture

Computed rankings based on multiple factors for each object should be available to Search module for sorting.

Scott Reynolds’s picture

I did this stuff with the 6 version. Via the similarity objects module, http://drupal.org/project/similarity