In testing the SQL query from relevant_content_get_nodes(), there seems to be an error causing extraneous info to be included in the tag count. Because the term_node table holds records for multiple node revisions (vid), a node-term combination (nid-tid) is counted multiple times, once per revision. This can cause a node that was updated many times to appear as the most relevant node even though it might not have the highest number of common terms.
I believe the SQL join should be:
LEFT JOIN {term_node} tn ON tn.nid = n.nid AND tn.vid = n.vid AND {$term_sql}
instead of:
LEFT JOIN {term_node} tn ON tn.nid = n.nid AND {$term_sql}
By adding the tn.vid = n.vid condition, we only count the tags for the current (latest) node revision (because the node table only holds the last revision).
| Comment | File | Size | Author |
|---|---|---|---|
| #2 | relevant-content-wrong-query-723526.patch | 828 bytes | adr_p |
Comments
Comment #1
waldmanm commentedJust wondering if any of the maintainers has looked at this. From testing I've done this seems to be a serious error in the core functionality of the module - it does not calculate the "relatedness" of nodes properly, which might lead to erroneous results. The fix is easy - see my description above. I'd expect that either this should be fixed or, if I'm wrong, close the bug.
Micah
Comment #2
adr_p commentedI confirm the problem is serious. Without the modification, a node matching only one term, but having many revisions, will be placed way more higher in the result set than it should be. Patch below.
Comment #3
nicholasthompsonHi,
Drupal 6 is no longer supported. Sorry for this bug.