We should NOT re-index nodes that link to themselves. This simple patch fixes what is a horrible performance bug, probably causing EVERY node with a comment form to be indexed twice. There are probably other cases, but comment is the one that caused us to look into this, ... the problem exists any time a nodes links create an internal reference.

While indexing, search looks for links to other nodes, and uses this knowledge to help with the scoring factor. Here's a quick overview on how this works. Let's say node/10 links to node/20. The more links to a given node, the higher it's relevancy. This is handled through the {search_node_links} table. However, there is no implicit link relevancy score. The way that a node's relevancy is increased is that words from the referencing node are added to the indexed text of the referenced node. So, during indexing, if we see a link from one node to another, we MUST re-index the referenced node after indexing the referencing node. This is inefficient, but the way things work.

Thanks to agentrickard for noticing this on #568350: UI cleanup of status and re-index button!

Comments

douggreen’s picture

Title: Search links cause nodes many nodes to be indexed twice » Search links cause many nodes to be indexed twice
douggreen’s picture

Status: Needs review » Needs work

There's something wrong with this patch. We can't assume that we're indexing a nid, so something needs to be done to make sure that the type is right, possibly just adding a $type = 'node' to the conditional. Also, while reviewing this, please look above this in the code, and see if we should do this check before the link is put into the $results array.

This should be back-ported once committed to 7.x.

douggreen’s picture

Issue tags: +Performance
douggreen’s picture

#581594: minor db_query() fix to search for DB TNG includes a simpletest for this...

douggreen’s picture

Status: Needs work » Needs review
StatusFileSize
new644 bytes

And here's the extra check for $type...

cburschka’s picture

Status: Needs review » Needs work
+++ modules/search/search.module	20 Sep 2009 15:42:44 -0000
@@ -678,7 +678,7 @@
+    else if ($sid != $nid || $type != 'node') {

The style guide favors elseif over else if here.

I'm on crack. Are you, too?

berdir’s picture

Status: Needs work » Needs review
StatusFileSize
new2.69 KB

Rerolled with the elseif fixed and also added the tests from #581594: minor db_query() fix to search for DB TNG.

catch’s picture

Status: Needs review » Reviewed & tested by the community

Looks great.

webchick’s picture

Status: Reviewed & tested by the community » Needs work

Yay! Critical bug fixes with tests. Yum, my favourite! :)

Could we get a one or two line comment along with the fix though? There's a lot of good info in this issue about the bug which could stand to be captured inline so someone doesn't try and 'optimize' this later.

sun.core’s picture

Anyone? webchick is only asking to clarify docs. 2 minute job! ;)

jhodgdon’s picture

It's not a 2-minute job any more, because the patch above doesn't apply...

jhodgdon’s picture

Status: Needs work » Needs review
Issue tags: -Performance

#7: 569536_581594.patch queued for re-testing.

Status: Needs review » Needs work
Issue tags: +Performance

The last submitted patch, 569536_581594.patch, failed testing.

jhodgdon’s picture

Status: Needs work » Needs review
StatusFileSize
new2.72 KB

Here's a re-roll of the patch in #7, with an expanded comment. Hopefully I've captured what webchick was hoping to see in that comment...

jhodgdon’s picture

StatusFileSize
new2.72 KB

Hmmm, newline problem, try this one.

grendzy’s picture

I couldn't reproduce this bug, so I'm not sure what it's supposed to fix. Can anyone provide steps to reproduce? Also the attached test passes without the fix to search.module, so the test doesn't seem to relate to the bug.

I tried adding a watchdog statement that would log index operations that were skipped by this extra conditional, and it was never triggered. I also checked the contents of the search tables both with and without this patch and they looked the same to me.

jhodgdon’s picture

The tests came from a different issue, I gather -- see #7. They should really be in a separate patch, but here they are...

We should probalby add a test to verify that this patch actually fixes the issue...
The problem, as I understand it:

a) Make a node that links to itself.
b) Index that node.
c) During indexing, links in the text of the node being indexed, linking to other nodes, cause the linked-to node to be added to the search_node_links table and to be marked for reindexing.
d) This shouldn't happen if the link is to the node being indexed, but under the old code, it was.

grendzy’s picture

StatusFileSize
new2.77 KB

jhodgdon, thanks for the clarification! Here's an updated patch that tests for the bug itself. This test will now fail without the patch to search.module.

catch’s picture

StatusFileSize
new2.1 KB

Uploading just the test from #18 so testbot can fail it.

Status: Needs review » Needs work

The last submitted patch, 569536_search_twice.patch, failed testing.

catch’s picture

Status: Needs work » Reviewed & tested by the community

Patch looks good so #18 is RTBC.

mradcliffe’s picture

1 week, 2 days, patch might be out of date, retesting.

#18: 569536_search_twice.patch queued for re-testing.

jhodgdon’s picture

Just to clarify, #19 is catch just uploading the test so we could verify that the test fails without the patch (which it did). #18 is the patch that should be committed (which passes tests).

dries’s picture

Status: Reviewed & tested by the community » Fixed

Looks good, and glad to see we added tests. Committed to CVS HEAD.

jhodgdon’s picture

Version: 7.x-dev » 6.x-dev
Status: Fixed » Patch (to be ported)

We should consider this for D6 back-port as well, I think.

dpearcefl’s picture

Priority: Critical » Normal
Status: Patch (to be ported) » Postponed (maintainer needs more info)

Is this issue still needed?

dpearcefl’s picture

Priority: Normal » Major
Status: Postponed (maintainer needs more info) » Patch (to be ported)
jhodgdon’s picture

The fix has not been ported to Drupal 6, so yes, the issue is still needed until a decision is made.

albert volkman’s picture

Status: Patch (to be ported) » Needs review
StatusFileSize
new770 bytes

D6 backport.

jhodgdon’s picture

Please read the help text under attachments. If you name patches like that, the test bot will ignore them. :)

Anyway. I'm not doing search.module maintaining any more, so I'm unfollowing myself from this issue... I don't think anyone is actually maintaining search.module actually. So I apologize if the probable lack of reviews on this issue makes you think no one appreciates your efforts... :(

albert volkman’s picture

@jhodgdon my feelings aren't hurt at all. Just trying to whittle down the number of open tickets!

I named the file like that because it's for D6. I thought that was the standard practice since there aren't tests for D6?

albert volkman’s picture

StatusFileSize
new770 bytes

Renaming so testbot will be happy.

jhodgdon’s picture

Status: Needs review » Reviewed & tested by the community

Wow, looking through old issues...

This fix was carefully tested in 7.x and this is a straight backport of it to 6.x. I think we should consider adding this to 6.x.

jhodgdon’s picture

Version: 6.x-dev » 7.x-dev
Issue summary: View changes
Status: Reviewed & tested by the community » Fixed

Talked with Gabor (the Drupal 6 branch maintainer) and D6 issues are really not being committed unless they're really essential -- we really don't have a test system for Drupal 6 and it's too dangerous. So... putting this back to D7 / fixed.

Status: Fixed » Closed (fixed)

Automatically closed - issue fixed for 2 weeks with no activity.