We should NOT re-index nodes that link to themselves. This simple patch fixes what is a horrible performance bug, probably causing EVERY node with a comment form to be indexed twice. There are probably other cases, but comment is the one that caused us to look into this, ... the problem exists any time a nodes links create an internal reference.
While indexing, search looks for links to other nodes, and uses this knowledge to help with the scoring factor. Here's a quick overview on how this works. Let's say node/10 links to node/20. The more links to a given node, the higher it's relevancy. This is handled through the {search_node_links} table. However, there is no implicit link relevancy score. The way that a node's relevancy is increased is that words from the referencing node are added to the indexed text of the referenced node. So, during indexing, if we see a link from one node to another, we MUST re-index the referenced node after indexing the referencing node. This is inefficient, but the way things work.
Thanks to agentrickard for noticing this on #568350: UI cleanup of status and re-index button!
| Comment | File | Size | Author |
|---|---|---|---|
| #32 | search_twice-569536-d6-29.patch | 770 bytes | albert volkman |
| #29 | search_twice-569536-29-d6.patch | 770 bytes | albert volkman |
| #19 | 569536_search_twice.patch | 2.1 KB | catch |
| #18 | 569536_search_twice.patch | 2.77 KB | grendzy |
| #15 | 569536withnewline.patch | 2.72 KB | jhodgdon |
Comments
Comment #1
douggreen commentedComment #2
douggreen commentedThere's something wrong with this patch. We can't assume that we're indexing a nid, so something needs to be done to make sure that the type is right, possibly just adding a $type = 'node' to the conditional. Also, while reviewing this, please look above this in the code, and see if we should do this check before the link is put into the $results array.
This should be back-ported once committed to 7.x.
Comment #3
douggreen commentedComment #4
douggreen commented#581594: minor db_query() fix to search for DB TNG includes a simpletest for this...
Comment #5
douggreen commentedAnd here's the extra check for $type...
Comment #6
cburschkaThe style guide favors elseif over else if here.
I'm on crack. Are you, too?
Comment #7
berdirRerolled with the elseif fixed and also added the tests from #581594: minor db_query() fix to search for DB TNG.
Comment #8
catchLooks great.
Comment #9
webchickYay! Critical bug fixes with tests. Yum, my favourite! :)
Could we get a one or two line comment along with the fix though? There's a lot of good info in this issue about the bug which could stand to be captured inline so someone doesn't try and 'optimize' this later.
Comment #10
sun.core commentedAnyone? webchick is only asking to clarify docs. 2 minute job! ;)
Comment #11
jhodgdonIt's not a 2-minute job any more, because the patch above doesn't apply...
Comment #12
jhodgdon#7: 569536_581594.patch queued for re-testing.
Comment #14
jhodgdonHere's a re-roll of the patch in #7, with an expanded comment. Hopefully I've captured what webchick was hoping to see in that comment...
Comment #15
jhodgdonHmmm, newline problem, try this one.
Comment #16
grendzy commentedI couldn't reproduce this bug, so I'm not sure what it's supposed to fix. Can anyone provide steps to reproduce? Also the attached test passes without the fix to search.module, so the test doesn't seem to relate to the bug.
I tried adding a watchdog statement that would log index operations that were skipped by this extra conditional, and it was never triggered. I also checked the contents of the search tables both with and without this patch and they looked the same to me.
Comment #17
jhodgdonThe tests came from a different issue, I gather -- see #7. They should really be in a separate patch, but here they are...
We should probalby add a test to verify that this patch actually fixes the issue...
The problem, as I understand it:
a) Make a node that links to itself.
b) Index that node.
c) During indexing, links in the text of the node being indexed, linking to other nodes, cause the linked-to node to be added to the search_node_links table and to be marked for reindexing.
d) This shouldn't happen if the link is to the node being indexed, but under the old code, it was.
Comment #18
grendzy commentedjhodgdon, thanks for the clarification! Here's an updated patch that tests for the bug itself. This test will now fail without the patch to search.module.
Comment #19
catchUploading just the test from #18 so testbot can fail it.
Comment #21
catchPatch looks good so #18 is RTBC.
Comment #22
mradcliffe1 week, 2 days, patch might be out of date, retesting.
#18: 569536_search_twice.patch queued for re-testing.
Comment #23
jhodgdonJust to clarify, #19 is catch just uploading the test so we could verify that the test fails without the patch (which it did). #18 is the patch that should be committed (which passes tests).
Comment #24
dries commentedLooks good, and glad to see we added tests. Committed to CVS HEAD.
Comment #25
jhodgdonWe should consider this for D6 back-port as well, I think.
Comment #26
dpearcefl commentedIs this issue still needed?
Comment #27
dpearcefl commentedComment #28
jhodgdonThe fix has not been ported to Drupal 6, so yes, the issue is still needed until a decision is made.
Comment #29
albert volkman commentedD6 backport.
Comment #30
jhodgdonPlease read the help text under attachments. If you name patches like that, the test bot will ignore them. :)
Anyway. I'm not doing search.module maintaining any more, so I'm unfollowing myself from this issue... I don't think anyone is actually maintaining search.module actually. So I apologize if the probable lack of reviews on this issue makes you think no one appreciates your efforts... :(
Comment #31
albert volkman commented@jhodgdon my feelings aren't hurt at all. Just trying to whittle down the number of open tickets!
I named the file like that because it's for D6. I thought that was the standard practice since there aren't tests for D6?
Comment #32
albert volkman commentedRenaming so testbot will be happy.
Comment #33
jhodgdonWow, looking through old issues...
This fix was carefully tested in 7.x and this is a straight backport of it to 6.x. I think we should consider adding this to 6.x.
Comment #34
jhodgdonTalked with Gabor (the Drupal 6 branch maintainer) and D6 issues are really not being committed unless they're really essential -- we really don't have a test system for Drupal 6 and it's too dangerous. So... putting this back to D7 / fixed.