Performance Issue during Indexing - extra unused query
douggreen - May 13, 2008 - 13:14
| Project: | Drupal |
| Version: | 6.x-dev |
| Component: | search.module |
| Category: | bug report |
| Priority: | normal |
| Assigned: | Unassigned |
| Status: | closed |
Description
Not LIVE FROM THE MINNESOTA SEARCH SPRINT, but found on the plane ride home...
In search_index (the function that scores HTML), a SQL to UPDATE the {search_index} is done, and if that fails then an INSERT is done. However, since search_wipe is called just before this loop, there will never be any matching items for the UPDATE and it will always fail. The loop also can't introduce any duplicate records because the array key is the word. Thus I don't think we ever need to do the UPDATE. And the attached patch will half the number of queries needed for every word inserted into the index!
| Attachment | Size |
|---|---|
| search_update.patch | 1.07 KB |

#1
Hum. And what about the big comment in front of the
UPDATEthat says:This
UPDATEis required, so this is by design.#2
Ah, I think you're right on the collate comment. But if that's the only case where we need to UPDATE, we should try the INSERT first and then do the UPDATE if the INSERT fails. The duplicate INSERT will fail because of the word_sid_type unique key. I think that this approach will result in one query instead of two 99% of the time.
#3
Right, we should invert those queries, because the INSERT will infrequently fail.
#4
#5
And since scores are floats and not numbers the %d in the UPDATE should be fixed.
#6
Please also fix the type of the sid column in the
UPDATEquery (it should be an int). Oh this was messy, and the fact that I fixed that bug on the day of the release of D6 is not an excuse.#7
What's wrong with the type? It's a string, which is correct.
#8
The sid is an int column:
'sid' => array('type' => 'int',
'unsigned' => TRUE,
'not null' => TRUE,
'default' => 0,
'description' => t('The {search_dataset}.sid of the searchable item to which the word belongs.'),
),
But there the
%din theUPDATEquery is in quotes.#9
I didn't notice the quotes around '%d'... The attached patch fixes those too.
#10
Looks good to me. This is a big performance win, and should be backported IMO.
#11
Ooops. We overslept on that one.
Here is a slightly modified version of douggreen's patch (with an updated comment).
Please review!
#12
@Damien, thanks for fixing the comment, I obviously missed that!
Is it at all possible to keep the comment on two lines, or did it break 80 characters that way?
Please remove the @ in front of db_query. I'm not sure why and when people need to add this, but I don't think it's needed.
#13
I replaced
@db_query. withdb_query. I tried to make that comment fit on 2 lines, but couldn't make it terse yet explanatory. But while working on it, I modified Damien's text. I'm not going to be stubborn on what the comment says, it's the code here that's really important. As moshe points out, we might want to backport this, so let's try to agree on the comment verbiage, and get this RTBC!#14
@douggreen - did you mean to attach a patch?
#15
doh!
#16
Hum. The comment looks good. But the @ in front of the db_query is required. The INSERT should fail with a duplicated key warning.
#17
What does the @ do in php? I thought it was redundant. The only place in core that there is an @db_query is in modules/node/node.module line +194, and I kinda assumed that this was a mistake.
#18
The @ prefix suppresses error reporting (http://www.php.net/manual/en/language.operators.errorcontrol.php).
#19
I think that this is ready now... new patch attached restoring the '@' sign. Damien's patch (#11) above was good enough too. My latest patch should be identical to his, just with a slightly different comment.
The reason this patch has take so long is that it just took some education on my part to understand why we needed @db_query!
#20
@douggreen, I like your comment better.
Marking #19 as (very well) reviewed. We finally made it!
#21
still applies cleanly.
#22
Tested, reviewed and committed. Thanks.
#23
patch applies just fine to D6.
#24
Thanks, committed to 6.x.
#25
Automatically closed -- issue fixed for two weeks with no activity.