Download & Extend

Performance Issue during Indexing - extra unused query

Project:Drupal core
Version:6.x-dev
Component:search.module
Category:bug report
Priority:normal
Assigned:Unassigned
Status:closed (fixed)

Issue Summary

Not LIVE FROM THE MINNESOTA SEARCH SPRINT, but found on the plane ride home...

In search_index (the function that scores HTML), a SQL to UPDATE the {search_index} is done, and if that fails then an INSERT is done. However, since search_wipe is called just before this loop, there will never be any matching items for the UPDATE and it will always fail. The loop also can't introduce any duplicate records because the array key is the word. Thus I don't think we ever need to do the UPDATE. And the attached patch will half the number of queries needed for every word inserted into the index!

AttachmentSizeStatusTest resultOperations
search_update.patch1.07 KBIgnored: Check issue status.NoneNone

Comments

#1

Status:needs review» closed (works as designed)

Hum. And what about the big comment in front of the UPDATE that says:

The database will collate similar words (accented and non-accented forms, etc.),
and the score is additive, so first add and then insert.

This UPDATE is required, so this is by design.

#2

Ah, I think you're right on the collate comment. But if that's the only case where we need to UPDATE, we should try the INSERT first and then do the UPDATE if the INSERT fails. The duplicate INSERT will fail because of the word_sid_type unique key. I think that this approach will result in one query instead of two 99% of the time.

AttachmentSizeStatusTest resultOperations
257192.patch1.21 KBIgnored: Check issue status.NoneNone

#3

Status:closed (works as designed)» needs work

Right, we should invert those queries, because the INSERT will infrequently fail.

#4

Status:needs work» needs review

#5

And since scores are floats and not numbers the %d in the UPDATE should be fixed.

AttachmentSizeStatusTest resultOperations
257912.patch1.21 KBIgnored: Check issue status.NoneNone

#6

Please also fix the type of the sid column in the UPDATE query (it should be an int). Oh this was messy, and the fact that I fixed that bug on the day of the release of D6 is not an excuse.

#7

What's wrong with the type? It's a string, which is correct.

#8

What's wrong with the type? It's a string, which is correct.

The sid is an int column:

      'sid' => array(
        'type' => 'int',
        'unsigned' => TRUE,
        'not null' => TRUE,
        'default' => 0,
        'description' => t('The {search_dataset}.sid of the searchable item to which the word belongs.'),
      ),

But there the %d in the UPDATE query is in quotes.

#9

I didn't notice the quotes around '%d'... The attached patch fixes those too.

AttachmentSizeStatusTest resultOperations
257912.patch1.21 KBIgnored: Check issue status.NoneNone

#10

Looks good to me. This is a big performance win, and should be backported IMO.

#11

Assigned to:douggreen» Anonymous

Ooops. We overslept on that one.

Here is a slightly modified version of douggreen's patch (with an updated comment).

Please review!

AttachmentSizeStatusTest resultOperations
257912_1.patch1.38 KBIgnored: Check issue status.NoneNone

#12

@Damien, thanks for fixing the comment, I obviously missed that!

Is it at all possible to keep the comment on two lines, or did it break 80 characters that way?

Please remove the @ in front of db_query. I'm not sure why and when people need to add this, but I don't think it's needed.

#13

I replaced @db_query. with db_query. I tried to make that comment fit on 2 lines, but couldn't make it terse yet explanatory. But while working on it, I modified Damien's text. I'm not going to be stubborn on what the comment says, it's the code here that's really important. As moshe points out, we might want to backport this, so let's try to agree on the comment verbiage, and get this RTBC!

#14

@douggreen - did you mean to attach a patch?

#15

doh!

AttachmentSizeStatusTest resultOperations
257912.patch1.46 KBIgnored: Check issue status.NoneNone

#16

Status:needs review» needs work

Hum. The comment looks good. But the @ in front of the db_query is required. The INSERT should fail with a duplicated key warning.

#17

What does the @ do in php? I thought it was redundant. The only place in core that there is an @db_query is in modules/node/node.module line +194, and I kinda assumed that this was a mistake.

#18

The @ prefix suppresses error reporting (http://www.php.net/manual/en/language.operators.errorcontrol.php).

The only place in core that there is an @db_query is in modules/node/node.module

... as well as in cache.inc and bootstrap.inc, everywhere an INSERT query could potentially fail.

#19

I think that this is ready now... new patch attached restoring the '@' sign. Damien's patch (#11) above was good enough too. My latest patch should be identical to his, just with a slightly different comment.

The reason this patch has take so long is that it just took some education on my part to understand why we needed @db_query!

AttachmentSizeStatusTest resultOperations
257912.patch1.46 KBIgnored: Check issue status.NoneNone

#20

Status:needs work» reviewed & tested by the community

@douggreen, I like your comment better.

Marking #19 as (very well) reviewed. We finally made it!

#21

still applies cleanly.

#22

Status:reviewed & tested by the community» fixed

Tested, reviewed and committed. Thanks.

#23

Version:7.x-dev» 6.x-dev
Status:fixed» reviewed & tested by the community

patch applies just fine to D6.

#24

Status:reviewed & tested by the community» fixed

Thanks, committed to 6.x.

#25

Status:fixed» closed (fixed)

Automatically closed -- issue fixed for two weeks with no activity.

nobody click here