Performance Issue during Indexing - extra unused query

douggreen - May 13, 2008 - 13:14
Project:Drupal
Version:6.x-dev
Component:search.module
Category:bug report
Priority:normal
Assigned:Unassigned
Status:closed
Description

Not LIVE FROM THE MINNESOTA SEARCH SPRINT, but found on the plane ride home...

In search_index (the function that scores HTML), a SQL to UPDATE the {search_index} is done, and if that fails then an INSERT is done. However, since search_wipe is called just before this loop, there will never be any matching items for the UPDATE and it will always fail. The loop also can't introduce any duplicate records because the array key is the word. Thus I don't think we ever need to do the UPDATE. And the attached patch will half the number of queries needed for every word inserted into the index!

AttachmentSize
search_update.patch1.07 KB

#1

Damien Tournoud - May 13, 2008 - 13:55
Status:needs review» by design

Hum. And what about the big comment in front of the UPDATE that says:

The database will collate similar words (accented and non-accented forms, etc.),
and the score is additive, so first add and then insert.

This UPDATE is required, so this is by design.

#2

douggreen - May 13, 2008 - 15:16

Ah, I think you're right on the collate comment. But if that's the only case where we need to UPDATE, we should try the INSERT first and then do the UPDATE if the INSERT fails. The duplicate INSERT will fail because of the word_sid_type unique key. I think that this approach will result in one query instead of two 99% of the time.

AttachmentSize
257192.patch 1.21 KB

#3

Damien Tournoud - May 13, 2008 - 15:27
Status:by design» needs work

Right, we should invert those queries, because the INSERT will infrequently fail.

#4

douggreen - May 13, 2008 - 15:56
Status:needs work» needs review

#5

douggreen - May 13, 2008 - 16:20

And since scores are floats and not numbers the %d in the UPDATE should be fixed.

AttachmentSize
257912.patch 1.21 KB

#6

Damien Tournoud - May 13, 2008 - 16:32

Please also fix the type of the sid column in the UPDATE query (it should be an int). Oh this was messy, and the fact that I fixed that bug on the day of the release of D6 is not an excuse.

#7

douggreen - May 13, 2008 - 17:06

What's wrong with the type? It's a string, which is correct.

#8

Damien Tournoud - May 13, 2008 - 18:51

What's wrong with the type? It's a string, which is correct.

The sid is an int column:

      'sid' => array(
        'type' => 'int',
        'unsigned' => TRUE,
        'not null' => TRUE,
        'default' => 0,
        'description' => t('The {search_dataset}.sid of the searchable item to which the word belongs.'),
      ),

But there the %d in the UPDATE query is in quotes.

#9

douggreen - May 13, 2008 - 18:11

I didn't notice the quotes around '%d'... The attached patch fixes those too.

AttachmentSize
257912.patch 1.21 KB

#10

moshe weitzman - May 13, 2008 - 19:50

Looks good to me. This is a big performance win, and should be backported IMO.

#11

Damien Tournoud - June 4, 2008 - 22:58
Assigned to:douggreen» Anonymous

Ooops. We overslept on that one.

Here is a slightly modified version of douggreen's patch (with an updated comment).

Please review!

AttachmentSize
257912_1.patch 1.38 KB

#12

douggreen - June 6, 2008 - 15:16

@Damien, thanks for fixing the comment, I obviously missed that!

Is it at all possible to keep the comment on two lines, or did it break 80 characters that way?

Please remove the @ in front of db_query. I'm not sure why and when people need to add this, but I don't think it's needed.

#13

douggreen - June 29, 2008 - 11:12

I replaced @db_query. with db_query. I tried to make that comment fit on 2 lines, but couldn't make it terse yet explanatory. But while working on it, I modified Damien's text. I'm not going to be stubborn on what the comment says, it's the code here that's really important. As moshe points out, we might want to backport this, so let's try to agree on the comment verbiage, and get this RTBC!

#14

moshe weitzman - June 29, 2008 - 11:54

@douggreen - did you mean to attach a patch?

#15

douggreen - June 29, 2008 - 14:20

doh!

AttachmentSize
257912.patch 1.46 KB

#16

Damien Tournoud - June 30, 2008 - 00:17
Status:needs review» needs work

Hum. The comment looks good. But the @ in front of the db_query is required. The INSERT should fail with a duplicated key warning.

#17

douggreen - June 30, 2008 - 21:06

What does the @ do in php? I thought it was redundant. The only place in core that there is an @db_query is in modules/node/node.module line +194, and I kinda assumed that this was a mistake.

#18

Damien Tournoud - June 30, 2008 - 21:33

The @ prefix suppresses error reporting (http://www.php.net/manual/en/language.operators.errorcontrol.php).

The only place in core that there is an @db_query is in modules/node/node.module

... as well as in cache.inc and bootstrap.inc, everywhere an INSERT query could potentially fail.

#19

douggreen - July 1, 2008 - 12:10

I think that this is ready now... new patch attached restoring the '@' sign. Damien's patch (#11) above was good enough too. My latest patch should be identical to his, just with a slightly different comment.

The reason this patch has take so long is that it just took some education on my part to understand why we needed @db_query!

AttachmentSize
257912.patch 1.46 KB

#20

Damien Tournoud - July 1, 2008 - 12:17
Status:needs work» reviewed & tested by the community

@douggreen, I like your comment better.

Marking #19 as (very well) reviewed. We finally made it!

#21

catch - August 14, 2008 - 16:44

still applies cleanly.

#22

Dries - August 14, 2008 - 20:14
Status:reviewed & tested by the community» fixed

Tested, reviewed and committed. Thanks.

#23

moshe weitzman - August 15, 2008 - 01:23
Version:7.x-dev» 6.x-dev
Status:fixed» reviewed & tested by the community

patch applies just fine to D6.

#24

Gábor Hojtsy - September 17, 2008 - 06:47
Status:reviewed & tested by the community» fixed

Thanks, committed to 6.x.

#25

Anonymous (not verified) - October 1, 2008 - 06:52
Status:fixed» closed

Automatically closed -- issue fixed for two weeks with no activity.

 
 

Drupal is a registered trademark of Dries Buytaert.