Performance Issue during Indexing - search_dataset.sid_type unique key should be an Index

douggreen - May 13, 2008 - 13:07
Project:Drupal
Version:7.x-dev
Component:search.module
Category:bug report
Priority:critical
Assigned:Unassigned
Status:closed
Description

Not LIVE FROM THE MINNESOTA SEARCH SPRINT, but found on the plane ride home...

The {search_dataset} has a unique key named "sid_type". Because of the query in search_wipe, which is called on every single sid that gets indexed, this unique key should be an index.

    db_query("DELETE FROM {search_dataset} WHERE sid = %d AND type = '%s'", $sid, $type);

I think that this was an index at one time. And I think that this might of gotten mixed up in the conversion to schema api.

This patch should be backported to 6.x.

#1

douggreen - May 13, 2008 - 13:08

And here's the patch...

AttachmentSize
search_dataset_sid_type.patch 1.25 KB

#2

Damien Tournoud - May 13, 2008 - 13:28

This sid_type should in fact be a primary key.

The change from index to unique was consciously made during the 6.x development cycle. It lead to some problems (such as collations problems solved in #218403: Duplicate entry errors in search indexer), but it's a good choice.

#3

douggreen - May 13, 2008 - 15:20

I'm not too clear on the differences here. But I do know that the current "unique key" implementation is significantly impacting this DELETE which is done on every single node that is indexed. If you know a better way, please offer the patch. Thanks!

#4

Damien Tournoud - May 13, 2008 - 15:25

I know a better way: just make this key a primary key.

#5

earnie - May 15, 2008 - 15:04

@Damien: How is Primary key better than a Unique key in this instance?

#6

douggreen - May 15, 2008 - 15:43

A unique key is just a constraint, whereas a primary key is an index. You get no performance gain from constraints, but you do from indexes.

#7

Damien Tournoud - May 15, 2008 - 15:54

@douggreen: In fact, for MySQL the difference is not very clear. MySQL also creates an index for UNIQUE KEY, but it's not clear to me if there is prefix and suffix compression for these indexes, nor whereas MySQL can use it to lookup values. Moreover, PRIMARY KEYS are clustered (the physical order of rows change), whereas other index types are non-clustered.

For the {search_dataset} particular case, a PRIMARY KEY is what makes sense anyway.

#8

douggreen - May 15, 2008 - 16:19

I do know that the unique key in mysql is giving terrible performance on the DELETE FROM {search_dataset} WHERE sid = %d AND type = '%s', so it surprises me to hear that "MySQL also creates an index for UNIQUE KEY."

#9

robertDouglass - May 15, 2008 - 20:30

Subscribe.

#10

lilou - October 26, 2008 - 13:50

Reroll patch #1 and add description to update function.

AttachmentSize
issue-257910-10.patch 1.41 KB

#11

Dries - November 2, 2008 - 14:56
Status:needs review» fixed

Committed to CVS HEAD. Thanks.

#12

Damien Tournoud - November 2, 2008 - 15:02
Status:fixed» active

Sorry, why removing the unique constraint? It makes total sense!

#13

Damien Tournoud - November 2, 2008 - 15:15
Assigned to:douggreen» Anonymous
Status:active» needs review

Here is a revert of #10, changing the unique constraints in search_index and search_dataset by primary keys.

AttachmentSize
257910-search-primary-keys.patch 1.95 KB

#14

Dries - November 2, 2008 - 17:56
Status:needs review» fixed

I think you're right. Committed the follow-up patch.

#15

Damien Tournoud - November 2, 2008 - 20:40
Status:fixed» needs review

Sorry, finally forgot to drop the redundant 'word' index in the schema.

AttachmentSize
257910-search-primary-keys-followup.patch 596 bytes

#16

mfb - November 2, 2008 - 23:46
Priority:normal» critical
Status:needs review» needs work

There's an update script in system.install running on the search tables, but search module and its tables may not even be installed! This causes errors at update.php. The update script should be in search.install instead.

#17

Damien Tournoud - November 3, 2008 - 00:23
Status:needs work» needs review

Oups.

AttachmentSize
257910-search-primary-keys-followup.patch 2.27 KB

#18

webchick - November 3, 2008 - 06:00
Status:needs review» needs work

This no longer applies because of some whitespace changes in #303889: Impossible to update D6 -> D7 but should be a quick re-roll.

#19

chx - November 3, 2008 - 06:04
Status:needs work» needs review
AttachmentSize
257910.patch 2.1 KB

#20

kbahey - November 3, 2008 - 06:14
Status:needs review» reviewed & tested by the community

I tested this on top of #46 in http://drupal.org/node/303889, and that combination made update.php work again.

#21

webchick - November 3, 2008 - 06:33
Status:reviewed & tested by the community» fixed

Third time's a charm? :)

#22

Damien Tournoud - November 3, 2008 - 08:29

I'm pretty sure this issue will came back to haunt us. A ghost will step in, and say "this is an *horrible* idea, please revert". The only question is: when will that happen?

--project followup subject--

System Message - November 17, 2008 - 08:58

Automatically closed -- issue fixed for two weeks with no activity.

--project followup subject--

System Message - November 17, 2008 - 09:04

Automatically closed -- issue fixed for two weeks with no activity.

#23

System Message - November 17, 2008 - 09:12
Status:fixed» closed

Automatically closed -- issue fixed for two weeks with no activity.

 
 

Drupal is a registered trademark of Dries Buytaert.