The recently-applied search patch from http://drupal.org/node/146466 causes database schema inconsistency on mysql (at least) after an upgrade to 5.3 to 6. schema.module reports:

* search_dataset
o indexes sid_type: missing in database
o unique keys sid_type: unexpected (not an error)
* search_index
o unique keys word_sid_type: missing in database
o unique keys sid_word_type: unexpected (not an error)

The "missing" indices are in the schema but not the database. The "unexpected" indices are what actually exist in the database post-upgrade; schema.module never reports extra indices as an error.
It looks like the indexes are not properly updated to reflect the new schema. Needless to say this will probably result in some pretty poor performance.

CommentFileSizeAuthor
#7 192348.patch1.57 KBdouggreen
Support from Acquia helps fund testing for Drupal Acquia logo

Comments

douggreen’s picture

Barry, I don't understand what schema module is report that you think I need to fix. IIRC correctly, and I want to be careful to not speak for someone else, but I believe that Steven thought that search module had a few too many indexes. Are you saying that we removed one inconsistently? Or are you saying that we removed one that we really need?

bjaspan’s picture

I have no opinion about what database indexes should exist for the search tables. However, the indexes that exist in the database and the indexes that are declared by search_schema() should be the same. Currently, if you upgrade a site from D5.3 to D6, they are not the same. The differences are shown by the schema comparison report.

For example, the report shows that the search_index table is declared (in search.install) to have a unique index named word_sid_type but, in the database, that unique index does not exist. Instead, a unique index named sid_word_type exists. This is not just an index naming issue; if the column orders in the index are different (as suggested by the name) then the database index is useless and Drupal's search module for sites upgraded from 5.3 to 6 will be very, very slow.

JirkaRybka’s picture

Also differently named indexes may be hardly referenced in future hook_update_n() functions, if needed to change them.

douggreen’s picture

Assigned: Unassigned » douggreen

{search_index} appears to be a naming problem. In hook_schema it's called word_sid_type, but in system_update_6036, it's called side_word_type. I can fix that in another system update if you think it's going to cause some confusion.

I might see the problem with {search_dataset}. There is a unique key on {search_dataset}. Is there a difference between a unique key and a unique index. I actually thought they were the same, but maybe they aren't. Based on my schema definition, do I need to create a new hook_update and drop the KEY and create an INDEX?

This is the schema definition.

  $schema['search_dataset'] = array(
    ...,
    'indexes' => array('sid_type' => array('sid', 'type')),
  );

And this is the update code (only used for MySQL):

     $ret[] = update_sql("ALTER IGNORE TABLE {search_dataset} ADD UNIQUE KEY sid_type (sid, type)");
bjaspan’s picture

Looking at search.install I see:

  $schema['search_index'] = array(
    ....
    'unique keys' => array('word_sid_type' => array('word', 'sid', 'type')),
  );

Looking system_update_6036() I see:

      $ret[] = update_sql("ALTER IGNORE TABLE {search_index} ADD UNIQUE KEY sid_word_type (sid, word, type)");

So should the index be on (word, sid, type) or on (sid, word, type)? This isn't just about confusion based on the name, they need to be consistent. Otherwise, some queries will use the index for sites created from a fresh install but when the site has been upgraded from D5, or vice versa. Nightmare! To fix this, you do not need to create a new system_update function; just decide which column order is right for the index and then either change the schema or change system_update_6036() (Drupal's policy is not to write update functions from HEAD to HEAD; just modify 6036 since D6 isn't released yet). Since I don't know anything about the history of this code I can't tell you which one is right to change.

Regarding key/index/unique, here's the scoop: KEY and INDEX are synonyms to mysql. "KEY" == "INDEX" == non-unique index which means more than one row can exist with identical values in all the index columns. "UNIQUE KEY" == "UNIQUE INDEX" == unique index which means that multiple rows with identical values in all the index columns are forbidden. The reason for the distinction is that databases can make substantially different optimizations based on the knowledge that an index is either unique or not unique. (FYI, just to complete the picture, "PRIMARY KEY" == "UNIQUE KEY" where additionally all of the index columns must be NOT NULL, and there can only be one such index per table).

In Schema API, 'indexes' declares non-unique indexes (what mysql calls INDEX a.k.a. KEY) and 'unique keys' declares unique indexes (what mysql calls UNIQUE KEY a.k.a. UNIQUE INDEX). 'unique key' is called 'unique key' and not 'unique index' just because 'key' sounds more like something that is unique.

So, in search.install, I see:

  $schema['search_dataset'] = array(
    ...
    'indexes' => array('sid_type' => array('sid', 'type')),
  );

and in system_update_6036() I see

      $ret[] = update_sql("ALTER IGNORE TABLE {search_dataset} ADD UNIQUE KEY sid_type (sid, type)");

So in one place you are declaring a non-unique index and in the other a unique index. This is a big problem, but again fixing it is easy. Just decide which one is right and do the same thing in both places.

One other thing. I see this comment in system_update_6036():

      // Add a unique index for the search_index.
      // Since it's possible that some existing sites have duplicates,
      // create the index using the IGNORE keyword, which ignores duplicate errors.
      // However, pgsql doesn't support it

I did not follow all the details of this patch but, ummm... unique indexes are supposed to be unique. I have no idea what the consequences are of using ALTER IGNORE TABLE to make mysql ignore the fact that you are declaring an illegal unique index. At a guess, I'd say that mysql's "unique" index for that table will exclude rows and the search module will end up giving incorrect results in very subtle, hard-to-debug ways. Are you really really sure you understand what you are doing by intentionally creating a "unique" index on non-unique sets of rows?

To put it another way, there is a reason pgsql does not support this. Drupal almost certainly should not be doing it.

bjaspan’s picture

I just read about ALTER IGNORE TABLE. The docs say:

IGNORE is a MySQL extension to standard SQL. It controls how ALTER TABLE works if there are duplicates on unique keys in the new table or if warnings occur when strict mode is enabled. If IGNORE is not specified, the copy is aborted and rolled back if duplicate-key errors occur. If IGNORE is specified, only the first row is used of rows with duplicates on a unique key, The other conflicting rows are deleted. Incorrect values are truncated to the closest matching acceptable value.

Well, okay. The behavior is clearly specified. If you wrote system_update_6036() to expect that ALTER TABLE IGNORE will delete rows from the table that violate the unique-key constraint and that is the behavior you want, fine. I'm concerned, though, because your comment says "Since it's possible that some existing sites have duplicates,
create the index using the IGNORE keyword, which ignores duplicate errors." mysql does not *ignore* duplicate errors, it deletes the rows that would cause duplicate errors. Is that what you meant it to do?

Note that whether or not using ALTER IGNORE TABLE is correct for system_update_6036(), you still need to make sure that for both search tables in question the unique vs. non-unique property and column order of indexes match for new Drupal 6 installations and for Drupal 5 to Drupal 6 upgrades.

douggreen’s picture

Status: Active » Needs review
FileSize
1.57 KB

Yes, that's the behavior I want -- delete the duplicate rows, and both tables need to have UNIQUE keys. Thanks for catching this inconsistency!

Dries’s picture

I've committed the patch in #7 and as such I'm lowering the priority. I haven't tested this with the schema.module which is why I leave the patch as 'code needs review'. Barry, do you think you could give this a try and mark it 'fixed' if all the problems disappeared?

Dries’s picture

Priority: Critical » Normal

I committed this patch to CVS HEAD and as such I'm lowering the priority. I haven't tested it against the schema module yet, which is why I'm not marking this as 'fixed' yet. It would be great if Barry, or someone else, could confirm that this problem has been fixed. Thanks.

bjaspan’s picture

Status: Needs review » Fixed

This patch results in zero schema inconsistencies for a fresh install and 5.3 to 6 upgrade on mysql and pgsql.

Anonymous’s picture

Status: Fixed » Closed (fixed)

Automatically closed -- issue fixed for two weeks with no activity.