On installation of the current 8.x I get the following error when using MyISAM:

Syntax error or access violation: 1071 Specified key was too long; max key length is 1000 bytes: CREATE TABLE {taxonomy_term_data} ( `tid` INT unsigned NOT NULL auto_increment COMMENT 'Primary Key: Unique term ID.', `uuid` VARCHAR(128) NULL DEFAULT NULL COMMENT 'Unique Key: Universally unique identifier for this entity.', `vid` VARCHAR(255) NOT NULL DEFAULT '' COMMENT 'The ID of the vocabulary to which the term is assigned.', `langcode` VARCHAR(12) NOT NULL DEFAULT '' COMMENT 'The language.langcode of this term.', `name` VARCHAR(255) NOT NULL DEFAULT '' COMMENT 'The term name.', `description` LONGTEXT NULL DEFAULT NULL COMMENT 'A description of the term.', `format` VARCHAR(255) NULL DEFAULT NULL COMMENT 'The filter_format.format of the description.', `weight` INT NOT NULL DEFAULT 0 COMMENT 'The weight of this term in relation to other terms.', PRIMARY KEY (`tid`), UNIQUE KEY `uuid` (`uuid`), INDEX `taxonomy_tree` (`vid`, `weight`, `name`), INDEX `vid_name` (`vid`, `name`), INDEX `name` (`name`) ) ENGINE = InnoDB DEFAULT CHARACTER SET utf8 COMMENT 'Stores term information.'; Array ( )

Specified key was too long of the table taxonomy_term_data

Reproducible on http://simplytest.me/project/drupal/8.x

About the server:

  • MySQL 5.5 (5.5.24-0ubuntu0.12.04.1) using MyISAM
  • PHP 5.3 (5.3.10-1ubuntu3.4)

Related issue
#1852896: Throw an exception if a schema defines a key that would be over 1000 bytes in MySQL

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

patrickd’s picture

To reproduce this locally you have to make sure to use MyISAM instead of the default InnoDB engine

1. Edit your mysqld settings-file (on ubuntu it's /etc/mysql/my.cnf)
2. Goto [mysql] and add the following lines

skip-external-locking
skip-innodb
default_storage_engine=MyISAM

3. Restart the mysqld by service mysql restart or /etc/init.d/mysql restart
Note that your not able to use your old InnoDB databases anymore with this configuration, you either need to convert them or create a new Database with this config for installing drupal 8.



On simplytest.me I'm using MyISAM because it turned out to be 20%-30% faster on installations, as there are more installations than actual site usage on the service I'd like to keep it that way.

Berdir’s picture

Something similar happened recently in #347988: Move $user->data into own table, see comment #117.

Sounds like this could use some sort of test, loop over all hook_schema() implementations and calculate the index length?

catch’s picture

Priority: Normal » Critical
sun’s picture

Status: Active » Needs review
FileSize
1.52 KB

This bug needs to be fixed independently, but FYI, we're trying to introduce an automated validation to prevent this from happening in the future: #1852896: Throw an exception if a schema defines a key that would be over 1000 bytes in MySQL

A) We can either limit the maximum field length of 'vid' (and thus ALL vocabulary IDs/machine names) to say, 64 characters.

B) Or we can limit the schema index lengths only.

C) Both.

I'd personally prefer to do A), and would actually like to start to think about introducing a common/globally accepted maximum length for config entity IDs/machine names of 64 chars (or if absolutely needed, 128 chars). FWIW, #type 'machine_name' defaults to a #maxlength of 64 since D7 already. That looks appropriate to me.

Damien Tournoud’s picture

Alternatively, is there any reason not to make vid a varbinary? That would also make the indexes more compact, which is a really good thing.

patrickd’s picture

#4

Your removing
- '#maxlength' => 255,
shouldn't there be a
+ '#maxlength' => 64,
?

Anyway, patch works and solves the issue on MyISAM! :)

thanks

sun’s picture

I looked into @Damien's suggestion in #5 - I did not verify them, but the user comments on http://dev.mysql.com/doc/refman/5.0/en/binary-varbinary.html are rather concerning.

sun’s picture

Title: MyISAM support broken » Taxonomy module fails to install on MyISAM due to too long schema index

Better issue title.

AFAICS, the only remaining part would be to double-check the D7 machine_name column of vocabularies, and if needed, perform some automated truncation of machine names (which would have consequences elsewhere).

And sadly, yes, D7's taxonomy_schema() defines a length of 255...

We will have to truncate the maximum length in one way or the other at some point, since #1186034: Length of configuration object name can be too small is still also on the table.

Berdir’s picture

See also #1852454: Restrict module and theme name lengths to 50 characters. I think 64 sounds like a good default length for most of these identifiers, be it modules, machine names, identifiers.

sun’s picture

Related: #1701014: Validate config object names is about to add a validation for the maximum config object name length.

We will need the identical solution for #1779026: Convert Text Formats to Configuration System — the filter format machine name also had a maximum of 255 chars in D7. :-/

I guess we'll need a update.inc helper function that allows to check existing machine names and potentially truncates them, though still ensuring uniqueness (hence, requiring to pass-in all existing names).

Something like this should do it:

function update_truncate_machine_names(array &$names, $maxlength = 64) {
  foreach ($names as &$name) {
    // Only truncate names that are too long.
    if (strlen($name) > $maxlength) {
      // Check truncation to full 64 chars first.
      $name = substr($name, 0, 63);
      // Initialize suffix counter. (Humans start to count at one.)
      $suffix = 1;
      // Loop until the name is unique.
      while (in_array($name, $names)) {
        $name = substr($name, 0, 61) . '_' . $suffix++;
      }
    }
  }
}
andypost’s picture

FileSize
595 bytes

Suppose while we have no clue about length of machine name let's just make a index shorter

Status: Needs review » Needs work

The last submitted patch, 1871032-vid-index-11.patch, failed testing.

andypost’s picture

Status: Needs work » Needs review

#11: 1871032-vid-index-11.patch queued for re-testing.

xjm’s picture

I agree with sun; I'd much prefer to enforce a maximum length for the values rather than #11.

Berdir’s picture

So do I.

But that might require more discussion, bigger changes, update functions, tests.. And it does not prevent the index length problem completely (Nothing stops you from adding an index over 5 machine name columns which might be a problem again).

What about getting the stop-gap in, so that we can get the index length validation in and then discuss cutting the max length down (64 sounds fine to me)

xjm’s picture

Yeah this is probably a duplicate of #1709960: declare a maximum length for entity and bundle machine names in the longrun, so I guess this patch is okay as a stopgap fix.

thedavidmeister’s picture

So you're saying to apply #11 now so that people can at least install the 8.x branch for dev but leave this issue open? or close this issue and immediately re-open a new critical issue about the data loss that will happen for people upgrading their D7 sites?

sun’s picture

Status: Needs review » Reviewed & tested by the community

I think it makes sense to move forward with #11, in order to unblock related issues.

But in the end, we have to do #4/#10 anyway at some point, and it doesn't make sense for me to push the problem to yet another new issue, so I'd suggest to commit #11 but revert this issue to active/needs review/work after committing, so we can investigate and do #4/#10 afterwards.

Net result: Other issues unblocked. This issue still critical. To figure out how to resolve the upgrade path for vocabulary machine names, and ideally, supply a generic helper facility + pattern for all other instances of the problem.

thedavidmeister’s picture

Seems fair. Pretty nasty that you can't install D8 at all right now on some servers.

Once this whole D8 thing is done, maybe we could try to think of a more sophisticated way to handle relations, dependencies and progress in issues. It really gets confusing to come into epic core threads that is already months old that is "fixed with a bandaid patch but left open for X".

Anyways, how does limiting the length of columns affect exporting/importing objects from Drupal 7 sites that were upgraded to D8 and presumably could have longer machine names already in the db than "fresh" D8 sites. I assume we need to handle changing the schema in the update script as well, not just on install?

Also, is it really a good idea to start numbering truncated machine name sequentially? doesn't that re-introduce the main problems that machine names were implemented to fix?

Ie.

Site A has in the database:

vocab 1 = long_name_foo
vocab 2 = long_name_bar

Site B has in the database:

vocab 1 = long_name_bar
vocab_2 = long_name_baz
vocab 3 = long_name_foo

Totally fine to deploy between the two currently using machine names. Run an update script that starts renaming them sequentially based on some query and things could start to get hairy. Even if you do something like order the machine names alphabetically it doesn't help if you have vocabs mid-alphabet that only exist in one instance of the site.

Would it be fair to say that a hash of the long name would be safer as it preserves the "identity" of the machine name, even if it is a little less human friendly at that point?

xjm’s picture

@thedavidmeister, probably best to discuss over in #1709960: declare a maximum length for entity and bundle machine names, but the idea is not that we would always truncate machine names. We would only do it for machine names that are being upgraded from D7 and have more than the maximum allowed characters. Starting in D8, it will become impossible to even create a machine name length over the maximum length.

thedavidmeister’s picture

@xjm - I understand that there is no problem for brand new D8 sites. I'm talking about two D7 sites that currently sync content or configuration or both based on machine name. When both are upgraded to D8 they may start "syncing" the wrong data if the truncation is based on a sequential algorithm, depending on the current states of their database.

Not a problem if one of the sites is just a dev instance of the other and can have the database overwritten with the prod site. If you have two prod sites syncing data by machine name then the sequential approach could really break things.

xjm’s picture

@thedavidmeister, Ah, now I understand what you are saying. I've added that to the thread at #1709960: declare a maximum length for entity and bundle machine names.

thedavidmeister’s picture

Also, IMO this issue should be closed when the patch is applied and the upgrade path discussion continued at #1709960: declare a maximum length for entity and bundle machine names (as well as bumping that issue to critical).

The title of this bug implies it is really just about the installation process, that other issue has some more discussion on how best to handle the upgrade process.

catch’s picture

Status: Reviewed & tested by the community » Needs work

No upgrade path for the index change?

andypost’s picture

Status: Needs work » Needs review
FileSize
1.04 KB

Added update function

thedavidmeister’s picture

Status: Needs review » Needs work

Nobody thinks we shouldn't prioritise an upgrade path... at the moment there's just two parallel discussions about the same issue and it really doesn't matter where our efforts go but let's not have extensive meta discussions about where the final patch should live and just focus on what it should look like.

Catch, are you advocating that we should *not* push the patch in #11 that prevents some people from installing D8 right now because we don't have an upgrade path for D7 users yet? or are you just saying that this situation "needs work" overall?

Is an md5 hash of existing machine names that are too long ok? Is there a case to not use hashes, other than human readability? (I do know that I used to swear at Panels every time it did that to me until somebody told me why it was doing that...)

The numeric thing is obviously a far more user friendly option but there's at least one easy way to confound it so I'm really worried it will just ruin somebody's day.

Also, is there somewhere where should be documenting changes like this as we go so they can be picked up by Coder or similar when the contrib migrations begin?

thedavidmeister’s picture

aaah looking at andypost's patch makes sense to me. I know what you're talking about now catch :)

catch’s picture

Status: Needs work » Needs review

@thedavidmeister I'm saying we should update the index change for 7.x sites so that 8.x and 7.x upgraded sites are kept in sync at all times.

As soon as we make schema changes in 8.x that aren't reflected in 7.x we introduce hidden bugs for the 7-8 upgrade path that might never get fixed, since there is no test coverage for that.

#25 looks good if it comes back green.

thedavidmeister’s picture

yeah, i got that when I looked in the patch. I think i just misunderstood what you were saying.

thedavidmeister’s picture

lol:

we absolutely need boilerplate upgrade tests, but whoever the unlucky person is who posts the first update hook to Drupal 8 is can do those

so, that would be andypost?

catch’s picture

No we already have some upgrade path tests fortunately, that one is only to have generic testing for schema changes.

andypost’s picture

Patch is green, needs RTBC

sun’s picture

Status: Needs review » Reviewed & tested by the community

Thanks!

catch’s picture

That should happen in taxonomy_update_8006() where the field is changed.

That should have been done anyway in fact in the original commit, but I missed it, see comment on db_change_field():

IMPORTANT NOTE: To maintain database portability, you have to explicitly recreate all indices and primary keys that are using the changed field.

That means that you have to drop all affected keys and indexes with db_drop_{primary_key,unique_key,index}() before calling db_change_field(). To recreate the keys and indices, pass the key definitions as the optional $keys_new argument directly to db_change_field().

vs. the update:

function taxonomy_update_8006() {
  db_change_field('taxonomy_term_data', 'vid', 'vid', array(
    'type' => 'varchar',
    'length' => 255,
    'not null' => TRUE,
    'default' => '',
    'description' => 'The ID of the vocabulary to which the term is assigned.',
  ));

catch’s picture

Status: Reviewed & tested by the community » Needs work
andypost’s picture

Status: Needs work » Reviewed & tested by the community
FileSize
1.48 KB
webchick’s picture

Status: Reviewed & tested by the community » Fixed

Committed and pushed to 8.x. Thanks!

Do we also need this change in 7.x?

andypost’s picture

@webchick D7 is not affected by this

xjm’s picture

Here are the indexes from taxonomy in D7:

    'indexes' => array(
      'taxonomy_tree' => array('vid', 'weight', 'name'), 
      'vid_name' => array('vid', 'name'), 
      'name' => array('name'),
    ),
    'indexes' => array(
      'term_node' => array('tid', 'sticky', 'created'), 
      'nid' => array('nid'),
    ), 

Since only the name field is a non-integer, they're all safely below 333 characters. Converting to a (very long) machine name for vocabulary is what caused the issue in D8.

webchick’s picture

Gotcha, thanks. :)

Status: Fixed » Closed (fixed)

Automatically closed -- issue fixed for 2 weeks with no activity.

Anonymous’s picture

Issue summary: View changes

Updated issue summary.