When writing hook_update_N() functions to create a new table, it seems natural and obvious to use the module's hook_schema() function to access the current definition of the table to avoid duplicating the table definition in the hook_update_N() function. However, you cannot safely use hook_schema() from within a hook_update_N() function.

Consider the following scenario: You created the module example. Initially, it has no tables, and its initial schema is empty. For version 6.x-1.5, you define a new table T. This means that you create the file M.install and, in it, create the function example_schema() which defines the table T:

function example_schema() {
  $schema['T'] = array(
    'fields' => array(
      'a' => array('type' => 'int'),
      'b' => array('type' => 'int'),
    )
  );
  return $schema;
}

You also write an update function to bring module M from version 6.x-1.0 to version 6.x-1.5 by creating table T:

function example_update_6100() {
  $schema = drupal_get_schema('M');
  $ret = array();
  db_create_table($ret, 'T', $schema['T']);  // DON'T DO THIS!
  return $ret;
}

For the 6.x-1.8 release, M no longer needs the field T.b. So, you update example_schema() to reflect the current schema and add an update function:

function example_schema() {
  $schema['T'] = array(
    'fields' => array(
      'a' => array('type' => 'int'),
      // NOTE: field 'b' has been removed
  ));
  return $schema;
}

function example_update_6100() {
  $schema = drupal_get_schema('M');
  $ret = array();
  db_create_table($ret, 'T', $schema['T']);  // DON'T DO THIS
  return $ret;
}

function example_update_6101() {
  $ret = array();
  db_drop_field($ret, 'T', 'b');
  return $ret;
}

Everything looks fine, but it isn't. The problem is that example_update_6101() cannot assume the field T.b exists. Dropping it can result in a failed SQL query. To understand why, consider two case histories:

  1. User 1 installs version 6.x-1.0 of the module example, upgrades to version 6.x-1.5 when available, and upgrades to version 6.x-1.8 when available. This works fine.
  2. User 2 installs version 6.x-1.0 of example.module, does not upgrade to version 6.x-1.5 when available, but does upgrade to version 6.x-1.8 when available. When user 2 upgrades to version 6.x-1.8 and runs update.php, example_update_6100() and example_update_6101() both run but example_schema() already contains the new structure. When example_update_6100() calls db_create_table($ret, $schema['T']), field T.b is not created. Therefore, when example_update_6101() tries to drop it, an error occurs.

The crux of the problem is that hook_schema() always defines the current schema for the module. However, hook_update_N() functions have to assume that the database matches the schema as it was when the hook_update_N() function was written.

The solution is simple. Do not refer to your own module's hook_schema() from within an update function. Instead, make everything explicit. In the above example, example_update_6100() has to specify the table structure itself, like this:

function example_update_6100() {
  $schema['T'] = array(
    'fields' => array(
      'a' => array('type' => 'int'),
      'b' => array('type' => 'int'),
  ));

  $ret = array();
  db_create_table($ret, 'T', $schema['T']);  // THIS IS SAFE BECAUSE $schema IS EXPLICIT
  return $ret;
}

This way, each function in the sequence of updates always knows exactly what it is doing. Sure, it looks redundant to include the definition of table T in both example_schema() and example_update_6100() but that is temporary. In the future when the definition of table T has radically changed, example_update_6100() will still contain its original definition and look nothing at all like the definition in example_schema().

Comments

artur.ejsmont’s picture

I noticed that drupal_write_record function can lead to the same type of issues if used in hook_update_XXXX.

It uses current schema definition calling "hook_schema" not inspecting the schema that is in the database. Let me draw a scenario for you:

1. my first release has table "table" in hook_shema with column A
2. my update 6001 has code like this

    $ret = array();
    db_add_field($ret, 'table', 'B', array('type' => 'varchar', 'length' => 32, 'not null' => TRUE, 'default' => '') );
    $object = new stdClass();
    $object->A = 'ok';
    $object->B = 'not ok';
    drupal_write_record(          'table', &$object );
    return $ret;

so i expect to have row in table 'table' with value "not ok" in it after i run the patch right?
3. i have another update 6002 where i rename table or do something of that sort
I also update hook_schema so there is no table 'table' in it any more
4. some user gets module and makes fresh install - all ok as only hook_schema is called which has the new table name and my drupal_write_record is in the install hook.
5. But. Now lets assume i have module with my oryginal schema installed, i update code so now hook_schema (the code was updated) already has new table name but database does not. If i run update 6001 now, i refer to incorrect table name and insert does not work.

Reason of the problem is that im refering to hook_schema from hook_update_XXXX (indirectly) even that i would never expect it to happen .

I got similar problem with drupal_write_record when column name was not in schema ... there was no error as column did not appear in the hook_schema definition it was just silently ignored!

Hope it helps someone.

Art
--------
http://artur.ejsmont.org