Once a DB table is created though db_create_table() I would expect that the previously cached DB schema (see drupal_get_schema) would be invalidated.

Instead every time db_create_table() is executed after that the DB schema is already cached the newly create table schema won't be inserted in the cached schema resulting in the fact that subsequents calls to drupal_write_record will fail.

Currently if one needs to insert something into the table after a db_create_table() has to call drupal_get_schema('table', TRUE) before doing any insertion. I think this is pretty illogical as IMHO one would just expect that the newly created table to be in the DB schema.

This easily leads to bugs for example in .install files and update functions.
For example: http://drupal.org/node/716506#comment-2612184

I would suggest adding a call to call drupal_get_schema('table', TRUE) in db_create_table() so that the cached schema will be refreshed..
Another approach would be making drupal_get_schema intelligent enough to understand that it's cached schema is out of date but this probably would have some problems with performance..

Hope this helps,
I'm interested in hearing your comments.

Fabio Varesano

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

sun’s picture

Category: feature » task
Crell’s picture

I am not sure that adding the line to db_create_table() itself is a good idea. For one thing, it's just a procedural wrapper around the OO DB layer, and direct function calls within the DB layer classes is in most cases a bug. For another, tables are generally created en masse, like on install. Clearing and rebuilding the cache after every single alteration to the schema is needless thrashing. Plus, the schema cache is based on hook_schema() anyway, not the actual tables themselves. :-)

Rather, I think we should be making sure that after any operation likely to affect the schema we clear the cache. Eg, after each update hook, or after a series of update hooks gets called. In fact, don't we do that already since the cache is cleared after an update to begin with?

sun’s picture

Similar thoughts crossed my mind when reading the other, original issue. My thought was that a module update, which adds a table, will have the new table already defined in hook_schema().

I quickly tried to look up whether we clear the schema cache before executing updates, and probably we're not - I think that's more likely to be the cause here.

Crell’s picture

fax8: I don't quite follow the use case where the current setup would be causing a problem. Can you give an example? I couldn't quite follow how the Mollom issue you linked is related.

fax8’s picture

Basically the linked mollom bug was affected by this "inconsistence" because they had a series of serialized drupal variables (mollom_form_*) which they were trying to move to a dedicated table during an update function.

The logical approach they used was to create a new table with db_create_table() then populate the new table with subsequent calls to drupal_write_record() .. unfortunately the db schema isn't modified after a db_create_table() so that this check

$schema = drupal_get_schema($table);
  if (empty($schema)) {
    return FALSE;
  }

failed resulting in loosing the data to be inserted.

So, my point is that each time that a table is created/modified/deleted I would expect that the changes to be available in subsequent calls to drupal_get_schema() .. but, as the schema get cached, this doesn't happen.

Dave Reid’s picture

More clearly you can't do this in an upgrade currently:

function mymodule_upgrade_7000() {
  db_create_table('my_new_table', array(
    ... // table definition
  ));

  drupal_write_record('my_new_table', array(...));
}

I think we have a similar issue with #620298: Schema not available in hook_install() but this is for updates, not installs.

Crell’s picture

Right, that's a long-standing issue with Schema API. You cannot rely on the schema being valid during an update hook because it's "in flux" at that point, and the caches won't be cleared until after the update process is done. I am not really sure how to solve that.

sun’s picture

Title: db_create_table() should invalidate/refresh cached DB schema » Updated DB schema not available in module update functions
Category: task » bug
Status: Active » Needs review
FileSize
790 bytes

Somehow, I thought it would be as simply as this.

Thoughts?

Crell’s picture

Can you include a unit test to verify that it works? If it really is that simple that would make a lot of people very happy... :-)

Dave Reid’s picture

I'm not sure how #8 fixes the situation at hand.

sun’s picture

Let me try to explain why it should fix this bug - using your example snippet:

function mymodule_schema() {
  // ...other, older stuff...

  $schema['my_new_table'] = array(
    // ...
  );
  return $schema;
}

function mymodule_upgrade_7000() {
  // Create our new table, *as defined in mymodule_schema()*.
  db_create_table('my_new_table', array(
    // ...
  ));

  // At this point, HEAD currently does not know of 'my_new_table', since
  // update.php does not flush the cached schema prior to executing updates.
  // Therefore, drupal_write_record() fails, because it is based on the stale,
  // cached schema.
  // My simple patch makes update.php flush the schema cache (once) - prior
  // to executing all module updates.
  // Therefore, we change the /actual/ database schema above with
  // db_create_table(), and are able to use drupal_write_record() here, since
  // the cached schema has been flushed, and the NEW schema information from
  // hook_schema() is available by default.
  drupal_write_record('my_new_table', array(...));
}

I'm not sure how to properly test this though.

nonsie’s picture

subscribing

Dave Reid’s picture

Yeah unfortunately any call to drupal_write_record would then cache the schema, and any other update in another module that tries to do the same thing is screwed.

sun’s picture

@Dave: Sorry, I'm unable to follow you. Could you elaborate a bit more, please?

drupal_write_record() does not cache the schema, it tries to use it. It currently cannot use it, because the schema is outdated.

This patch is essentially doing exactly the same as the fix that we applied to Mollom's module update. Just "once for all" instead.

fax8’s picture

@Dave not if all the table schema update/create functions care to invalidate the cached schema or modify with the new info the existing cached one.

Crell’s picture

fax8: The schema update routines affect the DATABASE. The schema definitions that drupal_get_schema() uses come from the hook_schema() implementations. Drupal does nothing at present to keep those in sync for you. That is an entirely manual process, and will remain so (unfortunately) for Drupal 7. The database-modification functions have no reason to be messing with the manual declarations one way or another.

Better solution: Just don't use drupal_write_record() in an update hook. Honestly I don't really understand what purpose it serves anymore with the insert, update, and merge query builders.

David_Rothstein’s picture

drupal_write_record() is very useful still because it allows you to pass (potentially unknown) objects rather than having to specify every single database column. In theory it's not quite as critical in an update hook - but the fact is that many API functions rely on drupal_write_record(), so if that doesn't work it means you can't use your module's API functions either. That was the issue that made #620298: Schema not available in hook_install() so annoying, and we really should fix this one too.

I think @sun's patch is correct as is, and clearing the cache only that once is all that is needed.... Here is how I understand it:

  • The situation we are imagining here is that one or more modules have updated code, which includes both (a) changes to their hook_schema() definition, and (b) new update functions that make existing sites match that schema definition.
  • The problem, though, is that hook_schema() has never been invoked for any of the new code, so the cached data is out of date.
  • By clearing the cache this once at the beginning, we ensure that the first time drupal_get_schema() is called - e.g., as a result of drupal_write_record() - it will do a module_invoke_all('schema') type thing and cache the correct data for all modules at once.
  • So the first call to drupal_write_record() will indeed populate the cache, but that cache will be correct because all modules being updated at the same time will have been correctly queried.

Anyway, if we can't test this (or even if we can) we should at least try to expand the code comment a bit to be more specific about how this works; it's a very tricky and confusing situation.

Crell’s picture

If a cache flush right before we start the update process works, I'm good with it. Although if it were that easy, why didn't we ever do it before? :-) It may be worth asking bjaspan for his input here, since he wrote the schema system originally.

bjaspan’s picture

Sadly, I do not think the simple patch will work.

hook_schema() reflects what the db will look like after all updates have been run. The current problem is that the cached schema structure is not updated until after updates are run, so that means that in the window between a call to db_create_table() and when all updates are done, the schema structure is wrong. Changing Drupal so the schema structure is updated before updates are run just changes the window when the schema structure is wrong to time between before the updates are run to when all the necessary db_create_table() statements are executed. Either way, there is a still a window when the schema structure does not accurately reflect the database. Also consider that it isn't just db_create_table() that can trigger this situation, but any db alteration.

Here's a scenario showing the problem with updating the schema before updates:

1. Module Foo 1.0 and Bar 1.0 are installed
2. Module Bar 1.1 is installed which requires module Foo 1.1, so both are installed at once.
3. The update to Foo 1.1 changes Foo's tables.
4. Prior to updates being run, the schema structure is wiped. Now the schema structure reflects Foo's 1.1 table structure, but the db still has the Foo 1.0 table structure.
5. Bar's updates run first. bar_update_1() calls drupal_write_record('foo', ...).
6. Boom.

Here's another scenario:

1. Module Foo 1.0 is installed.
2. Module Foo 1.1, 1.2, and 1.3 are released with db updates, but a site owner chooses not to install them.
3. The site owner installs Foo 1.4.
4. Before updates are run, the schema structure is wiped. Now the schema structure reflects Foo's 1.4 table structure, but the db still has the Foo 1.0 table structure.
5. foo_update_1_1() calls db alteration functions to bring the db up to the 1.1 schema, then calls drupal_write_record('foo', ...).
6. Boom.

The problem here is that the Schema API is not keeping the cached data structure in sync with the db tables, but db_write_record() assumes that it does.

The only completely correct solution I can think of at the moment is for the Schema API functions update the cached schema structure for the change they are making, so it always reflects the actual state of the database in real time. Yes, this means that for the duration of update functions, the *correct* schema structure is *not* the sum of what is returned by hook_schema(), because hook_schema() assumes all updates have been run. Once all updates are run, the schema cache can be wiped and replaced by the results of hook_schema(), though unless there are bugs there will be no difference between that and the dynamically updated schema structure.

To put it another way, this is all just a slightly more complex case of an API not maintaining consistency due to caching. Callers should never have to know that an API performs internal caching. See http://jaspan.com/dx-files-static-caching-drupal.

sun’s picture

Thanks for reviewing, Barry!

However, you will get the exact same or similar errors if you remove the "flush-schema-cache-prior-running-updates" steps from both of your examples, because the examples are focusing on an entirely different topic: update function dependencies.

The only difference this patch will lead to is that when module B depends on A and the updates ran in proper order, then both module B and A will be able to use API functions that may use drupal_write_record() to insert/update data.

In D7, both scenarios could actually be resolved with hook_update_dependencies(), and perhaps the use-cases for this hook are much broader in contrib than we thought.

Technically, I'd agree that a "perfect" solution would update the updated parts of the schema immediately after performing the update. However, I'm not sure whether that is doable within the current system.

Thoughts?

David_Rothstein’s picture

Right, Barry's first example sounds like a case of module Bar not properly implementing hook_update_dependencies() when it should have.

But the second example, I don't think is? That one looks like it might be a problem...

sun’s picture

1. Module Foo 1.0 is installed.
2. Module Foo 1.1, 1.2, and 1.3 are released, all with DB updates, but not installed.
3. Module Foo 1.4 is installed.
4. Updates are run, without flushing schema cache before.
5. foo_update_1_1() alters the database, invokes drupal_write_record().
6. Boom.

No difference. Actually, that's like it is now.

However, I can absolutely see the additional problem of changing schemas across multiple updates + API and helper functions failing due to this. But then again, that's a different issue:

1. Foo 1.0 installed.
2. Foo 1.1 - 1.4 released, schema changes everywhere, but only Foo 1.4 is installed.
3. Prior running updates, flush all caches.
4. Boom.

But yeah, I see now that it's totally error prone to use this function during updates. So, can we prevent this from happening please?

bjaspan’s picture

re #22, yes my exact point is that regenerating the schema cache before *or* after all updates are run has the same problem. That's why the patch in #8 won't work.

The patch in #22 correctly reflects the fact that it isn't save to use drupal_write_record() during updates, so it seems like an improvement over the current situation.

Well... it's an improvement in terms of total correctness, but then it also prevents the function from being used in situations where a module authors knows it will work fine, such as an update function for a module that has NEVER changed its tables and just wants to create some new rows. However, the limitations on when doing that is truly safe are very subtle (e.g. if the module has ever updated a table in a prior update, it isn't safe), so it is probably best to outlaw it entirely.

sun’s picture

Exactly... the module author may know that now, but if all of a sudden the schema of the affected tables is changed or needs to be changed, the module author would have to rewrite all of his updates. Hence, better don't allow it upfront.

Just double-checked core .install files and fortunately, we're not using drupal_write_record() anywhere. ;)

Crell’s picture

Status: Needs review » Needs work

There are contrib modules for D7 that are starting to use drupal_write_record() to save entities. Actually I think even core does that. #22 would effectively prevent $entity_save() from ever being possible inside an update hook, which is a non-starter to me.

bjaspan’s picture

Options that I am aware of:

1. Leave things the way they are, remaining vulnerable to subtle bugs under unpredictable circumstances.
2. Fix the Schema API functions so that drupal_get_schema() always tells the truth; the "right" solution.
3. Forbid any operation during update operations that might trigger the subtle and unpredictable bugs.

I point out that #2 is a not an API change, it is a bug fix that makes the APIs work as advertised.

sun’s picture

@Crell: Are you sure? :)

http://api.drupal.org/api/function/file_save/7
http://api.drupal.org/api/function/locale_date_format_save/7
http://api.drupal.org/api/function/path_save/7

errrrrr, technically, one could say that drupal_write_record() is dead in core already ;)

Crell’s picture

@bjaspan: Yes. #2 is also by far the most difficult. :-)

@sun: I don't follow. How is it dead if it's being used in all of those functions?

sun’s picture

If we make those 3 functions use db_merge() like everywhere else, then drupal_write_record() is not used throughout core. Hence, those changes along with the patch in #22 could be committed.

Crell’s picture

Hm. Using drupal_write_record gets a +1 in my book in general, so I'm OK with that. :-) We'd need to make sure it was well-documented that certain functions are blacklisted in update hooks, and make sure we know what all of them are.

bjaspan’s picture

How difficult would it be for Schema API functions to update the cached schema structure? For create or alter table or field or index, they have the new structure to add/replace in to the schema, and for drop they know which tables or fields or indexes to remove from the schema.

David_Rothstein’s picture

If we make those 3 functions use db_merge() like everywhere else, then drupal_write_record() is not used throughout core.

Hm, what am I missing here? When I go to http://api.drupal.org/api/function/drupal_write_record/7 it says "26 functions call drupal_write_record()"....

and a number of those definitely get used in update hooks.

How difficult would it be for Schema API functions to update the cached schema structure? For create or alter table or field or index, they have the new structure to add/replace in to the schema, and for drop they know which tables or fields or indexes to remove from the schema.

This sounds like it would be tricky because you'd have to make sure that cache never got deleted while there are still outstanding updates to run, right? Because if it did get deleted, you'd never really be able to get it back to where it was supposed to be. For example, if an update function was aborted, you might not come back to run the remaining updates until later, and you'd need the cached data to still be there or otherwise the second scenario from #19 would once again occur.

Here's a thought: What parts of http://drupal.org/project/schema are we willing to put into core in order to solve this problem? Actually, as far as I can tell, some of the schema introspection stuff looks like it's already in the new database API, isn't it?

Because then, you could primarily just do something like this:

function drupal_get_schema() {
  if (defined('MAINTENANCE_MODE') && MAINTENANCE_MODE == 'update') {
    // Do not trust the results of hook_schema() when running updates; instead,
    // get the schema information from the database directly.
    ....
  }
...
}
Dave Reid’s picture

I would much rather go for a solution like 32 and always invalidate caching the schema while in update mode.

David_Rothstein’s picture

Here's another fun gotcha that can occur when trying to use an API function inside an update, even if that API function doesn't use drupal_write_record: #734710: API functions that invoke hooks don't work as expected when a disabled module is being uninstalled

Maybe we should just tell people to stop using API functions altogether :)

Crell’s picture

Ideally, we should make the schema objects entirely able to self-analyze their own tables and be done with it, and then allow an over-ride via the schema hooks that's Drupal specific. The problem is that is a major change to the Schema system, and while I want to do it I'm fairly certain we can't get away with it in alpha2 state. Also, last I checked schema module was MySQL-specific. Remember that we have 3 databases we need to make work.

(This is one of the reasons I've always hated schema-dependent code.)

bjaspan’s picture

re #32: David makes a great point for why having the Schema API functions dynamically update the cached schema is harder than I realized.

re #35: schema.module's inspection code supports mysql and postgres, or at least it did when I last touched it. I've given the module to Mike Ryan.

derhasi’s picture

It seems drupal_get_schema() is not using drupal_static at the moment. By doing so, we could alter the static cached $schema:

  $schema = drupal_static('drupal_get_schema');
  $schema['my_table']['fields']['my_field'] = array(...);

I guess building a wrapper function for this method would be a good idea. So someone could use

  db_add_field('my_table', 'my_field', array(...));
  schema_alter_instantly('field', 'my_table', 'my_field', array(...));

What do you think of this approach?

sun’s picture

Version: 7.x-dev » 8.x-dev
Component: database system » database update system
Status: Needs work » Closed (won't fix)
Issue tags: +Upgrade path

In #922996: Upgrade path is broken since update.php does not enforce empty module list, we want to eliminate the entire access to module APIs and information provided by hooks during the update.php process, since that is the only right thing to do.

As mentioned earlier in this issue, I still think it would be a good idea to get rid of drupal_write_record() altogether. It is a pretty lame data-schema binding/conversion attempt, which doesn't take the actual entity properties into account and performs a "dumb" transformation of object properties into values that might be suitable for the storage schema columns.

So let me go out on a limb and call this issue won't fix, and provide a new issue for the proposed direction:
#1774104: Replace all trivial drupal_write_record() calls with db_merge()