Coming from #934050: Change format into string:

Relatively complex problem space; the below list of problems are depending on each other, so it does not make sense to split them into separate issues. The resolution needs to take all of them into account.

Problem

  • A field schema is set in stone after initial creation and cannot be updated.

Goal

  • Field type modules have to be able to programmatically update their field schema.

Details

  • As with any other module, the field schema of field type modules will change over time. Currently, Field API tries to prevent any schema updates on the assumption that the schema update has been triggered via Field UI. While limiting the possibilities of the user interface makes sense to some extent, there must be a way for field type modules to perform intended, safe, and wanted field schema updates, just like any other module has to be able to update its schema. Field type modules are no exception to that rule.

    AFAICS, Field UI already implements the necessary logic to prevent users from updating a field schema through the UI. The current patch does not change the runtime API of field_update_field(), so it still disallows field schema changes. Instead, an update helper function is introduced, which basically is the same as field_update_field(), but works within the update.php environment and does not disallow schema changes.

EDIT: Splitted the other issues with field_update_field() into #937554: field_update_field() relies on bogus {field_config}.data values

Files: 
CommentFileSizeAuthor
#26 937442-26_maintain_field_schema-937554.patch13.59 KBalex_b
FAILED: [[SimpleTest]]: [MySQL] 27,343 pass(es), 0 fail(s), and 1 exception(es).
[ View ]
#26 937442-26_maintain_field_schema.patch11.3 KBalex_b
FAILED: [[SimpleTest]]: [MySQL] 26,138 pass(es), 1 fail(s), and 1 exception(es).
[ View ]
#2 937442-2_maintain_field_schema.patch11.4 KBalex_b
PASSED: [[SimpleTest]]: [MySQL] 26,165 pass(es).
[ View ]
drupal.field-update.0.patch14.46 KBsun
PASSED: [[SimpleTest]]: [MySQL] 26,154 pass(es).
[ View ]

Comments

Tagging

StatusFileSize
new11.4 KB
PASSED: [[SimpleTest]]: [MySQL] 26,165 pass(es).
[ View ]

Version:7.x-dev» 8.x-dev
Category:bug» feature
Priority:Critical» Normal
Status:Needs review» Needs work

There is a whole lot of problems when you try to change the schema of a field and it's not something you can solve in a generic way. While adding / dropping indexes is possible (it does not touch data), changing and / or columns is not something that is possible most of the time. The lack of bulk field update API bites us here but that can't be helped. The two are inseparable. For example, if you want to change cardinality from 1 to 2 or from 2 to 3, will your widget and formatter be able to handle that? When you add a new column, but then how are you going to ensure that it contains the correct value? For PostgreSQL we might need to support USING clauses of ALTER TABLE.

You will throw another tantrum here that storage engines are bad, unusable, etc -- sorry, I can't help that either. The field storage engine concept is extremely strong -- it's not just that I am a rabid MongoDB fan -- when we designed the storage engine layer, MongoDB did not even exist. The motivation was an unsure hope in what will be called NoSQL databases (note that the field storage engine layer predates the wildspread use of this word) because SQL just plain sucks at storing this kind of data and the definite need for remote fields.

There is only so much that can be done in one release. We got extremely far this release but -- some work is left for contrib. Already entity module helps with singular updates, and batching them won't be hard, once that's done writing a module that copies the old field values into a new field and then drops the old will be possible.

If there will be a need to actually do a schema change we can write an update helper that changes the SQL tables directly, that's what the current update does. That does not mean we need to change the field_sql_storage.module.

Version:8.x-dev» 7.x-dev
Category:feature» bug
Priority:Normal» Critical

@chx, If you do not want to maintain your stored data, then that is your problem.

Version:7.x-dev» 8.x-dev
Category:bug» feature
Priority:Critical» Normal

http://drupal.org/node/361683#comment-1234964

Everyone agrees we do *not* want field migration on the main data-storage

Users scream about stuff. So it goes, we're pretty used to it. People who don't understand the complexities of schema changes aren't qualified to have their screaming heard.

You had almost two years to write this code. It did not happen. Drupal 8.

Priority:Normal» Major

Not supporting schema changes was one of the initial design decisions of the Field API. Not just to make the lives of everyone difficult, but because it is a *very complex* matter.

The initial Field API patch has been first marked as RTBC more then one year and a half ago. Field update is not something we can or should handle so late in the Drupal 7 cycle.

This problem can be investigated and figured out in contrib. Two possible directions:

  • Supporting updates of field storage outside of the current Field API (and maybe only for the SQL Storage)
  • Migrating data from one version of the field to the next via import/export

#3 / #5 / #6:

Does this mean that it would be acceptable for the Drupal 7 cycle that a schema update helper like _update_7000_field_update_field() in the patch at hand does not invoke 'hook_field_update_forbid'? (It does right now.)

Or in a more general way: what is the workaround for the limitation that field schema's can't be changed? The intention of this thread is not to find a general solution for schema changes, but to allow module maintainers to do a controlled schema change on the hook_update_N() level.

Version:8.x-dev» 7.x-dev
Category:feature» bug
Priority:Major» Critical

What Alex said. (Field type) modules need a way to update fields during module updates. In any way. Not doing so violates Drupal's #1 principle: Stored user data is retained and updated.

Here's what I don't understand.

I used to maintain a module called OG Block Visibility. In it, there's a table that effectively foreign keys off to block.delta in core. Let's say at some point we decide it's a good idea to make block.delta a varchar(128) field. Here's how we would do that:

1. In core, write a block_update_XXXX() that does a one-line db_change_field() call.
2. File a bug with OG Block Visibility saying "Hey, core changed its schema to varchar(128). You should too, or your users are going to get their data truncated."
3. Write a og_block_visibility_update_XXXX() with a one-line db_change_field() call.
4. There is no step 4.

What we don't do is make it Drupal core's responsibility to be both omniscient and psychic, and derive meaning from database schemas to infer that such-and-such a column is actually a block.delta, and make that change accordingly on behalf of the module author. Which is good, since core is not both omniscient and psychic, and so will probably get it wrong.

I fail to see how this situation is any different.

IMO, core should be handling its own format columns, which are filter.format, filter_format.format, users.signature_format, and any fields that are type text_with_summary, text_long, whatever (which can be determined from rows in the field_config table). That's where core's responsibility begins and ends: core's modules' own data.

Will this method work with PBS module or Fancy Format module? Probably not! Because those are contributed modules, they need to come up with its own mechanism to handle core's schema change. Just as OG Block Visibility does.

Now, a generic API for performing field updates sounds like a wonderful feature. For Drupal 8. In the meantime, KarenS and joachim are both trying to solve this problem in contrib, and have been for the past year or more. We can't hold 7.0's release up on this.

What am I missing?

webchick posted in the wrong issue based on an IRC discussion. Please disregard. We will get back later.

I think that, for now, if modules want to update the fields they created, they should do it their very own way. At least, they have the complete knowledge of what's in them, therefore on how to make them evolve.

It would be a pity to block the release of Drupal 7 until we manage to find a generic way do it.

I do not feel like it's a violation of the "saving data principle" since none is lost. And it would be the choice of the module coders to change something in their code that would require changes in fields. Not Drupal's fault :c)

Note: I will skip my turn in the "change-the-priority-everytime-a-comment-is-posted" game ;c)

Version:7.x-dev» 8.x-dev
Category:bug» feature
Priority:Critical» Normal

Enough of this. It wastes everyone's time and we do not go ahead. It's not feasible to fix this in Drupal 7.

In Drupal 7, neither field type modules nor fields can change their schemas. The root cause of this is not storage (which might or might be fixable) but widgets and formatters that expect certain data structures and might or might not break when they get something different. This is simply not fixable within this release cycle. Only move back if you have something to say about this. The demand to change anyways is just that: a demand that is not supported. Not everything can be done in one release.

What webchick wanted to post into this issue is "Core handling generic field schema changes sounds like a 18-month long problem to solve. I don't see us solving it by 7.0. KarenS and joachim have been working on it for months and afaik neither has achieved success." but she needed to leave and posted in hurry to the wrong issue. Edit: oh I see she actually posted that too but mixed it up with the two approaches that were discussed for the filter table update.

Version:8.x-dev» 7.x-dev
Category:feature» bug
Priority:Normal» Critical
Status:Needs work» Needs review

I have troubles to follow the logic.

1) No one suggested to do a "perfect" solution. Nothing is perfect. And nothing will ever be perfect.

2) While the problem space may not be new and may have been discussed during initial Field API discussions ~2 years ago, no one despite the developers being involved was actually aware of that. Don't tell me I wouldn't have worked on core. That's plain ridiculous.

3) I perfectly understand that more complex and more troublesome updates are a possible scenario, too. However, the use-case at hand does not involve any configurable field logic. Text module *knows* that 'format' is an integer. If that conversion from integer to string would be a problem, then all the other module updates throughout core would fail equally. This simple use-case is 100% solvable with the tools at hand (resp. in the patch). No one ever suggested a solution for doing more complex field updates.

4) By performing manual queries within the Text module update, any other field storage engines will not notice that a field update happened at all. We can surely do that, but it's slightly beyond me that exactly those people that are actively using other storage engines are objecting a patch that makes other storage engines aware of a field update (attempt).

To summarize:

5) All that is being suggested is a solution for *simple* field schema updates. I'm perfectly aware of the fact that there can be more complex scenarios. Scope creep.

6) The actual update at hand is safe, because it does not involve configurable field schema settings. Text fields are not going to be the only fields that will have to do such simple updates.

7) The proposed changes allow other field storage engines than SQL to react to the field update, if they need to, or simply skip the update.

8) We can easily add some comments to explain that the functionality is unsuitable and must not be used for more complex field updates. That's an entirely different can of worms anyway.

Conclusion:

What is the point of disallowing easy cases, just because there are more troublesome cases? The proposed changes are merely a small step forward. And nothing else.

Category:bug» feature
Priority:Critical» Major

Drupal 7 or Drupal 8, this is a feature request. It doesn't prevent anything from working, so it cannot possibly be critical.

We have no declarative API for database schema updates. We don't have it because it's a difficult problem where the start and end points are not sufficient because the path taken can influence the end result.

This patch introduce a half-baked API where:

- you can add / remove / change columns
- you can add / remove / change indexes
- you cannot rename columns
- you cannot migrate data from one column to the other

We need to be aware of the limitations but, all things considered, this is not that bad.

chx was worried about the data (ie. schema) / formatter / widget dependencies. My guess is that this is something the modules that provide field types, formatters or widgets will have to deal with. I can expect that most schema modifications required by the fields in contrib will be backward compatible (for example: adding a column).

Yes, that's exactly what I mean. Also, slight correction: HEAD already supports indexes, but only indexes. This patch adds columns, because we happen to need it.

subscribe

+1 to sun.
subscribe.

Without commenting on the particular implementation at the moment, if you have a field that, after the initial release, you decide needs to have one property out of a serialized options column split out to its own column, what do you do?

This exact scenario happened multiple times to CCK field modules in Drupal 6. If there's no way to do that in Drupal 7, then that's a clear regression. I agree it's a complicated problem, but it's still a regression if field modules have no upgrade path available to them.

The address field for d6 also had a couple of schema changes to be able to more accurately record addresses.

If you have a field that, after the initial release, you decide needs to have one property out of a serialized options column split out to its own column, what do you do?

@Crell, in that case, you create the new column first, and the migrate the data over in a batched update.

#20 - so how does that work in concrete terms? That basically means you can't keep your field type name when you change your schema, correct? So field type 'date' would need to turn into 'date2'? Am I missing something here?

@alex_b:

Crell's example is a bit different. He intends to extract some serialized info from an existing column into a new one.

In that case, Damien's solution works perfectly: create the new column, and then run an update on the two columns.

The restriction is about changing the type of an existing column (from text to number, or from date to text, for instance) or the cardinalities.

This involves many possibilities and complex behaviors, especially since it has to run on any storage type and take care of widgets/formatters, that won't be solved in Drupal 7's Field API.

@alex_b: if we decide to go ahead with this patch, you can keep your field type name, but you can only do "simple" schema modifications. Every thing else (like migrating data from one column to the other) will need to be done by an expensive batched migration:

<?php
$data
= $query
 
->fieldCondition($field)
  ->
execute();
foreach (
$data as $entity_type => $entities) {
 
$entities = entity_load($entity_type, array_keys($entities));
  foreach (
$entities as $entity_id => $entity) {
   
// Update the field.
   
[...]
   
field_attach_update($entity_type, $entity);
  }
}
?>

As a (probably better) alternative, the field can also "overlay" the previous data structure: on load, convert the old structure to the new one and leverage the natural lifecycle of the entities to progressively migrate the data.

wow, subs for now.

So after talking to some people (among them chx and sun) and thinking more on this, I really don't know what's the way forward here. IMO, swappable storage engines for fields with significantly limited possibilities for managing schema changes is broken. That said, I don't know the solution.

(I'd like to emphasize that this issue is about modules managing changes to their field schemas - schema changes induced by site building tasks is a different topic.)

Our options for schema maintenance seem to be these:

a) No changes allowed: Allow schema changes to field instances without data. Once there is data in field instances, assume that only index changes are necessary (this is the status quo).
b) Migration: Create a new field of a new type, migrate data from field instances of the old type to field instances of the new type. This seems to be the somewhat agreed upon solution for upcoming Drupal major releases. I have serious doubts around how practical this is (create a new field type every time the field's schema changes?). At the very least it will require a much more robust batch API and hence is not fit for a recommended course of action for Drupal 7 module maintainers.
c) Comparison based delegation: A change API based on hook_field_storage_update_field($field, $prior_field) that detects changes between the field specs passed in and changes the current schema. This is the path we started down in D7 and this patch expands on it. Many have commented that this is not a realistic approach for all kinds of problems with reliably detecting changes between schema specs and adjusting them. This is also the reason why we have hook_update_N() in Drupal. That said, Damien hinted in #14 that within limitations, this could be a workable solution. I'd like to learn more about this because frankly, while knowing that this approach isn't a general solution (trust me, I've been burned) - I have a hard time fathoming the actual limitations of this approach.
d) Granular Change API: Come up with an abstraction layer where every storage engine would have to implement the equivalents of db_change_field(), db_remove_index(), db_add_index() etc. This is really just and idea at this point and I don't know how feasible this is.
e) Schema version based delegation: A hook_field_update_[storage_engine]_N() style approach where core provides update hooks for the sql storage engine and storage engine maintainers need to provide equivalents. hook_update_N() may be even sufficient for this approach. This way our community winds up writing [storage engines] x [field modules] x [update hooks] update hooks. Not ideal. But maybe a compromise that carries us through 7?

Note that it is entirely possible to just change a field's schema with hook_update_N() not heeding the conceptual limitations of the storage layer abstraction. Nothing stops contrib modules from changing their schemas the way they used to in CCK times. I have a strong hunch that this is what we will see in D7 contrib.

For Drupal 7 I think at the very least we need clear documentation on this limitation. Module maintainers need to understand what the implications of changing a field's schema are.

I know this issue is branching off a pretty heated debate around making the filter format field a string #934050: Change format into string - I really don't care about filter formats here. This is larger than making filter formats exportable. By all the debate that managing schema changes had during the past couple of years, I think it's worth giving this problem another very hard thought before the Drupal 7 beta cycle comes to an end.

StatusFileSize
new11.3 KB
FAILED: [[SimpleTest]]: [MySQL] 26,138 pass(es), 1 fail(s), and 1 exception(es).
[ View ]
new13.59 KB
FAILED: [[SimpleTest]]: [MySQL] 27,343 pass(es), 0 fail(s), and 1 exception(es).
[ View ]

Improved on #2. I summarize:

- This is a #25 c) style approach.
- Introduces a _update_7000_field_update_field($prior_field, $field) update helper that invokes hook_field_storage_update_field()
- Expands field_sql_storage_field_storage_update_field() to allow for adding, changing and dropping fields using db_add_field(), db_drop_field() and db_change_field().

Changes to #2:

- Keep field_sql_storage_field_update_forbid() in place.
- Don't invoke hook_field_update_forbid() from _update_7000_field_update_field($prior_field, $field).
- Adjust tests accordingly.

Patch requires fixes from #937554: field_update_field() relies on bogus {field_config}.data values. I am attaching one patch with and without #937554 applied.

I am submitting this for review, I am not convinced that this is the recommendable course of action for Drupal 7.

Status:Needs review» Needs work

The last submitted patch, 937442-26_maintain_field_schema.patch, failed testing.

- Introduces a _update_7000_field_update_field($prior_field, $field) update helper that invokes hook_field_storage_update_field()

i stopped reading here. never call hooks in updates. I thought I was clear on IRC and hopefully in issues like this: we never upgrade field schemas. Write a bulk update API in contrib, due to the lack of entity CRUD you can't really in core, copy to a new field (quite possibly massaging the data as needed), drop the old field.

As for the filter change: gather the core SQL tables that have filter columns and update their schema and disregard everything else. End of both stories.

b) Migration: Create a new field of a new type, migrate data from field instances of the old type to field instances of the new type. This seems to be the somewhat agreed upon solution for upcoming Drupal major releases. I have serious doubts around how practical this is (create a new field type every time the field's schema changes?). At the very least it will require a much more robust batch API and hence is not fit for a recommended course of action for Drupal 7 module maintainers.

Ok, so for Drupal 7, the only possible course of action is this. Your two concerns here are partially bogus. First, field's schema should not change that often, so it doesn't really matter if the operation is costly or not. Second, the whole update process relies on the Batch API *anyway*.

We need to make sure that it is and remains possible to migrate data from one field to the other, meaning:

  • Create a new field field_name_new
  • Migrate data in batch from field_name to field_name_new
  • Delete field_name
  • Rename field_name_new to field_name

As far as I know, the current API provides everything except a way to rename a field.

chx: To make sure I understand #28, you're saying that for the format int/string conversion we should ignore abstraction and rely on the fact that we know that will be a local DB table in practice, and for any existing D7 sites that have that field in a non-SQL storage make it their problem?

I am not objecting to that approach, I just want to make sure I'm understanding it correctly.

Component:field system» documentation

#28

never call hooks in updates

Right, this is just proof of concept for #25 c). Look at the code that winds up getting invoked (field_sql_storage_field_storage_update_field()) this pattern could be done in an update safe way. Doesn't change anything about the larger problems of this approach.

As for the filter change: gather the core SQL tables that have filter columns and update their schema and disregard everything else.

We're on the same page here.

#29

the whole update process relies on the Batch API *anyway*

Sure, there's just a speed difference between db level operations and entity API level load/save operations.

#28 / #29

So looks like there's no way around field migration. Where's the best place to document this?

Component:documentation» field system

"As far as I know, the current API provides everything except a way to rename a field."

Would the mongo backend be able to rename a nested property in all its node records ?

Component:field system» documentation

did not mean to change component

Status:Needs work» Needs review

ouch, I've been bitten by this as-well, when wanting to drop some columns form OG tables.

Component:documentation» field system
Issue tags:+Needs Documentation

Subscribing, this is going to be a pain for multicomponent fields like the name field.

I need three new columns added to achive all of the new features requested, so if now 1 by 1, I'll end up with 4 field types and the need to write update scripts to convert one field type to the next field type for every step. Then to clean up the mess latter, the old field types would need to be removed, risking users being caught up in the inactive field issues.... This is way to messy and time consuming, so I will have to stop features that require schema changes until a work around is found.

During the course of Drupal 7 development, I became aware of the policy not to do any changes that may lose user data, but I would not have imagined that adding a new column was going to be an issue. I had to read through the code last night to see that the core system only does indexes.

While this is very late in the development cycle, I would love to see the DB maintainers to have another look at this. I have to agree with sun about most developers would not have known about the way core has implemented ccks' handling of the schema changes.

Cheers
Alan

Pinging some life into this. Two feature requests in a week, each requiring a change (new column) to the schema....

We need to make sure that it is and remains possible to migrate data from one field to the other, meaning:

Create a new field field_name_new
Migrate data in batch from field_name to field_name_new
Delete field_name
Rename field_name_new to field_name

As far as I know, the current API provides everything except a way to rename a field.

So any pointers for contrib maintainers on a safe way to do this?

Create a new field field_tmp
Migrate data in batch from field_name to field_tmp
Delete field_name
Create a new field field_name
Migrate data in batch from field_tmp to field_name
Delete field_tmp

Cheers

Issue tags:-Needs Documentation

Hi

Just want to throw my 10 cents in.

In many cases changes in application business logic imply changes in field schema definition.

In my case i have developed a n-tuple family of modules (see my profile) that heavily relies on cck's ability to adopt field schema api at runtime.

Version:7.x-dev» 8.x-dev

This blocks OG #1277842: Data type of gid column in og_field_schema is float instead of int, we need some way to deal with this in contrib.

Status:Needs review» Needs work

There's no patch here that needs review or applies to the current 8.x HEAD.