Comments

dropcube’s picture

Status: Postponed » Active
Issue tags: +Fields API, +FilterSystemRevamp
dropcube’s picture

Assigned: Unassigned » dropcube

I am working on this.

bjaspan’s picture

Interesting. I haven't looked at the partial patch from the previous issue, but I can see that once again abstract field storage makes this a bit tricky. The "obvious" approach is for text.module to implement the hook and db_update() all the field tables to change the format column. But text.module can't, because it does not know where the field data is stored. Options:

1. Call field_attach_query() on the field, specifying the format column as a query parameter, then field_attach_load() and field_attach_update() for each object id. Inefficient, obviously.

2. Implement a new function, something like field_attach_bulk_update($field, array('format' => new_value)), which invokes hook_field_storage_bulk_update(), which in field_sql_storage.module's case would just do the db_update() we wanted to start with, but for other storage engines could do whatever is appropriate.

yched’s picture

This is a pattern of "how does a reference field react to the change or deletion of a referenced 'thing' ?".
I'm not sure there's a fast enough way to actually update all the field data.

Note that check_markup() runs in text_field_load() and is saved in the field cache.

The idea so far was rather:
3. when a filter is deleted (or updated), use field_attach_query() to find the objects with a text field using this format and clean the corresponding field cache entries.
In text_field_load(), check whether the loaded format exists before running check_markup(), and if not, use the default format instead. The stored data keep the non-existing format, and replacement is done just-in-time.

Option 3 costs 1 indexed SELECT query per formatted text field + 1 indexed DELETE query on field cache
Option 2 costs 1 indexed UPDATE query per formatted text field + I'm not sure how we retrieve the cache_ids to clean : either same as option 3, or clean all field cache entries.

Input formats being changed or deleted happens rarely enough that we could clean the whole field cache anyway. Then option 3 just costs one indexed DELETE query on field cache, and we're sure it can scale whatever the number of text fields. Might not be true for all 'reference' fields, though.

catch’s picture

Just found this from #597236: Add entity caching to core. I also think we can clear the whole field cache - it's rare, and it's also unusual to have more than 4-5 input formats on most sites - which is likely to mean that we're clearing a large number of entries anyway most of the time - the exception would be deleting or modifying an unused format, but that's not enough of a use case to complicate things.

#3 sounds like the best of the three options.

sun’s picture

Issue tags: +D7 API clean-up

Tagging.

Note that #353458: hook_file_references() was not designed for a highly flexible field storage basically needs to do the same. Instead of a text format id, a file id may need to be updated.

catch’s picture

sun’s picture

Status: Closed (duplicate) » Active

Hm, no, these are not duplicates. One of the options in #2 mentioned by bjaspan is still required.

The 3rd option in #3 mentioned by yched doesn't fly. Filtered text can only be filtered, processed, and displayed in the assigned text format. Automatically falling back to another text format must not happen.

yched’s picture

Well, the options in #2 also planned replacing the deleted text format with the default format. The difference was rather when the replacement is done.

What are we supposed to do with existing data when an input format is dropped ?

sun’s picture

http://api.drupal.org/api/function/hook_filter_format_delete/7 gets a "new format" as second argument, which needs to be used for any replacement. Currently, that's always the fallback format in HEAD, but #635212: Fallback format is not configurable is about to change this.

Either way, the hook gets a text format to use as replacement value, so the essentials of this issue don't change. :)

Somewhere else I already figured:

Another way to resolve this would be to use Schema API to find all database columns that have a foreign key referring to {filter_format}.format and update all of them in one shot. However, even that wouldn't work for Field API due to its pluggable storage model.

catch’s picture

Why not prevent input formats being deleted which have content assigned to them? Can provide a hook for pluggable field storage to respond to.

David_Rothstein’s picture

Subscribing.

catch’s picture

Status: Active » Needs work
StatusFileSize
new2.62 KB

Here's a very rough draft for #11, it's unlikely I'll have time to take it any further.

Adds a new hook "filter_format_usage" - modules can return TRUE or FALSE if they're using the format anywhere.

On delete, we prevent deleting a format if any module returns that it's using it and set a message.

On update, we just clear the entire field cache since it's a sufficiently rare operation and trying to do it cleverly would be too much work.

To get this up and running, block, user (signatures) and text module, at least, would need to implement the new hook.

sun’s picture

Assigned: dropcube » Unassigned

Sorry for not following up earlier. Tinkered quite some time about this issue recently. So here we go:

WTF? ;) We introduced a pluggable field storage, which does not allow to mass-update values? How are modules supposed to maintain their fields and values then?

This issue smells much larger to me then. I mean, this is just one of a gazillion of situations, in which modules need to be able to properly update stored data, be it for module update reasons or for administratively initiated changes. Preventing usage of fields containing certain data or preventing actions on them until all entities containing those fields are manually updated certainly won't work out ;)

Hence, the only valid solution is bjaspan's 2)

Implement a new function, something like field_attach_bulk_update($field, array('format' => new_value)), which invokes hook_field_storage_bulk_update(), which in field_sql_storage.module's case would just do the db_update() we wanted to start with, but for other storage engines could do whatever is appropriate.

catch’s picture

If you want to bulk update entities, you should use $entity_save(), that's the only valid way to invoke the various _update(), pre_save() and other hooks. Doing direct updates doesn't cut it any more IMO.

yched’s picture

Sadly, I think sun is right...
Yet
- Coming up with a non-SQL-non-PDO-non-DBTNG meta syntax for this sounds like party.
- I'm not sure how to ensure atomic, non-timeoutable mass updates, or deal with the fact that maybe we can't.

catch’s picture

I don't believe there's a valid use case for deleting a text format which has content assigned to it outside of development/site building. Certainly there's not enough of a use case that core needs to support that. Once again, the only valid APi for updating entities in a storage agnostic way is $entity_save(). I don't think we want to support querying every possible entity which has a text format then bulk updating that in core, whether it's trying to do a mass db_update() via hook_schema() hacks or a potentially endless batch operation loading and saving individual nodes.

fwiw it'd be quite easy to do a bulk update in mongodb, but just triggering that when an admin deletes a text format is completely out of the question and would bring your site down. For a start mongo-based sites are not going to be creating indexes on every single text field format, just in case the format gets deleted one day, because the way entities / fields are stored in mongo (all in one collection, not as separate 'tables') means you need to be very sparing with the number of indexes. So you'd have a massive unindexed update operation happening as a result of a form submission, not good.

Also if core prevents this from happening via the UI, there's very little stopping a contrib "Text format replace" module from being written which could try to implement one of these options if there's really a pressing need for deleting in-use text formats. We don't do magical conversion of nodes if node types are deleted, or changing node types on the fly, nor do we allow moving taxonomy terms between vocabularies in core. Deleting text formats strikes me as a somewhat similar situation except with even more unpredictable implications since they have only a very loose relationship to the actual data they're associated with. So since it's a somewhat rare operation, which has security implications, I'd rather core was unhelpful here rather than too helpful.

sun’s picture

I don't believe there's a valid use case for deleting a text format which has content assigned to it outside of development/site building.

(?) Sure. Drupal's configuration is set in stone always and forever and ever, after initial setup. Care to remove the edit links then, please?

catch’s picture

No, because editing text formats works quite fine without bringing my sites to their knees or requiring potentially millions of values to be updated after a form submission via incredibly complex and hard to maintain code which doesn't exist yet.

David_Rothstein’s picture

I think people updating sites from D6 to D7 might wind up with a valid reason to delete some text formats with data in them, since the upgrade path currently messes around with their text format list a bit. They might have a good reason to delete one and convert its data to another.

I think we ought to really consider @yched's idea in #4 of "just in time" replacement, or something like it. Over at #562694: Text formats should throw an error and refuse to process if a plugin goes missing - and for an entirely different reason - I suggested a while ago that perhaps every time a text format was deleted, filter module should keep a record of what it was replaced with. This would just be a simple associative array in the variables table: e.g., array(1 => 2, 5 => 7) means that text format 1 was deleted and all content was (supposed to have been) converted to text format 2, text format 5 was deleted and all content was (supposed to have been) converted to text format 7, etc. Then if check_markup() ever encounters something with text format 1 or 5, it knows what to filter it with instead.

If we do this then it seems that would completely remove the need to update all field data the instant the form is submitted, right? We want the stored data to be correct of course - so ideally it should still happen - but it could then happen gradually, similar to how the Field API already handles bulk deletion.

yched’s picture

I was thinking of something along the lines of David's suggestion. My initial idea was doing the 'just in time' replacement in text.module's hook_field_load(), but doing this in check_markup() has the advantage of catching direct db access as well (Views...).

Actually, I think both places would be needed, because doing it in hook_field_load() ensures that the 'new' format gets preselected and saved next time the entity is edited.

bjaspan’s picture

Assigned: Unassigned » bjaspan

Okay, this is the part where I pretend I still remember something about Field API and Drupal in general. But it's DCSF code sprint, so I'll give it a shot. :-)

Someone is trying to delete an input format. To some extent this makes the argument that "there is no use case for that" moot. Or rather, if there is no use case, or if it is unimplementable, we should remove the delete feature from the format system, instead of just rejecting it in Field API (having a content-related feature that works "everywhere" except in the major subsystem we use for all content seems non-sensical to me). I'll assume that isn't going to happen.

To actually implement it, we have three proposals:

1. The dumb way: f_a_query() all the affected objects, then for each one, load it, change it, save it. This will work fine until we have many objects ("many" here might be as few as 100s or 1000s) at which point it will take forever, hit the PHP timeout, etc. We could do a batch API thing, but that will just solve "PHP timeout," not "take forever."

2. The bulk way: Add a new f_a_bulk_update() API. This seems "more scalable" because it is a single SQL query for that storage engine, but as catch points out in #17, that is an illusion. Ultimately we can't update an arbitrarily large amount of data in a single time-limited operation regardless of how it is stored.

2.5. No one has mentioned it yet, but we could do a thing like with field deletion... except I don't really think that makes sense in this case. Or rather, to the extent that it does, it is...

3. The "just in time" approach in which we wipe the field cache and update each object as it eventually loaded, keeping a lookaside table of the deleted format and the format to replace it with.

My take is this: #1 is easy but will fail quickly enough that probably isn't worth doing, but OTOH it would at least be something, provided we did it in a way that contrib could override. #2 is fundamentally no better than #1, and involves creating a new Field API function which we are really too late for. #3 actually seems like it can work, and scale. It will also involve a new Format API function to centralize the "replace deleted input formats with these new ones" table, but that will be much simplier than #2.

So, I'll get to work on #3.

sun’s picture

I still don't get it. True, I'm purposively broadening the scope, but in the end, it's still the same question, just asked generically:

We have no field API functionality to mass-update fields in entities? (?)

In the end, if we go with 3), then this means that anything that wants to update any value in any field will need to implement a custom field value storage mapping table that dynamically updates an entity with the supposed to be new values for a field on loading? And implement all of that through custom code?

How does that ensure data integrity?

bjaspan’s picture

It does suck, doesn't it?

I agree that it is a problem. However, it is not a new problem and has nothing to do with Field API's storage model. The bulk-update approach which for SQL reduces to "UPDATE field_data_body SET format=1 WHERE..." looks like "the right solution" but just try it with 50M nodes. So, to answer your question: No, we have no way to mass-update fields in entities, **and we never did**.

I think your larger point is legitimate: If the right solution to this issue is a lookaside table for handling bulk updates "on the fly," then probably we should have a more general system for it. Just fixing it here is a one-off hack. Getting this right will be part of making Drupal scale for Big Data and I suspect will not happen for D7. Unless you have a suggestion. :-)

bjaspan’s picture

Status: Needs work » Needs review
StatusFileSize
new4.3 KB

Here is my first patch, which seems to work. Marking CNR for testing. I will now go write new tests for it.

sun’s picture

Seriously, I can't stress enough that this contradicts Drupal's principle of handling and maintaining user data.

What are field modules in contrib supposed to do to maintain their field data? Add infinitely increasing system variables to store replacement values? And just hope that Views and other code will always load an entire entity to display a field value?

From all known options, a field attach batch update is the only one that sounds remotely acceptable to me. If we need to load entire entities and execute a batch, then be it.

bjaspan’s picture

StatusFileSize
new7.34 KB

For better or worse, here is a new patch with passing tests and a gratuitous typo fix.

I think you have a legitimate concern. I'll walk next door and discuss this with Dries and Angie (wow it's nice to be able to say that :-)).

bjaspan’s picture

Just spoke at length with Mike Booth, drothstein, ksenzee, and Dries. We have a new plan which I am now attempting.

Status: Needs review » Needs work

The last submitted patch, format-delete-556022-27.patch, failed testing.

bjaspan’s picture

Status: Needs work » Needs review
StatusFileSize
new3.48 KB

Here's the current scoop:

sun is right that we need an API for bulk-updating Field API data. However, doing it right will be complicated, and it is too late for D7 core. Just take a look at the docs on field_attach_query() to get an idea of the issues to be addressed. A correct field_attach_bulk_update() needs to be able to store a translation table for the old and new values for the columns being updated which is used when the data in the db is not updated yet, and then it needs a cron hook (like field_attach_delete()) to update field data in batches. A semi-correct solution would be to perform the bulk update in a single operation all within in the current page request. In principle this seems simple, but it gets a bit annoying. We need to bulk update both the current table and revisions table. We need to re-factor field_attach_query() to re-use its query-builder for field_attach_update(). We need to add a new Field Storage API hook to implement bulk update. None of this semi-correct solution is impossible and maybe we can achieve it for D7, but I certainly can't do it today.

Also, we realized this is not exactly a Field API issue. The format system needs a way to deal with invalid/deleted input formats. It mostly already has one, in which text in an invalid input format is rendered either into the default or fallback input format; drothstein says it is possible/likely that this fix has not been forward-ported from the D6 security issue, but that is a format system issue, not a Field API issue. And of course handling deleted input formats correctly is what #562694: Text formats should throw an error and refuse to process if a plugin goes missing is all about.

The simplest possible fix here is just to make sure that text.module clears the Field API cache when an input format changes or gets deleted. The data in the format column will remain wrong, and the format system will handle it. I've attached the trivial patch for this. Unfortunately, even the simplest possible fix is not simple, because in fact the format system will NOT handle it. The format system logs undefined index errors when handled an invalid format id.

Here is my current understanding of the path forward:

* This patch to text.module is an improvement on the current situation because it fixes the bug that the Field API cache is not cleared when input formats change.
* This patch reveals a bug in filter.module that it doesn't handle invalid format ids. That bug needs to be fixed.
* We can have a non-critical D7 issue to implement the semi-correct f_a_bulk_update() function, which may or may not happen.
* We should have a D8 issue for the correct batched version.

All that being said, my *next* post on this issue will be a text_filter_format_delete() operation that just updates the damn data the dumb, non-scalable, but easily implemented way. Unless I discover it too is a maze of twisty little passages.

Maybe I stopped paying attention to core for six months for a good reason. :-)

bjaspan’s picture

No, forget it. The dumb way is so ridiculously inefficient that I can't bear to write it.

bjaspan’s picture

Status: Needs review » Needs work
sun’s picture

Status: Needs work » Needs review
+++ modules/field/modules/text/text.test	22 Apr 2010 22:30:02 -0000
@@ -216,6 +216,14 @@ class TextFieldTestCase extends DrupalWe
+    // Delete the text format and verify the field content renders in
+    // plain text.

This is exactly what must not happen, never, under any circumstances.

.

Question: Why do I store data in Drupal at all? If it isn't able to properly update and maintain it?

The more we are discussing the high-level question at all, the more I'm seriously concerned about Drupal's future. This limitation is more than a release blocker.

96 critical left. Go review some!

catch’s picture

I still firmly believe we should prevent the deletion of the text format if there is any content assigned to it - so require a pattern of update content first, then delete the format. That may not be possible to do in core either, but it won't result in orphan text fields.

bjaspan’s picture

Status: Needs review » Needs work

@sun, I don't get it. You are complaining that Field API has no way to bulk update data, but if it does (and actually I'm writing it right now), text.module will use it to update the format column to change a deleted format id to... what? Not the fallback, because that will result in check_plain(), which you say must never happen.

@catch, I'm beginning to agree.

sun’s picture

It's not check_plain()'ed. The $fallback format that is passed to hook_filter_format_delete() needs to be assigned. The filters of the fallback format are configurable. In addition, #635212: Fallback format is not configurable allows the site administrator to choose the text format to pass as fallback.

bjaspan’s picture

StatusFileSize
new9.49 KB

Okay, here is a first pass at field_attach_bulk_update(). There are some TODOs and no tests, but the one test for text.module which uses it on format delete works. I am far from convinced this function is a good idea. At this point I think there are serious problems or objections to all possible solutions to this problem, including not solving it.

chx says he and David Strauss think field_attach_query() should actually be removed from core, which of course would this function impossible, which is fine with chx as he says he can't implement it for mongodb anyway. His suggested solution is that the text.module, on load, checks if the format is valid, and if not updates the stored value right then; thus the db slowly works it way into a correct state over time, without a separate batch process. Not a bad idea, but we'll have to see how it works out.

chx’s picture

Sure I can implement it! It's even implemented (just broken right now 'cos the latest structures are not implemented) I said the function is useless because it's a field level and not an entity level query. How will you find the latest ten published nodes with the 'flag' field being 1 if you can't query node.status with field_attach_query?

bjaspan’s picture

@chx: You are asking, "How can I search node documents and field data without a join?" Well, there are two answers:

1. You do what I'm assuming you want, which is store the field data in the node document and query node.status and node.field_flag in one operation. That works great for MongoDB I assume. You could also query node.status, node.field_flag, and node.field_some_multi_value_field all in one query because those are all in the node documents, right? Of course you know that you can't do that with SQL without a join or without completely de-normalizing by storing all the node properties in every field table (which still won't work for multi-value fields), which you know isn't going to happen.

2. You do the join in the application. Query the latest N published nodes. Then query all of those ids that have the flag field set. When you only get 3 matches, get the next N published nodes. Iterate.

Obviously #2 sucks for you. But your proposed alternative, removing field_attach_query() from Field API, makes every site using SQL stop working in important ways, and obviously we're not going to break every Drupal site except for Examiner.com. Perhaps we've just discovered that the Field API's goal of using a pluggable field storage engine so that we can work with SQL and document databases is unachievable in a single consistent and efficient system.

I'm not at all familiar with document databases, so it is entirely possibly I'm missing something.

This discussion is only relevant for this issue to the extent it bears on whether we should solve the text format problem with a field_attach_bulk_update().

yesct’s picture

huh? what needs to be done here? Certainly a re-roll...

damien tournoud’s picture

I opened #832572: Add a standard worker queue as a push forward to provide a way for modules to easily register long-running bulk operations.

catch’s picture

Issue tags: +API change

If we go with Barry's approach, adding API change tag.

bjaspan’s picture

I just re-read this issue yesterday and honestly I have no idea what the right approach is. I am leaning towards catch's idea in #34: "I still firmly believe we should prevent the deletion of the text format if there is any content assigned to it - so require a pattern of update content first, then delete the format. That may not be possible to do in core either, but it won't result in orphan text fields." Among other benefits, it is actually achievable in a small amount of time using existing APIs.

David_Rothstein’s picture

I'm really not a big fan of crippling the delete format functionality, especially when it will likely be needed in the D6->D7 update. Plus, I don't see how it can be done with existing APIs... doesn't it need the new API to query all modules and ask them if a particular text format is in use or not?

I wonder if we are over-thinking this. Why can't we just use the batch API here? Seems like that was exactly what it was designed for. Earlier @bjaspan said this:

1. The dumb way: f_a_query() all the affected objects, then for each one, load it, change it, save it. This will work fine until we have many objects ("many" here might be as few as 100s or 1000s) at which point it will take forever, hit the PHP timeout, etc. We could do a batch API thing, but that will just solve "PHP timeout," not "take forever."

But so what if there are some (probably rare) cases where it takes forever and the person has to stare at the progress bar for hours on end? If that's how long it takes, that's how long it takes. If they get bored and click away to interrupt the batch process then we just have to make sure it fails gracefully, which should be easy: Just don't delete the text format itself until the end of the batch process rather than the beginning. That way if they interrupt things, their site is still in a consistent state, and they can go back and "re-delete" the text format again later, and pick up right where they left off.

David_Rothstein’s picture

By the way, I assume this could be implemented by adding an optional $sandbox parameter to hook_filter_format_delete(), just like update functions have. So it wouldn't even be an API change, just a minor API addition. Modules that don't care about it could just ignore it.

yched’s picture

re #44: If there's an API function to delete a text format, then processing the delete cannot involve Batch API. Batch API assumes a UI, and thus cannot be used in a generic API func.

David_Rothstein’s picture

Batch API can be used without a UI if $batch['progressive'] = FALSE, so I was thinking maybe there'd be an optional parameter to filter_format_delete() that controls whether to do that or not?

(Although if anyone actually used it as an API that would then theoretically run the risk of PHP timeouts again. However, I think this is an edge case, for a number of reasons; we could just document it or something.)

Anonymous’s picture

putting aside the specifics of text.module and filters, i think the most important thing we need to build in to handle "update a bazillion fields/entities" is something like #41, because it allows a clean way for big sites to throw hardware at the issue in whatever way suits their setup best.

for 99.9% of sites, the default queue handling Damien suggests will do. for the very much smaller set of sites where this matters, a queue will make it easy to process the updates with something like Gearman. now, something that scales horizontally as easily as Gearman will probably just expose the data store as the limiting factor, but bigger sites will be in control of where to make the trade-offs.

philbar’s picture

Issue tags: +beta blocker

tag

yched’s picture

Also, it seems that any solution building on a 'long-running process' needs to account for site hits that happen meanwhile (i.e a node with a text value using a deprecated input format, being edited or viewed before the background process has handled it).

So whatever way we go (or not) for the mass-update part, it seems the solution always at least includes a 'just-in-time replacement' part as safeguard (approach explained in #4 and #20, implemented in #25, resurfaced in #37 second paragraph).

David_Rothstein’s picture

I think the just-in-time part may wind up being necessary for some edge cases of #358437: Filter system security fixes from SA-2008-073 not applied to Drupal 7.x - but I had forgotten there was already some code for it here :)

But for the narrow purposes of this issue, as long as we wait until after the conversion process is complete to delete the actual text format from the database (rather than the other way around like we do now) I don't think we actually have to worry about site hits in the meantime, do we? If we do it that way, then from their point of view, it's just like the deletion didn't happen yet; the node should still be perfectly viewable in the old text format.

yched’s picture

re #51, delayed deletion of the input format - As we found out when dealing with field deletion, delayed deletion has some non trivial consequences, since you need to deal with the 'in-between' time frame as well. You find yourself with half-citizens (marked deleted) that you need to keep around and invent their behavior completely:
make sure they don't show up in admin screens; ideally you don't want them used on new content; handle dependencies on filters...

Off-hand, I'd say 'just in time' replacement looks simpler :-)

David_Rothstein’s picture

Most of that complexity is not an issue if the batch API is used, since the user doesn't expect it to "actually" be deleted until they see the progress bar complete.

I guess it is true that there can be a race condition if content is created with a text format that is in the process of being deleted - and in that case, that content could get stuck with that format. This seems like another argument for having the 'just-in-time' behavior around as a backup (similar to the other edge cases in #358437: Filter system security fixes from SA-2008-073 not applied to Drupal 7.x). But I think we ideally need both: Make a good-faith effort to completely delete the text format (which should work fine probably 99% of the time anyway) and then have the just-in-time replacement as a backup to acknowledge that we can never guarantee we removed everything.

This is consistent with the way it was done in D6, actually (although that was cruder than this would be). The text formats were deleted by filter.module, and then with the security fix that went in, there was a backup behavior for things that did not get updated during the deletion process.

mikey_p’s picture

How on earth will this actually work? Do we need to have a hook where each module will return a list of {tablename}.column where they store format ID? Do we have any idea how we plan to batch this?

Alternatively in #358437: Filter system security fixes from SA-2008-073 not applied to Drupal 7.x David mentioned simply storing a mapping to be used by check_markup for displaying content using a deleted format. If we did that we wouldn't need to worry about race conditions, batch operations, or finding all the correct DB fields, regardless of storage engine, or worrying about text formats stored in disabled modules. Overall that seems like a huge win, when we would need to have a failsafe in check_markup() anyway. Thoughts?

Also at this point, this seems to be the same issue as #358437: Filter system security fixes from SA-2008-073 not applied to Drupal 7.x. Is there any reason that one of these couldn't be close in favor of the other?

David_Rothstein’s picture

It would definitely be simplest just to have the failsafe in check_markup() and stop there, but above there were some complaints (I think justified) that it's really not a good idea to purposefully leave incorrect data in the database and never make any attempt to update it to the new, actual value. The idea of doing both is that it's a realistic compromise for D7 - we make a best effort to update the data using existing D7 methods (the batch API), but knowing there are some edge cases where omissions will slip through, we have the other method as a backup.

As for how it would work, it would be just like http://api.drupal.org/api/function/hook_update_N/7, I think. The filter module would take care of preparing the batch and registering all the operations for it based on modules that implement hook_filter_format_delete(). The individual modules would not see any API change at all here really, just a new, optional $sandbox parameter passed along with the hook, which they could ignore if they want to (just like most update functions don't need to bother with $sandbox). But modules which have a lot of data and need to do a multi-pass update could use $sandbox to control that. The individual modules would still be responsible for updating their own database tables and everything, just like now.

If we implement both solutions at the same time, then it might be possible to close #358437: Filter system security fixes from SA-2008-073 not applied to Drupal 7.x in favor of this one, but we probably shouldn't do that just yet...

chx’s picture

Version: 7.x-dev » 8.x-dev
Category: bug » feature
Priority: Critical » Normal

We had our party from #16 with EntityFieldQuery so a contrib can be written that moves stuff from one format to the other. mikey_p is fixing check_markup to spit out n/a in case of need at #358437: Filter system security fixes from SA-2008-073 not applied to Drupal 7.x

David_Rothstein’s picture

Huh? How is this not a critical D7 bug?

mikey_p’s picture

After talking this over I agree, the only security issue is that from #358437: Filter system security fixes from SA-2008-073 not applied to Drupal 7.x and anything else is bordering on babysitting. The best approach here for the future would be to simple add a hook for each module to respond when validating the deletion of a format, and if any module responds, then block the deletion of the format. But I don't think it's critical, since there is no data loss.

David_Rothstein’s picture

Version: 8.x-dev » 7.x-dev
Category: feature » bug
Priority: Normal » Critical

To clarify the bug here, the steps to reproduce are:

1. Create a node with a "Filtered HTML" body.
2. Use the filter module UI to delete the "Filtered HTML" format. It tells you it's going to reassign all content that previously had that format to plain text.
3. Go back and look at your node. It's not in plain text.

I think we need to leave this issue open until that problem is fixed - even if it winds up getting fixed somewhere else :) The bug here is (relatively) specific to text.module.

chx’s picture

How on earth is this specific to text module? If I use that format in a custom block which then gets cached then ...? This is NOT specific to text module.

chx’s picture

StatusFileSize
new615 bytes

Is this what we want, then?

chx’s picture

Status: Needs work » Needs review
damien tournoud’s picture

Title: text.module must act when a text format is altered » Don't reassign deleted text formats to the fallback format

Ok, so I tend to agree there: it doesn't make a lot of sense to reassign existing content to the fallback format. We should leave the format as it is, and make sure that check_markup() doesn't output anything when the format doesn't exist.

So, in addition to #61, we need to remove the existing implementations of hook_filter_format_delete() in the block module and the user module.

chx’s picture

Really #358437: Filter system security fixes from SA-2008-073 not applied to Drupal 7.x needs to be merged in. But I agree. a) the filter delete operation is rare enough that the code weight to move nicely from one filter format to another screams for contrib and not core b) such a move making sense is not common enough (even out of those rare cases). It's much more likely that we will display invalid garbage (to the end user at least. Can they read HTML? Markdown? TeX? Deal with lengthy inline commands?) which might reveal data we don't want (that's the sechole).

Edit: however it is now possible to make such a move thanks to EntityFieldQuery at least for fields. We can at least hope that modules add foreign keys to their schemas pointing to filter format fid. If text module does not do that which I suspect it does then we probably should figure it out.

David_Rothstein’s picture

StatusFileSize
new24.32 KB

How on earth is this specific to text module? If I use that format in a custom block which then gets cached then ...? This is NOT specific to text module.

You may be right that there are caching bugs too, but that's not the same issue. The problem with text.module is that the wrong data is actually being stored in the database, so no amount of cache clearing will solve it.

Ok, so I tend to agree there: it doesn't make a lot of sense to reassign existing content to the fallback format.
...
It's much more likely that we will display invalid garbage (to the end user at least. Can they read HTML? Markdown? TeX? Deal with lengthy inline commands?) which might reveal data we don't want (that's the sechole).

That is why in Drupal 7 we do not force them to switch it to the fallback format (like Drupal 6 did). Instead, we give them a choice - see attached screenshot. (Of course, it's still true that if they make a bad choice, they can introduce a security risk.)

David_Rothstein’s picture

Title: Don't reassign deleted text formats to the fallback format » When a text format is deleted, some modules (like text.module) don't update their data, even though we say they do

All previous issue titles here presupposed a solution - retitling it to focus on the actual problem instead :)

In terms of the batch API way of handling this, I still don't understand the problem. It's exactly what we do for other rare operations in core that can potentially involve thousands of nodes being loaded and resaved, etc, such as http://api.drupal.org/api/function/user_cancel/7

catch’s picture

That has its own issues which have been bumped to D8 - see #89181: Use queue API for node and comment, user, node multiple deletes, short version is batch is no good for anything other than form submissions.

sun’s picture

Status: Needs review » Needs work

We are running in circles. We are hitting the same absurdity all over again: Drupal is not capable of applying changes to larger amounts of data.

A bit insane. I really wonder why this is not covered in a http://drupal.org/node/X where X <= 7.

Instead of playing ping pong with issue titles, statuses, and patches, let's decide on one of the following resolutions:

  1. Screw proper loading/saving. Most database update functions do not use a proper loading and saving procedure either, and actually, we are missing a separation between user updates and system updates of entities. Killing a text format would need to load + save tons of diverse entities, triggering actions (potentially having entirely unintended effects, such as sending an e-mail on a content update, updating the last changed timestamp, etc), log messages, and whatnot. Why? We have no notion of performing system updates on entities, so why aren't we just executing the required database queries, flush all caches, and be done with it?
  2. Screw API changes/deletions. All of the current operations are triggered via the UI, so just use Batch API + document it. People that need to trigger mass-updates and mass-deletions via non-UI functionality can come up with tweaks for Batch API or a replacement or just use custom code.
catch’s picture

For #1, we'd need a bulk update API for field API so that it works for pluggable storage - Barry actually started on this in one of those issues. We could try to have something like this but it really feels like a D8 or contrib task at this point.

On #2 One the one hand we already have that as an issue, cleaning it up is for D8, on the other I don't love adding more of this to D7, but could live with it if we add explicit @todos to do something else in D8 and open issues similar to the node/comment multiple delete ones.

David_Rothstein’s picture

short version is batch is no good for anything other than form submissions.
...
People that need to trigger mass-updates and mass-deletions via non-UI functionality can come up with tweaks for Batch API or a replacement or just use custom code.

Could you explain these statements? I use batch API without a UI all the time and probably so do you (every time you run the command-line installer in D7, or "drush updatedb" even earlier than that). It works fine.

Perhaps what you are actually saying is "if you plan to use this outside the form API on a GIANT amount of data you should check your PHP execution time limit first because you might hit it", and yes, if going this route we could document that. I really can't imagine the use case for deleting text formats outside of the UI anyway (e.g. on cron), though.

David_Rothstein’s picture

I really can't imagine the use case for deleting text formats outside of the UI anyway (e.g. on cron), though.

To clarify: I guess I can imagine doing it outside of the UI in an update function, but that is already inside a batch process anyway :)

damien tournoud’s picture

No go for those two resolutions. I only see two possible course of action:

  • Fix the Batch API so that it uses a proper queue and doesn't rely on the browser session to run it
  • Remove the mass filter update feature, that is a corner case anyway and should belong in contrib
sun’s picture

Re-reading my #68 and last #72:

2) Removing the option to delete a text format is a no-go. I've personally configured bogus text formats myself and as Wysiwyg module maintainer I know that there are countless of users who created varying amounts of formats that have to be merged/replaced with another existing format.

1) Thinking low-level and generally, I totally agree with the need for a low-level equivalent to Batch API, very potentially based on Queue API. Regardless of this issue, we need to address that problem space, so efforts should be continued on that other issue (I think + hope chx is working on something already). However:

The common causes are rather simple: "uid 123 just became 0, since 123 left us.", or "text format 3 was deleted, replace with 4."

In previous versions, we simply went with

UPDATE x SET y = new WHERE y = old;

but since D7, we seem to officially support no-sql databases, for which such an update may not work.

Actually, Field API seems to have been designed to officially support alternative storage engines than SQL, so that's likely the reason why we officially have to support mass-maintenance of values in no-sql storages. Not having used any of those myself, and considering them as very advanced usage (who in those 80% will ......), I'd be happy to leave mass-maintenance for no-sql storages to contrib, where those storage engines have to be invented in the first place.

That is, because: Even if we'd have a batch-alike update system being based on the new queue system, we'd still have to invent something new to differ between user and system updates. Everyone who worked with VBO (the only (UI-only) mass-maintenance attempt that exists AFAIK) knows what happens when updating or performing actions on 217 nodes via Batch API...

bjaspan’s picture

I suggested a bulk update API that for SQL reduces to the UPDATE query back in comment #3. While it is true that previously "we simply went with UPDATE x SET y = new WHERE y = old;", in #15 catch observed that that kind of update is really no longer kosher, even if we were willing to assume SQL-only storage.

I'm trying to come up with The Simplest Solution That Could Possibly Work.

Here's the thing: We *imagine* that the old way of "UPDATE x SET y = new WHERE y = old" actually did work because we think of that update query as instant. It is not. If the table is big enough or y isn't indexed, that query can take a LONG TIME. In fact I recently had a table so big that "SELECT COUNT(*) FROM table" by itself took over an hour before I ultimately killed it. So, the fact is that we have NEVER really had a scalable solution to this problem.

Therefore, I think The Simplest Solution That Could Possibly Work is to just do the queries when we need to do them. In this case, that means that text.module in hook_format_deleted or whatever will query for all entities with the old format, and then one at a time load them, update the value, and save them (well, we could use multi-load). Yes, this might run out of PHP execution time... but the UPDATE query could, too.

So that is my current suggestion.

chx’s picture

Bah, just batch API it. As David Rothstein points out we can run a batch API various ways, this discussion is not getting Drupal 7 out the door.

bjaspan’s picture

Status: Needs work » Needs review
StatusFileSize
new7.25 KB

I re-read this issue again from the beginning and we discussed it at today's D7 code sprint. In attendance: Dries, Moshe, scor, David Rothstein, hefox, Benjamin Melançon, Tim Loudin, and myself.

We decided that while we would like to keep all the data up to date and consistent, doing that has all kinds of issues we cannot solve quickly. We cannot to a load/save on all entities because we'll run out of time and even if we batch it we may trigger actions (sending emails, updating external web services, etc.) that we don't want to. We do not really want to implement the Field Bulk Update API because it's too late, it violates the entity abstraction, and it won't even fully solve the problem.

We considered the alternatives of disallowing deleting a format (either entirely, or only if content using it exists), declaring that "deleting a format" really means replacing its filters with one that always returns "This content is no longer available.", or implementing the "replacement map" functionality. We (or least I) do not like the idea of needing a replacement map for potentially every module for every column; clearly that is not a long-term solution.

But this issue is not about handing potentially every piece of data for every module; it is about deleting a text format and doing something reasonable.

Moshe ultimately made the killer argument: When a site admin clicks "delete a format", she doesn't want to be told "no, first you have to go find and update all your content manually." That's a PITA. She also does not want to be see "This content is no longer available" again until she goes and manually updates everything. She does not even want to see "all your content will be migrated to the new format by a magical perfect process, just please wait while your site continues to display some content in the format you just deleted." Instead, what she wants is for the format to be replaced by her selected replacement, RIGHT NOW.

Therefore, we decided to go with the replacement map option. It's simple, it accomplishes the goal, it lets us move forward, and it is probably needed even if/when someone implements the actual "update all the database entries to reflect the new format directly."

I have thus re-rolled my patch from #25 and added some tests. Attached.

Note that the "we" in this post was not completely unanimous. We reached agreement though some people still thought other approaches might be better. Anyone who wants to is welcome to implement those other approaches... but this issue will no longer be open in critical state. :-)

Status: Needs review » Needs work

The last submitted patch, format-delete-556022-76.patch, failed testing.

bjaspan’s picture

The test failure is because my patch makes it look like deleted formats are not actually deleted. Clearly changing the behavior of filter_format_load() as I did is wrong.

bjaspan’s picture

Status: Needs work » Needs review
StatusFileSize
new7.84 KB

New version. This operates on check_markup() and filter_list_format().

chx’s picture

Status: Needs review » Reviewed & tested by the community

I guess this works.

sun’s picture

Status: Reviewed & tested by the community » Needs work

This patch contains various coding style issues and the helper functions could use better names, and it's totally illogical to declare those "private" -- every module on this earth has to call them.

I still need to read up on the latest follow-ups. The chosen way to solve this is a total hack IMO and this will bite us badly. I can only suspect that this topic was solely discussed within the single scope of deleting a text format. The real scope is What happens with stored field values that reference something, if that referenced thing changes or vanishes? Node references, user references, taxonomy term references, the real scope applies to all values that are references, using foreign keys.

bjaspan’s picture

@sun, I explained our rationale in #76. The goal is to ship D7 with something that works, not continue to go around in circles forever attempting to solve intractable problems. The relevant question is: Is D7 so useless either without solving this problem, or with solving it as we decided to in #79, such that the majority of people who would otherwise use it can't use it? The answer is obviously no. Yes this solution is a hack, but it is a hack that provides a reasonable tradeoff, and after 2.5+ years of development we cannot hold back D7 forever trying to perfect solutions to every problem.

Regarding the fact that the functions are private: The goal was to make an internal-only behavior change, not change/complicate the public filter API. If you know of problems with the current changes that break callers of the filter API, by all means bring them up; that is perfectly valid review feedback.

Regarding your question: What happens with stored field values that reference something if the referenced thing changes or vanishes? It is actually two questions:

1. Changes. We generally handle this fine. Certainly text formats, nodes, users, terms, etc. that changed are handled fine. We clear the cache and get the new info on next reference. There may be cache clearing bugs, but that is not what this issue is about.

2. Vanishes. The simple and obvious solution is that a reference to something that no longer exists stops working and displays some error: "reference to non-existent [thing]". Trying to do better than that in the general case for arbitrary quantities of data, automatically, and invisibly to the user is probably impossible, and it is certainly not a critical "D7 is useless with out" requirement for core. In the case of node/user/term reference, "referenced thing no longer exists" is probably even the correct behavior. In the case of text formats, we actually considered that any text using a deleted format should render to "This content is no longer available." That was my first choice yesterday. However, Moshe made the compelling argument that the hack we implemented is substantially more usable than either doing nothing (which we considered) and "content not available." So that's what we did.

We've been going around in circles for months. At various points in this thread, someone (frequently you) has rejected every even theoretically possible solution. We need to ship D7. So the goal is to find something that works and move on.

I'll leave this as CNW for now but without some other tangible, implementable solution in the very near future (or without real identified problems with the current patch), I think it should go back to RTBC.

moshe weitzman’s picture

Status: Needs work » Reviewed & tested by the community

This patch downgrades a problem from critical to ugly. Some future patch can go from ugly to pretty.

Back to RTBC. If someone downgrades it again, please state exactly how this should be resolved.

damien tournoud’s picture

I like.

David_Rothstein’s picture

Status: Reviewed & tested by the community » Needs review

It's minor, but in the code comment for text_filter_format_delete() "input formats" should be "text formats"...

Plus, I think that comment really needs to explain why we *aren't* actually replacing the deleted format like we're supposed to (in other words, some kind of short summary of the discussion we had here, maybe even a @todo). Otherwise it will look pretty weird to have a delete hook that doesn't actually replace or delete something, just clears a cache.

I haven't looked too closely at the rest but it seemed reasonable.

sun’s picture

Status: Needs review » Needs work

Seriously, storage in a variable? How about introducing a proper {filter_format}.replacement column? Modules may have to join on those values to figure out update todos.

The PHPDoc of hook_filter_format_delete() implementations that do not perform a replacement should clearly state and clarify why they are not replacing, and noting a @todo for D8 wouldn't hurt either.

Likewise, the API docs for hook_filter_format_delete() should explain this yet unknown and entirely new concept of not deleting data and instead mapping it forever.

Why is the test not using drupalCreateNode() ?

David_Rothstein’s picture

Likewise, the API docs for hook_filter_format_delete() should explain this yet unknown and entirely new concept of not deleting data and instead mapping it forever.

As per my comment I definitely agree about the API docs, but this isn't actually an entirely new concept. It's exactly what happens for all contrib modules in D6; none of them ever delete or replace anything when a text format is deleted.

It wasn't good then and it isn't good now, but it's not entirely new. The only difference is that D6 did not give site administrators a choice for the replacement format whereas D7 does; therefore, the "mapping" that occurred within filter_list_format()/etc. in the filter module looked a lot simpler in D6 than it does in this patch. But it was still there.

yched’s picture

I agree with the approach taken here.

One minor nitpick : Ideally "// Handle a chain of deleted formats." would happen once and for all in _filter_set_replacement_format() rather than each time over and over again in _filter_get_replacement_format().

bjaspan’s picture

Component: field system » filter.module
Status: Needs work » Needs review
StatusFileSize
new10.33 KB

I've changed this issue from field system to filter.module because that's how we ended up deciding to address it.

Changes in this patch:

* Improved docs everywhere.
* The tests now use drupalCreateNode() instead of node_save().
* Changed _filter_set_fallback_format() to handle chains of deleted formats. However, I also left the chain logic in _filter_get_fallback_format() because... something is nagging me that under some condition it may be necessary, and if it is not necessary then the loop will only ever run once anyway.

Non-changes in this patch:

I am still using a variable to store the fallback map. My original plan at the code sprint was actually to do as sun suggested and add a "fallback" column to the filter_format table. I abandoned that plan for the following reasons:

* All the database queries on tables filter and filter_format would need to be updated.
* It requires a much larger API change since now there needs to be a way to list both deleted and non-deleted formats.
* The internal logic for filter_formats() and other functions would need to change to handle caches of deleted and non-deleted formats separately, making it a more complicated patch.
* Providing a join-able column is only useful for modules that store their data in an SQL table, and the whole reason this issue is complex is that we're trying not to assume that text data is stored in SQL. OTOH, a join-able column is in no way necessary: Now that the fallback map variable is documented, modules can just as easily do the equivalent of "SELECT * FROM mytext WHERE format IN (array_keys($fallback_map))", no join necessary.

In short, it seems to me that a fallback column (or a separate filter_format_deleted table which I also considered) would be a much more complicated patch without any real benefit over the current approach.

bjaspan’s picture

I guess I could go for a function to return the fallback map instead of suggesting that modules access the variable directly.

bjaspan’s picture

StatusFileSize
new10.53 KB

New patch, adds filter_get_fallback_formats() instead of documenting the internal variable.

catch’s picture

Component: filter.module » field system
Status: Needs review » Needs work

I prefer a variable to a table (unless we wrap calls to that table in both a cache_get(), cache_set() + static caching) - check_markup() running on 300 comments or 30 nodes has to call this function for each.

@sun, nodereference and userreference have been dealing with this issue for years now - they just leave stale data lying around and don't display anything, taxonomy follows that pattern. This is obviously a problem we need to solve, and it'd be nice to do it retrospectively (or have the ability to do so) when we do, but I agree with Barry/the patch that we can't do it here, and no-one has opened critical bug reports for that generic problem that I know about.

yched’s picture

Not sure why catch set this back to CNW (not clear from the comment) - my personal remark is that comments lines wrap consistently before 80 chars :-).

Other than that, looks good to me.

webchick’s picture

Component: field system » filter.module
Status: Needs work » Needs review

Looks like a straight-up cross-post with Barry to me.

catch’s picture

sorry that was a crosspost with barry, just took ages to get back to the tab after starting it :(

David_Rothstein’s picture

+function _filter_set_fallback_format($format_id, $fallback_id) {
+function _filter_get_fallback_format($format_id) {
+function filter_get_fallback_formats() {
...
(and others)

This is actually incorrect use of the term "fallback format" - in the filter system, "fallback" refers to http://api.drupal.org/api/function/filter_fallback_format/7.

I think we have existing code during filter deletion that (for historical reasons) uses this wrong term also, but that's a bug to be fixed and we probably shouldn't add to it here :) I'd suggest "replacement format" instead?

Patch looks good otherwise. (I'm trying to think if there are any other places in the code that would need to call these new functions, but can't think of any, at least off the top of my head - the places added in the patch seem to cover it.)

bjaspan’s picture

StatusFileSize
new11.85 KB

New patch. Changes "fallback" to "replacement" everywhere that we talk about the format to be used in place of one that was deleted, so "fallback" now only means "the format that cannot be deleted and that everyone is able to use." Also, comments are now wrapped at 79 characters.

chx’s picture

Status: Needs review » Reviewed & tested by the community

looks good, let's get our stubbornest critical out of the way.

sun’s picture

Status: Reviewed & tested by the community » Needs review
StatusFileSize
new11.1 KB

wow, I think I've never seen design as ugly as this. This is most probably the worst Drupal API change since.. like ever? At the very least, Drupal modules tried to update values, so as to ensure data integrity. But not trying at all is just wow, überugly. This basically means we could as well resurrect the epic Deletion API issue.

The consequences of this patch are horrible to think through. Starting from potential endless recursion issues, to breaking data integrity, breaking table joins, complicating module updates, potentially introducing security vulnerabilities in contributed and custom modules, and certainly more I can't even think of yet. An insane recipe for introducing critical bugs.

I do not approve this Filter API change. From all possible options, this is the absolute no-go resolution. But anyway, at least trying to prevent some possible infinite recursion problems, and fixed various comments and names.

sun’s picture

StatusFileSize
new11.3 KB

Status: Needs review » Needs work

The last submitted patch, drupal.kill_.sanity.100.patch, failed testing.

bjaspan’s picture

re #99: "At the very least, Drupal modules tried to update values, so as to ensure data integrity." No, they don't. Try deleting a content type sometime.

bjaspan’s picture

In #358437-23: Filter system security fixes from SA-2008-073 not applied to Drupal 7.x I suggest that these two issues, while not exactly the same, are really best served by a single patch.

sun’s picture

Status: Needs work » Needs review
StatusFileSize
new12.7 KB

Sorry for being asinine, but this adds a lot of long-term burden to Filter system/module maintainer(s).

I'm putting a new patch up for review. Simplicity: If we say we cannot ensure data integrity when a text format is deleted, then we simply cannot expose functionality to delete a text format.

Dead simple. It's illogical and utterly wrong to try to expose a feature that is and cannot be supported by the rest of the system.

You can make a text format unavailable by revoking user permissions. That's it, case closed.

Status: Needs review » Needs work

The last submitted patch, drupal.filter-format-delete.104.patch, failed testing.

sun’s picture

Another debate concluded that we're optimizing Drupal for 1% large-scale sites here. For 99%, i.e., all other more common sites, execution of various DELETE queries as well as flushing of various caches is no big deal at all. We can serve the needs of most users by implementing the originally suggested field_attach_bulk_update() + making the default implementation in field_sql_storage respond to it. If your site falls into that 1% of large-scale sites, then you can always replace field_sql_storage with a custom replacement module, or simply implement some alter hooks in a custom module to remove the functionality to delete a text format. But there's no reason why 99% of all sites should be discriminated, just because 1% is unable to do the same. Heck, I could even tag this issue with "accessibility"...

moshe weitzman’s picture

Status: Needs work » Reviewed & tested by the community

OK, lets send this to the committers. We're not going to come to a happy agreement quickly here.

#97 is RTBC and passes all tests. #104 is preferred by sun but does not pass tests yet.

dries’s picture

Status: Reviewed & tested by the community » Needs work

Sun, I don't support the proposed patch in #104.

People that don't know text formats might try to create one only to see how it works, what it does, etc. I know I would, and I would hate it to get stuck with a text format called 'test test'. ;-)

There are other valid use cases where it is completely harmless to delete a text format. The confirmation page of the text format deletion option should warn people about the consequences (it already does, I believe), and then we let the administrator make his/her own decision at his/her own risk.

Removing the delete option because it can cause damage is a crazy pattern. Do you recommend we remove the delete button from content types too?

Administrators can cause damage on any IT system. Delete the wrong file on Microsoft Windows, and your entire system is hosed ... it doesn't stop hundreds of millions of people from using Windows.

Let's continue with #97.

dries’s picture

Status: Needs work » Reviewed & tested by the community

Cross-post.

moshe weitzman’s picture

@sun - please let us know if you intend to keep going with #100. I'm not sure what you are doing there but I notice that some test failures have crept in. If you prefer not to keep going, thats ok.

bjaspan’s picture

I agree that sun's changes in #100 are improvements on my #97. I will pick up his changes and re-roll the patch soon if he does not. Currently, I'm taking an unplanned detour through the western Denmark due to mis-reading the train maps. At least I have internet, though. :-)

bjaspan’s picture

Status: Reviewed & tested by the community » Needs work

I am now working on re-rolling this patch.

bjaspan’s picture

Status: Needs work » Needs review
StatusFileSize
new13.66 KB

New patch. I updated function names, code, code placement, and comments based on sun in #100 (except for the slams against Field API). I introduced two test fixes worth noting.

FilterSecurityTestCase creates a node, deletes its text format, and verifies that the node body renders as nothing. We now need this:

+    // For this test we need content in a text format that the filter
+    // system knows nothing about. This means we have to explicitly
+    // remove it from deleted formats replacement list; otherwise, the
+    // next text will fail because the text will be formatted in the
+    // fallback format instead of as the empty string.
+    $deleted = variable_get('filter_deleted_formats', array());
+    unset($deleted[$format_id]);
+    variable_set('filter_deleted_formats', $deleted);

I also had to remove the last two lines in testTextFormatReplacements(), but I suspect something is wrong because this code passes before this patch (I assume), but I do not undersatnd how:

    // Delete Filtered HTML, verify default format is used.
    filter_format_delete($filtered_html_format);
    $this->drupalGet('node/' . $node->nid);
    $this->assertRaw(check_markup($html_text, $plaintext_format->format), 'Content was rendered in Plain text.');
    // We cannot verify the format form element because under these
    // conditions the form element is not rendered at all.
    //$this->drupalGet('node/' . $node->nid . '/edit');
    //$this->assertFieldByName("body[$langcode][0][format]", filter_fallback_format(), 'Text format widget defaults to Plain text.');

We're being kicked out of the code sprint room, so I'll look at that last issue again later tonight.

bjaspan’s picture

No, sorry, my final comment in #113 is wrong. The last two lines I commented out do not work before this patch, because they are added by this patch. sun added a check for the format element in the form to all three cases in testTextFormatReplacements(). His third such test fails because there is no format form element when there is only one format allowed.

I do not see where in the code makes the select element not get rendered when it only has one option. So, either the element should not get rendered and I should change the test to verify it is not rendered, or the element should get rendered with one option and I should put hte test back the way sun wrote it.

So: Should the format select widget get rendered when it only has one option? If not, where is the code that makes it not?

bjaspan’s picture

I found the answer to my question in #114.

sun’s picture

StatusFileSize
new11.94 KB

Re-rolled #100 + fixed the tests. Didn't look at the differences in #113 yet.

bjaspan’s picture

StatusFileSize
new13.62 KB

The answer to my question is that the format selector is not rendered if there is only one format. This behavior comes from the #access attribute:

  $element['format']['format'] = array(
    '#type' => 'select',
    '#title' => t('Text format'),
    '#options' => $options,
    '#default_value' => $element['#format'],
    '#access' => count($formats) > 1,
    '#weight' => 10,
    '#attributes' => array('class' => array('filter-list')),
    '#parents' => array_merge($element['#parents'], array('format')),
  );

I have therefore removed the test for the format element from this patch when there is only one format.

I believe this patch is RTBC, but it contains some (minor) test fixes since the last RTBC and sun's patch, so it could stand another quick review.

bjaspan’s picture

StatusFileSize
new13.5 KB

Reviewed my own patch, made two fixes:

+++ modules/filter/filter.module	22 Aug 2010 16:25:25 -0000
@@ -258,14 +258,31 @@ function filter_format_save(&$format) {
+ * hook_filter_format_delete(). ¶

Fixed whitespace.

+++ modules/filter/filter.module	22 Aug 2010 16:25:25 -0000
@@ -874,6 +957,7 @@ function filter_process_format($element)
+  dpm($element);

Removed.

Powered by Dreditor.

sun’s picture

StatusFileSize
new12.62 KB

Incorporated additional changes from #113.

Not sure whether this patch should go in. Given the thinking we seem to apply to the overall issue, we may as well just silently ignore the problem - just like we do for node types. In both cases it's irresponsible to expose a delete functionality at all. Just that.

Having to choose between irresponsibility and insanity, I think I'd prefer irresponsibility, since that doesn't have the consequence of having to maintain and clean up this mess later on.

dries’s picture

With the patch, we enable people to test/innovate in contrib. Maybe it is no longer 'critical' though?

bjaspan’s picture

Replies to #119:

1. The adds back comments that "blame" Field API for being inadequate. I don't think we need one module slamming another module in its comments. There is a very real use case not touching Field API where this issue strikes: A module that is disabled when a text format is deleted. We're all one big happy family of code, right?

2. I simply disagree with sun that "late fixup" is a completely insane approach. Sure, it would be great to make all of our data perfectly consistent all the time. It is not feasible in core at this time, and the disabled-module use case means it may never be. The approach in #118 is an improvement on what we currently offer.

3. #119 adds a test for "NoFieldByName" checking that the format selector is gone. I don't really see why that belongs in this patch/test case, but it is harmless to have it.

moshe weitzman’s picture

Status: Needs review » Reviewed & tested by the community
StatusFileSize
new13.5 KB

Here is #18 re-uploaded. I have reviewed and it is RTBC. I think this merits critical priority still.

bjaspan’s picture

Status: Reviewed & tested by the community » Needs review

If we decide to silently ignore the problem, we should stop letting users select a replacement text format. All text in a deleted format will then render to the empty string. We discussed this option at the D7 code sprint last week and I even advocated for it, but Moshe made the compelling argument against it: It is better to do what the site admin wants, because we can.

David_Rothstein’s picture

I would like to review the latest patch but am confused whether I should review #119 or #122.

Yes, this is definitely still a critical issue. Unless you like Drupal telling you "Content assigned to the deleted text format will be reassigned to the chosen one" but instead displaying all your content as an empty string instead, which is the current behavior of D7 HEAD :)

I am finding myself very sympathetic to @sun's comment in #106:

Another debate concluded that we're optimizing Drupal for 1% large-scale sites here. For 99%, i.e., all other more common sites, execution of various DELETE queries as well as flushing of various caches is no big deal at all. We can serve the needs of most users by implementing the originally suggested field_attach_bulk_update() + making the default implementation in field_sql_storage respond to it.

We need Barry's patch no matter what, but it does seem silly not to actually make some effort to update the data in addition to that, when on 99% of sites this won't really be more complicated than a db_update().

Barry's point, I think, is that that part could be handled in a non-critical followup, which I think may be true but since @sun feels very strongly about it I wonder if we won't just make everyone happy by dealing with it here :) We just need to be careful not to bite off too much. With this new hook, we'd need to name it and document in such a way so that it's crystal clear this is NOT for generic ("mission critical") bulk updates, but only for "housekeeping-related" ones. And that the calling code needs to accept the possibility that some field storage engines won't be able to respond to it.

sun’s picture

re #120:

Contrib can innovate with or without this patch. Though that's kinda unlikely to happen, since not even Drupal core top contributors, ninjas, masters, and gurus were able to resolve this issue.

re #121:

1) I think you're mistaken, the test has been updated/corrected.

2) It's not blaming Field API. It's explaining reality.

3) Sure, you do not have to maintain this code. I'll be blamed for Filter API insanity.

4) #118 still uses a bad variable name and doesn't contain improved phpDoc.

To perhaps clarify #119:

I just meant to mark this issue won't fix and leave HEAD as is. People can delete formats, but like node types, existing data may or may not be updated.

#119 passed in the meantime. But I'd prefer to mark this issue won't fix in the meantime. The added code here is not an improvement, it makes things worse. The help text on the text format delete confirmation form may rather need a warning that not all of the existing content may be updated.

dries’s picture

I don't see how the patch in #119 makes things worse.

Even good code results in WTFs. See http://www.osnews.com/story/19266/WTFs_m ;-)

damien tournoud’s picture

As I said in #72, I would strongly prefer we simply remove the "mass filter update feature" and just make sure that a content in a deleted text format will never be rendered.

This type of mass update requires a lot of thinking (you generally need to implement transformations of a text format into an other with partial rendered - for example from BBCode to Filtered HTML), and I don't believe this is a feature that should be in core in the first place.

bjaspan’s picture

Okay, I think as a group we have slightly reduced the set of things we communally think constitute the minimally viable change here. That's progress! Here are the contenders for "What is the least we can do to make this not critical and not make the situation worse":

1. Nothing.
2. Remove the ability to delete a text format completely.
3. Remove the ability to delete a text format completely if any content uses it.
4. Leave in deleting text formats, but remove the ability to select a replacement format and have the form say "all content in the format you are deleting will disappear."
5. The current patch: Leave in deleting text formats with a replacement map applied during check_markup().

My votes, in order: 5, 2, 3, 4. I think #1 is unacceptable because text in the UI is currently just a lie and is trivial to fix. #4 is my last choice because once the format is deleted with no trace, there is basically no way to ever recover. #3 is problematic because it makes assumptions about fields storage; it is easy for SQL, and easily implementable via our EntityFieldQuery, but could be very inefficient for NoSQL storage. #2 works and is logical, it is just bad SEX (Site Executive eXperience :-)). This is why I ended up coding #5, obviously.

While ranking my votes, I considered how I would vote on the similar question on node type deletion. The difference is that for node type deletion: we do not falsely promise that we do anything at all so at least #1 does not involve a lie and thus "do nothing" is #4, including the "never recover" part; #5 is nonsensical, we can't map one content type to another; and #3 is both easy and feasible because node storage is always SQL. Thus for node type deletion my votes would be: 3, 2, 4.

Note that in both cases I think we can do the real cleanup job we all want to do, in D8, using 'deleted' status columns "everywhere" and using the job queue. I just think it is too late for that for D7 because it is more complicated than people realize.

David_Rothstein’s picture

One more contender, as per my comment above:

6. Same as #5, but additionally

  • have text.module invoke a hook to give the field storage engine a chance to do a "housekeeping" update,
  • implement that hook in the obvious way in field_sql_storage.module with db_update(),
  • document/accept the fact that some storage engines might not be able to implement it.

To me this seems like the most reasonable compromise as it means we are making a "best effort" to actually maintain the integrity of the data within core (and leaving it up to other storage modules to decide whether they are able to do that or not).

#1-#4 don't make any sense to me. Again, we have working code for this replacement feature in Drupal 6 (http://api.drupal.org/api/function/filter_list_format/6) and Barry's patch basically just ports that to Drupal 7. I think it would take a pretty compelling argument to refuse to port that to Drupal 7, and instead rip out or completely change the text format deletion feature at this late date.

bjaspan’s picture

I think it would be a mistake to design core in a way we know is impossible to implement on non-SQL databases efficiently. Yes, today that is only 1% of sites (actually, today it is probably exactly 1 site), but we hope to see many much larger Drupal sites evolve during the lifetime of D7, and have Drupal's non-SQL ecosystem grow with it.

So I see option 6 as at best not required to solve the critical issue and at worst a step in the wrong direction. That said, a rough draft of that approach is in comment #37.

Can we agree to stop talking about option 6 or any other kind of bulk update until we commit/decide on something from option 1 to 5?

bjaspan’s picture

StatusFileSize
new13.69 KB

New patch. This is my patch plus the variable-name fix from sun. The only remaining differences between our patches are:

1. Mine contains this and sun's doesn't:

-function text_filter_format_delete() {
+function text_filter_format_delete($format, $replacement) {
   field_cache_clear();

The patch adds these args to the hook in filter.api.php. I'm sitting next to Angie and she says the hook should list the formal args even if they are not used.

2. Comments.

moshe weitzman’s picture

Status: Needs review » Reviewed & tested by the community

thanks for the summary. as you say, #5 for right now. wait for green before commit.

as for that db_update(), i'm not so sure. one could argue that it is an API violation of the entity to change a format without firing the appropriate hooks (i.e. hook_node_update, …)?

sun’s picture

Priority: Critical » Major
Status: Reviewed & tested by the community » Needs review
+ * In some cases this is not possible to do immediately. Therefore, the Filter
+ * API maintains a history of all deleted formats and their corresponding
+ * replacement formats. When the Filter API is asked to operate on a deleted
+ * format, it instead uses the corresponding replacement format. This feature
+ * of Filter API may be deprecated someday, so modules should update their
+ * data if possible.

To clarify again in case people don't realize it, this bears the consequence that any "upgrade", "migration", or fix for the underlying issue (performing mass-updates), will HAVE TO add an entirely new hook or similarly complex ONE-TIME system to be able to determine, where orphan formats are used. That's a serious afterburner in WTFs/m.

re #130: Why can't your NoSQL driver in contrib not simply hide the text format deletion feature (and similar other mass-updates)?

For Filter API, the most sane move forward would be either (ordered by preference):

1. Nothing. But clarify the help text to explain the "best effort". Due to #358437: Filter system security fixes from SA-2008-073 not applied to Drupal 7.x, all not updated content will be rendered as empty string now.

6. Make at least Field SQL Storage do a "best effort" for 99% of Drupal users. Those other 1% won't delete a format anyway, as they probably know what they're doing. (This only ranks 2nd, because I want to believe bjaspan that it's hard to implement.)

2. Remove the entire text format deletion functionality.

Since 1 + 2 are both valid ways out, nothing is really broken here. Hence, demoting to major. D6 did not have this mass-update nor mapping feature either, D6 is simply horribly broken in what it does with the fallback to the default format.

bjaspan’s picture

Priority: Major » Critical
Status: Needs review » Reviewed & tested by the community

You do not get to unilaterally de-prioritize the issue, taking it off the list everyone is looking at to decide what needs to be worked on.

I think you and I are done now. Dries will have to decide what to do.

sun’s picture

StatusFileSize
new12.75 KB

Let's remind webchick of why it's a bad idea to spell out arguments that your hook implementation does not need at all:

#236657: Swapped arguments in system_clear_cache_submit function
#525570: system_clear_cache_submit signature changed from drupal-6.12 to drupal-6.13 with no obvious reason
#507668: Parameter order for system_clear_cache_submit changed in 6.13

Those needless arguments needlessly broke lots of Drupal sites.

Merged the improved comments from #131 with improved comments from #119. I'm still opposed to this patch, as it's the worst from all possible resolutions.

David_Rothstein’s picture

+ * elements, but input formats are not deleted that often.

Still says "input formats" - change to "text formats".

Otherwise one or the other of the above two patches are probably fine - I don't have time to investigate both :) We can follow up post-commit with pursuing option #6 - that's fine.

*****

I think it would be a mistake to design core in a way we know is impossible to implement on non-SQL databases efficiently. Yes, today that is only 1% of sites (actually, today it is probably exactly 1 site), but we hope to see many much larger Drupal sites evolve during the lifetime of D7, and have Drupal's non-SQL ecosystem grow with it.

So I see option 6 as at best not required to solve the critical issue and at worst a step in the wrong direction.

I don't think it's a step in the wrong direction for Drupal 7 (or at least not more than anything else here): Just like the rest of the patch, it acknowledges the current limitations and works with them. And by invoking a hook, we leave open the possibility that any/all field storage modules might eventually figure out how to implement it effectively.

Frankly, if non-SQL databases can't ever figure this out, I don't think they're going to have much of a future for storing Drupal field data (at least not outside highly-specialized situations).

as for that db_update(), i'm not so sure. one could argue that it is an API violation of the entity to change a format without firing the appropriate hooks (i.e. hook_node_update, …)?

If so, then we violate it a lot already for other entity properties (e.g. http://api.drupal.org/api/function/user_filter_format_delete/7) - no reason to stop now :) But I don't think so. We aren't actually updating the content of the entity itself, just the way in which it gets displayed when it gets rendered; I think there may be a distinction. And @sun had a good point of not wanting to trigger 10,000 email alerts saying that 10,000 nodes were updated, just because we deleted a text format.

bjaspan’s picture

I've asked Dries to hold off making a decision on this. Let's all just cool it for a few days and ponder silently.

sun’s picture

Thanks for that. :)

I would be more comfortable with this patch, if we were not storing this replacement mapping in a system variable. I've put a lot of energy into rewriting Filter module for D7, to modernize it and make it suck less. Which first and foremost means that it no longer pollutes the system variable storage with lots of data that's actually tied to configuration data properly stored elsewhere and only ever needed when a filtered piece of content is created or updated.

Therefore, restarting to pollute the global system variable space, right after the API was finally cleaned up, is what makes things worse. Contrary to that, I'm not strictly against keeping deleted or otherwise invalid records (instead of a replacement mapping in a system variable), as that basically maps to my proposal in #562694: Text formats should throw an error and refuse to process if a plugin goes missing - although that patch goes one step further and not only applies the paradigm to text formats, but also to each filter in a format.

However, the proper and clean way would be to keep deleted records forever, as basically being proposed in the epic #147723: Deletion API for core. If and only if considering any replacement mapping.

When not considering a replacement mapping (as that pretty much kills everything you've ever learned about data integrity and no amount of NoSQL or whatever hyped technology will ever make me feel comfortable with invalidating my data), then just simply doing nothing here, just like we do nothing for node types, will work until we find a working solution to perform mass updates and mass deletions -- which we should still implement as simple database update query for field_sql_storage, to at least cover the 99% use-case of Drupal.

Our future tasks are pretty clear:

- Introduce a global concept of maintenance updates; i.e., without triggering any kind of notifications, or whatever else could happen. (For D7, we could at least start with agreeing on and setting a simple flag like $entity->maintenance_update = TRUE to try to prevent this.)

- Introduce a reliable queue that can be processed through batch UI, cron, or other means.

- Introduce a standard way to perform mass updates and mass deletions.

bjaspan’s picture

I agree with all of your "future tasks" but think they are all for D8 or beyond. I do not think we want to add an API to "cover the 99% use-case of Drupal" (i.e. assume SQL) for very much the same reason you do not want to add Drupal variables-based logic back into the Filter system: it is not the future and we've just paid the price to move away from it.

However, I am moving towards is thinking that the variable-based mapping approach is not the right one.

Since we cannot yet implement "true" text format deletion in a way we like, I suggest we actually don't delete anything at all. Instead, the "delete" option should be replaced with a "hide" option that removes everyone's permission (except uid 1 of course) to view and edit that content in one fell swoop. Admins can then choose between that or just changing the text format into whatever they would have replaced it with instead. Sure that means they might end up with two similar formats, but frankly I don't care about that, and I'm not even sure it would happen anyway.

The "hide" option will just remove privs, so there cannot be an "unhide" option because it won't know what privs to put back (we should not even try to restore the previous set of privs because there is a good chance those won't ever be the right then to unhide to). So when you click "hide", the confirmation dialog should say, "to unhide this, go to the permissions page and decide who to give permission to."

Can we agree on this approach?

Incidentally, I like the same approach for content types. You can delete them only if there is no content using them (because we have a way to determine that). If there is content, you can "hide" it by stripping permissions. But that's a separate issue.

sun’s picture

#562694: Text formats should throw an error and refuse to process if a plugin goes missing is a hard requirement to implement a {filter_format}.status for hiding/disabling text formats instead of deleting them. Although there is a complete and working patch, that issue was recently moved to D8 by reviewers though. Therefore, #562694 would have to be re-targeted against D7 and committed, to continue with the suggested approach here.

webchick’s picture

Status: Reviewed & tested by the community » Needs review

Bumping out of my queue for now. Thanks for taking a calm, collected look at the situation now post-DC. :)

David_Rothstein’s picture

I think I agree with @sun in #138 although not entirely sure which he is suggesting for D7 vs D8 :) In any case, keeping the approach of the current patch but switching from the variables table to storing the deleted/replacement info in the text format record itself seems pretty reasonable. It shouldn't affect performance, since calling filter_format_load() already loads and caches everything for all formats.

It really seems like it would be silly to not do the db_update() in the case of field_sql_storage, no? I don't think we're sacrificing anything for the future with that approach, but rather the opposite. If we've all agreed that we want core to handle such bulk updates generically in D8, then providing a hook in D7 and leaving it up to the storage engines to implement if they want to means that maybe a solution will work itself out in contrib even earlier than D8.

Regarding hiding text formats by taking away their permissions, we already have two UIs for that in core (the permission page, plus the individual format configuration page) so I wouldn't be too excited about adding a third :) Also, this still leaves this format available all over the admin interface. I don't think it's really acceptable to have things in core that people can create but never delete, especially when we had that ability in Drupal 6 and earlier already.

bjaspan’s picture

Assigned: bjaspan » Unassigned

re: #140 and #142: No, no, no, no, no, no, no, no, no.

I've had it. Finish this issue yourselves.

BenK’s picture

As one of three issues left tagged "beta blocker" I thought I'd check in and see where this issue stands. It looks like there has been some very heated discussion. What needs to be done to get this off of the beta blocker list?

--Ben

David_Rothstein’s picture

Anyone know why this issue is marked as a "beta blocker" to begin with?

Note that all the security aspects of this have since moved to #358437: Filter system security fixes from SA-2008-073 not applied to Drupal 7.x. The effect of this bug now is basically just "I deleted a text format and now all my site's content is hidden". A critical bug, but I don't think it should block a D7 beta release.

Also, it seems to me that any API change that comes out of this issue (if at all) would be small, and probably more of an API addition than anything...

webchick’s picture

Issue tags: -beta blocker

Agreed.

StuartJNCC’s picture

This seems one of the more intractable issues still left in the critical queue so, whilst I don't claim to be able to help much in solving it, I thought it might be useful to summarise the very lengthy discussion:

The problem

If you delete a text format in current HEAD, you are offered an option to select a text format that will be used in its place (see screen shot in #65). This implies that existing references to the text format being deleted will be updated to the one chosen, but that isn't actually done.

In Drupal 6 there is a site-wide default text format (plain text) that is used if a text format can't be found. This has a potential problem: say the deleted text format was BBCode. Plain text won't recognise BBCode markup at all and will display it unfiltered. That may display stuff like an email address embedded in the markup that user's don't want revealed.

Drupal 7 added a feature in #635212: Fallback format is not configurable which allows the replacement text format to be user-selected. As noted by David_Rothstein in #65 "Of course, it's still true that if they make a bad choice, they can introduce a security risk." However, the fact that the update to the chosen text format is not actually done, left the situation somewhat worse. It seems that the filter that will actually get applied is the top one in the drop-down list of text formats that will be displayed on the page. For the Admin of a fresh Drupal 7 install that is "Full HTML"! The potential security hole that results has been tackled in #358437: Filter system security fixes from SA-2008-073 not applied to Drupal 7.x.

The options

These are summarised in #128:

1. Do nothing

Not tenable. Effectively D7 lies to the user.

2. Do not allow text formats to be deleted

Dries gives the reasons why this won't do in #108. Basically, if people cannot try out a text format and then delete it, it makes it very hard to test or develop text formats.

3. Do not allow text formats to be deleted if they are in use

This was first suggested by catch in #13 and is raised repeatedly throughout the discussions.

The two counter-arguments are:
1. Its not what users want. This is best articulated in #76 in a quote attributed to Moshe Weitzman:

When a site admin clicks "delete a format", she doesn't want to be told "no, first you have to go find and update all your content manually."

2. It would possibly require a new API by which modules declare which text formats they use and be difficult to implement efficiently on a non-SQL storage engine. This is best expressed by bjaspan in #130

I think it would be a mistake to design core in a way we know is impossible to implement on non-SQL databases efficiently. Yes, today that is only 1% of sites (actually, today it is probably exactly 1 site), but we hope to see many much larger Drupal sites evolve during the lifetime of D7, and have Drupal's non-SQL ecosystem grow with it.

4. Fix it by updating all references to the deleted text format with the chosen replacement

What bjaspan calls "The dumb way" of doing this (#22) is "f_a_query() all the affected objects, then for each one, load it, change it, save it.". The problem with this approach is that it might work for SQL based storage providing the database is not too big, but is difficult to implement for non-SQL based storage and, whatever the storage engine, will likely hit the PHP timeout limit if there are many objects to update.

The alternative is to use some sort of batch processing to perform the update. One problem with this approach is how to implement it. Various possibilities were discussed, including suggestions for a new bulk update API.

Another problem concern what happens if a user edits or adds content based on the to-be-deleted text format during the period before the batch update has finished? It was suggested that, providing the text format is not actually deleted until the end of the update process, this should not actually cause problems. However, there could be race conditions where an add or edit was in progress as the last batch terminated and the text format was actually deleted. This could result in objects being left with the deleted text format. Therefore, option 5 might be needed as well.

5. Just in time approach

Keep a "replacement map" which remembers the text format that has been deleted and the one that was chosen to replace it. Use this to do the update on-the-fly when an object using the deleted text format is accessed. Over a period of use, the system gradually gets updated. This sort of approach was first suggested by yched in #4 and bjaspan produced a patch which was RBTC by Moshe Weitzman in #134.

Sun strongly objects to this patch (#99) and, throughout, sought to generalise the issue to cover the case of any item being deleted when it may be a foreign key referenced elsewhere in the system: for example, deleting a node type or a content type. He also has a basic objection to text being rendered in a format other than the one in which it was written (#33) "This is exactly what must not happen, never, under any circumstances." His immediate objection to the patch: "The chosen way to solve this is a total hack IMO and this will bite us badly." (#81) and that he replacement map is stored in a variable (#86). It is not clear (to me) what sun's preferred solution is.

(Just to put my cards on the table - mine would be 3: don't allow deletion of a text format that is in use.)

chx’s picture

Deleted this comment, will write another.

chx’s picture

Assigned: Unassigned » chx
Status: Needs review » Needs work

Here is the solution. We punt on the bulk update API for D8 and for D7 we extend field storage with an update column operation and add text_filter_format_delete and let it call hook_field_storage_update_column. This is possible in SQL and MongoDB both. We will give an opportunity to modules to veto on deletion, they can do something else as they see fit. I talked to David Strauss and he is in agreement. This is not a perfect solution. This is a feasible solution that gets D7 out the door.

david strauss’s picture

> I talked to David Strauss and he is in agreement. This is not a perfect solution. This is a feasible solution that gets D7 out the door.

Yes.

sun’s picture

Assigned: chx » Unassigned
Status: Needs work » Needs review

Thanks for the issue summary, Stuart!

My preferred solution is to simply update the values properly. For simplicity, via Batch API. Implemented by every module, as usual, similar to the user cancellation process. If required, additionally providing a hook for certain field storage modules to intercept the batch and do whatever they think is right to do. The default implementation, field_sql_storage, can happily work like it always works.

Additionally, we have to prevent unexpected magic triggers, rules, and actions from firing during this update. By introducing a $entity->maintenance_update property.

--

x-post with chx... a simple column update would surely work for me, too. For this issue, that's totally sufficient -- although I hoped that we'd be able to also fix the following issues in one fell swoop with the Batch API approach:

#89181: Use queue API for node and comment, user, node multiple deletes
#355905: node_delete_multiple() has to use batch API
#355926: Add comment_mass_update()
#814990: node_mass_update() can't rely on a browser session
#832572: Add a standard worker queue

David_Rothstein’s picture

We punt on the bulk update API for D8 and for D7 we extend field storage with an update column operation and add text_filter_format_delete and let it call hook_field_storage_update_column. This is possible in SQL and MongoDB both.

Absolutely! But... we discussed this exact idea above. See #124, #129, #142 so on and so forth. In response, people said "no no no no" and "alternative storage engines can't do that". If the actual answer is "they can" or "some of them can", then even better.

And to clarify, the reason why a simple column update is fine is that it's what we do elsewhere. Nowhere else do we try to load the entities and resave them when a text format is deleted. For example:
http://api.drupal.org/api/function/user_filter_format_delete/7
http://api.drupal.org/api/function/block_filter_format_delete/7

(And as @sun pointed out several times above, loading and resaving entities would lead to undesirable side effects anyway.)

What's good enough for those modules is good enough for text.module too. And we have the backup method anyway for situations where the column cannot or does not get updated - we still do need to keep that (i.e. most of Barry's patch or some version thereof). But by invoking a hook, we let the storage engine handle the situation however it wants to, or however it best can. Let's do it :)

chx’s picture

Assigned: Unassigned » chx
Status: Needs review » Needs work

David mentioned that internally non SQL engines might implement this by storing a replacement map and on load check the map, save it back gradually removing the need for a map. I know this was not liked for the filter system -- but David and I agree that this actually is an internal business to the storage engine. Many override points will be necessary -- even the SQL implementation might want an override which does this internal map thing to avoid lock contention. We might want to help storage engines in Drupal 8 with mapping but for Drupal 7 we just tell them to update the column, close our eyes and pray.

damien tournoud’s picture

Ok for this approach. As #153 mentions, it's way better to push the complexity down the stack to the storage engine then to try to solve the scalability issues in the application layer.

David_Rothstein’s picture

Please see lots - and lots and LOTS :) - of comments above, which explain why (even assuming a perfect world where field storage engines handle things perfectly) the filter module still needs to maintain a mapping of deleted text formats: to handle the case of disabled modules.

I don't see any reason to push off to Drupal 8 something that already existed in Drupal 6. We need both approaches to solve this issue.

chx’s picture

David, handling text formats in disabled module is another, not critical issue. It existed since eternal and whether we fix it in D7 or not is a whole another pile of hurt. THIS issue is about fixing field storage and nothing else.

sun’s picture

@David: I agree with chx. That has always been the case, and if there is an issue to discuss and solve it, then that's #562694: Text formats should throw an error and refuse to process if a plugin goes missing

chx’s picture

And "we have the backup method anyway for situations where the column cannot or does not get updated " -- no we don't, as said above, a module can veto can the deletion and then it's up to the vetoing module to do something good. This whole issue have blown up about doing what Barry's patch did so that's something we definitely not do.

David_Rothstein’s picture

@chx, @sun: No, it works absolutely fine in Drupal 6. Try it! (For example, try deleting a text format while CCK is disabled, then reenable CCK.)

Not fixing this would be a regression. And given our new method of handling content with an unrecognized text format in Drupal 7 - in particular, we don't display the content at all - it would IMHO be a critical regression.

And according to StuartJNCC's excellent summary, it has been part of this issue since somewhere around comment #4 :) This issue is about exactly what the title says it's about. We claim in the UI that when a text format is deleted, all modules convert their data (or at least have their data "converted" for them). We need to make that statement true. Or I suppose we could explicitly warn the user about cases where it does not, but that seems silly given that it would be confusing and a regression, and we already have Barry's working code to solve it.

sun’s picture

@David: It only works, because D6 is broken. I already stated in http://drupal.org/node/562694 and elsewhere that D6 has this invalid and utterly broken fallback to the default format, but only just because the Filter API in D6 is not able at all to properly account for it. That is primarily because modules don't even have a chance to update their values, since there is no API hook invoked in the first place. Therefore, you cannot compare D6 and D7. Not at all. And, please, this issue is only about Text module (resp. Field [storage] API) not being able to perform an action that would otherwise be a simple 5 minute fix: db_update('foo')->fields(array('format' => 3))->condition('format', 2)->execute(). Everything else we discuss in separate issues.

chx’s picture

Status: Needs work » Closed (won't fix)

Enough!

chx’s picture

Assigned: chx » Unassigned

Find another fool who will find a solution , rally people behind the solution just to get someone drag up a totally new, hairy edge case blocking this issue.

chx’s picture

Just so people understand, because it's impossible to understand from what David said so far, his new issue is this:

  1. Disable a module.
  2. Delete a format that module partcipated in.
  3. Reenable that module.
  4. Things now might be broken.

Whatever. We do not want to release Drupal 7, I can see.

sun’s picture

Title: When a text format is deleted, some modules (like text.module) don't update their data, even though we say they do » Missing EntityFieldUpdateQuery
Component: filter.module » field system
Status: Closed (won't fix) » Needs review
StatusFileSize
new10.8 KB

Spent the past hours with this patch. I don't understand at all what's deemed to be "impossible" here.

Someone more knowledgeable of OOP might want to finish this patch.

catch’s picture

David's case (which as I understand it is exactly what chx outlines about in #163) needs its own issue, it's ridiculous to make that a prerequisite to allowing text module to do the bulk update.

On adding the hook and letting field storage engines do as much on their own as possible (including blocking delete), that's fine with me. In #7 I outlined the main problem with this approach in regards to mongodb - mongodb_field_storage stores everything in one collection, there's a limit to the number of indexes, no site is going to have an index on the format key for the body field, for example, since you never, ever need to query that except for this particular situation. However unindexed queries are no longer blocking as of mongodb 1.6, and if the hooks are flexible enough then it could be implemented in a way which doesn't require an unindexed query anyway, or just punt on it at this point.

I double checked text_field_schema() and that already has an index on format in SQL, so no need to add one.

I think the field update patch should move to a new issue which is less cursed than this one, and the text_field_format_delete() could be added here once that's in.

David_Rothstein’s picture

Wow. We seriously all need to chill out here. (Again. There is something about this issue...)

I apologize if some of my recent replies came off as curt - I was writing them fast in the middle of doing other things. I also need to remember that not everyone followed all the complicated twists this issue took (and that some of the discussion may have taken place in other issues that were at one point intertwined with this one; the example I brought up is definitely not new, though). In any case, I thought we had a productive conversation earlier on IRC. @chx, I'm sorry if you felt differently.

However, please do not accuse me or anyone of "blocking this issue". It is hardly blocking an issue to point out that the existing (already written) patch in the issue solves certain problems that the new (proposed but not yet written) patch would not be able to solve.

And if we really wanted to get Drupal 7 released absolutely as fast as possible, we would simply commit Barry's patch as is, because it works and fixes the critical bug. Period :) It's fine to have other concerns beyond that, but acknowledge them for what they are.

There is widespread agreement that we want to introduce something like what @sun just posted. If anyone feels strongly that this issue should be split up somehow (rather than simply combining that patch with Barry's), feel free to move Barry's patch to another issue or vice versa. Whatever makes you happy.

Status: Needs review » Needs work

The last submitted patch, drupal.entityfieldupdatequery.164.patch, failed testing.

catch’s picture

Status: Needs work » Closed (duplicate)

@David, yes it's been discussed in depth as part of this issue, but there are two distinct problems to solve, this what I wanted to make clear.

I think we should kill this issue dead and split everything to new ones - one for the bulk update, one for the just-in-time replacement for re-enabled modules. So I'm going to do that right now.

1. EntityFieldUpdateQuery - I think everyone wants this, even if there are concerns about how it'll pan out, but not having it is worse. #914306: Create a batched framework for entity updates

2. Handling cases where modules are disabled, then enabled, and information they depended upon has disappeared.
#914316: Disabled modules don't get text format replacement run on their data

@sun, while a straight update doesn't solve the bulk delete/bulk unpublish issues you mention, it would eventually allow for things like merging two taxonomy terms together, removing references to deleted terms etc.. The maintenance update (we did something with a referenced object, so need to go and clear out everything) is different from an actual content or status change even if the problems are similar.

PS. If anyone thinks the two issues I created don't cover all the problems discussed here, please 1. open a new issue 2. link to it for this one 3. don't get too pissed off with me and re-open this one again, I really think we need to break this up into as small pieces as possible and not kill each other over this.

chx’s picture

Title: Missing EntityFieldUpdateQuery » Missing hook_field_storage_update_integer_column
Status: Closed (duplicate) » Needs work

Although I am out of the issue, I need to shut this down before it gets out of hand. It seems that noone read what I have outlined above. We are not trying to come up with generic solutions this late. We are coming up with one thing and that's updating integer columns. We can debate whether we want to allow updating string columns as well. I would rather not. It's too late, it's an ugly stopgap towards a bulk update API which is a lot of work. Getting this right is also a lot of work. Any assumption taken for the update capabilities of field storage engines is false.

However, this issue is a soul killer and that's why I won't fixed it. As I said yesterday, find another fool who writes this, because I had enough. See, after I came up with a compromise and got everyone behind it it was happily and totally ignored and something else was done.

catch’s picture

Status: Needs work » Closed (duplicate)

I'm moving this back to duplicate on the assumption chx cross-posted. Agreed that int updates should be restricted (and this covers all 'reference' fields we currently have anyway).

chx’s picture

Status: Closed (duplicate) » Needs work

I did not cross post. There is no other issue for hook_field_storage_update_integer_column. I have moved EntityFieldUpdateQuery to Drupal 8 where it belongs.

chx’s picture

Title: Missing hook_field_storage_update_integer_column » When a text format is deleted, some modules (like text.module) don't update their data, even though we say they do

Resetting title. Also note that while I did write field_sql_storage_update_integer_column the next step is a serious challenge, how do you expose this to the world? field attach API deals with entities. So I am in support of #914458: Remove the format delete reassignment "feature"

sun’s picture

Serious waste of time, I see. Can we please stop to post arbitrary statements like

We are coming up with one thing and that's updating integer columns.
Agreed that int updates should be restricted

without any foundation? The latest patch doesn't need such a restriction.

chx’s picture

Status: Needs work » Closed (duplicate)
gisle’s picture

Issue summary: View changes
Issue tags: -D7 API clean-up +API clean-up

Removing non-canonical duplicate tag. See #2426171: Multiple tags similar to 'API clean-up' for background.