Closed (duplicate)
Project:
Drupal core
Version:
7.x-dev
Component:
field system
Priority:
Critical
Category:
Bug report
Assigned:
Unassigned
Reporter:
Created:
21 Aug 2009 at 19:05 UTC
Updated:
27 Jun 2020 at 14:17 UTC
Jump to comment: Most recent, Most recent file
Comments
Comment #1
dropcube commentedComment #2
dropcube commentedI am working on this.
Comment #3
bjaspan commentedInteresting. I haven't looked at the partial patch from the previous issue, but I can see that once again abstract field storage makes this a bit tricky. The "obvious" approach is for text.module to implement the hook and db_update() all the field tables to change the format column. But text.module can't, because it does not know where the field data is stored. Options:
1. Call field_attach_query() on the field, specifying the format column as a query parameter, then field_attach_load() and field_attach_update() for each object id. Inefficient, obviously.
2. Implement a new function, something like field_attach_bulk_update($field, array('format' => new_value)), which invokes hook_field_storage_bulk_update(), which in field_sql_storage.module's case would just do the db_update() we wanted to start with, but for other storage engines could do whatever is appropriate.
Comment #4
yched commentedThis is a pattern of "how does a reference field react to the change or deletion of a referenced 'thing' ?".
I'm not sure there's a fast enough way to actually update all the field data.
Note that check_markup() runs in text_field_load() and is saved in the field cache.
The idea so far was rather:
3. when a filter is deleted (or updated), use field_attach_query() to find the objects with a text field using this format and clean the corresponding field cache entries.
In text_field_load(), check whether the loaded format exists before running check_markup(), and if not, use the default format instead. The stored data keep the non-existing format, and replacement is done just-in-time.
Option 3 costs 1 indexed SELECT query per formatted text field + 1 indexed DELETE query on field cache
Option 2 costs 1 indexed UPDATE query per formatted text field + I'm not sure how we retrieve the cache_ids to clean : either same as option 3, or clean all field cache entries.
Input formats being changed or deleted happens rarely enough that we could clean the whole field cache anyway. Then option 3 just costs one indexed DELETE query on field cache, and we're sure it can scale whatever the number of text fields. Might not be true for all 'reference' fields, though.
Comment #5
catchJust found this from #597236: Add entity caching to core. I also think we can clear the whole field cache - it's rare, and it's also unusual to have more than 4-5 input formats on most sites - which is likely to mean that we're clearing a large number of entries anyway most of the time - the exception would be deleting or modifying an unused format, but that's not enough of a use case to complicate things.
#3 sounds like the best of the three options.
Comment #6
sunTagging.
Note that #353458: hook_file_references() was not designed for a highly flexible field storage basically needs to do the same. Instead of a text format id, a file id may need to be updated.
Comment #7
catchDuplicate of #562694: Text formats should throw an error and refuse to process if a plugin goes missing.
Comment #8
sunHm, no, these are not duplicates. One of the options in #2 mentioned by bjaspan is still required.
The 3rd option in #3 mentioned by yched doesn't fly. Filtered text can only be filtered, processed, and displayed in the assigned text format. Automatically falling back to another text format must not happen.
Comment #9
yched commentedWell, the options in #2 also planned replacing the deleted text format with the default format. The difference was rather when the replacement is done.
What are we supposed to do with existing data when an input format is dropped ?
Comment #10
sunhttp://api.drupal.org/api/function/hook_filter_format_delete/7 gets a "new format" as second argument, which needs to be used for any replacement. Currently, that's always the fallback format in HEAD, but #635212: Fallback format is not configurable is about to change this.
Either way, the hook gets a text format to use as replacement value, so the essentials of this issue don't change. :)
Somewhere else I already figured:
Comment #11
catchWhy not prevent input formats being deleted which have content assigned to them? Can provide a hook for pluggable field storage to respond to.
Comment #12
David_Rothstein commentedSubscribing.
Comment #13
catchHere's a very rough draft for #11, it's unlikely I'll have time to take it any further.
Adds a new hook "filter_format_usage" - modules can return TRUE or FALSE if they're using the format anywhere.
On delete, we prevent deleting a format if any module returns that it's using it and set a message.
On update, we just clear the entire field cache since it's a sufficiently rare operation and trying to do it cleverly would be too much work.
To get this up and running, block, user (signatures) and text module, at least, would need to implement the new hook.
Comment #14
sunSorry for not following up earlier. Tinkered quite some time about this issue recently. So here we go:
WTF? ;) We introduced a pluggable field storage, which does not allow to mass-update values? How are modules supposed to maintain their fields and values then?
This issue smells much larger to me then. I mean, this is just one of a gazillion of situations, in which modules need to be able to properly update stored data, be it for module update reasons or for administratively initiated changes. Preventing usage of fields containing certain data or preventing actions on them until all entities containing those fields are manually updated certainly won't work out ;)
Hence, the only valid solution is bjaspan's 2)
Comment #15
catchIf you want to bulk update entities, you should use $entity_save(), that's the only valid way to invoke the various _update(), pre_save() and other hooks. Doing direct updates doesn't cut it any more IMO.
Comment #16
yched commentedSadly, I think sun is right...
Yet
- Coming up with a non-SQL-non-PDO-non-DBTNG meta syntax for this sounds like party.
- I'm not sure how to ensure atomic, non-timeoutable mass updates, or deal with the fact that maybe we can't.
Comment #17
catchI don't believe there's a valid use case for deleting a text format which has content assigned to it outside of development/site building. Certainly there's not enough of a use case that core needs to support that. Once again, the only valid APi for updating entities in a storage agnostic way is $entity_save(). I don't think we want to support querying every possible entity which has a text format then bulk updating that in core, whether it's trying to do a mass db_update() via hook_schema() hacks or a potentially endless batch operation loading and saving individual nodes.
fwiw it'd be quite easy to do a bulk update in mongodb, but just triggering that when an admin deletes a text format is completely out of the question and would bring your site down. For a start mongo-based sites are not going to be creating indexes on every single text field format, just in case the format gets deleted one day, because the way entities / fields are stored in mongo (all in one collection, not as separate 'tables') means you need to be very sparing with the number of indexes. So you'd have a massive unindexed update operation happening as a result of a form submission, not good.
Also if core prevents this from happening via the UI, there's very little stopping a contrib "Text format replace" module from being written which could try to implement one of these options if there's really a pressing need for deleting in-use text formats. We don't do magical conversion of nodes if node types are deleted, or changing node types on the fly, nor do we allow moving taxonomy terms between vocabularies in core. Deleting text formats strikes me as a somewhat similar situation except with even more unpredictable implications since they have only a very loose relationship to the actual data they're associated with. So since it's a somewhat rare operation, which has security implications, I'd rather core was unhelpful here rather than too helpful.
Comment #18
sun(?) Sure. Drupal's configuration is set in stone always and forever and ever, after initial setup. Care to remove the edit links then, please?
Comment #19
catchNo, because editing text formats works quite fine without bringing my sites to their knees or requiring potentially millions of values to be updated after a form submission via incredibly complex and hard to maintain code which doesn't exist yet.
Comment #20
David_Rothstein commentedI think people updating sites from D6 to D7 might wind up with a valid reason to delete some text formats with data in them, since the upgrade path currently messes around with their text format list a bit. They might have a good reason to delete one and convert its data to another.
I think we ought to really consider @yched's idea in #4 of "just in time" replacement, or something like it. Over at #562694: Text formats should throw an error and refuse to process if a plugin goes missing - and for an entirely different reason - I suggested a while ago that perhaps every time a text format was deleted, filter module should keep a record of what it was replaced with. This would just be a simple associative array in the variables table: e.g.,
array(1 => 2, 5 => 7)means that text format 1 was deleted and all content was (supposed to have been) converted to text format 2, text format 5 was deleted and all content was (supposed to have been) converted to text format 7, etc. Then if check_markup() ever encounters something with text format 1 or 5, it knows what to filter it with instead.If we do this then it seems that would completely remove the need to update all field data the instant the form is submitted, right? We want the stored data to be correct of course - so ideally it should still happen - but it could then happen gradually, similar to how the Field API already handles bulk deletion.
Comment #21
yched commentedI was thinking of something along the lines of David's suggestion. My initial idea was doing the 'just in time' replacement in text.module's hook_field_load(), but doing this in check_markup() has the advantage of catching direct db access as well (Views...).
Actually, I think both places would be needed, because doing it in hook_field_load() ensures that the 'new' format gets preselected and saved next time the entity is edited.
Comment #22
bjaspan commentedOkay, this is the part where I pretend I still remember something about Field API and Drupal in general. But it's DCSF code sprint, so I'll give it a shot. :-)
Someone is trying to delete an input format. To some extent this makes the argument that "there is no use case for that" moot. Or rather, if there is no use case, or if it is unimplementable, we should remove the delete feature from the format system, instead of just rejecting it in Field API (having a content-related feature that works "everywhere" except in the major subsystem we use for all content seems non-sensical to me). I'll assume that isn't going to happen.
To actually implement it, we have three proposals:
1. The dumb way: f_a_query() all the affected objects, then for each one, load it, change it, save it. This will work fine until we have many objects ("many" here might be as few as 100s or 1000s) at which point it will take forever, hit the PHP timeout, etc. We could do a batch API thing, but that will just solve "PHP timeout," not "take forever."
2. The bulk way: Add a new f_a_bulk_update() API. This seems "more scalable" because it is a single SQL query for that storage engine, but as catch points out in #17, that is an illusion. Ultimately we can't update an arbitrarily large amount of data in a single time-limited operation regardless of how it is stored.
2.5. No one has mentioned it yet, but we could do a thing like with field deletion... except I don't really think that makes sense in this case. Or rather, to the extent that it does, it is...
3. The "just in time" approach in which we wipe the field cache and update each object as it eventually loaded, keeping a lookaside table of the deleted format and the format to replace it with.
My take is this: #1 is easy but will fail quickly enough that probably isn't worth doing, but OTOH it would at least be something, provided we did it in a way that contrib could override. #2 is fundamentally no better than #1, and involves creating a new Field API function which we are really too late for. #3 actually seems like it can work, and scale. It will also involve a new Format API function to centralize the "replace deleted input formats with these new ones" table, but that will be much simplier than #2.
So, I'll get to work on #3.
Comment #23
sunI still don't get it. True, I'm purposively broadening the scope, but in the end, it's still the same question, just asked generically:
We have no field API functionality to mass-update fields in entities? (?)
In the end, if we go with 3), then this means that anything that wants to update any value in any field will need to implement a custom field value storage mapping table that dynamically updates an entity with the supposed to be new values for a field on loading? And implement all of that through custom code?
How does that ensure data integrity?
Comment #24
bjaspan commentedIt does suck, doesn't it?
I agree that it is a problem. However, it is not a new problem and has nothing to do with Field API's storage model. The bulk-update approach which for SQL reduces to "UPDATE field_data_body SET format=1 WHERE..." looks like "the right solution" but just try it with 50M nodes. So, to answer your question: No, we have no way to mass-update fields in entities, **and we never did**.
I think your larger point is legitimate: If the right solution to this issue is a lookaside table for handling bulk updates "on the fly," then probably we should have a more general system for it. Just fixing it here is a one-off hack. Getting this right will be part of making Drupal scale for Big Data and I suspect will not happen for D7. Unless you have a suggestion. :-)
Comment #25
bjaspan commentedHere is my first patch, which seems to work. Marking CNR for testing. I will now go write new tests for it.
Comment #26
sunSeriously, I can't stress enough that this contradicts Drupal's principle of handling and maintaining user data.
What are field modules in contrib supposed to do to maintain their field data? Add infinitely increasing system variables to store replacement values? And just hope that Views and other code will always load an entire entity to display a field value?
From all known options, a field attach batch update is the only one that sounds remotely acceptable to me. If we need to load entire entities and execute a batch, then be it.
Comment #27
bjaspan commentedFor better or worse, here is a new patch with passing tests and a gratuitous typo fix.
I think you have a legitimate concern. I'll walk next door and discuss this with Dries and Angie (wow it's nice to be able to say that :-)).
Comment #28
bjaspan commentedJust spoke at length with Mike Booth, drothstein, ksenzee, and Dries. We have a new plan which I am now attempting.
Comment #30
bjaspan commentedHere's the current scoop:
sun is right that we need an API for bulk-updating Field API data. However, doing it right will be complicated, and it is too late for D7 core. Just take a look at the docs on field_attach_query() to get an idea of the issues to be addressed. A correct field_attach_bulk_update() needs to be able to store a translation table for the old and new values for the columns being updated which is used when the data in the db is not updated yet, and then it needs a cron hook (like field_attach_delete()) to update field data in batches. A semi-correct solution would be to perform the bulk update in a single operation all within in the current page request. In principle this seems simple, but it gets a bit annoying. We need to bulk update both the current table and revisions table. We need to re-factor field_attach_query() to re-use its query-builder for field_attach_update(). We need to add a new Field Storage API hook to implement bulk update. None of this semi-correct solution is impossible and maybe we can achieve it for D7, but I certainly can't do it today.
Also, we realized this is not exactly a Field API issue. The format system needs a way to deal with invalid/deleted input formats. It mostly already has one, in which text in an invalid input format is rendered either into the default or fallback input format; drothstein says it is possible/likely that this fix has not been forward-ported from the D6 security issue, but that is a format system issue, not a Field API issue. And of course handling deleted input formats correctly is what #562694: Text formats should throw an error and refuse to process if a plugin goes missing is all about.
The simplest possible fix here is just to make sure that text.module clears the Field API cache when an input format changes or gets deleted. The data in the format column will remain wrong, and the format system will handle it. I've attached the trivial patch for this. Unfortunately, even the simplest possible fix is not simple, because in fact the format system will NOT handle it. The format system logs undefined index errors when handled an invalid format id.
Here is my current understanding of the path forward:
* This patch to text.module is an improvement on the current situation because it fixes the bug that the Field API cache is not cleared when input formats change.
* This patch reveals a bug in filter.module that it doesn't handle invalid format ids. That bug needs to be fixed.
* We can have a non-critical D7 issue to implement the semi-correct f_a_bulk_update() function, which may or may not happen.
* We should have a D8 issue for the correct batched version.
All that being said, my *next* post on this issue will be a text_filter_format_delete() operation that just updates the damn data the dumb, non-scalable, but easily implemented way. Unless I discover it too is a maze of twisty little passages.
Maybe I stopped paying attention to core for six months for a good reason. :-)
Comment #31
bjaspan commentedNo, forget it. The dumb way is so ridiculously inefficient that I can't bear to write it.
Comment #32
bjaspan commentedComment #33
sunThis is exactly what must not happen, never, under any circumstances.
.
Question: Why do I store data in Drupal at all? If it isn't able to properly update and maintain it?
The more we are discussing the high-level question at all, the more I'm seriously concerned about Drupal's future. This limitation is more than a release blocker.
96 critical left. Go review some!
Comment #34
catchI still firmly believe we should prevent the deletion of the text format if there is any content assigned to it - so require a pattern of update content first, then delete the format. That may not be possible to do in core either, but it won't result in orphan text fields.
Comment #35
bjaspan commented@sun, I don't get it. You are complaining that Field API has no way to bulk update data, but if it does (and actually I'm writing it right now), text.module will use it to update the format column to change a deleted format id to... what? Not the fallback, because that will result in check_plain(), which you say must never happen.
@catch, I'm beginning to agree.
Comment #36
sunIt's not check_plain()'ed. The $fallback format that is passed to hook_filter_format_delete() needs to be assigned. The filters of the fallback format are configurable. In addition, #635212: Fallback format is not configurable allows the site administrator to choose the text format to pass as fallback.
Comment #37
bjaspan commentedOkay, here is a first pass at field_attach_bulk_update(). There are some TODOs and no tests, but the one test for text.module which uses it on format delete works. I am far from convinced this function is a good idea. At this point I think there are serious problems or objections to all possible solutions to this problem, including not solving it.
chx says he and David Strauss think field_attach_query() should actually be removed from core, which of course would this function impossible, which is fine with chx as he says he can't implement it for mongodb anyway. His suggested solution is that the text.module, on load, checks if the format is valid, and if not updates the stored value right then; thus the db slowly works it way into a correct state over time, without a separate batch process. Not a bad idea, but we'll have to see how it works out.
Comment #38
chx commentedSure I can implement it! It's even implemented (just broken right now 'cos the latest structures are not implemented) I said the function is useless because it's a field level and not an entity level query. How will you find the latest ten published nodes with the 'flag' field being 1 if you can't query node.status with field_attach_query?
Comment #39
bjaspan commented@chx: You are asking, "How can I search node documents and field data without a join?" Well, there are two answers:
1. You do what I'm assuming you want, which is store the field data in the node document and query node.status and node.field_flag in one operation. That works great for MongoDB I assume. You could also query node.status, node.field_flag, and node.field_some_multi_value_field all in one query because those are all in the node documents, right? Of course you know that you can't do that with SQL without a join or without completely de-normalizing by storing all the node properties in every field table (which still won't work for multi-value fields), which you know isn't going to happen.
2. You do the join in the application. Query the latest N published nodes. Then query all of those ids that have the flag field set. When you only get 3 matches, get the next N published nodes. Iterate.
Obviously #2 sucks for you. But your proposed alternative, removing field_attach_query() from Field API, makes every site using SQL stop working in important ways, and obviously we're not going to break every Drupal site except for Examiner.com. Perhaps we've just discovered that the Field API's goal of using a pluggable field storage engine so that we can work with SQL and document databases is unachievable in a single consistent and efficient system.
I'm not at all familiar with document databases, so it is entirely possibly I'm missing something.
This discussion is only relevant for this issue to the extent it bears on whether we should solve the text format problem with a field_attach_bulk_update().
Comment #40
yesct commentedhuh? what needs to be done here? Certainly a re-roll...
Comment #41
damien tournoud commentedI opened #832572: Add a standard worker queue as a push forward to provide a way for modules to easily register long-running bulk operations.
Comment #42
catchIf we go with Barry's approach, adding API change tag.
Comment #43
bjaspan commentedI just re-read this issue yesterday and honestly I have no idea what the right approach is. I am leaning towards catch's idea in #34: "I still firmly believe we should prevent the deletion of the text format if there is any content assigned to it - so require a pattern of update content first, then delete the format. That may not be possible to do in core either, but it won't result in orphan text fields." Among other benefits, it is actually achievable in a small amount of time using existing APIs.
Comment #44
David_Rothstein commentedI'm really not a big fan of crippling the delete format functionality, especially when it will likely be needed in the D6->D7 update. Plus, I don't see how it can be done with existing APIs... doesn't it need the new API to query all modules and ask them if a particular text format is in use or not?
I wonder if we are over-thinking this. Why can't we just use the batch API here? Seems like that was exactly what it was designed for. Earlier @bjaspan said this:
But so what if there are some (probably rare) cases where it takes forever and the person has to stare at the progress bar for hours on end? If that's how long it takes, that's how long it takes. If they get bored and click away to interrupt the batch process then we just have to make sure it fails gracefully, which should be easy: Just don't delete the text format itself until the end of the batch process rather than the beginning. That way if they interrupt things, their site is still in a consistent state, and they can go back and "re-delete" the text format again later, and pick up right where they left off.
Comment #45
David_Rothstein commentedBy the way, I assume this could be implemented by adding an optional $sandbox parameter to hook_filter_format_delete(), just like update functions have. So it wouldn't even be an API change, just a minor API addition. Modules that don't care about it could just ignore it.
Comment #46
yched commentedre #44: If there's an API function to delete a text format, then processing the delete cannot involve Batch API. Batch API assumes a UI, and thus cannot be used in a generic API func.
Comment #47
David_Rothstein commentedBatch API can be used without a UI if $batch['progressive'] = FALSE, so I was thinking maybe there'd be an optional parameter to filter_format_delete() that controls whether to do that or not?
(Although if anyone actually used it as an API that would then theoretically run the risk of PHP timeouts again. However, I think this is an edge case, for a number of reasons; we could just document it or something.)
Comment #48
Anonymous (not verified) commentedputting aside the specifics of text.module and filters, i think the most important thing we need to build in to handle "update a bazillion fields/entities" is something like #41, because it allows a clean way for big sites to throw hardware at the issue in whatever way suits their setup best.
for 99.9% of sites, the default queue handling Damien suggests will do. for the very much smaller set of sites where this matters, a queue will make it easy to process the updates with something like Gearman. now, something that scales horizontally as easily as Gearman will probably just expose the data store as the limiting factor, but bigger sites will be in control of where to make the trade-offs.
Comment #49
philbar commentedtag
Comment #50
yched commentedAlso, it seems that any solution building on a 'long-running process' needs to account for site hits that happen meanwhile (i.e a node with a text value using a deprecated input format, being edited or viewed before the background process has handled it).
So whatever way we go (or not) for the mass-update part, it seems the solution always at least includes a 'just-in-time replacement' part as safeguard (approach explained in #4 and #20, implemented in #25, resurfaced in #37 second paragraph).
Comment #51
David_Rothstein commentedI think the just-in-time part may wind up being necessary for some edge cases of #358437: Filter system security fixes from SA-2008-073 not applied to Drupal 7.x - but I had forgotten there was already some code for it here :)
But for the narrow purposes of this issue, as long as we wait until after the conversion process is complete to delete the actual text format from the database (rather than the other way around like we do now) I don't think we actually have to worry about site hits in the meantime, do we? If we do it that way, then from their point of view, it's just like the deletion didn't happen yet; the node should still be perfectly viewable in the old text format.
Comment #52
yched commentedre #51, delayed deletion of the input format - As we found out when dealing with field deletion, delayed deletion has some non trivial consequences, since you need to deal with the 'in-between' time frame as well. You find yourself with half-citizens (marked deleted) that you need to keep around and invent their behavior completely:
make sure they don't show up in admin screens; ideally you don't want them used on new content; handle dependencies on filters...
Off-hand, I'd say 'just in time' replacement looks simpler :-)
Comment #53
David_Rothstein commentedMost of that complexity is not an issue if the batch API is used, since the user doesn't expect it to "actually" be deleted until they see the progress bar complete.
I guess it is true that there can be a race condition if content is created with a text format that is in the process of being deleted - and in that case, that content could get stuck with that format. This seems like another argument for having the 'just-in-time' behavior around as a backup (similar to the other edge cases in #358437: Filter system security fixes from SA-2008-073 not applied to Drupal 7.x). But I think we ideally need both: Make a good-faith effort to completely delete the text format (which should work fine probably 99% of the time anyway) and then have the just-in-time replacement as a backup to acknowledge that we can never guarantee we removed everything.
This is consistent with the way it was done in D6, actually (although that was cruder than this would be). The text formats were deleted by filter.module, and then with the security fix that went in, there was a backup behavior for things that did not get updated during the deletion process.
Comment #54
mikey_p commentedHow on earth will this actually work? Do we need to have a hook where each module will return a list of {tablename}.column where they store format ID? Do we have any idea how we plan to batch this?
Alternatively in #358437: Filter system security fixes from SA-2008-073 not applied to Drupal 7.x David mentioned simply storing a mapping to be used by check_markup for displaying content using a deleted format. If we did that we wouldn't need to worry about race conditions, batch operations, or finding all the correct DB fields, regardless of storage engine, or worrying about text formats stored in disabled modules. Overall that seems like a huge win, when we would need to have a failsafe in check_markup() anyway. Thoughts?
Also at this point, this seems to be the same issue as #358437: Filter system security fixes from SA-2008-073 not applied to Drupal 7.x. Is there any reason that one of these couldn't be close in favor of the other?
Comment #55
David_Rothstein commentedIt would definitely be simplest just to have the failsafe in check_markup() and stop there, but above there were some complaints (I think justified) that it's really not a good idea to purposefully leave incorrect data in the database and never make any attempt to update it to the new, actual value. The idea of doing both is that it's a realistic compromise for D7 - we make a best effort to update the data using existing D7 methods (the batch API), but knowing there are some edge cases where omissions will slip through, we have the other method as a backup.
As for how it would work, it would be just like http://api.drupal.org/api/function/hook_update_N/7, I think. The filter module would take care of preparing the batch and registering all the operations for it based on modules that implement hook_filter_format_delete(). The individual modules would not see any API change at all here really, just a new, optional $sandbox parameter passed along with the hook, which they could ignore if they want to (just like most update functions don't need to bother with $sandbox). But modules which have a lot of data and need to do a multi-pass update could use $sandbox to control that. The individual modules would still be responsible for updating their own database tables and everything, just like now.
If we implement both solutions at the same time, then it might be possible to close #358437: Filter system security fixes from SA-2008-073 not applied to Drupal 7.x in favor of this one, but we probably shouldn't do that just yet...
Comment #56
chx commentedWe had our party from #16 with EntityFieldQuery so a contrib can be written that moves stuff from one format to the other. mikey_p is fixing check_markup to spit out n/a in case of need at #358437: Filter system security fixes from SA-2008-073 not applied to Drupal 7.x
Comment #57
David_Rothstein commentedHuh? How is this not a critical D7 bug?
Comment #58
mikey_p commentedAfter talking this over I agree, the only security issue is that from #358437: Filter system security fixes from SA-2008-073 not applied to Drupal 7.x and anything else is bordering on babysitting. The best approach here for the future would be to simple add a hook for each module to respond when validating the deletion of a format, and if any module responds, then block the deletion of the format. But I don't think it's critical, since there is no data loss.
Comment #59
David_Rothstein commentedTo clarify the bug here, the steps to reproduce are:
1. Create a node with a "Filtered HTML" body.
2. Use the filter module UI to delete the "Filtered HTML" format. It tells you it's going to reassign all content that previously had that format to plain text.
3. Go back and look at your node. It's not in plain text.
I think we need to leave this issue open until that problem is fixed - even if it winds up getting fixed somewhere else :) The bug here is (relatively) specific to text.module.
Comment #60
chx commentedHow on earth is this specific to text module? If I use that format in a custom block which then gets cached then ...? This is NOT specific to text module.
Comment #61
chx commentedIs this what we want, then?
Comment #62
chx commentedComment #63
damien tournoud commentedOk, so I tend to agree there: it doesn't make a lot of sense to reassign existing content to the fallback format. We should leave the format as it is, and make sure that check_markup() doesn't output anything when the format doesn't exist.
So, in addition to #61, we need to remove the existing implementations of hook_filter_format_delete() in the block module and the user module.
Comment #64
chx commentedReally #358437: Filter system security fixes from SA-2008-073 not applied to Drupal 7.x needs to be merged in. But I agree. a) the filter delete operation is rare enough that the code weight to move nicely from one filter format to another screams for contrib and not core b) such a move making sense is not common enough (even out of those rare cases). It's much more likely that we will display invalid garbage (to the end user at least. Can they read HTML? Markdown? TeX? Deal with lengthy inline commands?) which might reveal data we don't want (that's the sechole).
Edit: however it is now possible to make such a move thanks to EntityFieldQuery at least for fields. We can at least hope that modules add foreign keys to their schemas pointing to filter format fid. If text module does not do that which I suspect it does then we probably should figure it out.
Comment #65
David_Rothstein commentedYou may be right that there are caching bugs too, but that's not the same issue. The problem with text.module is that the wrong data is actually being stored in the database, so no amount of cache clearing will solve it.
That is why in Drupal 7 we do not force them to switch it to the fallback format (like Drupal 6 did). Instead, we give them a choice - see attached screenshot. (Of course, it's still true that if they make a bad choice, they can introduce a security risk.)
Comment #66
David_Rothstein commentedAll previous issue titles here presupposed a solution - retitling it to focus on the actual problem instead :)
In terms of the batch API way of handling this, I still don't understand the problem. It's exactly what we do for other rare operations in core that can potentially involve thousands of nodes being loaded and resaved, etc, such as http://api.drupal.org/api/function/user_cancel/7
Comment #67
catchThat has its own issues which have been bumped to D8 - see #89181: Use queue API for node and comment, user, node multiple deletes, short version is batch is no good for anything other than form submissions.
Comment #68
sunWe are running in circles. We are hitting the same absurdity all over again: Drupal is not capable of applying changes to larger amounts of data.
A bit insane. I really wonder why this is not covered in a http://drupal.org/node/X where X <= 7.
Instead of playing ping pong with issue titles, statuses, and patches, let's decide on one of the following resolutions:
Comment #69
catchFor #1, we'd need a bulk update API for field API so that it works for pluggable storage - Barry actually started on this in one of those issues. We could try to have something like this but it really feels like a D8 or contrib task at this point.
On #2 One the one hand we already have that as an issue, cleaning it up is for D8, on the other I don't love adding more of this to D7, but could live with it if we add explicit @todos to do something else in D8 and open issues similar to the node/comment multiple delete ones.
Comment #70
David_Rothstein commentedCould you explain these statements? I use batch API without a UI all the time and probably so do you (every time you run the command-line installer in D7, or "drush updatedb" even earlier than that). It works fine.
Perhaps what you are actually saying is "if you plan to use this outside the form API on a GIANT amount of data you should check your PHP execution time limit first because you might hit it", and yes, if going this route we could document that. I really can't imagine the use case for deleting text formats outside of the UI anyway (e.g. on cron), though.
Comment #71
David_Rothstein commentedTo clarify: I guess I can imagine doing it outside of the UI in an update function, but that is already inside a batch process anyway :)
Comment #72
damien tournoud commentedNo go for those two resolutions. I only see two possible course of action:
Comment #73
sunRe-reading my #68 and last #72:
2) Removing the option to delete a text format is a no-go. I've personally configured bogus text formats myself and as Wysiwyg module maintainer I know that there are countless of users who created varying amounts of formats that have to be merged/replaced with another existing format.
1) Thinking low-level and generally, I totally agree with the need for a low-level equivalent to Batch API, very potentially based on Queue API. Regardless of this issue, we need to address that problem space, so efforts should be continued on that other issue (I think + hope chx is working on something already). However:
The common causes are rather simple: "uid 123 just became 0, since 123 left us.", or "text format 3 was deleted, replace with 4."
In previous versions, we simply went with
UPDATE x SET y = new WHERE y = old;
but since D7, we seem to officially support no-sql databases, for which such an update may not work.
Actually, Field API seems to have been designed to officially support alternative storage engines than SQL, so that's likely the reason why we officially have to support mass-maintenance of values in no-sql storages. Not having used any of those myself, and considering them as very advanced usage (who in those 80% will ......), I'd be happy to leave mass-maintenance for no-sql storages to contrib, where those storage engines have to be invented in the first place.
That is, because: Even if we'd have a batch-alike update system being based on the new queue system, we'd still have to invent something new to differ between user and system updates. Everyone who worked with VBO (the only (UI-only) mass-maintenance attempt that exists AFAIK) knows what happens when updating or performing actions on 217 nodes via Batch API...
Comment #74
bjaspan commentedI suggested a bulk update API that for SQL reduces to the UPDATE query back in comment #3. While it is true that previously "we simply went with UPDATE x SET y = new WHERE y = old;", in #15 catch observed that that kind of update is really no longer kosher, even if we were willing to assume SQL-only storage.
I'm trying to come up with The Simplest Solution That Could Possibly Work.
Here's the thing: We *imagine* that the old way of "UPDATE x SET y = new WHERE y = old" actually did work because we think of that update query as instant. It is not. If the table is big enough or y isn't indexed, that query can take a LONG TIME. In fact I recently had a table so big that "SELECT COUNT(*) FROM table" by itself took over an hour before I ultimately killed it. So, the fact is that we have NEVER really had a scalable solution to this problem.
Therefore, I think The Simplest Solution That Could Possibly Work is to just do the queries when we need to do them. In this case, that means that text.module in hook_format_deleted or whatever will query for all entities with the old format, and then one at a time load them, update the value, and save them (well, we could use multi-load). Yes, this might run out of PHP execution time... but the UPDATE query could, too.
So that is my current suggestion.
Comment #75
chx commentedBah, just batch API it. As David Rothstein points out we can run a batch API various ways, this discussion is not getting Drupal 7 out the door.
Comment #76
bjaspan commentedI re-read this issue again from the beginning and we discussed it at today's D7 code sprint. In attendance: Dries, Moshe, scor, David Rothstein, hefox, Benjamin Melançon, Tim Loudin, and myself.
We decided that while we would like to keep all the data up to date and consistent, doing that has all kinds of issues we cannot solve quickly. We cannot to a load/save on all entities because we'll run out of time and even if we batch it we may trigger actions (sending emails, updating external web services, etc.) that we don't want to. We do not really want to implement the Field Bulk Update API because it's too late, it violates the entity abstraction, and it won't even fully solve the problem.
We considered the alternatives of disallowing deleting a format (either entirely, or only if content using it exists), declaring that "deleting a format" really means replacing its filters with one that always returns "This content is no longer available.", or implementing the "replacement map" functionality. We (or least I) do not like the idea of needing a replacement map for potentially every module for every column; clearly that is not a long-term solution.
But this issue is not about handing potentially every piece of data for every module; it is about deleting a text format and doing something reasonable.
Moshe ultimately made the killer argument: When a site admin clicks "delete a format", she doesn't want to be told "no, first you have to go find and update all your content manually." That's a PITA. She also does not want to be see "This content is no longer available" again until she goes and manually updates everything. She does not even want to see "all your content will be migrated to the new format by a magical perfect process, just please wait while your site continues to display some content in the format you just deleted." Instead, what she wants is for the format to be replaced by her selected replacement, RIGHT NOW.
Therefore, we decided to go with the replacement map option. It's simple, it accomplishes the goal, it lets us move forward, and it is probably needed even if/when someone implements the actual "update all the database entries to reflect the new format directly."
I have thus re-rolled my patch from #25 and added some tests. Attached.
Note that the "we" in this post was not completely unanimous. We reached agreement though some people still thought other approaches might be better. Anyone who wants to is welcome to implement those other approaches... but this issue will no longer be open in critical state. :-)
Comment #78
bjaspan commentedThe test failure is because my patch makes it look like deleted formats are not actually deleted. Clearly changing the behavior of filter_format_load() as I did is wrong.
Comment #79
bjaspan commentedNew version. This operates on check_markup() and filter_list_format().
Comment #80
chx commentedI guess this works.
Comment #81
sunThis patch contains various coding style issues and the helper functions could use better names, and it's totally illogical to declare those "private" -- every module on this earth has to call them.
I still need to read up on the latest follow-ups. The chosen way to solve this is a total hack IMO and this will bite us badly. I can only suspect that this topic was solely discussed within the single scope of deleting a text format. The real scope is What happens with stored field values that reference something, if that referenced thing changes or vanishes? Node references, user references, taxonomy term references, the real scope applies to all values that are references, using foreign keys.
Comment #82
bjaspan commented@sun, I explained our rationale in #76. The goal is to ship D7 with something that works, not continue to go around in circles forever attempting to solve intractable problems. The relevant question is: Is D7 so useless either without solving this problem, or with solving it as we decided to in #79, such that the majority of people who would otherwise use it can't use it? The answer is obviously no. Yes this solution is a hack, but it is a hack that provides a reasonable tradeoff, and after 2.5+ years of development we cannot hold back D7 forever trying to perfect solutions to every problem.
Regarding the fact that the functions are private: The goal was to make an internal-only behavior change, not change/complicate the public filter API. If you know of problems with the current changes that break callers of the filter API, by all means bring them up; that is perfectly valid review feedback.
Regarding your question: What happens with stored field values that reference something if the referenced thing changes or vanishes? It is actually two questions:
1. Changes. We generally handle this fine. Certainly text formats, nodes, users, terms, etc. that changed are handled fine. We clear the cache and get the new info on next reference. There may be cache clearing bugs, but that is not what this issue is about.
2. Vanishes. The simple and obvious solution is that a reference to something that no longer exists stops working and displays some error: "reference to non-existent [thing]". Trying to do better than that in the general case for arbitrary quantities of data, automatically, and invisibly to the user is probably impossible, and it is certainly not a critical "D7 is useless with out" requirement for core. In the case of node/user/term reference, "referenced thing no longer exists" is probably even the correct behavior. In the case of text formats, we actually considered that any text using a deleted format should render to "This content is no longer available." That was my first choice yesterday. However, Moshe made the compelling argument that the hack we implemented is substantially more usable than either doing nothing (which we considered) and "content not available." So that's what we did.
We've been going around in circles for months. At various points in this thread, someone (frequently you) has rejected every even theoretically possible solution. We need to ship D7. So the goal is to find something that works and move on.
I'll leave this as CNW for now but without some other tangible, implementable solution in the very near future (or without real identified problems with the current patch), I think it should go back to RTBC.
Comment #83
moshe weitzman commentedThis patch downgrades a problem from critical to ugly. Some future patch can go from ugly to pretty.
Back to RTBC. If someone downgrades it again, please state exactly how this should be resolved.
Comment #84
damien tournoud commentedI like.
Comment #85
David_Rothstein commentedIt's minor, but in the code comment for text_filter_format_delete() "input formats" should be "text formats"...
Plus, I think that comment really needs to explain why we *aren't* actually replacing the deleted format like we're supposed to (in other words, some kind of short summary of the discussion we had here, maybe even a @todo). Otherwise it will look pretty weird to have a delete hook that doesn't actually replace or delete something, just clears a cache.
I haven't looked too closely at the rest but it seemed reasonable.
Comment #86
sunSeriously, storage in a variable? How about introducing a proper {filter_format}.replacement column? Modules may have to join on those values to figure out update todos.
The PHPDoc of hook_filter_format_delete() implementations that do not perform a replacement should clearly state and clarify why they are not replacing, and noting a @todo for D8 wouldn't hurt either.
Likewise, the API docs for hook_filter_format_delete() should explain this yet unknown and entirely new concept of not deleting data and instead mapping it forever.
Why is the test not using drupalCreateNode() ?
Comment #87
David_Rothstein commentedAs per my comment I definitely agree about the API docs, but this isn't actually an entirely new concept. It's exactly what happens for all contrib modules in D6; none of them ever delete or replace anything when a text format is deleted.
It wasn't good then and it isn't good now, but it's not entirely new. The only difference is that D6 did not give site administrators a choice for the replacement format whereas D7 does; therefore, the "mapping" that occurred within filter_list_format()/etc. in the filter module looked a lot simpler in D6 than it does in this patch. But it was still there.
Comment #88
yched commentedI agree with the approach taken here.
One minor nitpick : Ideally "// Handle a chain of deleted formats." would happen once and for all in _filter_set_replacement_format() rather than each time over and over again in _filter_get_replacement_format().
Comment #89
bjaspan commentedI've changed this issue from field system to filter.module because that's how we ended up deciding to address it.
Changes in this patch:
* Improved docs everywhere.
* The tests now use drupalCreateNode() instead of node_save().
* Changed _filter_set_fallback_format() to handle chains of deleted formats. However, I also left the chain logic in _filter_get_fallback_format() because... something is nagging me that under some condition it may be necessary, and if it is not necessary then the loop will only ever run once anyway.
Non-changes in this patch:
I am still using a variable to store the fallback map. My original plan at the code sprint was actually to do as sun suggested and add a "fallback" column to the filter_format table. I abandoned that plan for the following reasons:
* All the database queries on tables filter and filter_format would need to be updated.
* It requires a much larger API change since now there needs to be a way to list both deleted and non-deleted formats.
* The internal logic for filter_formats() and other functions would need to change to handle caches of deleted and non-deleted formats separately, making it a more complicated patch.
* Providing a join-able column is only useful for modules that store their data in an SQL table, and the whole reason this issue is complex is that we're trying not to assume that text data is stored in SQL. OTOH, a join-able column is in no way necessary: Now that the fallback map variable is documented, modules can just as easily do the equivalent of "SELECT * FROM mytext WHERE format IN (array_keys($fallback_map))", no join necessary.
In short, it seems to me that a fallback column (or a separate filter_format_deleted table which I also considered) would be a much more complicated patch without any real benefit over the current approach.
Comment #90
bjaspan commentedI guess I could go for a function to return the fallback map instead of suggesting that modules access the variable directly.
Comment #91
bjaspan commentedNew patch, adds filter_get_fallback_formats() instead of documenting the internal variable.
Comment #92
catchI prefer a variable to a table (unless we wrap calls to that table in both a cache_get(), cache_set() + static caching) - check_markup() running on 300 comments or 30 nodes has to call this function for each.
@sun, nodereference and userreference have been dealing with this issue for years now - they just leave stale data lying around and don't display anything, taxonomy follows that pattern. This is obviously a problem we need to solve, and it'd be nice to do it retrospectively (or have the ability to do so) when we do, but I agree with Barry/the patch that we can't do it here, and no-one has opened critical bug reports for that generic problem that I know about.
Comment #93
yched commentedNot sure why catch set this back to CNW (not clear from the comment) - my personal remark is that comments lines wrap consistently before 80 chars :-).
Other than that, looks good to me.
Comment #94
webchickLooks like a straight-up cross-post with Barry to me.
Comment #95
catchsorry that was a crosspost with barry, just took ages to get back to the tab after starting it :(
Comment #96
David_Rothstein commentedThis is actually incorrect use of the term "fallback format" - in the filter system, "fallback" refers to http://api.drupal.org/api/function/filter_fallback_format/7.
I think we have existing code during filter deletion that (for historical reasons) uses this wrong term also, but that's a bug to be fixed and we probably shouldn't add to it here :) I'd suggest "replacement format" instead?
Patch looks good otherwise. (I'm trying to think if there are any other places in the code that would need to call these new functions, but can't think of any, at least off the top of my head - the places added in the patch seem to cover it.)
Comment #97
bjaspan commentedNew patch. Changes "fallback" to "replacement" everywhere that we talk about the format to be used in place of one that was deleted, so "fallback" now only means "the format that cannot be deleted and that everyone is able to use." Also, comments are now wrapped at 79 characters.
Comment #98
chx commentedlooks good, let's get our stubbornest critical out of the way.
Comment #99
sunwow, I think I've never seen design as ugly as this. This is most probably the worst Drupal API change since.. like ever? At the very least, Drupal modules tried to update values, so as to ensure data integrity. But not trying at all is just wow, überugly. This basically means we could as well resurrect the epic Deletion API issue.
The consequences of this patch are horrible to think through. Starting from potential endless recursion issues, to breaking data integrity, breaking table joins, complicating module updates, potentially introducing security vulnerabilities in contributed and custom modules, and certainly more I can't even think of yet. An insane recipe for introducing critical bugs.
I do not approve this Filter API change. From all possible options, this is the absolute no-go resolution. But anyway, at least trying to prevent some possible infinite recursion problems, and fixed various comments and names.
Comment #100
sunComment #102
bjaspan commentedre #99: "At the very least, Drupal modules tried to update values, so as to ensure data integrity." No, they don't. Try deleting a content type sometime.
Comment #103
bjaspan commentedIn #358437-23: Filter system security fixes from SA-2008-073 not applied to Drupal 7.x I suggest that these two issues, while not exactly the same, are really best served by a single patch.
Comment #104
sunSorry for being asinine, but this adds a lot of long-term burden to Filter system/module maintainer(s).
I'm putting a new patch up for review. Simplicity: If we say we cannot ensure data integrity when a text format is deleted, then we simply cannot expose functionality to delete a text format.
Dead simple. It's illogical and utterly wrong to try to expose a feature that is and cannot be supported by the rest of the system.
You can make a text format unavailable by revoking user permissions. That's it, case closed.
Comment #106
sunAnother debate concluded that we're optimizing Drupal for 1% large-scale sites here. For 99%, i.e., all other more common sites, execution of various DELETE queries as well as flushing of various caches is no big deal at all. We can serve the needs of most users by implementing the originally suggested field_attach_bulk_update() + making the default implementation in field_sql_storage respond to it. If your site falls into that 1% of large-scale sites, then you can always replace field_sql_storage with a custom replacement module, or simply implement some alter hooks in a custom module to remove the functionality to delete a text format. But there's no reason why 99% of all sites should be discriminated, just because 1% is unable to do the same. Heck, I could even tag this issue with "accessibility"...
Comment #107
moshe weitzman commentedOK, lets send this to the committers. We're not going to come to a happy agreement quickly here.
#97 is RTBC and passes all tests. #104 is preferred by sun but does not pass tests yet.
Comment #108
dries commentedSun, I don't support the proposed patch in #104.
People that don't know text formats might try to create one only to see how it works, what it does, etc. I know I would, and I would hate it to get stuck with a text format called 'test test'. ;-)
There are other valid use cases where it is completely harmless to delete a text format. The confirmation page of the text format deletion option should warn people about the consequences (it already does, I believe), and then we let the administrator make his/her own decision at his/her own risk.
Removing the delete option because it can cause damage is a crazy pattern. Do you recommend we remove the delete button from content types too?
Administrators can cause damage on any IT system. Delete the wrong file on Microsoft Windows, and your entire system is hosed ... it doesn't stop hundreds of millions of people from using Windows.
Let's continue with #97.
Comment #109
dries commentedCross-post.
Comment #110
moshe weitzman commented@sun - please let us know if you intend to keep going with #100. I'm not sure what you are doing there but I notice that some test failures have crept in. If you prefer not to keep going, thats ok.
Comment #111
bjaspan commentedI agree that sun's changes in #100 are improvements on my #97. I will pick up his changes and re-roll the patch soon if he does not. Currently, I'm taking an unplanned detour through the western Denmark due to mis-reading the train maps. At least I have internet, though. :-)
Comment #112
bjaspan commentedI am now working on re-rolling this patch.
Comment #113
bjaspan commentedNew patch. I updated function names, code, code placement, and comments based on sun in #100 (except for the slams against Field API). I introduced two test fixes worth noting.
FilterSecurityTestCase creates a node, deletes its text format, and verifies that the node body renders as nothing. We now need this:
I also had to remove the last two lines in testTextFormatReplacements(), but I suspect something is wrong because this code passes before this patch (I assume), but I do not undersatnd how:
We're being kicked out of the code sprint room, so I'll look at that last issue again later tonight.
Comment #114
bjaspan commentedNo, sorry, my final comment in #113 is wrong. The last two lines I commented out do not work before this patch, because they are added by this patch. sun added a check for the format element in the form to all three cases in testTextFormatReplacements(). His third such test fails because there is no format form element when there is only one format allowed.
I do not see where in the code makes the select element not get rendered when it only has one option. So, either the element should not get rendered and I should change the test to verify it is not rendered, or the element should get rendered with one option and I should put hte test back the way sun wrote it.
So: Should the format select widget get rendered when it only has one option? If not, where is the code that makes it not?
Comment #115
bjaspan commentedI found the answer to my question in #114.
Comment #116
sunRe-rolled #100 + fixed the tests. Didn't look at the differences in #113 yet.
Comment #117
bjaspan commentedThe answer to my question is that the format selector is not rendered if there is only one format. This behavior comes from the #access attribute:
I have therefore removed the test for the format element from this patch when there is only one format.
I believe this patch is RTBC, but it contains some (minor) test fixes since the last RTBC and sun's patch, so it could stand another quick review.
Comment #118
bjaspan commentedReviewed my own patch, made two fixes:
Fixed whitespace.
Removed.
Powered by Dreditor.
Comment #119
sunIncorporated additional changes from #113.
Not sure whether this patch should go in. Given the thinking we seem to apply to the overall issue, we may as well just silently ignore the problem - just like we do for node types. In both cases it's irresponsible to expose a delete functionality at all. Just that.
Having to choose between irresponsibility and insanity, I think I'd prefer irresponsibility, since that doesn't have the consequence of having to maintain and clean up this mess later on.
Comment #120
dries commentedWith the patch, we enable people to test/innovate in contrib. Maybe it is no longer 'critical' though?
Comment #121
bjaspan commentedReplies to #119:
1. The adds back comments that "blame" Field API for being inadequate. I don't think we need one module slamming another module in its comments. There is a very real use case not touching Field API where this issue strikes: A module that is disabled when a text format is deleted. We're all one big happy family of code, right?
2. I simply disagree with sun that "late fixup" is a completely insane approach. Sure, it would be great to make all of our data perfectly consistent all the time. It is not feasible in core at this time, and the disabled-module use case means it may never be. The approach in #118 is an improvement on what we currently offer.
3. #119 adds a test for "NoFieldByName" checking that the format selector is gone. I don't really see why that belongs in this patch/test case, but it is harmless to have it.
Comment #122
moshe weitzman commentedHere is #18 re-uploaded. I have reviewed and it is RTBC. I think this merits critical priority still.
Comment #123
bjaspan commentedIf we decide to silently ignore the problem, we should stop letting users select a replacement text format. All text in a deleted format will then render to the empty string. We discussed this option at the D7 code sprint last week and I even advocated for it, but Moshe made the compelling argument against it: It is better to do what the site admin wants, because we can.
Comment #124
David_Rothstein commentedI would like to review the latest patch but am confused whether I should review #119 or #122.
Yes, this is definitely still a critical issue. Unless you like Drupal telling you "Content assigned to the deleted text format will be reassigned to the chosen one" but instead displaying all your content as an empty string instead, which is the current behavior of D7 HEAD :)
I am finding myself very sympathetic to @sun's comment in #106:
We need Barry's patch no matter what, but it does seem silly not to actually make some effort to update the data in addition to that, when on 99% of sites this won't really be more complicated than a db_update().
Barry's point, I think, is that that part could be handled in a non-critical followup, which I think may be true but since @sun feels very strongly about it I wonder if we won't just make everyone happy by dealing with it here :) We just need to be careful not to bite off too much. With this new hook, we'd need to name it and document in such a way so that it's crystal clear this is NOT for generic ("mission critical") bulk updates, but only for "housekeeping-related" ones. And that the calling code needs to accept the possibility that some field storage engines won't be able to respond to it.
Comment #125
sunre #120:
Contrib can innovate with or without this patch. Though that's kinda unlikely to happen, since not even Drupal core top contributors, ninjas, masters, and gurus were able to resolve this issue.
re #121:
1) I think you're mistaken, the test has been updated/corrected.
2) It's not blaming Field API. It's explaining reality.
3) Sure, you do not have to maintain this code. I'll be blamed for Filter API insanity.
4) #118 still uses a bad variable name and doesn't contain improved phpDoc.
To perhaps clarify #119:
I just meant to mark this issue won't fix and leave HEAD as is. People can delete formats, but like node types, existing data may or may not be updated.
#119 passed in the meantime. But I'd prefer to mark this issue won't fix in the meantime. The added code here is not an improvement, it makes things worse. The help text on the text format delete confirmation form may rather need a warning that not all of the existing content may be updated.
Comment #126
dries commentedI don't see how the patch in #119 makes things worse.
Even good code results in WTFs. See http://www.osnews.com/story/19266/WTFs_m ;-)
Comment #127
damien tournoud commentedAs I said in #72, I would strongly prefer we simply remove the "mass filter update feature" and just make sure that a content in a deleted text format will never be rendered.
This type of mass update requires a lot of thinking (you generally need to implement transformations of a text format into an other with partial rendered - for example from BBCode to Filtered HTML), and I don't believe this is a feature that should be in core in the first place.
Comment #128
bjaspan commentedOkay, I think as a group we have slightly reduced the set of things we communally think constitute the minimally viable change here. That's progress! Here are the contenders for "What is the least we can do to make this not critical and not make the situation worse":
1. Nothing.
2. Remove the ability to delete a text format completely.
3. Remove the ability to delete a text format completely if any content uses it.
4. Leave in deleting text formats, but remove the ability to select a replacement format and have the form say "all content in the format you are deleting will disappear."
5. The current patch: Leave in deleting text formats with a replacement map applied during check_markup().
My votes, in order: 5, 2, 3, 4. I think #1 is unacceptable because text in the UI is currently just a lie and is trivial to fix. #4 is my last choice because once the format is deleted with no trace, there is basically no way to ever recover. #3 is problematic because it makes assumptions about fields storage; it is easy for SQL, and easily implementable via our EntityFieldQuery, but could be very inefficient for NoSQL storage. #2 works and is logical, it is just bad SEX (Site Executive eXperience :-)). This is why I ended up coding #5, obviously.
While ranking my votes, I considered how I would vote on the similar question on node type deletion. The difference is that for node type deletion: we do not falsely promise that we do anything at all so at least #1 does not involve a lie and thus "do nothing" is #4, including the "never recover" part; #5 is nonsensical, we can't map one content type to another; and #3 is both easy and feasible because node storage is always SQL. Thus for node type deletion my votes would be: 3, 2, 4.
Note that in both cases I think we can do the real cleanup job we all want to do, in D8, using 'deleted' status columns "everywhere" and using the job queue. I just think it is too late for that for D7 because it is more complicated than people realize.
Comment #129
David_Rothstein commentedOne more contender, as per my comment above:
6. Same as #5, but additionally
To me this seems like the most reasonable compromise as it means we are making a "best effort" to actually maintain the integrity of the data within core (and leaving it up to other storage modules to decide whether they are able to do that or not).
#1-#4 don't make any sense to me. Again, we have working code for this replacement feature in Drupal 6 (http://api.drupal.org/api/function/filter_list_format/6) and Barry's patch basically just ports that to Drupal 7. I think it would take a pretty compelling argument to refuse to port that to Drupal 7, and instead rip out or completely change the text format deletion feature at this late date.
Comment #130
bjaspan commentedI think it would be a mistake to design core in a way we know is impossible to implement on non-SQL databases efficiently. Yes, today that is only 1% of sites (actually, today it is probably exactly 1 site), but we hope to see many much larger Drupal sites evolve during the lifetime of D7, and have Drupal's non-SQL ecosystem grow with it.
So I see option 6 as at best not required to solve the critical issue and at worst a step in the wrong direction. That said, a rough draft of that approach is in comment #37.
Can we agree to stop talking about option 6 or any other kind of bulk update until we commit/decide on something from option 1 to 5?
Comment #131
bjaspan commentedNew patch. This is my patch plus the variable-name fix from sun. The only remaining differences between our patches are:
1. Mine contains this and sun's doesn't:
The patch adds these args to the hook in filter.api.php. I'm sitting next to Angie and she says the hook should list the formal args even if they are not used.
2. Comments.
Comment #132
moshe weitzman commentedthanks for the summary. as you say, #5 for right now. wait for green before commit.
as for that db_update(), i'm not so sure. one could argue that it is an API violation of the entity to change a format without firing the appropriate hooks (i.e. hook_node_update, …)?
Comment #133
sunTo clarify again in case people don't realize it, this bears the consequence that any "upgrade", "migration", or fix for the underlying issue (performing mass-updates), will HAVE TO add an entirely new hook or similarly complex ONE-TIME system to be able to determine, where orphan formats are used. That's a serious afterburner in WTFs/m.
re #130: Why can't your NoSQL driver in contrib not simply hide the text format deletion feature (and similar other mass-updates)?
For Filter API, the most sane move forward would be either (ordered by preference):
1. Nothing. But clarify the help text to explain the "best effort". Due to #358437: Filter system security fixes from SA-2008-073 not applied to Drupal 7.x, all not updated content will be rendered as empty string now.
6. Make at least Field SQL Storage do a "best effort" for 99% of Drupal users. Those other 1% won't delete a format anyway, as they probably know what they're doing. (This only ranks 2nd, because I want to believe bjaspan that it's hard to implement.)
2. Remove the entire text format deletion functionality.
Since 1 + 2 are both valid ways out, nothing is really broken here. Hence, demoting to major. D6 did not have this mass-update nor mapping feature either, D6 is simply horribly broken in what it does with the fallback to the default format.
Comment #134
bjaspan commentedYou do not get to unilaterally de-prioritize the issue, taking it off the list everyone is looking at to decide what needs to be worked on.
I think you and I are done now. Dries will have to decide what to do.
Comment #135
sunLet's remind webchick of why it's a bad idea to spell out arguments that your hook implementation does not need at all:
#236657: Swapped arguments in system_clear_cache_submit function
#525570: system_clear_cache_submit signature changed from drupal-6.12 to drupal-6.13 with no obvious reason
#507668: Parameter order for system_clear_cache_submit changed in 6.13
Those needless arguments needlessly broke lots of Drupal sites.
Merged the improved comments from #131 with improved comments from #119. I'm still opposed to this patch, as it's the worst from all possible resolutions.
Comment #136
David_Rothstein commentedStill says "input formats" - change to "text formats".
Otherwise one or the other of the above two patches are probably fine - I don't have time to investigate both :) We can follow up post-commit with pursuing option #6 - that's fine.
*****
I don't think it's a step in the wrong direction for Drupal 7 (or at least not more than anything else here): Just like the rest of the patch, it acknowledges the current limitations and works with them. And by invoking a hook, we leave open the possibility that any/all field storage modules might eventually figure out how to implement it effectively.
Frankly, if non-SQL databases can't ever figure this out, I don't think they're going to have much of a future for storing Drupal field data (at least not outside highly-specialized situations).
If so, then we violate it a lot already for other entity properties (e.g. http://api.drupal.org/api/function/user_filter_format_delete/7) - no reason to stop now :) But I don't think so. We aren't actually updating the content of the entity itself, just the way in which it gets displayed when it gets rendered; I think there may be a distinction. And @sun had a good point of not wanting to trigger 10,000 email alerts saying that 10,000 nodes were updated, just because we deleted a text format.
Comment #137
bjaspan commentedI've asked Dries to hold off making a decision on this. Let's all just cool it for a few days and ponder silently.
Comment #138
sunThanks for that. :)
I would be more comfortable with this patch, if we were not storing this replacement mapping in a system variable. I've put a lot of energy into rewriting Filter module for D7, to modernize it and make it suck less. Which first and foremost means that it no longer pollutes the system variable storage with lots of data that's actually tied to configuration data properly stored elsewhere and only ever needed when a filtered piece of content is created or updated.
Therefore, restarting to pollute the global system variable space, right after the API was finally cleaned up, is what makes things worse. Contrary to that, I'm not strictly against keeping deleted or otherwise invalid records (instead of a replacement mapping in a system variable), as that basically maps to my proposal in #562694: Text formats should throw an error and refuse to process if a plugin goes missing - although that patch goes one step further and not only applies the paradigm to text formats, but also to each filter in a format.
However, the proper and clean way would be to keep deleted records forever, as basically being proposed in the epic #147723: Deletion API for core. If and only if considering any replacement mapping.
When not considering a replacement mapping (as that pretty much kills everything you've ever learned about data integrity and no amount of NoSQL or whatever hyped technology will ever make me feel comfortable with invalidating my data), then just simply doing nothing here, just like we do nothing for node types, will work until we find a working solution to perform mass updates and mass deletions -- which we should still implement as simple database update query for field_sql_storage, to at least cover the 99% use-case of Drupal.
Our future tasks are pretty clear:
- Introduce a global concept of maintenance updates; i.e., without triggering any kind of notifications, or whatever else could happen. (For D7, we could at least start with agreeing on and setting a simple flag like $entity->maintenance_update = TRUE to try to prevent this.)
- Introduce a reliable queue that can be processed through batch UI, cron, or other means.
- Introduce a standard way to perform mass updates and mass deletions.
Comment #139
bjaspan commentedI agree with all of your "future tasks" but think they are all for D8 or beyond. I do not think we want to add an API to "cover the 99% use-case of Drupal" (i.e. assume SQL) for very much the same reason you do not want to add Drupal variables-based logic back into the Filter system: it is not the future and we've just paid the price to move away from it.
However, I am moving towards is thinking that the variable-based mapping approach is not the right one.
Since we cannot yet implement "true" text format deletion in a way we like, I suggest we actually don't delete anything at all. Instead, the "delete" option should be replaced with a "hide" option that removes everyone's permission (except uid 1 of course) to view and edit that content in one fell swoop. Admins can then choose between that or just changing the text format into whatever they would have replaced it with instead. Sure that means they might end up with two similar formats, but frankly I don't care about that, and I'm not even sure it would happen anyway.
The "hide" option will just remove privs, so there cannot be an "unhide" option because it won't know what privs to put back (we should not even try to restore the previous set of privs because there is a good chance those won't ever be the right then to unhide to). So when you click "hide", the confirmation dialog should say, "to unhide this, go to the permissions page and decide who to give permission to."
Can we agree on this approach?
Incidentally, I like the same approach for content types. You can delete them only if there is no content using them (because we have a way to determine that). If there is content, you can "hide" it by stripping permissions. But that's a separate issue.
Comment #140
sun#562694: Text formats should throw an error and refuse to process if a plugin goes missing is a hard requirement to implement a {filter_format}.status for hiding/disabling text formats instead of deleting them. Although there is a complete and working patch, that issue was recently moved to D8 by reviewers though. Therefore, #562694 would have to be re-targeted against D7 and committed, to continue with the suggested approach here.
Comment #141
webchickBumping out of my queue for now. Thanks for taking a calm, collected look at the situation now post-DC. :)
Comment #142
David_Rothstein commentedI think I agree with @sun in #138 although not entirely sure which he is suggesting for D7 vs D8 :) In any case, keeping the approach of the current patch but switching from the variables table to storing the deleted/replacement info in the text format record itself seems pretty reasonable. It shouldn't affect performance, since calling filter_format_load() already loads and caches everything for all formats.
It really seems like it would be silly to not do the db_update() in the case of field_sql_storage, no? I don't think we're sacrificing anything for the future with that approach, but rather the opposite. If we've all agreed that we want core to handle such bulk updates generically in D8, then providing a hook in D7 and leaving it up to the storage engines to implement if they want to means that maybe a solution will work itself out in contrib even earlier than D8.
Regarding hiding text formats by taking away their permissions, we already have two UIs for that in core (the permission page, plus the individual format configuration page) so I wouldn't be too excited about adding a third :) Also, this still leaves this format available all over the admin interface. I don't think it's really acceptable to have things in core that people can create but never delete, especially when we had that ability in Drupal 6 and earlier already.
Comment #143
bjaspan commentedre: #140 and #142: No, no, no, no, no, no, no, no, no.
I've had it. Finish this issue yourselves.
Comment #144
BenK commentedAs one of three issues left tagged "beta blocker" I thought I'd check in and see where this issue stands. It looks like there has been some very heated discussion. What needs to be done to get this off of the beta blocker list?
--Ben
Comment #145
David_Rothstein commentedAnyone know why this issue is marked as a "beta blocker" to begin with?
Note that all the security aspects of this have since moved to #358437: Filter system security fixes from SA-2008-073 not applied to Drupal 7.x. The effect of this bug now is basically just "I deleted a text format and now all my site's content is hidden". A critical bug, but I don't think it should block a D7 beta release.
Also, it seems to me that any API change that comes out of this issue (if at all) would be small, and probably more of an API addition than anything...
Comment #146
webchickAgreed.
Comment #147
StuartJNCC commentedThis seems one of the more intractable issues still left in the critical queue so, whilst I don't claim to be able to help much in solving it, I thought it might be useful to summarise the very lengthy discussion:
The problem
If you delete a text format in current HEAD, you are offered an option to select a text format that will be used in its place (see screen shot in #65). This implies that existing references to the text format being deleted will be updated to the one chosen, but that isn't actually done.
In Drupal 6 there is a site-wide default text format (plain text) that is used if a text format can't be found. This has a potential problem: say the deleted text format was BBCode. Plain text won't recognise BBCode markup at all and will display it unfiltered. That may display stuff like an email address embedded in the markup that user's don't want revealed.
Drupal 7 added a feature in #635212: Fallback format is not configurable which allows the replacement text format to be user-selected. As noted by David_Rothstein in #65 "Of course, it's still true that if they make a bad choice, they can introduce a security risk." However, the fact that the update to the chosen text format is not actually done, left the situation somewhat worse. It seems that the filter that will actually get applied is the top one in the drop-down list of text formats that will be displayed on the page. For the Admin of a fresh Drupal 7 install that is "Full HTML"! The potential security hole that results has been tackled in #358437: Filter system security fixes from SA-2008-073 not applied to Drupal 7.x.
The options
These are summarised in #128:
1. Do nothing
Not tenable. Effectively D7 lies to the user.
2. Do not allow text formats to be deleted
Dries gives the reasons why this won't do in #108. Basically, if people cannot try out a text format and then delete it, it makes it very hard to test or develop text formats.
3. Do not allow text formats to be deleted if they are in use
This was first suggested by catch in #13 and is raised repeatedly throughout the discussions.
The two counter-arguments are:
1. Its not what users want. This is best articulated in #76 in a quote attributed to Moshe Weitzman:
2. It would possibly require a new API by which modules declare which text formats they use and be difficult to implement efficiently on a non-SQL storage engine. This is best expressed by bjaspan in #130
4. Fix it by updating all references to the deleted text format with the chosen replacement
What bjaspan calls "The dumb way" of doing this (#22) is "f_a_query() all the affected objects, then for each one, load it, change it, save it.". The problem with this approach is that it might work for SQL based storage providing the database is not too big, but is difficult to implement for non-SQL based storage and, whatever the storage engine, will likely hit the PHP timeout limit if there are many objects to update.
The alternative is to use some sort of batch processing to perform the update. One problem with this approach is how to implement it. Various possibilities were discussed, including suggestions for a new bulk update API.
Another problem concern what happens if a user edits or adds content based on the to-be-deleted text format during the period before the batch update has finished? It was suggested that, providing the text format is not actually deleted until the end of the update process, this should not actually cause problems. However, there could be race conditions where an add or edit was in progress as the last batch terminated and the text format was actually deleted. This could result in objects being left with the deleted text format. Therefore, option 5 might be needed as well.
5. Just in time approach
Keep a "replacement map" which remembers the text format that has been deleted and the one that was chosen to replace it. Use this to do the update on-the-fly when an object using the deleted text format is accessed. Over a period of use, the system gradually gets updated. This sort of approach was first suggested by yched in #4 and bjaspan produced a patch which was RBTC by Moshe Weitzman in #134.
Sun strongly objects to this patch (#99) and, throughout, sought to generalise the issue to cover the case of any item being deleted when it may be a foreign key referenced elsewhere in the system: for example, deleting a node type or a content type. He also has a basic objection to text being rendered in a format other than the one in which it was written (#33) "This is exactly what must not happen, never, under any circumstances." His immediate objection to the patch: "The chosen way to solve this is a total hack IMO and this will bite us badly." (#81) and that he replacement map is stored in a variable (#86). It is not clear (to me) what sun's preferred solution is.
(Just to put my cards on the table - mine would be 3: don't allow deletion of a text format that is in use.)
Comment #148
chx commentedDeleted this comment, will write another.
Comment #149
chx commentedHere is the solution. We punt on the bulk update API for D8 and for D7 we extend field storage with an update column operation and add text_filter_format_delete and let it call hook_field_storage_update_column. This is possible in SQL and MongoDB both. We will give an opportunity to modules to veto on deletion, they can do something else as they see fit. I talked to David Strauss and he is in agreement. This is not a perfect solution. This is a feasible solution that gets D7 out the door.
Comment #150
david strauss> I talked to David Strauss and he is in agreement. This is not a perfect solution. This is a feasible solution that gets D7 out the door.
Yes.
Comment #151
sunThanks for the issue summary, Stuart!
My preferred solution is to simply update the values properly. For simplicity, via Batch API. Implemented by every module, as usual, similar to the user cancellation process. If required, additionally providing a hook for certain field storage modules to intercept the batch and do whatever they think is right to do. The default implementation, field_sql_storage, can happily work like it always works.
Additionally, we have to prevent unexpected magic triggers, rules, and actions from firing during this update. By introducing a $entity->maintenance_update property.
--
x-post with chx... a simple column update would surely work for me, too. For this issue, that's totally sufficient -- although I hoped that we'd be able to also fix the following issues in one fell swoop with the Batch API approach:
#89181: Use queue API for node and comment, user, node multiple deletes
#355905: node_delete_multiple() has to use batch API
#355926: Add comment_mass_update()
#814990: node_mass_update() can't rely on a browser session
#832572: Add a standard worker queue
Comment #152
David_Rothstein commentedAbsolutely! But... we discussed this exact idea above. See #124, #129, #142 so on and so forth. In response, people said "no no no no" and "alternative storage engines can't do that". If the actual answer is "they can" or "some of them can", then even better.
And to clarify, the reason why a simple column update is fine is that it's what we do elsewhere. Nowhere else do we try to load the entities and resave them when a text format is deleted. For example:
http://api.drupal.org/api/function/user_filter_format_delete/7
http://api.drupal.org/api/function/block_filter_format_delete/7
(And as @sun pointed out several times above, loading and resaving entities would lead to undesirable side effects anyway.)
What's good enough for those modules is good enough for text.module too. And we have the backup method anyway for situations where the column cannot or does not get updated - we still do need to keep that (i.e. most of Barry's patch or some version thereof). But by invoking a hook, we let the storage engine handle the situation however it wants to, or however it best can. Let's do it :)
Comment #153
chx commentedDavid mentioned that internally non SQL engines might implement this by storing a replacement map and on load check the map, save it back gradually removing the need for a map. I know this was not liked for the filter system -- but David and I agree that this actually is an internal business to the storage engine. Many override points will be necessary -- even the SQL implementation might want an override which does this internal map thing to avoid lock contention. We might want to help storage engines in Drupal 8 with mapping but for Drupal 7 we just tell them to update the column, close our eyes and pray.
Comment #154
damien tournoud commentedOk for this approach. As #153 mentions, it's way better to push the complexity down the stack to the storage engine then to try to solve the scalability issues in the application layer.
Comment #155
David_Rothstein commentedPlease see lots - and lots and LOTS :) - of comments above, which explain why (even assuming a perfect world where field storage engines handle things perfectly) the filter module still needs to maintain a mapping of deleted text formats: to handle the case of disabled modules.
I don't see any reason to push off to Drupal 8 something that already existed in Drupal 6. We need both approaches to solve this issue.
Comment #156
chx commentedDavid, handling text formats in disabled module is another, not critical issue. It existed since eternal and whether we fix it in D7 or not is a whole another pile of hurt. THIS issue is about fixing field storage and nothing else.
Comment #157
sun@David: I agree with chx. That has always been the case, and if there is an issue to discuss and solve it, then that's #562694: Text formats should throw an error and refuse to process if a plugin goes missing
Comment #158
chx commentedAnd "we have the backup method anyway for situations where the column cannot or does not get updated " -- no we don't, as said above, a module can veto can the deletion and then it's up to the vetoing module to do something good. This whole issue have blown up about doing what Barry's patch did so that's something we definitely not do.
Comment #159
David_Rothstein commented@chx, @sun: No, it works absolutely fine in Drupal 6. Try it! (For example, try deleting a text format while CCK is disabled, then reenable CCK.)
Not fixing this would be a regression. And given our new method of handling content with an unrecognized text format in Drupal 7 - in particular, we don't display the content at all - it would IMHO be a critical regression.
And according to StuartJNCC's excellent summary, it has been part of this issue since somewhere around comment #4 :) This issue is about exactly what the title says it's about. We claim in the UI that when a text format is deleted, all modules convert their data (or at least have their data "converted" for them). We need to make that statement true. Or I suppose we could explicitly warn the user about cases where it does not, but that seems silly given that it would be confusing and a regression, and we already have Barry's working code to solve it.
Comment #160
sun@David: It only works, because D6 is broken. I already stated in http://drupal.org/node/562694 and elsewhere that D6 has this invalid and utterly broken fallback to the default format, but only just because the Filter API in D6 is not able at all to properly account for it. That is primarily because modules don't even have a chance to update their values, since there is no API hook invoked in the first place. Therefore, you cannot compare D6 and D7. Not at all. And, please, this issue is only about Text module (resp. Field [storage] API) not being able to perform an action that would otherwise be a simple 5 minute fix:
db_update('foo')->fields(array('format' => 3))->condition('format', 2)->execute(). Everything else we discuss in separate issues.Comment #161
chx commentedEnough!
Comment #162
chx commentedFind another fool who will find a solution , rally people behind the solution just to get someone drag up a totally new, hairy edge case blocking this issue.
Comment #163
chx commentedJust so people understand, because it's impossible to understand from what David said so far, his new issue is this:
Whatever. We do not want to release Drupal 7, I can see.
Comment #164
sunSpent the past hours with this patch. I don't understand at all what's deemed to be "impossible" here.
Someone more knowledgeable of OOP might want to finish this patch.
Comment #165
catchDavid's case (which as I understand it is exactly what chx outlines about in #163) needs its own issue, it's ridiculous to make that a prerequisite to allowing text module to do the bulk update.
On adding the hook and letting field storage engines do as much on their own as possible (including blocking delete), that's fine with me. In #7 I outlined the main problem with this approach in regards to mongodb - mongodb_field_storage stores everything in one collection, there's a limit to the number of indexes, no site is going to have an index on the format key for the body field, for example, since you never, ever need to query that except for this particular situation. However unindexed queries are no longer blocking as of mongodb 1.6, and if the hooks are flexible enough then it could be implemented in a way which doesn't require an unindexed query anyway, or just punt on it at this point.
I double checked text_field_schema() and that already has an index on format in SQL, so no need to add one.
I think the field update patch should move to a new issue which is less cursed than this one, and the text_field_format_delete() could be added here once that's in.
Comment #166
David_Rothstein commentedWow. We seriously all need to chill out here. (Again. There is something about this issue...)
I apologize if some of my recent replies came off as curt - I was writing them fast in the middle of doing other things. I also need to remember that not everyone followed all the complicated twists this issue took (and that some of the discussion may have taken place in other issues that were at one point intertwined with this one; the example I brought up is definitely not new, though). In any case, I thought we had a productive conversation earlier on IRC. @chx, I'm sorry if you felt differently.
However, please do not accuse me or anyone of "blocking this issue". It is hardly blocking an issue to point out that the existing (already written) patch in the issue solves certain problems that the new (proposed but not yet written) patch would not be able to solve.
And if we really wanted to get Drupal 7 released absolutely as fast as possible, we would simply commit Barry's patch as is, because it works and fixes the critical bug. Period :) It's fine to have other concerns beyond that, but acknowledge them for what they are.
There is widespread agreement that we want to introduce something like what @sun just posted. If anyone feels strongly that this issue should be split up somehow (rather than simply combining that patch with Barry's), feel free to move Barry's patch to another issue or vice versa. Whatever makes you happy.
Comment #168
catch@David, yes it's been discussed in depth as part of this issue, but there are two distinct problems to solve, this what I wanted to make clear.
I think we should kill this issue dead and split everything to new ones - one for the bulk update, one for the just-in-time replacement for re-enabled modules. So I'm going to do that right now.
1. EntityFieldUpdateQuery - I think everyone wants this, even if there are concerns about how it'll pan out, but not having it is worse. #914306: Create a batched framework for entity updates
2. Handling cases where modules are disabled, then enabled, and information they depended upon has disappeared.
#914316: Disabled modules don't get text format replacement run on their data
@sun, while a straight update doesn't solve the bulk delete/bulk unpublish issues you mention, it would eventually allow for things like merging two taxonomy terms together, removing references to deleted terms etc.. The maintenance update (we did something with a referenced object, so need to go and clear out everything) is different from an actual content or status change even if the problems are similar.
PS. If anyone thinks the two issues I created don't cover all the problems discussed here, please 1. open a new issue 2. link to it for this one 3. don't get too pissed off with me and re-open this one again, I really think we need to break this up into as small pieces as possible and not kill each other over this.
Comment #169
chx commentedAlthough I am out of the issue, I need to shut this down before it gets out of hand. It seems that noone read what I have outlined above. We are not trying to come up with generic solutions this late. We are coming up with one thing and that's updating integer columns. We can debate whether we want to allow updating string columns as well. I would rather not. It's too late, it's an ugly stopgap towards a bulk update API which is a lot of work. Getting this right is also a lot of work. Any assumption taken for the update capabilities of field storage engines is false.
However, this issue is a soul killer and that's why I won't fixed it. As I said yesterday, find another fool who writes this, because I had enough. See, after I came up with a compromise and got everyone behind it it was happily and totally ignored and something else was done.
Comment #170
catchI'm moving this back to duplicate on the assumption chx cross-posted. Agreed that int updates should be restricted (and this covers all 'reference' fields we currently have anyway).
Comment #171
chx commentedI did not cross post. There is no other issue for hook_field_storage_update_integer_column. I have moved EntityFieldUpdateQuery to Drupal 8 where it belongs.
Comment #172
chx commentedResetting title. Also note that while I did write field_sql_storage_update_integer_column the next step is a serious challenge, how do you expose this to the world? field attach API deals with entities. So I am in support of #914458: Remove the format delete reassignment "feature"
Comment #173
sunSerious waste of time, I see. Can we please stop to post arbitrary statements like
without any foundation? The latest patch doesn't need such a restriction.
Comment #174
chx commented#914458: Remove the format delete reassignment "feature"
Comment #175
gisleRemoving non-canonical duplicate tag. See #2426171: Multiple tags similar to 'API clean-up' for background.