$container->hierarchy not updated correctly, plus code clean-up.
| Project: | Category |
| Version: | 6.x-2.0-rc1 |
| Component: | Code |
| Category: | bug report |
| Priority: | normal |
| Assigned: | Unassigned |
| Status: | fixed |
The 'hierarchy' field on a container is actually a dead corpse of a field. It used to be a setting back on 5.x (both Category and Taxonomy), controlling how categories (terms) may be parented (root only, one parent, multiple parents), but it's removed from the UI now in 6.x (both Category and Taxonomy).
As for Taxonomy, the removal originated from #192242: Usability: Improve taxonomy add forms - originally suggested by Dries at #41, patch at #60 - although this one seems incomplete, obviously there was some other commit involved too. The field still exists in db schema, but it's only set by helper function taxonomy_check_vocabulary_hierarchy() according to real terms structure. On new and edited vocabularies it's set to zero (on edits that's a bug #580040: hierarchy field in vocabulary table resets to 0 when editing vocabulary). The hierarchy flag is only used (plus updated) on administrative listing (apart from possible legacy uses in contrib), and updates in form submit handler. This handler is where the term is actually saved in core Taxonomy module.
So, the field is now only just an internal summary flag to adjust some administrative displays.
Now in Category, we have almost the same code inherited. The setting is gone from the UI, and we have category_check_container_hierarchy(), called from a form submit handler category_form_category_submit() - that's all more or less the same as in Taxonomy. This is a big bloat of code, inherited from core Taxonomy, but in fact it's useless. There's a lot of sloppy playing with old&new category in form data prior to the category being actually saved, special casing of increase and decrease of parents count - and it currently doesn't work.
As I discovered while testing #251119: Allow various node types with category behavior to be used for free-tagging (per container), the flag is updated in a form submit handler (!), so it doesn't get updated on node_save() driven operations like free-tagging and Taxonomy legacy saves. While core Taxonomy creates it's data in a form submit handler, we have category_save_category() inside hook_nodeapi() for this purpose, so we can't just inherit the taxonomy code for this workflow. Also we currently don't handle deletions of categories at all.
Also, it isn't set correctly on newly created categories with one parent, because there seems to be the same number of parents detected as "old" state of the (new!) node. The 'new' node skeleton contains one parent with zero value, which is ignored in part of the processing, but still counts as one pre-existing parent elsewhere. One new parent is then considered as "no change", and the hierarchy flag doesn't get updated.
But rather than fixing this messy code (really, core didn't get a truly nice solution here), I would say: All we need is this:
$hierarchy = (int) db_result(db_query('SELECT COUNT(h.parent) AS parents FROM {category_hierarchy} h INNER JOIN {category} c ON h.cid = c.cid WHERE c.cnid = %d AND h.parent <> %d GROUP BY h.cid ORDER BY parents DESC', $cnid, $cnid));
$hierarchy = $hierarchy < 2 ? $hierarchy : 2;This retrieves the correct value for 'hierarchy' field (assuming that the new category have been already saved), and since this is the case of saving a category, performance shouldn't be a big concern. Put this into a helper function, along with a save to the container (preferrably only if changed), and call from category_save_category() [for both containers, if changed], and from category_del_category() too(!), and we're done with updates to the flag.
This change will both fix the result, and get rid of the messy code, mostly.
Just for record: The hierarchy flag is only used in hook_help() for administrative pages, on the administrative listing (seems similar to core taxonomy), and to hide "Add child" links - the last one making no sense now that the flag doesn't control category additions in any way (it's controlled by these additions - it's all the other way). [I'm going to re-roll #251119: Allow various node types with category behavior to be used for free-tagging (per container) because of that.] Taxonomy wrapper only just passes the value between container and vocabulary without any processing. (We also have taxonomy.admin.inc, which seems to be an unused mirror of the core file? But that's for another issue.)

#1
I rolled a patch, it was rather simple. I tested, it works fine, the hierarchy flag is now magically set to a correct value however I play with categories (so bug fixed), and code much, *much* simplified.
There's a small TODO in there: Taxonomy wrapper needs to update legacy data too. But that bit is not changed by my patch, so we're no worse than before. Also there are other places with the same problem (changes done through administrative overview screen, in particular), and IMO this is for larger discussion - maybe the taxonomy wrapper even needs some hook invoked from Category, to react on smaller updates to the database (non node_save). That's beyond this issue, though.
So, I believe that the small TODO is actually out of scope here.
#2
OK, let's re-introduce some of the optimizations, since it's simple and cheap. Now we accumulate parents count as we're saving in category_save_category(), and pass it into the hierarchy updater function. This new argument is basically a hierarchy setting suggested by the newly changed category itself, serving as a minimum for the hierarchy flag update on the container: For example, if we have a multiple hierarchy category at hand already, the setting on container may not be lower, so we can skip the query and save a bit of time. This happens for most cases where we increase or keep the amount of parents. If we're having smaller amount of parents [than container's maximum] after the change (which is also true for all deletions, and removals [i.e. moving to another container]), the query needs to run, but still we can avoid it, if the hierarchy on the container was already set to the minimum of 0.
This addition to the patch is rather minor optimization, though, as we're talking about the case of saving a new/updated category, which is rare. I did this bit because it's easy. I believe it's not worth the trouble to go further (although core taxonomy did), playing with additional checks against previously existing state (i.e. "did the amount of parents on this category really change? Is it going up?"), as that would be more complicated and with loading of previously existing parents, so we'll likely end up with much more complex code, and possibly the same average performance in the end.
Additionally, this patch changes one awkward ternary (introduced in my yesterday's patch) to cleaner min(), and removes one redundant !isset(), because empty() already handles that (and is used a few lines above in a similar way, already, BTW).
#3
I discovered one flaw in the above patch: The root [container] parent doesn't count if it's the only parent (i.e. having all in root is flat, not single hierarchy, but when there are multiple parents, then the root parent counts (i.e. having a category with root + some_category as parents is multiple hierarchy, not single). I reviewed the original core taxonomy code again (just to be sure), and indeed it works just like that.
As for other thoughts about the original code coming from core taxonomy - in the original core issue, there's not any discussion about why it was written the way it is. I mean, there's an obvious, yet unadressed, additional question why there's a php-loop processing the whole, potentially big, output from taxonomy_get_tree() [that is a beast with a joined query and a lot of recursive php processing by itself, cached only in a static]. My version defers all that to query level [with GROUP BY and a sort], which might scale better to big amounts of entries [than a dual php loop, that is], and avoids issues with the *old* state being cached [even more important with our persistent cache now in Category] and so need to pre-update the retrieved data in-sync with the actual term saving operation, making the code ugly. So, yeah, I still see it as better option, to retrieve the whole thing afresh on query-level post-save, as implemented in my patch, rather than looping and counting and compensating for this'n'that at php level.
For record: The hierarchy-updating code in core Taxonomy was added at #193333: Add drag and drop to taxonomy. It also changed the default on new vocabularies to zero (we have that already), but otherwise there's absolutely no discussion about this part on that other issue. (It's one of these large Drag-and-Drop patches from quicksketch, where a tens-of-kilobytes patch is lucky, if it gets more than one sentence of explanation...)
One unpleasant bit about the query in this patch is, that there's apparently a difference between MySQL and PgSQL limiting the implementation: I was tempted to write
SELECT .... MIN(h.parent = %d) AS is_cont_only .... GROUP BY ....to check whether not all rows match the condition, as MySQL doesn't have a BOOLEAN data type, and returns tinyint 0 or 1 from comparisons. But unfortunately, according to documentation, PgSQL is more strict, returns booleans, and MIN() doesn't accept booleans - there's a BOOL_AND() aggregate function instead, which in turn isn't in MySQL. Since I didn't want to run into the unsafe land of DB-type specific queries (and I can't test on PgSQL), I avoided that by writing:MAX(ABS(h.parent - %d)) AS not_cont_only, which is purely numeric and gives an usable result, too.I also see, that the original code doesn't change the hierarchy flag for free-tagging containers/vocabularies at all. Although this makes sense on first sight (free-tagging container doesn't need ANY hierarchy), I decided not to follow this route, because it's unreliable. What I see is that a container being changed to free-tagging later, may have some already existing categories (not neccessarily flat), and although new additions with free-tagging indeed won't change the hierarchy, deletions of old [more hierarchic] categories would, and won't be updated correctly if free-tagging is a global blocker for the update. And then again, on true free-tagging containers we won't gain anything by blocking the hierarchy update, as the flag would be zero all the time, so the updating code doesn't do anything already.
So in fact, adding a check for free-tagging setting would only just break certain edge cases (and add one unneeded line of code), without providing any benefit.
Another problem that I see in the original Category code [replaced by this patch] is the case, where the category got moved to another container. This case was completely broken, because it only just updated the new container if the number of parents newly set on the category was lower than before (which is in no way guaranteed, being purely user input, and likely to be 1 most of time). The other case (number of parents increased) only updated the new container, if the previous case also fired and populated $new_container (which never happens, as the conditions are opposite), and with the same number of parents assigned in new container, there was no update at all (although both the containers are in fact affected). Also if the new parents count was higher, the old container got updated to that higher value (although in fact we just removed the category from there).
All in all, that code simply just didn't work. This patch simplifies that, and I successfully tested it in all the cases I can think of.
----------------------
On a broader scope of discussion, the whole thing with removal of 'hierarchy' setting from the UI makes some sense in core Taxonomy (as explained on the core issue #192242: Usability: Improve taxonomy add forms), because setting up a vocabulary and adding terms through the full form with parents etc., are both administrative tasks, so choosing parents on terms is sufficient by itself (it doesn't make sense if the admin limits options [by vocabulary setting] only for himself [adding terms later]). In Category though, adding categories is just ordinary content submission (category is a node, also a replacement for book page, and now just any node type may be a category), so submitting that is definitely NOT an administrative task (as oposed to Taxonomy). So while in Taxonomy the removal only just fixed sort-of duplicity on administrative pages, in Category it's a true feature-removal, making it impossible for site admin [container node owner] to restrict the hiearchy model of user-submitted category-behaving posts, as it was possible before.
If I had a chance to vote on this before it was done (although I don't really need this feature, having category submissions restricted to trusted persons on my site), I would've said keep the feature. In the other hand, there's the need to keep category module more or less compatible/similar to evolving core Taxonomy [which is an interesting topic with implications for every single major version of Drupal, but out of scope here] - so this says follow the suit, and remove the feature. But anyway, now the latter is already done, so my opinion then is: Keep what we have, just fix it [that's what this patch attempts].
----------------------------
The patch is ready now, as far as I'm concerned.
#4
Thanks for this, JirkaRybka. Committed to HEAD.