Performance follow-up: Convert Taxonomy wrapper to the new caching pattern
| Project: | Category |
| Version: | 6.x-2.0-rc1 |
| Component: | Wrapper modules |
| Category: | task |
| Priority: | normal |
| Assigned: | Unassigned |
| Status: | fixed |
This is in fact a follow-up from #501378: PERFORMANCE! Central caching for category API functions, where I introduced a caching pattern for Category module. AFAICT, it works pretty well (apart from one typo #548544: Category navigation missing links), as it removed a whole heap of slowest queries on my site. So, I then moved my focus on to another optimizations across installed modules on the site (and alternative caching backend, etc.), but the slow queries list shown by devel module also took my attention to the Taxonomy wrapper. The legacy Taxonomy functions obviously also run (maybe because I run Forum module), and are also sources of slow queries (although not as critical as core Category was before the previous patch).
So my proposal here is to convert the legacy Taxonomy data-loading functions to the new caching pattern as well. I considered this in the initial patch already, but postponed this part, because I was unsure about possible consequences on these legacy core-tables. But now I've already converted two most expensive Taxonomy functions on my live site (as a part of my performance across-modules hacks), and it works just fine. I feared that Taxonomy tables may be written directly from elsewhere (bypassing the needed cache flush), but in fact there's no point for other modules to play with tables directly, when Taxonomy have API functions for the saves. And each such save with Taxonomy wrapper is a node_save() call, which then flushes caches properly in category.module - so we don't need to bother about flushing in the wrapper at all. Then again, if some module *did* save data to Taxonomy tables directly, then we're in trouble already, as that won't get forwarded to Category system, so the legacy tables will be out-of-sync then, and a little stale cache is least of the problems.
I did a grep search for taxonomy tables being referenced in my site's codebase, and it came back with nothing worse than unused devel_generate.module, unused core taxonomy, and our taxonomy wrapper; plus purely read access in Views, Token, Pathauto, img_assist, Forum, and a couple more occassions in core (which is all OK).
So, I believe this is OK to proceed. In fact, this is a simple task, because the central caching function is already in place, as well as the whole cache-flushing part, so we only just need to wrap the actual code in functions into a simple cache-fetching structure, and that's all.
I hope to tackle this as time permits, but any help is welcome.

#1
So, I rolled a patch for this. It's just a straightforward conversion of the taxonomy wrapper legacy data-loaders to the category caching pattern, very similar to what my pre-rc1 patch did in category.inc counterparts. This obviously means a lot of indentation changes (wrapping existing code into conditional blocks), but apart from a few static variables being replaced by our central caching (occasionally leading to simplified array-keys), there's no change in fact.
We don't need explicit cache flushing in taxonomy wrapper, because all saves are automatically accompanied by node_save() [as long as we're not in serious trouble with the wrapper in general].
---------------------
The patch also includes one additional fix in category.inc, which is a follow-up fix after my previous performance patch. There's a small weirdness in existing code, that forced me to adopt inefficient caching pattern in category_get_tree() on quick look, previously:
category_get_tree() have a strange condition in the first block of processing, whose results we store in cache:
if (!$distant && $depth == 0)This part of code is just some sort of sanity check, enforcing that the current container is a root parent of the category (even if it means a change to the parent property on the category). Bypassing that check in case of distant parenting is fine, but why check for depth? This is a recursive function, so we'll only do that check on first iteration - then the result is cached (and it always WAS cached, in a static at least), so the check for depth won't have any effect on further iterations anyway. It doesn't do anything unless we call the function with an arbitrary $depth - which is documented as "internal use only" argument.
Also, the check is already done in the query (through $distant_sql), as far as c.cnid is sufficient, and if we wanted to check for orphaned categories in the other hand, we might need to check all the way up to the root of the tree. So this check is quite weird to my eyes. I'm going to take a closer look if time permits (check where and why it was added, if I can find it).
Having $depth inside the loop, strictly taken, requires the $depth to be in the cache key too, and so short-circuits caching for the recursive processing [each recursion getting new cache entry] - and that's really bad. Since it obviously didn't work with the old static caching too, and since $depth is always zero on first (cached) iteration (unless "internal use only" argument got abused somehow), I'm removing the $depth from the cached part for now. All the existing calls to category_get_tree() in category package either pass nothing, or the default -1 value as $depth anyway (so cached data are generated with the $depth condition always true).
-------------------------
I'm going to further test this patch in a few days (also put it to my live site, along with my other patches). I'll report back then.
#2
Thanks, JirkaRybka. Please post follow-up patches if you have further improvements to this one.
Committed to HEAD.
#3
Thanks. This patch is running on my production site all the time since it was posted here, without any problems. So I believe it's really OK.