Using freetagging, we're having some scale issues with tagadelic. Simply put, the volume of taxonomy terms overwhelms the following query in tagadelic_get_weighted_tags():

  $result = db_query_range('SELECT COUNT(*) AS count, d.tid, d.name, d.vid FROM {term_data} d INNER JOIN {term_node} n ON d.tid = n.tid WHERE d.vid IN ('. substr(str_repeat('%d,', count($vids)), 0, -1) .') GROUP BY d.tid, d.name ORDER BY count DESC', $vids, 0, $size);

In my install, with 300+ terms, this query takes 2000+ ms to complete.

The attached patch solves this by creating a tagadelic table in the database, which stores the data needed to generate the tag clouds.

This data is updated via nodeapi on 'insert', 'update', and 'delete.' Term data is updated on hook_taxonomy 'delete' or 'update.'

One issue not fully tested: Since taxonomy_node_save() needs to execute before tagadelic updates the term count, I'm not sure this approach gets us the most accurate counts. There would be at least one additional way to do so.

On node save, store a varaible (tagadelic_node = $node->nid). Then put a non-cached menu check againast that variable. If present, reset it to zero and run the mnew tagadelic_index_node() function.

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

agentrickard’s picture

FileSize
6.75 KB

Update to the patch that creates a tagadelic_update() function that converts term_node data to the new table structure.

agentrickard’s picture

FileSize
6.75 KB

Accidentally left a db_queryd in the last patch.

agentrickard’s picture

FileSize
6.77 KB

After more testing, added a default ORDER BY to the get_weighted tags query.

MySQL times for the query is now sub 1 ms.

See http://rubybaboon.com/tagadelic

agentrickard’s picture

FileSize
7.06 KB

Missed a condition in taxonomy_node_save where the term might be passed as an array of objects.

This patch corrects for that.

Bèr Kessels’s picture

I am just wondering why you chose for the route of a private table and not simply caching the retreived array with weighted tags in the cache table? It would make the code a lot cleaner afaik.

I might miss an obvious point though :)

agentrickard’s picture

Caching the array didn't occur to me until later ;-).

In our implementation, we are running auto-generated content that updates via cron, and our tag cloud changes frequently. We wanted the tag clouds to update without lag.

That said, using cache may be a better route.

In either case, a patch is needed IMO, because of scale issues.

Bèr Kessels’s picture

Yes a patch is certainly needed. But i prefer this module to remain tableless, and since we have a proper solution (cache) we can solve this much easier.

agentrickard, do you have the resources to rewrite this whole patch so it uses Drupals caching instead? If not, please let me know too, so I can have a look at the current patch more closely.

agentrickard’s picture

Patch vs. 4.7 or vs. HEAD?

I can do a cache patch vs. 4.7 easily.

agentrickard’s picture

FileSize
2.54 KB

I think this should work against 4.7 (// $Id: tagadelic.module,v 1.19.2.3 2006/06/13 19:35:35 ber Exp $).

The change is in tagadelic_get_weighted_tags().

Before running the db_query, I check for a cache object. Tagadelic cache objects are named 'tagadelic:' followed by a list of $vids. They are set to expire every half hour.

Tested on localhost only.

Bèr Kessels’s picture

Version: 4.7.x-1.x-dev » master

If possible against CVS. I am very reluctant to introduce new features on a stable branch.
If not possible, please let me know, so I can try to apply (by hand) to a CVS version. By now, I did not even try that, because you mention it explicitly for 4.7.

The code looks good. I love the fact that with only five (six?) changed lines (disgregarding the change in identation) you introduced this caching. Nice job!

Bèr

agentrickard’s picture

CVS changes would take a little while, since Dries just committed the cache table patch and my cvs version is out of date. (http://drupal.org/node/72290).

I'll see what I can do over the long (US) holiday weekend.

Bèr Kessels’s picture

Sorry! My mistake! no, I meant CVS as in "the cvs version of tagadelic". tagadelic HEAD is still completely 4.7-compatible. I will only start to upgrade tagadelic to core-head it that head becomes frozen (and maybe even then i'll wait a little).

So: I meant: use HEAD tagadlic, on a 4.7 Drupal, and make it work there. You will find that HEAD tagadelic has loads of features that 4.7 tagadelic has not. Yet both run fine on 4.7

(why not release it for 4.7 I hear you think! well: I prefer stability, incubation times and predictability. Example: on my host I needed to upgrade image module 4.7 to the new image module 4.7 (security issue) and by doing so, I broke 30 sites! Because image module 4.7 (new) was completely different from 4.7 (old). I think that is BAD. I prefer not to go there... (though I love the new split up image modules)

agentrickard’s picture

FileSize
2.55 KB

The whole CVS version thing confuses my little mind.

Attached is a Drupal 4.7.3, Tagadelic 4.7.cvs patch for caching the $tags generated by tagadelic_get_weighted_tags().

agentrickard’s picture

Looking at the patch, it may be necessary to add $cid = 'tagadelic: nullset' before the else on line 203.

This would force a null return on line 207 [$data = cache_get($cid);].

This is only needed if sending a blank $cid might return data from the cache table.

Setting the $cid to 'tagadelic: nullset' would ensure that no data match is found in the cache table.

Bèr Kessels’s picture

Status: Needs review » Needs work

From your last comment I assume that this patch needs some work.

Bèr Kessels’s picture

http://drupal.org/node/64022 was marked duplicate of this.

agentrickard’s picture

Status: Needs work » Needs review

No, I don't think that modification is necessary.

If $cid is not set, then cache_get should fail to return a result.

This would only be an issue if the cache table allowed cid to be NULL, which it does not.

That comment was simply the only flaw I could see in the logic.

I think the patch is ready for testing.

Bèr Kessels’s picture

I have no experience with benchmarks, so if someone strolls along who feels like doing some, please give me some stats on the performance before and after this patch?

Bèr Kessels’s picture

Title: Store tagadelic data in a table » cache tagadelic data
FileSize
0 bytes

This renewed patch is untested, but should work on HEAD. please try it and report if it worked for you. Thanks

mudanoman’s picture

Ber,

It doesn't look like the cache patch file was uploaded properly.

Thanks,

Ivan

Bèr Kessels’s picture

FileSize
0 bytes

retry

Bèr Kessels’s picture

Status: Needs review » Needs work

Hmm, somehow the file itself is empty, this means that I lost my work.

m3avrck’s picture

I ran some benchmarks on just caching the entire tagadelic block output in general and it's significantly faster up to 14% or so.

Depends on the number of terms, etc... but with 100 terms and showing 50 in 3 different tagadelic blocks, it was great.

I have not yet benchmarked this patch in particular but it seems like you'd see equal results. I'll have a more detailed review tomorrow.

alliax’s picture

What is the current state of this? Since you lost all your work on caching the results in the cache table, nothing has been done?
I am using tagadelic happily but as one site I have a freetagging and the catgory got bigger, with under 1000 visitors a day, 1and1 blocked my database and I had to change hosting company before I realised having 3 tagadelic blocks on every page view without block caching might be why the mysql consuming was too high. I removed the tagadelic blocks for now.

arthurf’s picture

FileSize
2.64 KB

The last two patches don't seem to exist any more, not sure what's happened. Anyway, I did a quick and dirty version of a caching scheme that has admin configurable time limits.

arthurf’s picture

FileSize
2.75 KB

Ignore my last patch. This one should work a bit better.

Bèr Kessels’s picture

I don't like the time limits. Drupal has all sorts of comlpex logic to determine if a cache needs wiping. We should not add more stuff to that.

Without the option the patch looks fine.

arthurf’s picture

FileSize
1.92 KB

Point well taken. Here's a patch minus the time stuff.

Bèr Kessels’s picture

Status: Needs work » Needs review

Good! But some teenyweeny thingsies. Not that I want to sound like an *ss, but I'd like to keep stuff clean, or else not make it messier then it is =)

* please remove the comment about caching, that is a hint/todo, wich would be done after this patch =)
* mind your spacing its if ($omething) { and not if( $something) . same for teh else followed by a wite line.

For the rest: good one and ready to go.

arthurf’s picture

FileSize
1.86 KB

Points well taken about style. I wasn't even paying attention. Here's the cleaned up version.

Bèr Kessels’s picture

Status: Needs review » Reviewed & tested by the community
Bèr Kessels’s picture

Committed to HEAD. Thanks a lot.

If anyone feels like backporting this to a stable branch, please open a new issue thread.

Bèr Kessels’s picture

Status: Reviewed & tested by the community » Fixed
Anonymous’s picture

Status: Fixed » Closed (fixed)
jayjaydluffy’s picture

I am about to apply this patch on my module in my site since I am having the same problem with my robust taxonomy. Should anyone of you here sum up what this patch does? This has been a long thread and me as a newbie could hardly follow the discussion.
I'm looking forward to your response to my request, and it's highly appreciated.
Thanks

alliax’s picture

You just have to update your module to the dev version, no need to apply a patch. Then, whenever the module maintainer will decide, you will be able to switch to the next release version.
That's what I understood, and when I upgraded to the latest dev version, the problem of the tagadelic bloc not updating when site cache was activated, suddenly disappeared.

jayjaydluffy’s picture

Thanks though I already have successfully patched my module with the latest patch above. Well, it goes perfect. Thanks again!

alliax’s picture

Excuse me, my last answer was completely out of context, I've mistaken two different issues and I was replying like I was talking about the other one. So please just ignore my last comment :-)

jayjaydluffy’s picture

I found something weird after applying this patch. After some deep observation on its performance, I found out that everytime a tag is inserted in a node, the cache is cleared and new data is set to cache. I want to disable this stuff since I've added admin control to automatically clear the cache after n seconds, where n depends on configured number of minutes. I've been examining it in the module but cannot locate where the stuff occurred. Please anyone can help me find it. Thanks!

David Stosik’s picture

Title: cache tagadelic data » cache tagadelic data (backport to Drupal 5 version)
Version: master » 5.x-1.x-dev
Assigned: Unassigned » David Stosik
Status: Closed (fixed) » Needs review
FileSize
1.91 KB

Hello,
Here is a patch that backports caching system on DRUPAL-5 branch.
Hope this can be applied.

By the way, if #391930: Tagadelic block cache displays wrong tags gets applied, I will need to backport this change (using cache_page instead of cache table).

Regards,
David

PS: don't use this patch, there is a forgotten dpr. uploadin new patch in next post.

David Stosik’s picture

Clean patch here.

Bèr Kessels’s picture

Status: Needs review » Reviewed & tested by the community

Looks good and solid. Tested on my blog (http://bler.webschuur.com/tags) and found no (degrading) issues.

If others can test this on large(r) databases, that would be great. Else it will go in to the next release this weekend.

Bèr Kessels’s picture

Status: Reviewed & tested by the community » Needs work

ber@yasmine:~/Documenten/TAG_tagadelic/unfuddle/tagadelic-5.x$ patch -p0 < tagadelic_cache_backport5_0.patch
(Stripping trailing CRs from patch.)
patching file tagadelic.module
Reversed (or previously applied) patch detected! Assume -R? [n]
Apply anyway? [n] y
Hunk #1 FAILED at 204.
1 out of 1 hunk FAILED -- saving rejects to file tagadelic.module.rej

David Stosik’s picture

Could you explain more, please?

Bèr Kessels’s picture

Well, the patch fails on a DRUPAL-5 checkout. patch says that it sees changes already in there.

What version did you patch against?

David Stosik’s picture

Well, if I can remeber, it was the HEAD at the time I posted...

MacMannn’s picture

Version: 5.x-1.x-dev » 5.x-1.1
Assigned: David Stosik » MacMannn
Category: feature » bug
Status: Needs work » Needs review

I'm pretty new to Drupal, but I've been playing around with the Tagadelic module (// $Id: tagadelic.module,v 1.36.2.12 2009/03/29 12:52:40 ber Exp $). On my testsite I've created a menu item that leads to a page where the tagcloud for all the content on the site is being shown. At first this seemed to work OK, but after a while I noticed that the cloud was not updating as more tags were added. Weird. Upon further examination it turned out that the cloud was generated once, CACHED, and loaded from the cache ever after...

Having read through the discussion above I can understand the need for caching. When you load a cloud on every node as a block, that could indeed produce some serious performance issues. In my case there's very little chance of that, since the cloud will only be generated when the user visits the specific cloud page by clicking on a menu item. So, I tried to work around the issue by changing the code of the module. By commenting out the following lines, I've been able to take out the caching and made the module behave as I wanted to.

211 $cache_name = 'tagadelic_cache_'. $options;
212 $cache = cache_get($cache_name);

215 if (isset($cache->data)) {
216 $tags = unserialize($cache->data);
217 }
218 else {

227 cache_set($cache_name, 'cache', serialize($tags), CACHE_TEMPORARY);
228 }

I'm not sure if the fact that the cloud was generated only once and loaded from the cache ever after on my testsite is caused by my setup of if it's a flaw in the module. But if it's a bug, then this comment might be helpful to others running into the same trouble.

David Stosik’s picture

Version: 5.x-1.1 » 5.x-1.x-dev
Assigned: MacMannn » David Stosik
Category: bug » feature
Status: Needs review » Needs work

This thread is only about backporting a feature of the Drupal 6 version. If you think caching the tagcloud is a bug, please open your own issue, or find another issue dealing with this problem. ;)

David Stosik’s picture

By the way the issue #391930: Tagadelic block cache displays wrong tags already deals with the problem you raise.
Bèr Kessels, should it be backported to D5?

Bèr Kessels’s picture

Status: Needs work » Closed (won't fix)