cache tagadelic data (backport to Drupal 5 version)

agentrickard - August 14, 2006 - 14:48
Project:Tagadelic
Version:5.x-1.x-dev
Component:Code
Category:feature request
Priority:normal
Assigned:David Stosik
Status:needs work
Description

Using freetagging, we're having some scale issues with tagadelic. Simply put, the volume of taxonomy terms overwhelms the following query in tagadelic_get_weighted_tags():

  $result = db_query_range('SELECT COUNT(*) AS count, d.tid, d.name, d.vid FROM {term_data} d INNER JOIN {term_node} n ON d.tid = n.tid WHERE d.vid IN ('. substr(str_repeat('%d,', count($vids)), 0, -1) .') GROUP BY d.tid, d.name ORDER BY count DESC', $vids, 0, $size);

In my install, with 300+ terms, this query takes 2000+ ms to complete.

The attached patch solves this by creating a tagadelic table in the database, which stores the data needed to generate the tag clouds.

This data is updated via nodeapi on 'insert', 'update', and 'delete.' Term data is updated on hook_taxonomy 'delete' or 'update.'

One issue not fully tested: Since taxonomy_node_save() needs to execute before tagadelic updates the term count, I'm not sure this approach gets us the most accurate counts. There would be at least one additional way to do so.

On node save, store a varaible (tagadelic_node = $node->nid). Then put a non-cached menu check againast that variable. If present, reset it to zero and run the mnew tagadelic_index_node() function.

AttachmentSize
tagadelic_2.patch5.54 KB

#1

agentrickard - August 14, 2006 - 15:27

Update to the patch that creates a tagadelic_update() function that converts term_node data to the new table structure.

AttachmentSize
tagadelic2.patch 6.75 KB

#2

agentrickard - August 14, 2006 - 15:31

Accidentally left a db_queryd in the last patch.

AttachmentSize
tagadelic3.patch 6.75 KB

#3

agentrickard - August 14, 2006 - 15:40

After more testing, added a default ORDER BY to the get_weighted tags query.

MySQL times for the query is now sub 1 ms.

See http://rubybaboon.com/tagadelic

AttachmentSize
tagadelic4.patch 6.77 KB

#4

agentrickard - August 14, 2006 - 16:03

Missed a condition in taxonomy_node_save where the term might be passed as an array of objects.

This patch corrects for that.

AttachmentSize
tagadelic5.patch 7.06 KB

#5

Bèr Kessels - August 17, 2006 - 21:37

I am just wondering why you chose for the route of a private table and not simply caching the retreived array with weighted tags in the cache table? It would make the code a lot cleaner afaik.

I might miss an obvious point though :)

#6

agentrickard - August 18, 2006 - 13:35

Caching the array didn't occur to me until later ;-).

In our implementation, we are running auto-generated content that updates via cron, and our tag cloud changes frequently. We wanted the tag clouds to update without lag.

That said, using cache may be a better route.

In either case, a patch is needed IMO, because of scale issues.

#7

Bèr Kessels - August 19, 2006 - 20:42

Yes a patch is certainly needed. But i prefer this module to remain tableless, and since we have a proper solution (cache) we can solve this much easier.

agentrickard, do you have the resources to rewrite this whole patch so it uses Drupals caching instead? If not, please let me know too, so I can have a look at the current patch more closely.

#8

agentrickard - August 24, 2006 - 16:53

Patch vs. 4.7 or vs. HEAD?

I can do a cache patch vs. 4.7 easily.

#9

agentrickard - August 24, 2006 - 18:08

I think this should work against 4.7 (// $Id: tagadelic.module,v 1.19.2.3 2006/06/13 19:35:35 ber Exp $).

The change is in tagadelic_get_weighted_tags().

Before running the db_query, I check for a cache object. Tagadelic cache objects are named 'tagadelic:' followed by a list of $vids. They are set to expire every half hour.

Tested on localhost only.

AttachmentSize
tagadelic_cache.patch 2.54 KB

#10

Bèr Kessels - August 29, 2006 - 22:40
Version:4.7.x-1.x-dev» HEAD

If possible against CVS. I am very reluctant to introduce new features on a stable branch.
If not possible, please let me know, so I can try to apply (by hand) to a CVS version. By now, I did not even try that, because you mention it explicitly for 4.7.

The code looks good. I love the fact that with only five (six?) changed lines (disgregarding the change in identation) you introduced this caching. Nice job!

Bèr

#11

agentrickard - August 30, 2006 - 14:05

CVS changes would take a little while, since Dries just committed the cache table patch and my cvs version is out of date. (http://drupal.org/node/72290).

I'll see what I can do over the long (US) holiday weekend.

#12

Bèr Kessels - August 30, 2006 - 19:16

Sorry! My mistake! no, I meant CVS as in "the cvs version of tagadelic". tagadelic HEAD is still completely 4.7-compatible. I will only start to upgrade tagadelic to core-head it that head becomes frozen (and maybe even then i'll wait a little).

So: I meant: use HEAD tagadlic, on a 4.7 Drupal, and make it work there. You will find that HEAD tagadelic has loads of features that 4.7 tagadelic has not. Yet both run fine on 4.7

(why not release it for 4.7 I hear you think! well: I prefer stability, incubation times and predictability. Example: on my host I needed to upgrade image module 4.7 to the new image module 4.7 (security issue) and by doing so, I broke 30 sites! Because image module 4.7 (new) was completely different from 4.7 (old). I think that is BAD. I prefer not to go there... (though I love the new split up image modules)

#13

agentrickard - September 1, 2006 - 14:25

The whole CVS version thing confuses my little mind.

Attached is a Drupal 4.7.3, Tagadelic 4.7.cvs patch for caching the $tags generated by tagadelic_get_weighted_tags().

AttachmentSize
tagadelic_cvs_cache.patch 2.55 KB

#14

agentrickard - September 1, 2006 - 14:30

Looking at the patch, it may be necessary to add $cid = 'tagadelic: nullset' before the else on line 203.

This would force a null return on line 207 [$data = cache_get($cid);].

This is only needed if sending a blank $cid might return data from the cache table.

Setting the $cid to 'tagadelic: nullset' would ensure that no data match is found in the cache table.

#15

Bèr Kessels - September 15, 2006 - 12:40
Status:needs review» needs work

From your last comment I assume that this patch needs some work.

#16

Bèr Kessels - September 15, 2006 - 13:31

http://drupal.org/node/64022 was marked duplicate of this.

#17

agentrickard - September 15, 2006 - 14:06
Status:needs work» needs review

No, I don't think that modification is necessary.

If $cid is not set, then cache_get should fail to return a result.

This would only be an issue if the cache table allowed cid to be NULL, which it does not.

That comment was simply the only flaw I could see in the logic.

I think the patch is ready for testing.

#18

Bèr Kessels - September 15, 2006 - 14:22

I have no experience with benchmarks, so if someone strolls along who feels like doing some, please give me some stats on the performance before and after this patch?

#19

Bèr Kessels - September 26, 2006 - 21:04
Title:Store tagadelic data in a table» cache tagadelic data

This renewed patch is untested, but should work on HEAD. please try it and report if it worked for you. Thanks

AttachmentSize
tagadelic_cache.patch.txt 0 bytes

#20

mudanoman - September 29, 2006 - 08:44

Ber,

It doesn't look like the cache patch file was uploaded properly.

Thanks,

Ivan

#21

Bèr Kessels - October 3, 2006 - 13:42

retry

AttachmentSize
tagadelic_cache.patch_0.txt 0 bytes

#22

Bèr Kessels - October 3, 2006 - 13:44
Status:needs review» needs work

Hmm, somehow the file itself is empty, this means that I lost my work.

#23

m3avrck - November 10, 2006 - 05:37

I ran some benchmarks on just caching the entire tagadelic block output in general and it's significantly faster up to 14% or so.

Depends on the number of terms, etc... but with 100 terms and showing 50 in 3 different tagadelic blocks, it was great.

I have not yet benchmarked this patch in particular but it seems like you'd see equal results. I'll have a more detailed review tomorrow.

#24

alliax - June 1, 2007 - 21:20

What is the current state of this? Since you lost all your work on caching the results in the cache table, nothing has been done?
I am using tagadelic happily but as one site I have a freetagging and the catgory got bigger, with under 1000 visitors a day, 1and1 blocked my database and I had to change hosting company before I realised having 3 tagadelic blocks on every page view without block caching might be why the mysql consuming was too high. I removed the tagadelic blocks for now.

#25

arthurf - August 15, 2007 - 22:40

The last two patches don't seem to exist any more, not sure what's happened. Anyway, I did a quick and dirty version of a caching scheme that has admin configurable time limits.

AttachmentSize
cache_29.patch 2.64 KB

#26

arthurf - August 17, 2007 - 19:24

Ignore my last patch. This one should work a bit better.

AttachmentSize
cache_new.patch 2.75 KB

#27

Bèr Kessels - August 22, 2007 - 13:28

I don't like the time limits. Drupal has all sorts of comlpex logic to determine if a cache needs wiping. We should not add more stuff to that.

Without the option the patch looks fine.

#28

arthurf - August 31, 2007 - 15:54

Point well taken. Here's a patch minus the time stuff.

AttachmentSize
cache.new_.patch 1.92 KB

#29

Bèr Kessels - August 31, 2007 - 20:22
Status:needs work» needs review

Good! But some teenyweeny thingsies. Not that I want to sound like an *ss, but I'd like to keep stuff clean, or else not make it messier then it is =)

* please remove the comment about caching, that is a hint/todo, wich would be done after this patch =)
* mind your spacing its if ($omething) { and not if( $something) . same for teh else followed by a wite line.

For the rest: good one and ready to go.

#30

arthurf - September 3, 2007 - 21:57

Points well taken about style. I wasn't even paying attention. Here's the cleaned up version.

AttachmentSize
cache_new1.patch 1.86 KB

#31

Bèr Kessels - September 14, 2007 - 21:11
Status:needs review» reviewed & tested by the community

#32

Bèr Kessels - September 15, 2007 - 09:30

Committed to HEAD. Thanks a lot.

If anyone feels like backporting this to a stable branch, please open a new issue thread.

#33

Bèr Kessels - September 15, 2007 - 09:31
Status:reviewed & tested by the community» fixed

#34

Anonymous - September 29, 2007 - 09:32
Status:fixed» closed

#35

jayjaytheluffy - September 3, 2008 - 01:54

I am about to apply this patch on my module in my site since I am having the same problem with my robust taxonomy. Should anyone of you here sum up what this patch does? This has been a long thread and me as a newbie could hardly follow the discussion.
I'm looking forward to your response to my request, and it's highly appreciated.
Thanks

#36

alliax - September 3, 2008 - 05:10

You just have to update your module to the dev version, no need to apply a patch. Then, whenever the module maintainer will decide, you will be able to switch to the next release version.
That's what I understood, and when I upgraded to the latest dev version, the problem of the tagadelic bloc not updating when site cache was activated, suddenly disappeared.

#37

jayjaytheluffy - September 4, 2008 - 01:43

Thanks though I already have successfully patched my module with the latest patch above. Well, it goes perfect. Thanks again!

#38

alliax - September 4, 2008 - 02:47

Excuse me, my last answer was completely out of context, I've mistaken two different issues and I was replying like I was talking about the other one. So please just ignore my last comment :-)

#39

jayjaytheluffy - September 5, 2008 - 06:42

I found something weird after applying this patch. After some deep observation on its performance, I found out that everytime a tag is inserted in a node, the cache is cleared and new data is set to cache. I want to disable this stuff since I've added admin control to automatically clear the cache after n seconds, where n depends on configured number of minutes. I've been examining it in the module but cannot locate where the stuff occurred. Please anyone can help me find it. Thanks!

#40

David Stosik - March 18, 2009 - 16:24
Title:cache tagadelic data» cache tagadelic data (backport to Drupal 5 version)
Version:HEAD» 5.x-1.x-dev
Assigned to:Anonymous» David Stosik
Status:closed» needs review

Hello,
Here is a patch that backports caching system on DRUPAL-5 branch.
Hope this can be applied.

By the way, if #391930: Tagadelic block cache displays wrong tags gets applied, I will need to backport this change (using cache_page instead of cache table).

Regards,
David

PS: don't use this patch, there is a forgotten dpr. uploadin new patch in next post.

AttachmentSize
tagadelic_cache_backport5.patch 1.91 KB

#41

David Stosik - March 18, 2009 - 16:25

Clean patch here.

AttachmentSize
tagadelic_cache_backport5.patch 1.89 KB

#42

Bèr Kessels - March 26, 2009 - 17:34
Status:needs review» reviewed & tested by the community

Looks good and solid. Tested on my blog (http://bler.webschuur.com/tags) and found no (degrading) issues.

If others can test this on large(r) databases, that would be great. Else it will go in to the next release this weekend.

#43

Bèr Kessels - May 11, 2009 - 16:57
Status:reviewed & tested by the community» needs work

ber@yasmine:~/Documenten/TAG_tagadelic/unfuddle/tagadelic-5.x$ patch -p0 < tagadelic_cache_backport5_0.patch
(Stripping trailing CRs from patch.)
patching file tagadelic.module
Reversed (or previously applied) patch detected! Assume -R? [n]
Apply anyway? [n] y
Hunk #1 FAILED at 204.
1 out of 1 hunk FAILED -- saving rejects to file tagadelic.module.rej

#44

David Stosik - May 13, 2009 - 10:51

Could you explain more, please?

#45

Bèr Kessels - May 13, 2009 - 16:31

Well, the patch fails on a DRUPAL-5 checkout. patch says that it sees changes already in there.

What version did you patch against?

#46

David Stosik - May 14, 2009 - 08:41

Well, if I can remeber, it was the HEAD at the time I posted...

#47

MacMannn - June 19, 2009 - 19:32
Version:5.x-1.x-dev» 5.x-1.1
Category:feature request» bug report
Assigned to:David Stosik» MacMannn
Status:needs work» needs review

I'm pretty new to Drupal, but I've been playing around with the Tagadelic module (// $Id: tagadelic.module,v 1.36.2.12 2009/03/29 12:52:40 ber Exp $). On my testsite I've created a menu item that leads to a page where the tagcloud for all the content on the site is being shown. At first this seemed to work OK, but after a while I noticed that the cloud was not updating as more tags were added. Weird. Upon further examination it turned out that the cloud was generated once, CACHED, and loaded from the cache ever after...

Having read through the discussion above I can understand the need for caching. When you load a cloud on every node as a block, that could indeed produce some serious performance issues. In my case there's very little chance of that, since the cloud will only be generated when the user visits the specific cloud page by clicking on a menu item. So, I tried to work around the issue by changing the code of the module. By commenting out the following lines, I've been able to take out the caching and made the module behave as I wanted to.

211 $cache_name = 'tagadelic_cache_'. $options;
212 $cache = cache_get($cache_name);

215 if (isset($cache->data)) {
216 $tags = unserialize($cache->data);
217 }
218 else {

227 cache_set($cache_name, 'cache', serialize($tags), CACHE_TEMPORARY);
228 }

I'm not sure if the fact that the cloud was generated only once and loaded from the cache ever after on my testsite is caused by my setup of if it's a flaw in the module. But if it's a bug, then this comment might be helpful to others running into the same trouble.

#48

David Stosik - June 22, 2009 - 08:46
Version:5.x-1.1» 5.x-1.x-dev
Category:bug report» feature request
Assigned to:MacMannn» David Stosik
Status:needs review» needs work

This thread is only about backporting a feature of the Drupal 6 version. If you think caching the tagcloud is a bug, please open your own issue, or find another issue dealing with this problem. ;)

#49

David Stosik - June 22, 2009 - 08:53

By the way the issue #391930: Tagadelic block cache displays wrong tags already deals with the problem you raise.
Bèr Kessels, should it be backported to D5?

 
 

Drupal is a registered trademark of Dries Buytaert.