Is there a way to cache the filtered list? If the list does not change frequently why keep the overhead of querying the database and sorting for every page request.

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

Kristen Pol’s picture

Assigned: Unassigned » Kristen Pol

Thanks for the reminder. This is on my to-do list but isn't currently implemented. I will likely use a drupal cache table for this.

In the meanwhile, perhaps turning on the system's block caching at:

[yourdomain]/admin/settings/performance

would be sufficient.

Kristen

ajayg’s picture

block caching is disabled at system level the moment you enable any node access module.

SO e.g on sites which uses organic groups with thousands of nodes, your suggestion won't work. It is painful to run one additional query for every page view even when you know it is going to return same thing over and over again. Using drupal cache table is right approach since you can further scale it using memcache or some alternate cache.

Kristen Pol’s picture

Good point. I've also seen weird behavior with block caching as well (showing logged in info to logged out users).

I have looked into using the drupal system cache table and it's pretty straightforward so this feature request is next in my queue... I'll *try* to get it done within the next couple weeks.

Thanks,
Kristen

ajayg’s picture

When you implement caching you need to implement per page + per user caching and not just per page caching, right?

Because if a page shows cached entries when someone else visited that page a while ago and the current user do not have access to those pages then you will be showing links which the current user do not have access.

Or may be not. Perhaps when access control module is turned on perhaps caching can't be used is right since it will require to query db for each page request as the chance of a person going back to the same page during session are limited.

Any thoughts?

Kristen Pol’s picture

Been sick with pneumonia :( so haven't gotten to looking to this further.

You make a good point though. The content that is to be shown depends on the page it's on. I guess I will have to see how views does its caching since, if you have arguments, it would be similar. I wonder if it really caches the view per page... seems like a lot of caching, but I guess that's the only way to deal with it.

I'm not sure the best way to deal with caching based on different roles and logged-in vs out... if you have to cache for *each* role and for anonymous on top of caching per page, that seems like a *lot* of data that being cached.

Kristen

Kristen Pol’s picture

I haven't forgotten about this one but have been scratching my head as to the best approach... I will probably end up doing the D7 migration before I figure this one out.

Kristen

Kristen Pol’s picture

Status: Active » Postponed
Kristen Pol’s picture

Version: 6.x-1.0-beta2 » 6.x-1.x-dev
Kristen Pol’s picture

Category: feature » task
xsanz’s picture

Hello everybody,

Few days ago I posted a feature request about caching, because of poor performance on a large number of nodes.

Kepol kindly noticed me about this issue.

So here I am, writing again about Performance subject.

After spending some days understanding the module and coding, here's my patch :)

A new table is created where store cache data (this table is created when module is installed, and deleted when uninstalled) so u don't have to worry about forgotten tables.

Cache Table's primary keys are block delta and node id.

There are two new options in 'Featured Content Block Configuration': 'sort_cache' And 'generate_sort_cache'.

First option means that when u create or modify a node that matches filter, it generates cache only for that node.

Second option means that when block is created or modified, cache data is generated for every filter matching's node.

When u delete a block, cache data is removed.

Data records are stored in table, trying first an update, if not, a new record is created.

On a 500 nodes, it spends about 300 seconds to generate all cache data. I've had to set up php.ini's max_execution_time to 600.

I'm thinking about implement also an action to do the same as 'generate_sort_cache', but as a cron task

I know maybe there are better ways to implement this, but my knowledge of Drupal it's limited ;)

Comments are welcome :)

Thanks everybody.

xsanz’s picture

Status: Postponed » Patch (to be ported)
FileSize
19.01 KB
93.27 KB

Hi, because of that was my first patch, I send again the same post, but with correct status.

Thx

Kristen Pol’s picture

Cool, I'm very excited to try this out. I did notice the patch didn't seem to be generated in an optimal way. The patch is showing a bunch of lines deleted but then the same lines added back in, so I assume it is due to whitespace. When you are creating the patch, can you not include changes that are just whitespace changes?

e.g.

diff -upbB file1 file22 > diff_file1_file2.patch

Thanks!!!
Kristen

xsanz’s picture

Sorry,

It was my first patch :P

Here u have.

Xavi

Kristen Pol’s picture

Thanks! I will check it out soon.

Kristen

xsanz’s picture

Hi,

kpol, i've found that when cache data is inserted, in some cases drupal_write_record in update mode does not work as expected.
So cache data isn't inserted in table, if u experience this u can comment where drupal_write_record updates (is in an if sentence), and leave drupal_write_record only.

Regards

Xavier

Kristen Pol’s picture

Hi Xavier,

Unfortunately I'm super slammed right now with many projects & with kids... & I still need to do my taxes (yes, I know).

At this rate, I'm guessing I won't be able to really review this until Oct 15th since I know my taxes will be done by then ;)

Sorry about that! I'm very happy you are helping out and hope to test this soon.

Kristen

xsanz’s picture

No problem Kristen,

So this way I can introduce some menu_callback features to call generate_sort_cache via cron ;)

And... I also was testing performance with 15.000 nodes (generating one node featured content against 14999 nodes) and it took almost 600 seconds to generate, so if I must trigger a cache generation for 15.000 nodes... It could take 100 days!!

So I would like to test some other sort algorithm. I've been 'googling', and i've found that 'MATCH...AGAINST' algorithm could be useful. I will do some tests, and tell u

http://dev.mysql.com/doc/refman/5.0/en/fulltext-search.html

Happy to contribute!

Xavier

Kristen Pol’s picture

Btw, I thought I had linked to this before, but didn't notice it in the comments above. Here is some tips on how to hook into the regular drupal caching system:

http://www.lullabot.com/articles/a-beginners-guide-to-caching-data

I would guess that it makes the most sense to only cache each block right when needed rather than generating all caches at the same time. It's my understanding that's how the views caching works.

Thanks!
Kristen

xsanz’s picture

So as I told u before, I've added a MENU_CALLBACK page to trigger a cron action

U can call it whit something like this:

0 * * * * wget -O - -q -t 1 http://localhost/featured-content/cache/generate/1

Where (1) is the block delta

I also added a new form of sorting based on node title & teaser, using MATCH...AGAINST mySQL algorithm (I think it might be really useful).

I also improved performance by changing your select SQL in sort_by_terms,
U did a 'select' within a 'select', i changed this to a 'left join' sentence

I think, also noticed a bug in .module file:

     while ($row = db_fetch_object($results)) {
-      $tmp[$row->nid] = $node[$row->nid];
+      $tmp[$row->nid] = $nodes[$row->nid]; // I think there was a bug here.
     }

I've commented the following lines because they were necessary?

+    /*
     foreach ($nodes as $nid => $node) {
-      if (! in_array($nid, $tmp)) {
+      //if (! in_array($nid, $tmp)) { 
         // tack on the rest of the nodes
         $tmp[$nid] = $node;
+      //}
+    }
+    */

So performance test results were:

Number of nodes: 500

Test 1 Original:

# Cache Generation - Starting time: 26-09-2010 10:11:00
# Cache Generation - Ending time: 26-09-2010 10:20:12

Test 2: solving a bug in line #782 and eliminating "if in array" iteration

# Cache Generation - Starting time: 26-09-2010 10:36:38
# Cache Generation - Ending time: 26-09-2010 10:39:45

Test 3: changing SQL sentence to LEFT JOIN

# Cache Generation - Starting time: 26-09-2010 16:47:05
# Cache Generation - Ending time: 26-09-2010 16:48:35

As u can see final performance on sort_by_terms was about 90 seconds on a 500 node database, much better than the inital 500~600 seconds (~5.5x faster).

I'll try to test it on a larger database tomorrow monday

By the way here are the patches, they are applied to your original code, not to previous patches of mine :)

Regards,

Xavier Sanz

Kristen Pol’s picture

That is awesome that you are bench-testing this! Keep up the great work ;)

Kristen

xsanz’s picture

Thanks for the link Kristen,

The problem I am facing here is that first time user loads node page has to wait to generate the cache (almost 600 seconds on a +15.000 database record).

Because new nodes are added continously (3-4 at day) and all featured-content must be up-to-date, users could face often this situation;

The solution I've found here is: 'block data needs to be generated in 'background'', so:

1.When a new node is added or changed, block's data should be generated for new node, and old ones (just to include new one)

2.When a featured content block is added or changed, block's data should be generated for all nodes related to block.

I may be wrong but the reason is using a own 'cache' table schema could prevent u from anyone 'accidentally' deleting your 'drupal cached' data. Also I've read there were some problems with users viewing cached data.

Regards,

Xavier

Kristen Pol’s picture

It sounds like you are trying to be very clever in the caching. That's good! My understanding of the views caching is that it just shows the cached data until you clear the cache or update the view... though maybe it is more clever than that and updates when there are nodes that are updated that impact the view (though I don't know how it would know that unless it's a views argument).

I agree that generating the cache in the background is a good solution. If it's run during cron, it will run in the background, but I think it may time out if it's too long. At least that's the experience I had using xmlsitemap generation with cron in some cases. But, I haven't dug into the code to say for sure.

Maybe you have already added, but to have it run upon cron, add a hook_cron function:

featured_content_cron()

and then call your generate code from there.

Thanks,
Kristen

xsanz’s picture

Hi kristen,

I found a little bug of mine, causing not to show featured content not using cache.

Here's the patch.

This morning I did some test on a 15.000 nodes database, generation time was over 1~4 seconds each node. I didn't finish tests, but I think maybe could be acceptable not to using cache.

By the way, why don't we contact ajayg to act as a 'beta tester'? :)

Kristen Pol’s picture

xsanz - I have started looking through this code. There are a lot of changes so I would want to update the code slowly to start integrating this stuff in rather than do it in one big jump.

I first try to introduce your "sort by words" functionality, but I get a mysql warning when implementing:

http://drupal.org/node/969440#comment-3698738

so unfortunately that can't be added into the code.

I think the steps will be for me to start pulling out the use of arg(1) and such from the code to make those variables that are passed in. Thus, like in your patch, the node id can be passed into the function rather than assuming the data is coming from the URL.

I would still like to review the views caching and other caching mechanisms as well since there may be issues with user permissions viewing certain nodes. If we cache a list of nodes, not all users may have access to all those nodes.

xsanz’s picture

Ok! no problem :)

xsanz’s picture

the warning in your mysql it's because u must add a fulltext index in your node table :)

http://dev.mysql.com/doc/refman/5.0/en/fulltext-search.html

Kristen Pol’s picture

Is there any performance downside to adding that fulltext index to the node table?

If that feature was to be added to the module, it would need to be optional because some people would not want to alter drupal core tables. If the feature was made to be enabled, then the module itself would need to do the index change.

Kristen

xsanz’s picture

There's no performance downside, only the time for generating the index, yeah i agree with u

xsanz’s picture

Hi kepol, You want me to implement this 'feature' as an optional?

Kristen Pol’s picture

Hi xsanz. It may be useful for others to have. But, I'm not sure I could get it committed to the code until January. I have a hard deadline to meet for early December and then am going on vacation for a couple weeks.

If you decide to make a patch, please do it based on the current state of the code (version 6.x-1.2).

Thanks!
Kristen

ajayg’s picture

Any update on this further?

Kristen Pol’s picture

Status: Patch (to be ported) » Postponed

I'm very sorry this never happened... :( It's obviously been on my list forever and has always slipped away. I have thought about it a lot. But, it's such a big patch and affects so many things and the code base has changed to deal with other features/issues so the patch will need to be updated (by me). Also, D7 has come out as well so I will need to patch that too.

My current thought is that I will need to create 6.x-2.x and 7.x-2.x branches to work on the performance/caching stuff. It's just too hard to do within the 1.x branches while other issues come up.

Again, my deep apologies to xsanz for not following through with this. I wish I can say when it will happen but I just really can't. Anytime I mean to dig into it, something happens with work or kids or life.

Kristen

Kristen Pol’s picture

I have created new 2.x branches in both D6 & D7 for the performance/caching stuff... at least that is a start. Now I just have to keep the momentum going!

Kristen

ajayg’s picture

Just so you are aware. When I moved this on production site , it barely worked (creating/editing features block) I always got browser warning the javascript on the page taking too long do you want to stop the script. I had to stop the script to proceed further.
So this is a great module for smaller sites. But for very large sites(greater than 40 K users or 30k+ nodes) even the admin UI has significant performance issue.

The performance for a visitor also has significant issue. NO surprise here that is why this issue is created. Site slowed down significantly and I had to disable the module in production.

For those lurking here. I will still recommend this to smaller sites.

Kristen Pol’s picture

Thanks for the data! Yes, I'm not surprised. I would like to deal with the performance issues as my next task. But, I'm going to be writing a book for the next couple+ months so...

Now that that there is a new 2.x branch for dealing with performance, perhaps someone wants to take a stab at modifying that code with xsanz patch. The module has changed since the patch so it may not be entirely obvious. If someone is interested in doing this, I can add the person as a co-maintainer assuming that any work is only done in the 2.x branches.

If anyone is interested, please let me know.

Thanks,
Kristen

Kristen Pol’s picture

I'm still finishing up my Drupal 7 Multilingual Sites book:

http://www.packtpub.com/drupal-7-multilingual-sites/book

which took twice as long as expected (which should have been expected!).

Then DrupalCon Denver this week... I'm annoyed at how long it's taking me to deal with this issue but I don't have a cloning machine :(

Kristen

xsanz’s picture

Category: task » support
Status: Postponed » Active

Hi Kristen!

I've been while away from a computer, now I'm back.

I've seen you've been dealing again with cache/performance issues.

Your module is one that I know a bit so I think I might be helpful, please drop me a line.

Regards!!

Xavier

Kristen Pol’s picture

Status: Active » Postponed

I won't be able to work on this though I guess that's pretty obvious since it hasn't happened yet :/

If someone wants to work on the 2.x branch to work on caching, I can add them as a co-maintainer. There aren't a lot of changes happening on the 1.x branch but it would be safer to do it in the 2.x branch. It would provide a safe place to work.

Kristen Pol’s picture

Assigned: Kristen Pol » Unassigned
Issue summary: View changes
Kristen Pol’s picture

Version: 6.x-1.x-dev » 7.x-2.x-dev

Doesn't make sense to rework 6.x code so switching this to 7.x.