It seems that the entire page cache is cleared whenever a node gets created or an existing node is saved. I don't think this behaviour is by design. I am not sure how te debug this, what could be the cause of this?
So to be clear: I get a First Page Request on every page that previously was cached whenever a save a generic node.
This makes the module useless and I would really like it to work.
Thanks in advance for your help.

Comments

Jonah Ellison’s picture

Category: bug » feature

This is Drupal core behavior (see the last line). The project page explains this on the bottom in the To-do list by explaining the possible feature of creating "Advanced Rulesets." This is not a simple implementation, however. Example: if you have nodes on your homepage and create a new node, then Authcache will need to know whether to clear both the homepage and the node page. If there are taxonomy terms involved, then it needs to clear the related taxonomy page. If Views are used to display the node, then those pages need to be clear as well. And etc, etc.

eelke’s picture

Thanks for the quick reply, although this is obviously not strictly an Authcache issue, I'm curious how this plays out on a site where content is contantly updated. When the cache is cleared, the visitor has to wait while the non cached page is being served. Is there a way to rebuild the page cache for a certain page or set of pages? So that the site visitors won't notice the constant flushing of the caches. This way I can hook onto node operations and rebuild the cache for a certain set of populair pages.
I realize this is out of the scope of this particular module, but I would really appreciate your advise on this.
Thanks in advance!

Jonah Ellison’s picture

I've seen the rebuild functionality in the Boost (check out the boost_crawler_run() function). I would caution that implementing a crawler this might actually slow down a site... for example, if you want to rebuild 100 popular pages, it would be equivalent to your site getting hit with 100 requests every time a node is created/updated.

eelke’s picture

Thanks, I found that crawler too and decided it was not a proper solution due to the server load.

Are you currently working on these "Advanced Rulesets", maybe I could help, I think it would be a killer feature!

Thanks for the suggestions so far.

vood002’s picture

Subscribing. Not sure if there is much I will be able to contribute, but I'm looking forward to seeing this module's effectiveness increased, and curious how it will be accomplished.

gausarts’s picture

Subscring for better world. Thanks

vood002’s picture

I'm currently running Boost and Authcache together and it's humming along nicely. The benefit I'm getting from Authcache is relatively minimal due to consistent new content creation, but since i typically have about a 10-1 anon-auth ratio my overall CPU usage has dropped way down. I realize this is only loosely related but thought I'd share here in case people are looking for quick ways to improve authcache performance.

allain’s picture

I'm running a very high load site and this has basically forced me to only make changes to the site during slow times.

If there were a crawler that rebuilt 1 page at a time as it crawled, I could clear the authcache without having a Major performance slowdown.

a_c_m’s picture

Is there a development branch anywhere that we can help test?

This selective expiry of cached pages would be brilliant for both authcache and boost and so might even be an API they could both use?

Really interested in this.

Also - does the node save invalidation overrule the "Minimum cache lifetime" setting?

Jonah Ellison’s picture

No dev version, but I've been experimenting. Unfortunately, Drupal does not have a cache invalidation hook, so any type of selective expiration will require a hack to either the cache handler module (e.g. CacheRouter/Memcache) or Drupal core (if the database is used).

Boost actually already has some custom cache invalidation logic since it doesn't use the Drupal cache system and instead saves pages directly to the file system (which allows it to preform its own invalidation on node save, update, etc). The maintainer has forked the code out to the expire module, which is a great start and seems like the right direction. I'll need to add a hook to Authcache to support the expire module and figure out the correct way hack/modify the cache handler.

The "Minimum cache lifetime" setting should prevail to prevent cache invalidation. Of course, this means if your site is updated, then anonymous users won't see the update until the lifetime has been reached.

kentr’s picture

It appears that I'm having the opposite behavior. When I save or add a node, the changes don't appear until I manually clear the caches.

Where can we find out how to use the advanced rules to selectively invalidate?

Update: I wonder if the above is only true when logged in as user 1. I just tried as another user and my changes were immediately visible. Nope, happens as a standard user, too.

Jonah Ellison’s picture

kentr - There are no advanced rulesets. What caching engine are you using (Memcache, APC, etc?). Try different engines. Also try running Authcache without any CacheRouter or Memcache settings (this will make Authcache use the database).

aimeerae’s picture

Subscribing. I'm interested in the cache invalidation, expiry, and the use of authcache with memcache/cacherouter on a large, busy site with many related nodes. Insert windows as your operating system in addition to the multiple layered cache stack, and there are some very interesting challenges at play. I'd be happy to help test any further advancement of this module in both *nix and Windows environments. ;-)

kentr’s picture

@Jonah Ellison:

Thanks. I'm using the file engine. I'll try to DB, but the whole point of authcache for me was to avoid using the DB :-)

puddyglum’s picture

I have a selective flush working for our site, but it depends on Block Cache Alter, and it only works because I've hacked _authcache_key() to return the same key no matter what (all users get the same cache). We have about 5,000 users and about 6,000 pages, about 200 of them create content on the site. What I do is put a minimum cache lifetime of 18 hours, and use a web-crawler to crawl the whole site at night. I then use modified blockcache_alter code to force re-cache of pages that need to be updated.

It only works for "Re-cache When Node of this type is Added/Updated/Deleted", but that's all we use on our site right now.

You could probably take the modifications to the blockcache_alter function and put it in authcache_nodeapi() or something like that. I really am doing this last minute because our site was so slow we needed something desperately or nobody was going to use it... I'm immensely grateful for Jonah and all the work put into this module.

Modified authcache.helpers.inc _authcache_shutdown_save_page()

  // Save to cache
  $time = time() + (18 * 60 * 60); // Epoch time right now + Seconds in 18 hours
  //cache_set($key, $buffer, 'cache_page', CACHE_TEMPORARY, drupal_get_headers());
  cache_set($key, $buffer, 'cache_page', $time, drupal_get_headers());

Modified blockcache_alter.module _blockcache_alter_nodeapi()

  // RE-CACHE PAGES THAT USE EFFECTED BLOCKS
  global $theme_key, $base_url;

  // Get a list of all of the blocks that are configured to be recached if
  // node of this content type is added/updated/deleted
  $cached_blocks = db_query("SELECT REPLACE(name, 'bc_relate_','') AS block FROM {variable} WHERE name LIKE 'bc_relate_%' AND value LIKE '%%%s%'", $node->type.'";s');
  drupal_set_message("SELECT REPLACE(name, 'bc_relate_','') AS block FROM {variable} WHERE name LIKE 'bc_relate_%' AND value LIKE '%".$node->type.'";s'."%'");

  while($cached_block = db_fetch_object($cached_blocks))
  {
    if(substr($cached_block->block,0,6) == 'views_')
    {
      $delta = str_replace("views_","",$cached_block->block);
      $module = "views";
      cache_clear_all($module.":".$delta.":".$theme_key.":en","cache_block");
    }
    else
    {
      $delta = str_replace("block_","",$cached_block->block);
      $module = "block";
      cache_clear_all($module.":".$delta.":".$theme_key.":en","cache_block");
    }
    
    // Check for pages that use this block
    $block_pages = db_fetch_object(db_query("SELECT bid, visibility, pages
                                        FROM {blocks}
                                        WHERE module='%s'
                                          AND delta='%s'",
                                      $module, $delta));
    
    if($block_pages->visibility == 1 && $block_pages->pages != "")
    {
      // Check every alias to see if it matches the block pages
      $paths = db_query("SELECT dst FROM {url_alias}");
      while($path = db_fetch_object($paths))
      {
        if(drupal_match_path($path->dst,$block_pages->pages))
        {
          // It's a match, clear this pages cache
          cache_clear_all("00b4de".$base_url."/".$path->dst,"cache_page");
        }
      }
    }
  }
puddyglum’s picture

If anybody finds a better way to do this, please let me know, but for us a modified version of the code above is working wonders for us. Pretty much all thanks to the AuthcacheAjax framework... 5,000 pages loading almost instantly, with three user-specific blocks on every page, and with each page only re-caching when a block on the page has been updated or the page has been re-saved.

Thanks Jonah!

DIMSKK’s picture

@jmonkfish: Thanks for the code. I do not know anything about php, html or css and I am developing my website just by guess work. Can you or anyone please explain this (If anyone of you have spare time):

Note: I am using authcache+cacherouter in file mode.

1. How to set set the minimum cache life time to be 3 days for authcache? I can edit the time in the above code but I do not want the authcache to clear entire cache every time the cron is run. Does this makes authcache independent from cron if it runs after every 6 hours????

2. What is the purpose of crawler? Is it like a logged in user? How can i get this???

3. And what is the impact of this??

I've hacked _authcache_key() to return the same key no matter what (all users get the same cache)

Note: I have already disallowed cache clearing for authcache on node update/insert/delete by hacking cache router.

Thanks in advance!
DIMSKK

puddyglum’s picture

1. I replaced the last line in _authcache_shutdown_save_page() (authcache/authcache.helpers.inc) with this:

  // Save to cache
  $time = time() + (18 * 60 * 60); // Seconds in 18 hours
  //cache_set($key, $buffer, 'cache_page', CACHE_TEMPORARY, drupal_get_headers());
  cache_set($key, $buffer, 'cache_page', $time, drupal_get_headers());

That changes the cache from "CACHE_TEMPORARY" (or, cache expires when cache_clear_all() is ran), to 18 hours minimum.

2. The crawler downloads the entire site. This basically re-caches the entire site when its ran. This is the one I use: http://download.cnet.com/WinWSD-WebSite-Downloader/3000-2377_4-10562531....

3. I've hacked _authcache_key() because the original will return a different key depending on the roles you have on your site. Our site has about 40 roles, so that would mean 40 different cached versions of each page. It just returns "authcache" on our site, no matter what the role. There are other pieces of code in authcache that need to be modified to use "authcache" instead of $key outside of _authcache_key(), just have to look for those places I guess (search for $key)

DIMSKK’s picture

Thanks a lot for help. This was really helpful. As authcache works only for authenticated users, can you please tell how come the above crawler downloads the entire site in authenticated mode? I mean how is it able to crawl the site like an authenticated user?

puddyglum’s picture

I crawl it anonymously. I've modified authcache so that even anonymous users and authenticated users use the same cache. Had to modify _authcache_key and some other places to make it work right. I rely on the AJAX to make sure user-specific/role-specific content is replaced correctly.

Without modifying authcache you can still crawl as an authenticated user using the webcrawler above. It is javascript and cookie enabled, and you can specify cookies for the crawler to use. In Firefox with Firebug console enabled, a GET request will appear in console giving information on the AJAX call that authcache makes. Copy the cookie string from the Header tab and you can paste that into the crawler, and then the crawler will crawl everything under that session.

I'm just imparting suggestions from my experience with authcache thus far. I'm still learning how to use it properly. Had I known about authcache when I started our project I probably would have done things differently.

DIMSKK’s picture

Oh I see! Thanks a lot jmonkfish for help.

There are a lot of problems with caching system for drupal. I think that it was better if there were different caching systems for node, taxonomies and comments, each with unlimited or very large cache lifetimes both for authenticated and anonymous traffic (with use of ajax).

And the cache clearing system is setup like:

1. There is no cache_clear_all system.
2. If there is a change in any node only that specific node's page cache is cleared.
2. If a new node was added, only the corresponding taxonomy term page cache is cleared
3. If a new comment is made, only that comment's node page cache is cleared.

I am sorry if I have said something which is not possible or very very difficult to be done.

thanks
DIMSKK

chaps2’s picture

Here's a reliable way to clear a specific node's page cache without hacking the role key as described in #15. I'm trying this out as part of Community Tags integration with Authcache. The same technique could be used for other types of page.

The goal is to ensure up-to-date community tags are displayed on node pages following a CT tagging event. To do this, the node page of the node that has been tagged/untagged must be cleared from the page cache. Typically the node is not updated so a global cache clear does not occur.

The problem with selectively clearing node pages from the page cache is that the cache keys are constructed from the combined role keys of the users that triggered the cache store. The solution below is to maintain a list of the role key combinations and use these to re-construct the cache keys of the cached node pages.

/**
 * Maintain the set of role combinations. Store in page cache so that it gets 
 * reset when the page cache is cleared.
 * 
 * This should be called from the code that determines how the updated content 
 * is rendered in the authcached node page. e.g. hook_preprocess for a block
 * where $is_page_authcache is set. Or in the more general case could be called
 * by an implementation of hook_authcache_info() or hook_authcache_ajax() but might
 * result in more redundant calls to cache_clear_all().
 */
function _community_tags_authcache_update_role_keys() {
  global $user;

  // get the list of role keys used so far to cache pages where this function has been invoked.
  $cache_entry = cache_get('authcache_role_keys', 'cache_page');
  $keys = !empty($cache_entry->data) ? $cache_entry->data : array();

  // recreate the roles part of the key that is used to cache the current page.
  $key = _authcache_key($user);

  // update the list of roles keys if necessary
  if (!isset($keys[$key])) {
    $keys[$key] = $key;
    cache_set('authcache_role_keys', $keys, 'cache_page', CACHE_TEMPORARY);
  }
}

And then this is called when a node is tagged or un-tagged:

/**
 * Remove selected node pages that have been cached by authcache.
 */
function _community_tags_authcache_invalidate_nodepage_authcache($node) {
  global $base_root;

  // Invalidate cached node page for all roles
  $cache_entry = cache_get('authcache_role_keys', 'cache_page');

  if(!empty($cache_entry->data)) {
    $url = url('node/'.$node->nid);

    foreach($cache_entry->data as $role_key) {
      $cache_key = $role_key . $base_root . $url;
      cache_clear_all($cache_key, 'cache_page');
    }
  }
}
drupalninja99’s picture

How is it you are saying the cache gets cleared out on every node save? I am using authcache db and this is not the case. Maybe you dont have a minimum lifetime? I have the longest min/max lifetime and I only clear specific cache_page cids using rules.

That being said we have such a big site and traffic is erratic as far as behavior goes and that makes ppl hit a lot of uncached pages.

chaps2’s picture

@jaykali - There is a global page (and block) cache clear triggered by the cache_clear_all() call in node_save().

I'm intrigued by how you are using rules to clear pages in this particular scenario. How do you get the authcache generated cids for the page?

simg’s picture

Status: Active » Closed (fixed)

Worth checking out Authcache Actions (new modules)

http://drupal.org/project/authcacheactions

DIMSKK’s picture

Status: Closed (fixed) » Active

@simg: I think this module also does not solve the problem mentioned by OP in this issue.

The main problem is:

On any general node or comment save/update, ENTIRE WEBSITE'S CACHE shall NOT be cleared automatically, unless minimum cache lifetime expires. Instead, Cache of only that specific saved/updated/commented node page shall be cleared. (which is something that boost module does, but its not available for authenticated users)

simg’s picture

>On any general node or comment save/update, ENTIRE WEBSITE'S CACHE shall NOT be cleared automatically,

No, of course not. I hadn't fully appreciated what this post was about.

I've Just looked at the D6 node_save() code. W.T.F !

The D7 node_save() code works as you would hope - resets just the page cache for a single node - although even then I don't think you're *always* going to want to clear the cache on every node_save()

thehyperlink’s picture

@ #27
I am not convinced. - Upon further investigation I read this:
Where in D7 does the cache get cleared on content change? I see in D6 there's an empty call to cache_clear_all(), at the end of node_save, but nothing of the sort in D7.
...
In Drupal 7 those cache clears were moved to submit handlers, mainly so that things like mass imports of tens of thousands of nodes can handle cache clearing themselves at the end instead of once per node.
...
Indeed, it happens in node_form_submit!
https://api.drupal.org/api/drupal/modules!node!node.pages.inc/function/n...
// Clear the page and block caches.
cache_clear_all();

znerol’s picture

Issue summary: View changes
Status: Active » Closed (outdated)