When a normal naive admin user of Drupal sets the "Cache lifetime" in the performance page to, say, 12 hours, they pretty much expect their Pages to be cached for, well, 12 hours.

Instead, they are wiped from the cache the next time cron.php is run.

The issue has been chewed over pretty thoroughly in these places:
#739320: page_get_cache sets all pages as temporary
#1094372: Page cache don't used cache_life_time variable
http://www.metaltoad.com/blog/how-drupals-cron-killing-you-your-sleep-si...

I am not going to attempt to repeat all of those, nor un-wind the backend cache implementations etc.

I am going to submit a one-line D6 bug-fix patch for your review that has Pages honoring the cache_lifetime as the naive admin user would expect, based on the GUI/words on the Performance page.

There is probably something inherently wrong with this patch, or it would have been done...

Either that, or it's sheer genius in its simplicity :-)

This is for D6 includes/common.inc

It worked for me in a crude ab -n 1000 -c 100 series of loadtests overnight to change a 99% failure rate to 100% success rate.

But our sites are stupid simple with not much dynamic content, so that's not really much of a test.

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

RichardLynch’s picture

FileSize
473 bytes

Upload rejected because it has .inc. in the middle of the filename...

Re-uploading.

Status: Needs review » Needs work

The last submitted patch, common.patch, failed testing.

RichardLynch’s picture

Status: Needs work » Needs review

Uploading a git patch, or my best effort at one...

RichardLynch’s picture

Errr. The instructions say to use the comment number in the patch filename, but I don't have a comment number before I comment, and then I have no upload button... Feels like a race condition to me...

I'm using the first comment number then.

RichardLynch’s picture

Oooh. Maybe it will need the directory from Drupal root with /includes/ in there...
Try this too.

Status: Needs review » Needs work

The last submitted patch, page_honor_cache_lifetime-1279654-4989936.patch, failed testing.

RichardLynch’s picture

Status: Needs work » Needs review
RichardLynch’s picture

Version: 8.x-dev » 6.x-dev
Issue tags: -Needs backport to D6, -Needs backport to D7
FileSize
423 bytes

I am going to attempt to submit a completely untested patch against, ummm, HEAD, I think...

I started with this anyway, so whatever this is, D7, D8, whatever:
git clone 'git://git.drupal.org/project/drupal.git' 'checkout'

Status: Needs review » Needs work

The last submitted patch, page_honor_cache_lifetime-1279654-4998027.patch, failed testing.

BTMash’s picture

Version: 6.x-dev » 8.x-dev
Status: Needs work » Needs review
Issue tags: +Needs backport to D6, +Needs backport to D7

Changing the issue to be a D8 issue and then need backports to D7, D6 based on the kind of patch.

RichardLynch’s picture

Moving to D8 and auto-checking un-tested blind trivial patch there.
And marking backport for 6/7.

BTMash’s picture

RichardLynch’s picture

The D6 backport is #5 which passed automated testing.
I believe the exact same patch will work for D7, untested.

I apologize my un-familiarity with this tracker is making too many emails in your Inbox for something so simple.

RichardLynch’s picture

blind patch for D7

RichardLynch’s picture

Version: 6.x-dev » 8.x-dev
Issue tags: +Needs backport to D6, +Needs backport to D7
FileSize
21.68 KB

I'll never be a graphic artist, but here's a crude before/after scatter graph of ab -n 1000 -c 100 loadtests to show the difference.

The top part of the graph in the green polygon is largely irrelevant.
Static Images, CSS, JS are all "fast" either way.

The bottom part, where Drupal pages go from 99% FAIL on the left, with the three green bars at 0, to 100% 2xx @100 requests per second on the right is supposed to be the focus.

gary4gar’s picture

cache_lifetime is used as a minimum. Using it as the TTL seems like a bad idea because some site may only set it to extremely lower value, say 5-10mins

RichardLynch’s picture

The current TTL is "until cron runs" which is counter-intuitive and really really really bad for performance.

If there's a better value for TTL for a Page than cache_lifetime, I'd be happy to implement that instead...

But everybody seems to think that the Performance popup IS the TTL for a Page when caching is enabled, and it's not.

BTMash’s picture

Priority: Minor » Normal

I'm changing the status of this to normal since I don't think the issue should really be considered minor if the performance implications are what seem to be getting implied. Looking through the code inside expireon how expired items get flushed by DrupalDatabaseCache, the created column is not looked at to figure out which items to get rid of and solely relies on the expire column to denote what is expired and what is not (and the items are cleared out on a cron run as is with and without the patch). The patch looks sound to me though it needs some more reviews.

RichardLynch’s picture

There is a very complicated patch, which failed a UnitTest for poll module, that tried to look at `created` instead of `expireon` and all kinds of stuff tossed around for D8 around this issue...

Bottom line is that the "cache" for a Page is just not a cache at all.

It only lasts until cron runs, because cache_set_page uses CACHE_TEMPORARY.

I'm not even quite sure what would be so ephemeral that you'd only want it to last until cron runs, but that's the behaviour of CACHE_TEMPORARY.

A Page certainly should last for the "minimum" time set in the Performance choices by an admin.

It does not.

It might last forever, if you never run cron, which does the purging.

It might last 1 minute if that's when cron happens to run.

The field may be called `expireon`, but it seems to be a TTL added to time() in the cron.php cache.inc function that does the purging.

As far as I can tell.

RichardLynch’s picture

If `expire` is a time-stamp, as the cache_clear_all seems to imply at least sometimes, then perhaps this is the right patch:

Status: Needs review » Needs work

The last submitted patch, page_honor_cache_lifetime-1279654-20.patch, failed testing.

RichardLynch’s picture

Status: Needs work » Needs review

So...
Every page in Drupal can't be cached because a poll might maybe be used on one page?...
Am I reading the test results correctly?...

That seems like putting the tail wagging the dog to me...

catch’s picture

Status: Needs review » Closed (duplicate)
RichardLynch’s picture

My personal interest at this time is only in D6.
There isn't enough "oomph" in D7 to even upgrade to that, much less D8.
I will be trying to figure out why a "poll" unit test requires that NO page can be cached longer than the next cron run. That seems counter-intuitive.
I don't even have a poll on my site, nor am I ever likely to want to have a poll.
Why should I have to have all my cache nuked every hour for the sake of a poll feature?

firebus’s picture

+1, this needs to be fixed in D6, and a related grumble that the bodhisattva method of refusing to patch D6 until D7 and D8 fixes are committed is counter-productive, and is an artifact of the shortcomings of the issue queue at d.o., rather than a best practice for fixing bugs.

mr.j’s picture

I know this issue is closed but there is a lot of misinformation on this issue from RichardLynch and the linked blog article that needs to be corrected.

Look at the relevant part of the cache_clear_all function in cache.inc on Drupal 6 ($cid is NULL when cron is invoked):

  if (empty($cid)) {
    if (variable_get('cache_lifetime', 0)) {
      // We store the time in the current user's $user->cache variable which
      // will be saved into the sessions table by sess_write(). We then
      // simulate that the cache was flushed for this user by not returning
      // cached data that was cached before the timestamp.
      $user->cache = time();

      $cache_flush = variable_get('cache_flush_'. $table, 0);
      if ($cache_flush == 0) {
        // This is the first request to clear the cache, start a timer.
        variable_set('cache_flush_'. $table, time());
      }
      else if (time() > ($cache_flush + variable_get('cache_lifetime', 0))) {
        // Clear the cache for everyone, cache_lifetime seconds have
        // passed since the first request to clear the cache.
        db_query("DELETE FROM {". $table ."} WHERE expire != %d AND expire < %d", CACHE_PERMANENT, time());
        variable_set('cache_flush_'. $table, 0);
      }
    }
    else {
      // No minimum cache lifetime, flush all temporary cache entries now.
      db_query("DELETE FROM {". $table ."} WHERE expire != %d AND expire < %d", CACHE_PERMANENT, time());
    }
  }

The cache_page table is only cleared when the else if (time() > ($cache_flush + variable_get('cache_lifetime', 0))) condition is satisfied. Clearly this means that if you set your cache lifetime to a longer time period than your cron interval, then cron will not clear the cache_page table every time. i.e. if you set your cache lifetime to 6 hours and run cron every hour, that condition will only be satisfied at least 6 hours after the last time cache_clear_all was called. It will flush all pages, including those that were cached 5 minutes before the end of that period though because they are all stored with an expire timestamp of -1 (CACHE_TEMPORARY). If your cache is in fact being cleared every single time that cron is run, then I can only assume that you have set your cache lifetime to shorter than your cron interval, in which case I am left to wonder what you are complaining about.

Also note that the cache_flush_cache_page variable will be set back to 0 after the cache is flushed. I would have thought it more sensible to reset the variable to the current time instead of 0. This means that the next time the cache_clear_all is called, the page cache won't in fact be cleared because if ($cache_flush == 0) will be satisfied and nothing will happen apart from the variable being set to the current time. So if cron is the only thing flushing your cache (i.e. there are no new nodes or comments etc), it will only flush the second time cron is run after the cache lifetime has expired.

In effect the cache lifetime setting is the minimum time that must elapse before the entire cache is deleted, not a per page cache lifetime. And reading the info on the performance admin page, that is exactly what it says:

On high-traffic sites, it may be necessary to enforce a minimum cache lifetime. The minimum cache lifetime is the minimum amount of time that will elapse before the cache is emptied and recreated, and is applied to both page and block caches. A larger minimum cache lifetime offers better performance, but users will not see new content for a longer period of time.

It doesn't say anything about being a per page cache lifetime, so where you said "as the naive admin user would expect, based on the GUI/words on the Performance page" is based on your own interpretation and/or misunderstanding of the words right in front of you. It says the cache may be "emptied and recreated" after the lifetime expires. I don't know about you but "emptied and recreated" sounds to me like everything is deleted, but only after that cache lifetime period ends. And it works exactly like that.

If I sound annoyed it is because I am. Someone got me to look into this very thing this morning based on the alarmist stuff being posted here and I've wasted the better part of a couple of hours finding out that there is a mountain being made out of a molehill here.

In particular look at this line of code:
db_query("DELETE FROM {". $table ."} WHERE expire != %d AND expire < %d", CACHE_PERMANENT, time());

You see that expire is in fact treated as a timestamp in this statement? So all the patches you posted up until you realised that in #20 would not have done a single thing to improve things as you made the mistake of thinking that it was a TTL (your patch published on your blog site "rant" still has that error btw). It leads me to question the validity of the results you showed in your graph in #15 as your patches could not possibly have fixed anything unless you set your cache lifetime to an enormously high number that was in effect a timestamp from a recent date. Maybe you did something else that improved things in your graph?

I know a per-page cache lifetime may be better in some cases (not always though - if someone posts a new comment you may want it visible sooner on your home page for example) and to me that reset of the cache_flush_cache_page variable to 0 after a flush looks buggy as it could in fact prolong the cache lifetime longer than you would like. But being an alarmist and posting blogs and comments on threads everywhere saying "cron clears your entire cache every time it is run" and "experienced drupal developers are wrong and should know better" is simply wrong, counter-productive, and makes you look more than a bit silly.

RichardLynch’s picture

Perhaps you missed the Oct 11 blog post, and only read parts 1 and 2?

The default setting for cache_lifetime is 0, so MOST users for out of the box installs get their page cache flushed every time cron runs.

But admins can set cache_lifetime up to a max of 1 Day.

Great.

I can make my pages last a WHOLE DAY by setting that minimum lifetime.

Joy.

Most of the pages don't [bleeping] change for months on end.

Or, I can force cache_lifetime to be bigger than 1 Day with a custom settings.php, or a module with an init_hook...

So, yes, I did crank up cache_lifetime to a very large value, which made those graphs actually start to get decent performance.

Except then, the CSS and JS required for the page to render get nuked, leaving me with a lovely validated HTML document, with absolutely no styling.

So, ultimately, Drupal 6 and 7 caching is incredibly limited in scope and flexibility.

And the code is such a confusing jumble of conditions that it took you a couple hours to figure out what was really going on just for the PAGE cache.

How about blocks, views, CSS, JS, etc?
Do you really want to repeat your hours-long exercise for each of those?
I didn't. I just gave up.
Does this caching of inter-dependent content look like it was actually Architected to you?

Drupal caching is really only suitable for sites where the bulk of the pages are changing at least once per day. It's completely useless for a simple CMS that wants good performance.

Bottom Line:
Since my VP of Marketing isn't just going to accept the fact that he has bolted a billboard onto the back of his Maserati servers, and stop complaining "it's slow" every day he checks some random page nobody else has looked at yet that day, I have to crawl the whole site as part of the cron job and prime the cache after Drupal flushes it.

Whether that's every day or hour or whatever we've set cache_lifetime to is kind of irrelevant at that point.

It's actually easier to leave the cache_lifetime at 0 and prime the pump every time you surf to cron.php than to figure out when cron will purge what and be certain that all page requisites survive.

I'm sorry that you feel your 2 hours were wasted learning how Drupal's cache actually works.

I spent far more time than that on it, and consider the only "waste" is that Drupal's caching system clearly doesn't suit my use case (and I dare say the bulk of Drupal users' use cases) at all, and I have to "waste" resources crawling the site instead of using a caching system that would fit my use case.

mr.j’s picture

Status: Closed (duplicate) » Closed (works as designed)

I am not here to engage in a debate about the choices made in the development of Drupal's caching or its performance. I understand and accept the limitations inherent in the way Drupal's caching works. It works for me, but it won't be suitable for everyone. Use whatever works for you, I really don't care if you decide Drupal is best for you or not.

I just want it made very clear for anyone else who reads this train wreck of an issue that the claims made here that cron empties the page cache every time it is run regardless of the cache lifetime setting are simply wrong. End of story.

This kind of alarmism ends up wasting other people's (like myself) time who thought they knew exactly how Drupal's caching works, warts and all. And it turns out I did. My 2 hours were indeed wasted confirming what I already knew. Hopefully by posting this I can save someone else the trouble.

Also a tip: insulting the people who develop software that you can use freely is poor form. Its even worse when they are helpful and take the time to correct your mistakes and then you tell them that they are wrong and don't know what they are talking about. Like you did here (and amazingly you were still arguing with another person who corrected you again recently):
http://www.metaltoad.com/comment/390#comment-390

marcelovani’s picture

I know this issue has been closed and people got upset, but I think the main idea here is to share our problems and solutions.

I have been facing caching problems, where pages will not expire after the minimum lifetime is reached.
i.e. If I set the minimun cache lifetime to 1 hour, I would expect the page to be refreshed after 1 hour.
But it's not the case. It wont expire, unless the cache is flushed.

I was looking back into the history of different versions of Drupal to see how it improved:

In Drupal 6 the pages are set as CACHE_TEMPORARY
http://drupalcode.org/project/drupal.git/blob/refs/heads/6.x:/includes/c...

Then Pressflow fixed the caching problem, by using the page_cache_lifetime
https://github.com/pressflow/6/blob/master/includes/common.inc#L2713

Drupal 7 seems to have inherited the code from Pressflow 6 (very similar), apart from the use of page_cache_lifetime
http://drupalcode.org/project/drupal.git/blob/refs/heads/7.x:/includes/c...
On line 5141 it checks for a header 'expires', but from what I found, this header is not set anywhere in core, in a way that uses the page_cache_lifetime.

Drupal 8 has more or less the same logic as Drupal 7
http://drupalcode.org/project/drupal.git/blob/refs/heads/8.x:/core/inclu...

What I propose, is that we use the page_cache_lifetime if it is set.

One way of adding this information is to add this to any module:

/**
 * Implementation of hook_preprocess_page
 * Adds Expires header based on the configuration of Minimum Cache Lifetime
 */
function my_module_preprocess_page(&$variables) {
  global $user;
  
  if ($user->uid == 0) {
    if (variable_get('cache_lifetime', 0)) {
      $expire = REQUEST_TIME + variable_get('cache_lifetime', 0);
      drupal_add_http_header('Expires', gmdate(DATE_RFC1123, $expire));
    }
  }
}

I am adding here patch for Drupal 7 and 8, feel free to use if you think it makes sense.

jimyhuang’s picture

FileSize
453 bytes

For someone who don't want to hack core, may use module attached

jimyhuang’s picture

Issue summary: View changes

use issue filter for issue links

bkosborne’s picture

#29 - While I understand where you are coming from, the "minimum" cache lifetime really implies the minimum the cache exists just like the name says. It's not "make sure pages ONLY last this long"

dmitry.kazberovich’s picture

Patch for D7 is updated