The people at MayFirst.org have created a utility which lets you create static file copies of a Drupal home page, and the most recent 20 nodes (pieces of content). It's called "Drupal File Cache Generator" (dfcg) You can see the various files for this utility at:

You can also see a high volume site that's using this file cache utility at AfterDowningStreet.org

How It Works

Basically, you run a script every so many minutes via cron, it fetches the home page and writes it out to an index.html file, gets the id's of the 20 most recent nodes, saves files for each whole page for a node (ie. displaying a page containing node content plus header, footer, and sidebar blocks).

The utility uses URL rewriting to check if a file cache copy of the page containing the node exists, if yes, that's returned to the client, else Drupal is called to generate the page.

Why Not Drupal Caching?

This script was created after the site AfterDowningStreet.org was having lots of weird errors, slowdowns, and outages. Many of the errors seemed to be load related and might have been race conditions (duplicate cache entry, etc). In fact we ended up turning off Drupal caching in an effort to get the errors to calm down (and weirdly, we kept getting duplicate cache errors).

It's possible that the AfterDowningStreet.org had various options set which kept the changing opening page ("categories" taxonomy block with story posting counts and time since last posting), causing the cache to keep getting flushed. It's also possible that Referer spam was creating lots of garbage cache entries.

In anycse, the people who were running the site (politically, not technically) were going crazy and were close to switching to a Wordpress like blogging solution using an Akami type distributed web caching infrastructure.

After switching to this file cache utility, page views/day went through the roof, all the errors disappeared, and the front page displays blazingly fast. You can see some old page view stats here.

In Conclusion

Obviously, this type of file based caching is only useful in certain situations. You have to setup your site so all users see the same opening page and node content pages, and that's not appropriate for many sites. But if you're a high volume site, it's wonderful to have the opening page display nearly instantly.

This utility probably should be considered in beta mode. There are a few weirdnesses with session id's which are still being worked out (which show up when using curl vs wget). Also, the script probably should be updated to pull all the node id's that are being displayed on the front page, rather than the most recent 20 nodes. If you want to display the uncached version of a page, you have to append an ampersand (&) to the URL. To see if you got a cached page, look for "STATIC Generated:" HTML comment in the HTML source.

Maybe this file based caching utility won't be as necessary if new alternative Drupal caching options are used. But it's going to be hard to get the owners of the ADS site to trust Drupal caching after the problems they've had. I'll have to make sure I understand all the details of Drupal based caching in order to sell them. Specifically, what extra work can occur when a Drupal cached page is accessed? How does Drupal know when to update the cached entry? Could the Referer spam issue mentioned above cause problems for Drupal based caching?

Ben Slade
PublicMailbox at BenSlade dot com
(append 030516 to the subj line to bypass spam filters)
"Everything should be made as simple as possible, but not simpler"
Albert Einstein

Comments

bslade’s picture

From a comment in another node (http://drupal.org/node/11736) by killes@www.drop.org

File caching is not new to Drupal but never made it to core:

http://cvs.drupal.org/viewcvs/drupal/contributions/sandbox/jeremy/fileca...
--
Drupal services
My Drupal services

Hosting Geek’s picture

Swap to lighttpd which has a lower memory footprint than apache and can stand a slashdotting or in your case on being really pop. and use its mod _cml (Cache Meta Language)

I really wonder why apache is so pop. when you have something way better (lighty) with a whole lot of cool modules that apache is missing that would make it a lot more sutible for a CMS such as Drupal.

bslade’s picture

This utility makes the cached pages look like your not logged in. So for websites with comment forms (and other interactive forms in the main content) which only display for logged in users, this utility may not be appropriate.

When someone views a story, the web page will look as if they're not logged in. So you have to make sure your website allows anonymous comments (from non-logged in users) so that the file cached web pages show the post comment form. But you want to make sure that anonymous comments require after-the-fact approval (er, moderation). You also want comments to require preview before being submitted (logged in users will see the logged in preview form)

PublicMailbox@benslade.com
"It's the mark of an educated mind to be moved by statistics"
Oscar Wilde

SimonVlc’s picture

Hello,

could this be applied to anonymous users?

Regards, Simon.

cybe’s picture

I can't seem to get this to work with clean urls. Is it possible to get it to work?

  RewriteCond %{QUERY_STRING} ^q=node/([0-9]+)$  (etc etc) works

but not

RewriteCond %{QUERY_STRING} ^/node/([0-9]+)$