First - huge Kudos to the Boost team. I had been toying with writing a cache replacement that would have done essentially the same thing as the Boost module. Very nice surprise to find this well developed module ready to go.

I recently joined a large project that had Boost deployed. It is a high traffic site that sees in excess of 170k uniques per month. A typical page will make anywhere from 50 to 100+ calls back to the server for static content likes images, js and css files. I was amazed to see the traffic that was taking place in my firebug net panel for every single page load. Every page load was reloading every single file. On investigation I found the headers that were recommended (and in fact needed) for the boost module's cached pages were being applied to every single static element on the server due to their placement in the .htaccess file in the docroot. The users browsers were not being allowed to cache these static elements at all due to the cache-control and expires headers making them uncachable.

Since this is needed for the boost cache files only, I moved the mod_headers directive into the cache directory itself as an independent .htaccess file. Wow... Bandwidth and connections per second immediately fell to approx 1/3 what they were requiring before the change. In my firebug net panel I was now only seeing requests for static elements made on initial load of the site after clearing my cache, or when I use reload to send a no-cache type request back to the server. Pages that were taking 100+ requests from the server were reduced in most case to one or two requests.

Would recommend that this be incorporated into the README.txt, and dropped out into a separate example .htaccess file for the cache directory alone. That way it would be clear that the rewrite rules need to remain in the .htaccess file in the docroot of the Drupal installation, and the special headers needed for Boost cache files only need to be applied in a new .htaccess file in the cache directory.

This will allow server tuners to use other apache modules like mod_expires to tune the cachability of their static content.

Thanks everyone for the great module.

Comments

mikeytown2’s picture

Category: bug » feature

Interesting. This isn't the first time I've seen this issue http://drupal.org/node/185075#comment-1560198 Back then boost still had a lot of issues, now I think it could be ready for 2 different .htaccess files; one in the webroot, one in the cache dir. The cache dir is writable so I can use PHP to create the htaccess file and try to make this as painless as possible. What I might want to do is create a GUI so one can set the cache directive for ico, js, css, gif, png, jpg, xml, txt, flv, & swf files. Place this at the bottom of the boost rules; and include the js & css in the htaccess of the cache dir. I'm going to bump this to a feature request, and hopefully I'll get this done sooner rather then later.

christefano’s picture

Thanks for posting this, sg3524. This is really great to know.

jcisio’s picture

It's weird, my .htaccess seems correct

  ### BOOST START ###
  AddDefaultCharset utf-8
  <FilesMatch "(\.html|\.xml|\.json)$">
    <IfModule mod_headers.c>
      Header set Expires "Sun, 19 Nov 1978 05:00:00 GMT"
      Header set Cache-Control "no-store, no-cache, must-revalidate, post-check=
0, pre-check=0"
    </IfModule>
  </FilesMatch>
  ...

In Firebug, after the initial load, I see only a dozen requests, most of them are from the ads provider.

This issue is for the 6.x-1.18?

mikeytown2’s picture

Proposal:
htaccess in the cache folder; auto generated by php since cache folder needs to be writable. If doing this might as well use sethandler.

  AddDefaultCharset utf-8
  <FilesMatch "\.((html|xml|json)|((html|xml|json)\.gz))$">
    <IfModule mod_expires.c>
      ExpiresDefault A1
    </IfModule>
    <IfModule mod_headers.c>
      Header set Expires "Sun, 19 Nov 1978 05:00:00 GMT"
      Header set Cache-Control "no-store, no-cache, must-revalidate, post-check=0, pre-check=0"
    </IfModule>
  </FilesMatch>
  <IfModule mod_mime.c>
    AddCharset utf-8 .html
    AddCharset utf-8 .xml
    AddCharset utf-8 .json
    AddCharset utf-8 .css
    AddCharset utf-8 .js
    AddEncoding gzip .gz
  </IfModule>
  <FilesMatch "\.(html|html\.gz)$">
    ForceType text/html
  </FilesMatch>
  <FilesMatch "\.(xml|xml\.gz)$">
    ForceType text/xml
  </FilesMatch>
  <FilesMatch "\.((json|js)|((json|js)\.gz))$">
    ForceType text/javascript
  </FilesMatch>
  <FilesMatch "\.(css|css\.gz)$">
    ForceType text/css
  </FilesMatch>

htaccess in webroot

  ### BOOST START ###
  # Gzip Cookie Test
  RewriteRule boost-gzip-cookie-test\.html  cache/perm/boost-gzip-cookie-test\.html\.gz [L,T=text/html]

  # GZIP - Cached css & js files
  RewriteCond %{HTTP_COOKIE} !(boost-gzip)
  RewriteCond %{HTTP:Accept-encoding} !gzip
  RewriteRule .* - [S=2]
  RewriteCond %{DOCUMENT_ROOT}/cache/perm/%{SERVER_NAME}%{REQUEST_URI}_\.css\.gz -s
  RewriteRule .* cache/perm/%{SERVER_NAME}%{REQUEST_URI}_\.css\.gz [L,QSA,T=text/css]
  RewriteCond %{DOCUMENT_ROOT}/cache/perm/%{SERVER_NAME}%{REQUEST_URI}_\.js\.gz -s
  RewriteRule .* cache/perm/%{SERVER_NAME}%{REQUEST_URI}_\.js\.gz [L,QSA,T=text/javascript]

  # NORMAL - Cached css & js files
  RewriteCond %{DOCUMENT_ROOT}/cache/perm/%{SERVER_NAME}%{REQUEST_URI}_\.css -s
  RewriteRule .* cache/perm/%{SERVER_NAME}%{REQUEST_URI}_\.css [L,QSA,T=text/css]
  RewriteCond %{DOCUMENT_ROOT}/cache/perm/%{SERVER_NAME}%{REQUEST_URI}_\.js -s
  RewriteRule .* cache/perm/%{SERVER_NAME}%{REQUEST_URI}_\.js [L,QSA,T=text/javascript]

  # Caching for anonymous users
  # Skip boost IF not get request OR uri has wrong dir OR cookie is set OR https request
  RewriteCond %{REQUEST_METHOD} !^(GET|HEAD)$ [OR]
  RewriteCond %{REQUEST_URI} (^/(admin|cache|misc|modules|sites|system|openid|themes|node/add))|(/(comment/reply|edit|user|user/(login|password|register))$) [OR]
  RewriteCond %{HTTP_COOKIE} DRUPAL_UID [OR]
  RewriteCond %{HTTPS} on
  RewriteRule .* - [S=7]

  # GZIP
  RewriteCond %{HTTP_COOKIE} !(boost-gzip)
  RewriteCond %{HTTP:Accept-encoding} !gzip
  RewriteRule .* - [S=3]
  RewriteCond %{DOCUMENT_ROOT}/cache/normal/%{SERVER_NAME}%{REQUEST_URI}_%{QUERY_STRING}\.html\.gz -s
  RewriteRule .* cache/normal/%{SERVER_NAME}%{REQUEST_URI}_%{QUERY_STRING}\.html\.gz [L,T=text/html]
  RewriteCond %{DOCUMENT_ROOT}/cache/normal/%{SERVER_NAME}%{REQUEST_URI}_%{QUERY_STRING}\.xml\.gz -s
  RewriteRule .* cache/normal/%{SERVER_NAME}%{REQUEST_URI}_%{QUERY_STRING}\.xml\.gz [L,T=text/xml]
  RewriteCond %{DOCUMENT_ROOT}/cache/normal/%{SERVER_NAME}%{REQUEST_URI}_%{QUERY_STRING}\.json\.gz -s
  RewriteRule .* cache/normal/%{SERVER_NAME}%{REQUEST_URI}_%{QUERY_STRING}\.json\.gz [L,T=text/javascript]

  # NORMAL
  RewriteCond %{DOCUMENT_ROOT}/cache/normal/%{SERVER_NAME}%{REQUEST_URI}_%{QUERY_STRING}\.html -s
  RewriteRule .* cache/normal/%{SERVER_NAME}%{REQUEST_URI}_%{QUERY_STRING}\.html [L,T=text/html]
  RewriteCond %{DOCUMENT_ROOT}/cache/normal/%{SERVER_NAME}%{REQUEST_URI}_%{QUERY_STRING}\.xml -s
  RewriteRule .* cache/normal/%{SERVER_NAME}%{REQUEST_URI}_%{QUERY_STRING}\.xml [L,T=text/xml]
  RewriteCond %{DOCUMENT_ROOT}/cache/normal/%{SERVER_NAME}%{REQUEST_URI}_%{QUERY_STRING}\.json -s
  RewriteRule .* cache/normal/%{SERVER_NAME}%{REQUEST_URI}_%{QUERY_STRING}\.json [L,T=text/javascript]
  ### BOOST END ###

Should also create a guide for usage with httpd.conf only; no htaccess.

jmseigneur’s picture

Are the following lines mandatory for html?

Header set Expires "Sun, 19 Nov 1978 05:00:00 GMT"
Header set Cache-Control "no-store, no-cache, must-revalidate, post-check=0, pre-check=0"

or can we change it if we want that the client caches the html for some time even if boost has regenerated them on the server side?

Thanks.

jcisio’s picture

We can change, but mostly we shouldn't. Dynamic web should not be cache on the client side. It's not that much bandwidth saving. And it's really annoying when you post something on the frontpage but visitors can't not see it immediately.

That having said, you are the one who decides ;-)

jmseigneur’s picture

"We can change, but mostly we shouldn't. Dynamic web should not be cache on the client side. It's not that much bandwidth saving. And it's really annoying when you post something on the frontpage but visitors can't not see it immediately."

If I understand well boost, the page is cached as a static html page for a while, e.g., 1 day. During that day the client always requests the same page because it has not been changed. So even if something new was posted it would not be updated in the cached page until the page is regenerated. If the client caching time is short compared to that day, there might be some benefits to do that especially on small servers that have to handle a lot of clients, although maybe not much as you mention. Thanks for your reply anyway.

jcisio’s picture

Serving a cache (static) page costs nothing. Apache can server a thousand pages per second, nginx can server a few dozens of thousands of pages per second, or 100+ million pages view per day ;-)

Not saying that when a page is changed, boost always updates (or tries to update) the cached page so that visitor never sees outdated page. However I disable this feature because of high db usage (in my case).