Google caches unstyled pages - links to obsolete bundles
| Project: | Support File Cache |
| Version: | 5.x-1.1-a |
| Component: | Code |
| Category: | bug report |
| Priority: | normal |
| Assigned: | Unassigned |
| Status: | active |
Jump to:
We have found that google caches our pages complete with links to sfcache bundles, which all have a hash/checksum in their names. e.g. bundle_79172574ac22_1.css
Then when you go and create a new bundle, the filename changes completely and sfcache deletes the old bundle.
So any person who looks at a google cache of your page that was created when you were using an old bundle only sees a plain, unstyled page because the old stylesheet bundle no longer exists.
May I suggest the following solution:
Instead of changing the name completely each time the bundle is recreated, increment a version counter each time and append it as a query string. This acts as a mechanism to inform the user's browser when to download a new version of the same file. e.g. bundle.css?1, bundle.css?2 etc
Browsers will download the new bundle.css file if the querystring has changed but will fetch from their local cache if it hasn't. Google will still get whatever current version of the file exists because if it requests bundle.css?1 and you're up to bundle.css?3 it still sends down the latest file anyway (we used to do this manually at our sites before I found sfcache).
The other thing I thought of was 301 redirecting requests for the old files to the new ones but I'd expect that would involve some ugly drupal witchcraft if you wanted to avoid manually maintaining .htaccess
I realise that google fetching the latest version of a stylesheet may still not make an old page look like it should, but it is still better than no style at all in the vast majority of cases.

#1
The problem with adding query strings to the file is that some systems (e.g. reverse proxies, CDNs) do not cache these files by default (most can be configured that way, but it might have other implications). Another idea is to allow the user to select whether sf_cache should delete outdated bundles.
Redirecting outdated URLs might work in some occassions, but since sf_cache allows the user to configure the URL from which files are loaded, it can't cover all cases.
#2
Hmm, doesn't sound too promising then.
Here is the manual process we are using in the meantime, in case anyone else has this same problem.
We add something like this to our htaccess file, using the correct current version of the bundles.
This should redirect all requests to obsolete sf_cache files to the current versions.
# Redirect obsolete SF-Cache files to the current version# DISABLE THIS BEFORE REGENERATING NEW FILES, then change to new filename and re-enable
RewriteCond %{REQUEST_FILENAME} !files/sf_css/core_9029c875a8a4_1.css$
RewriteRule ^files/sf_css/core_[a-z0-9]+_1.css$ files/sf_css/core_9029c875a8a4_1.css [R=301,L,NC]
RewriteCond %{REQUEST_FILENAME} !files/sf_css/theme_54bcd22c7896_1.css$
RewriteRule ^files/sf_css/theme_[a-z0-9]+_1.css$ files/sf_css/theme_54bcd22c7896_1.css [R=301,L,NC]
Now the important thing: during the update process, you must comment out those lines in htaccess before regenerating the bundles - otherwise your server will redirect new requests to the old filenames even after they no longer exist.
When the regeneration is complete, the sf_cache module gives you the new bundle names in a message. Copy the new filenames and update your htaccess file then uncomment the lines, save it on your server and you're good to go. No more unstyled google cache pages, or watchdog error messages when people with cached pages request out of date stylesheets.