cache_set and cache_get base_url brokenosity
| Project: | Drupal |
| Version: | 7.x-dev |
| Component: | base system |
| Category: | bug report |
| Priority: | critical |
| Assigned: | Unassigned |
| Status: | active |
We're having problems with the cache on a few sites. I originally thought it was because of ye old "Apache and Drupal both compressing pages" problem, but now I'm not so sure.
I can repeat the following problem on a few sites running CVS and beta2. All of these sites are in subdirectories right now (which is important, see later), as they're on our new server:
- Empty the cache.
- Use http://web-sniffer.net/ to hit the front page of a site with "accept-encoding: gzip" disabled. This will make drupal generate a cache entry in the database.
- Repeat the hit again with "accept-encoding: gzip" enabled. Binary is returned instead of HTML..
If I repeat this test on an empty cache starting with gzip enabled, then hit it again with it enabled, I get an error instead of binary:
Warning: gzinflate(): data error in /var/www/drupal-4.7.0-beta2/includes/bootstrap.inc on line 624
Warning: Cannot modify header informationI forced the cache functions to watchdog some debug info on what they were trying to cache, and noticed that toggling the gzip header can make Drupal cache pages as plain HTML instead of compressed. I assumed that this was the cause of the errors but couldn't figure out where the logic was going wrong in the functions. It seems that if a cached page is generated by a client that doesn't support gzip, it can't be viewed by a client that does support compression. The opposite is also true, but with different results.
Then we noticed that the "cid" field in the cache table was broken. Instead of a valid path for each cached paged, we have:
http://<server_ip_here>/~leafish/test/~leafish/test/
This is due to the following line in cache_set:
cache_set($base_url . request_uri(), $data, CACHE_TEMPORARY, drupal_get_headers());
$base_url and request_uri() have overlap if you're running a site in a subdirectory, which is resulting in fruity paths in the database. This may be the cause of our problems, or a seperate issue.
I wanted to provide a patch for this before creating the issue, but I think there's two or three different problems here that are confusing the hell out of me. I'm not sure what the best way to fix the $base_url problem is, and I didn't want to hack something into bootstrap to fix it here, but changing the function that generates this variable will probably break other things. Wah!
The server is running:
Apache 2.0.52 (mod_deflate disabled)
PHP 4.4.1
MySQL 4.1.11
eAccelerator caching.

#1
First of all, thanks for the detailed report. :)
Second, request_uri() being broken when Drupal runs in a subdir is a ling standing bug which nobody seems to care about. I do not think that it is related to the issue at hand. http://drupal.org/node/10917
The issue is that apparently unzipped or scrambled data gets inserted into the cache table if a client which does not support gzip accesses the page in question before one that does.
This needs to be investigated. The code is in bootrap.inc. I will try to have a look, but am a bot short on time.
#2
@killes: we probably need to only write to page cache for gzip browsers. but that means that many crawlers will never get cached. of course what we do today is even worse - we show the crawler garbage which can't be good for pagerank.
#3
This needs to be investigated more closely. Can anyone reproduce this?
#4
Ok, I've thought a bit about it and come to the conclusion that I cannot reproduce the problem.
I have created a patch that will create some debug info. Can somebody run it and report back?
What should happen is that if you have the problem, you will have "data compressed" info in your logs. The patch would also fix the problem if it exists for some combination of apache and php versions and settings.
#5
I'll take a look at this patch when I get time, and report back.
I debugged the code in a similar way myself by watchdogging the data that's being cached. It alternated between compressed content and plain text, depending on what kind of request generated it.
#6
I can't reproduce this problem in HEAD, with or without the patch. Tested with HTTP 1.1/1.0, through a proxy, and with accept-encoding gzip on and off. Cache data is now always stored compressed, and delivered in the correct format described in the accept-encoding header.
Could still be an issue, but if nobody else can reproduce this for now then I'll close it. Lowering the priority until then.
#7
I can reproduce.
Visit dev.newsphoto.nl/ with IE6, visit a page twice (first to make a cahce entry) then youll see that the paged served from cache comes out blannk.
www.newsphoto.nl/ has the same codebase, but only cache switched off.
Bèr
#8
#9
the previous post contains a log from my site. From about line 35, is where the problems occur. Those are the headers sent. As per killes request.
#10
@Ber - those can't be all the request headers. no 'accept encoding' or http 1.0/1.1 are specified.
#11
Can anyone please take a look at this also?
http://drupal.org/node/46272
#12
the attached patch is meant to be used todebug the problem, not for includion in core.
#13
I've come across a site that's getting these errors randomly. Will see if we can apply killes's debugging patch.
#14
any news about this?
#15
The news is that it's still a problem, due to Drupal being a bit crap and issues never being resolved.
#16
Hm. I think you meant "Due to no one trying the debugging patch that Gerhard posted at http://drupal.org/node/43462#comment-367659 and reporting back with their full results on a site that was having this problem." :)
We need people who can reproduce the problem to give us enough information to fix it. Until then, it won't be.
#17
Putting this to critical so it get's the attention it needs...
Also critical since binary returns, and WSOD's are not good!
(needs to be fixed before 7 ships...)