a few of us discussed boost's refusal to cache pages with querystrings. we agreed that querystrings must never be used to modify data (per the http spec) and nothing is wrong with caching them as usual. did we miss something?

Comments

bjaspan’s picture

Subscribe. I was one of "a few of us" that discussed this. ISTM that "http://site/?page=1" (i.e. paged views) ought to be cached just like the first page.

Hetta’s picture

+1 agreed.

firebus’s picture

agreed, and as i have an immediate need to cache comment pages (?page=1 style query strings) i'm going to write a patch.

it should be a straightforward extension of the existing code and htaccess. does anyone have advice on how to format the string? it seems like i just need to encode the ? and & in some way and append it to the file name before the extension...

firebus’s picture

StatusFileSize
new2.6 KB

that didn't work out quite as i'd planned.

i really wanted to escape the = and & from the query string somehow.

but i couldn't find a mod_rewrite way to do that, without using the external program type of RewriteMap which seemed a little heavyweight.

so i decided to go with the actual query string appended to the path, so my cache files have names like:

  cache/server/0/node/123_page=1&something=else.html

it turns out that boost also likes to escape special characters, so preg_replace in boost_file_path needs to be changed to allow & and = through.

i added an if statement in boost_init() to cache URLs with query strings. i think my routine to convert $_GET into a string is probably lame and someone could do a better job with $_SERVER['QUERY_STRING'] and one of the php string functions.

finally, i added a stanza to htaccess to handle rewrites for URLs with query strings.

there's probably a lot wrong with this, i haven't tested it very well, and i don't mean to step on anyone's toes, but i wanted to share.

finally, i should note that, with comments, if users are allowed to set comment display settings (thread/flat/asc/desc/etc) then $page=1 doesn't display the same thing for all users.

boost is already broken for this on the pages it does cache, and now it's broken on pages with query strings as well. whatever settings are in effect when the page is cached will apply to all anon users.

i just submitted a patch to advcache for comments that keeps separate cache files for users with different comment display settings and it's possible that the same logic could work here. however, i think i will just find a way to disable the settings for anonymous users for now.

thanks!

firebus’s picture

Status: Active » Needs review

updating status

Crimson’s picture

Status: Needs review » Reviewed & tested by the community

Works good.

firebus’s picture

Assigned: moshe weitzman » firebus
Status: Reviewed & tested by the community » Needs work

there's a little issue with this

the line
RewriteCond %{QUERY_STRING} ^(.*)$
matches even when the query string is empty.

there are some (bad) urls that can generate a cache file which will then be returned in place of a valid url due to boost's string rewriting rules.

let's say a user tries to access http://www.example.com/node\
boost transforms the backslash to an underscore
boost creates the cache file node_.html
next a user tries to access http://www.example.org.com/node
boost returns node_.html - page not found - instead of the actual node page.

fix is simple. change the RewriteCond to:
RewriteCond %{QUERY_STRING} ^(.+)$

i'll make a new patch for this...

moshe weitzman’s picture

@Arto and others - what do you think of caching these requests, and of this patch in specific?

moshe weitzman’s picture

Category: bug » feature
Arto’s picture

Title: cache pages with querystrings » Cache pages with query strings

@Moshe, not sure I will have a chance to look at this myself anytime soon, but would consider committing it if it gets moved to RTBC status.

moshe weitzman’s picture

Version: 5.x-1.x-dev » 6.x-1.x-dev

thanks arto ... has to go into d6 first.

hopefully firebus or other person has time to revive this.

alex s’s picture

@firebus, nice work

But you forgot about multiple pagers on one page, which uses comma separation, so:

$path = preg_replace('@[^/a-z0-9_\-&=,]@i', '_', $path);
moshe weitzman’s picture

There is code in boost_exit() that adds a querstring in order to defeat caching. We need a different way to signal that the following page should not be cached.


// Check that the page we're redirecting to really necessitates
      // special handling, i.e. it doesn't have a query string:
      extract(parse_url($destination));
      $path = ($path == base_path() ? '' : substr($path, strlen(base_path())));
      if (empty($query)) {
        // FIXME: call any remaining exit hooks since we're about to terminate?

        // If no query string was previously set, add one just to ensure we
        // don't serve a static copy of the page we're redirecting to, which
        // would prevent the session messages from showing up:
        $destination = url($path, 't=' . time(), $fragment, TRUE);

alex s’s picture

StatusFileSize
new4.69 KB

moshe, i solved this problem

chirale’s picture

If pager is the problem, Clean Pagination solves it on Drupal 5.x, changing the pager links into something like "frontpage/page/2" (Boost friendly). This patch can be anyway a good improvement to boost.

christefano’s picture

Status: Needs work » Needs review

#14 needs review.

mo6’s picture

subscribe

batman1983’s picture

Version: 6.x-1.x-dev » 5.x-1.0
Component: Apache integration » Code
Assigned: firebus » batman1983
Category: feature » support

Hey,
i installed the second patch and the files get stored as 2217_seite=0.1.html e.g. (in folder: node)
There is a little problem left. Every node creates a link to the node_number.html e.g.: artikel/2008/review1.html is a link to node/1122
The querystring pages like artikel/2008/review1_page_1 doesn't get saved in the folder artikel/2008. How can i fix this?
I'm sorry for my bad english.
Best regards from germany,
Batman1983

nbchip’s picture

subscribe

alex s’s picture

StatusFileSize
new5.15 KB

I rewrote the patch, so it should support path aliases.
There are still 2 small issues:
1) Wildcart is not implemented yet in boost_cache_expire(), so pages with nonempty query will be expired only by hook_cron().
2) Boost block doesn't distinguish pages with same path and different queries.

alex s’s picture

Version: 5.x-1.0 » 6.x-1.x-dev

Forgot to say, the patch is fo for 6.x-1.x-dev version

nbchip’s picture

Is there any chance for 5.x version of patch?

alex s’s picture

Yes, but it would be a good idea to test the patch for 6 version, that I would not have to do same work twice

mikeytown2’s picture

using 6.x
Patch works for me

btw couldn't you use a URL Alias for myview?page=x and set it to myview/pagex ?

alex s’s picture

mikeytown2, maybe this topic will help you: http://drupal.org/node/59362

Terko’s picture

All the day I try to implement this patch, but without success. The other pages of the node are cached, but not served. :( Maybe I am wrong with the htaccess?

alex s’s picture

Have you replaced your .htaccess with new boosted1.txt ?

Terko’s picture

Yes, everything, but my drupal is install in subfolder blog, so I edited my htaccess in this way:
RewriteCond %{REQUEST_METHOD} ^GET$
RewriteCond %{REQUEST_URI} !^/cache
RewriteCond %{REQUEST_URI} !^/user/login
RewriteCond %{REQUEST_URI} !^/admin
RewriteCond %{QUERY_STRING} !^$
RewriteCond %{HTTP_COOKIE} !DRUPAL_UID
RewriteCond %{DOCUMENT_ROOT}/blog/cache/uhaaa.com/$1_%{QUERY_STRING}.html -f
RewriteRule ^(.*)$ cache/uhaaa.com/$1_%{QUERY_STRING}.html [L]

Used the info from previous issue about subfolder installation (don't remember the name of the contributor).
In this manner everything works. I've comented and the line with the convertation of the symbols to _.
Again I've created and edition that catch the ?page=1 (pages of the first page)

RewriteCond %{REQUEST_METHOD} ^GET$
RewriteCond %{REQUEST_URI} !^/cache
RewriteCond %{REQUEST_URI} !^/user/login
RewriteCond %{REQUEST_URI} !^/admin
RewriteCond %{QUERY_STRING} !^$
RewriteCond %{HTTP_COOKIE} !DRUPAL_UID
RewriteCond %{DOCUMENT_ROOT}/blog/cache/uhaaa.com/index_%{QUERY_STRING}.html -f
RewriteRule ^(.*)$ cache/uhaaa.com/index_%{QUERY_STRING}.html [L]
Now everything works fine. The only problem I have is how to make the URL Aliases language independent, because they are associated with "Bulgarian" language in default and I edit them by hand (after bulk edit in DB).
You can see the working site here http://uhaaa.com/blog

nbchip’s picture

Any progress on patch for 5.x?
thx.

dicreat’s picture

Also need patch for 5.x.
thx.

damienmckenna’s picture

StatusFileSize
new4.48 KB

Please give the attached file a spin with D5, based on the current 5.x dev release.

damienmckenna’s picture

firebus & alex_s: a hug thanks for the patches, they worked a treat and were simple to backport.

EvanDonovan’s picture

DamienMcKenna, thanks for the 5.x patch - it doesn't seem to work on 5.x-1.0, however. Is there anything I would need to change to get it to work on 5.x-1.0?

EvanDonovan’s picture

Note that I changed the htaccess rewrite rules as follows:

RewriteCond %{DOCUMENT_ROOT}/cache/%{SERVER_NAME}/0%{REQUEST_URI}_%{QUERY_STRING}.html -f
  RewriteRule ^(.*)$ cache/%{SERVER_NAME}/0/$1_%{QUERY_STRING}.html [L]

(since the 1.0 version has a path of /cache/server_name/0/filename.html)

Even with this change, no pages were loading through the Boost cache, though they were being created.

dbeall’s picture

applies to cache path with htaccess.. This is not the right way, but works on single domain. Change server_name to the cached folder_name. In my case, the cache mod produced a folder named /cache/bexleymarine.com/
RewriteRule ^(.*)$ cache/bexleymarine.com/0/$1_%{QUERY_STRING}.html [L]

mikeytown2’s picture

@dbeall
I think there could be a lot of issues in regards to running boost on a multi site. Cron clearing the static cache comes to mind.

dbeall’s picture

@mikeytown2
Off Topic: I know from the issue ques that you have done a lot of work on boost. You going to help get a new release rolled up for us dummes? The mod works fairly good after you find all the patches.

mikeytown2’s picture

Status: Needs review » Fixed

committed to 6.x dev

Status: Fixed » Closed (fixed)

Automatically closed -- issue fixed for 2 weeks with no activity.

mikeytown2’s picture

Version: 6.x-1.x-dev » 5.x-1.x-dev
Category: support » task
Status: Closed (fixed) » Active
mikeytown2’s picture

Assigned: batman1983 » Unassigned
Status: Active » Needs review
SysOp’s picture

Category: task » bug
Status: Closed (fixed) » Needs review

This is true you can use it for your site. I did it on my: http://www.maniata.net/

mikeytown2’s picture

mikeytown2’s picture

Status: Needs review » Needs work
mikeytown2’s picture

Status: Needs work » Closed (fixed)

Closing all 5.x issues; will only reevaluate if someone steps up #454652: Looking for a co-maintainer - 5.x

Reason is 6.x has 10x as many users as 5.x; also last 5.x dev was over a year ago. The 5.x issue queue needs to go.

bgm’s picture

Issue summary: View changes
Status: Needs review » Closed (fixed)

Not sure why this is showing as 'needs review', when the last comment mentions it was closed.