Cache pages with query strings

moshe weitzman - October 11, 2007 - 16:03
Project:Boost
Version:5.x-1.x-dev
Component:Code
Category:task
Priority:normal
Assigned:Unassigned
Status:needs work
Description

a few of us discussed boost's refusal to cache pages with querystrings. we agreed that querystrings must never be used to modify data (per the http spec) and nothing is wrong with caching them as usual. did we miss something?

#1

bjaspan - October 11, 2007 - 16:07

Subscribe. I was one of "a few of us" that discussed this. ISTM that "http://site/?page=1" (i.e. paged views) ought to be cached just like the first page.

#2

Hetta - October 11, 2007 - 18:00

+1 agreed.

#3

firebus - October 26, 2007 - 21:59

agreed, and as i have an immediate need to cache comment pages (?page=1 style query strings) i'm going to write a patch.

it should be a straightforward extension of the existing code and htaccess. does anyone have advice on how to format the string? it seems like i just need to encode the ? and & in some way and append it to the file name before the extension...

#4

firebus - October 27, 2007 - 01:47

that didn't work out quite as i'd planned.

i really wanted to escape the = and & from the query string somehow.

but i couldn't find a mod_rewrite way to do that, without using the external program type of RewriteMap which seemed a little heavyweight.

so i decided to go with the actual query string appended to the path, so my cache files have names like:

  cache/server/0/node/123_page=1&something=else.html

it turns out that boost also likes to escape special characters, so preg_replace in boost_file_path needs to be changed to allow & and = through.

i added an if statement in boost_init() to cache URLs with query strings. i think my routine to convert $_GET into a string is probably lame and someone could do a better job with $_SERVER['QUERY_STRING'] and one of the php string functions.

finally, i added a stanza to htaccess to handle rewrites for URLs with query strings.

there's probably a lot wrong with this, i haven't tested it very well, and i don't mean to step on anyone's toes, but i wanted to share.

finally, i should note that, with comments, if users are allowed to set comment display settings (thread/flat/asc/desc/etc) then $page=1 doesn't display the same thing for all users.

boost is already broken for this on the pages it does cache, and now it's broken on pages with query strings as well. whatever settings are in effect when the page is cached will apply to all anon users.

i just submitted a patch to advcache for comments that keeps separate cache files for users with different comment display settings and it's possible that the same logic could work here. however, i think i will just find a way to disable the settings for anonymous users for now.

thanks!

AttachmentSize
boost_querystring.patch 2.6 KB

#5

firebus - November 25, 2007 - 20:37
Status:active» needs review

updating status

#6

Crimson - November 27, 2007 - 14:16
Status:needs review» reviewed & tested by the community

Works good.

#7

firebus - April 5, 2008 - 16:36
Assigned to:moshe weitzman» firebus
Status:reviewed & tested by the community» needs work

there's a little issue with this

the line
RewriteCond %{QUERY_STRING} ^(.*)$
matches even when the query string is empty.

there are some (bad) urls that can generate a cache file which will then be returned in place of a valid url due to boost's string rewriting rules.

let's say a user tries to access http://www.example.com/node\
boost transforms the backslash to an underscore
boost creates the cache file node_.html
next a user tries to access http://www.example.org.com/node
boost returns node_.html - page not found - instead of the actual node page.

fix is simple. change the RewriteCond to:
RewriteCond %{QUERY_STRING} ^(.+)$

i'll make a new patch for this...

#8

moshe weitzman - October 29, 2008 - 14:28

@Arto and others - what do you think of caching these requests, and of this patch in specific?

#9

moshe weitzman - October 29, 2008 - 14:28
Category:bug report» feature request

#10

Arto - October 29, 2008 - 14:44
Title:cache pages with querystrings» Cache pages with query strings

@Moshe, not sure I will have a chance to look at this myself anytime soon, but would consider committing it if it gets moved to RTBC status.

#11

moshe weitzman - October 29, 2008 - 14:47
Version:5.x-1.x-dev» 6.x-1.x-dev

thanks arto ... has to go into d6 first.

hopefully firebus or other person has time to revive this.

#12

alex s - October 30, 2008 - 20:48

@firebus, nice work

But you forgot about multiple pagers on one page, which uses comma separation, so:

$path = preg_replace('@[^/a-z0-9_\-&=,]@i', '_', $path);

#13

moshe weitzman - October 30, 2008 - 21:35

There is code in boost_exit() that adds a querstring in order to defeat caching. We need a different way to signal that the following page should not be cached.

<?php
// Check that the page we're redirecting to really necessitates
      // special handling, i.e. it doesn't have a query string:
     
extract(parse_url($destination));
     
$path = ($path == base_path() ? '' : substr($path, strlen(base_path())));
      if (empty(
$query)) {
       
// FIXME: call any remaining exit hooks since we're about to terminate?

        // If no query string was previously set, add one just to ensure we
        // don't serve a static copy of the page we're redirecting to, which
        // would prevent the session messages from showing up:
       
$destination = url($path, 't=' . time(), $fragment, TRUE);
?>

#14

alex s - November 1, 2008 - 19:54

moshe, i solved this problem

AttachmentSize
boost_querystring.patch 4.69 KB

#15

chirale - November 3, 2008 - 12:23

If pager is the problem, Clean Pagination solves it on Drupal 5.x, changing the pager links into something like "frontpage/page/2" (Boost friendly). This patch can be anyway a good improvement to boost.

#16

christefano - December 4, 2008 - 07:53
Status:needs work» needs review

#14 needs review.

#17

george@dynapres.nl - December 9, 2008 - 11:19

subscribe

#18

batman1983 - December 14, 2008 - 17:45
Version:6.x-1.x-dev» 5.x-1.0
Component:Apache integration» Code
Category:feature request» support request
Assigned to:firebus» batman1983

Hey,
i installed the second patch and the files get stored as 2217_seite=0.1.html e.g. (in folder: node)
There is a little problem left. Every node creates a link to the node_number.html e.g.: artikel/2008/review1.html is a link to node/1122
The querystring pages like artikel/2008/review1_page_1 doesn't get saved in the folder artikel/2008. How can i fix this?
I'm sorry for my bad english.
Best regards from germany,
Batman1983

#19

nbchip - January 20, 2009 - 10:50

subscribe

#20

alex s - January 21, 2009 - 17:05

I rewrote the patch, so it should support path aliases.
There are still 2 small issues:
1) Wildcart is not implemented yet in boost_cache_expire(), so pages with nonempty query will be expired only by hook_cron().
2) Boost block doesn't distinguish pages with same path and different queries.

AttachmentSize
boost_querystring.patch 5.15 KB

#21

alex s - January 21, 2009 - 17:07
Version:5.x-1.0» 6.x-1.x-dev

Forgot to say, the patch is fo for 6.x-1.x-dev version

#22

nbchip - January 22, 2009 - 06:05

Is there any chance for 5.x version of patch?

#23

alex s - January 22, 2009 - 09:33

Yes, but it would be a good idea to test the patch for 6 version, that I would not have to do same work twice

#24

mikeytown2 - January 23, 2009 - 22:17

using 6.x
Patch works for me

btw couldn't you use a URL Alias for myview?page=x and set it to myview/pagex ?

#25

alex s - January 24, 2009 - 11:36

mikeytown2, maybe this topic will help you: http://drupal.org/node/59362

#26

teri@uhaaa.com - January 25, 2009 - 15:35

All the day I try to implement this patch, but without success. The other pages of the node are cached, but not served. :( Maybe I am wrong with the htaccess?

#27

alex s - January 25, 2009 - 18:43

Have you replaced your .htaccess with new boosted1.txt ?

#28

teri@uhaaa.com - January 25, 2009 - 18:52

Yes, everything, but my drupal is install in subfolder blog, so I edited my htaccess in this way:
RewriteCond %{REQUEST_METHOD} ^GET$
RewriteCond %{REQUEST_URI} !^/cache
RewriteCond %{REQUEST_URI} !^/user/login
RewriteCond %{REQUEST_URI} !^/admin
RewriteCond %{QUERY_STRING} !^$
RewriteCond %{HTTP_COOKIE} !DRUPAL_UID
RewriteCond %{DOCUMENT_ROOT}/blog/cache/uhaaa.com/$1_%{QUERY_STRING}.html -f
RewriteRule ^(.*)$ cache/uhaaa.com/$1_%{QUERY_STRING}.html [L]

Used the info from previous issue about subfolder installation (don't remember the name of the contributor).
In this manner everything works. I've comented and the line with the convertation of the symbols to _.
Again I've created and edition that catch the ?page=1 (pages of the first page)

RewriteCond %{REQUEST_METHOD} ^GET$
RewriteCond %{REQUEST_URI} !^/cache
RewriteCond %{REQUEST_URI} !^/user/login
RewriteCond %{REQUEST_URI} !^/admin
RewriteCond %{QUERY_STRING} !^$
RewriteCond %{HTTP_COOKIE} !DRUPAL_UID
RewriteCond %{DOCUMENT_ROOT}/blog/cache/uhaaa.com/index_%{QUERY_STRING}.html -f
RewriteRule ^(.*)$ cache/uhaaa.com/index_%{QUERY_STRING}.html [L]
Now everything works fine. The only problem I have is how to make the URL Aliases language independent, because they are associated with "Bulgarian" language in default and I edit them by hand (after bulk edit in DB).
You can see the working site here http://uhaaa.com/blog

#29

nbchip - February 19, 2009 - 15:59

Any progress on patch for 5.x?
thx.

#30

dicreat - February 24, 2009 - 18:20

Also need patch for 5.x.
thx.

#31

DamienMcKenna - March 11, 2009 - 20:20

Please give the attached file a spin with D5, based on the current 5.x dev release.

AttachmentSize
boost-n182687-querystring_d5.patch 4.48 KB

#32

DamienMcKenna - March 11, 2009 - 20:33

firebus & alex_s: a hug thanks for the patches, they worked a treat and were simple to backport.

#33

EvanDonovan - March 18, 2009 - 18:53

DamienMcKenna, thanks for the 5.x patch - it doesn't seem to work on 5.x-1.0, however. Is there anything I would need to change to get it to work on 5.x-1.0?

#34

EvanDonovan - March 18, 2009 - 20:01

Note that I changed the htaccess rewrite rules as follows:

RewriteCond %{DOCUMENT_ROOT}/cache/%{SERVER_NAME}/0%{REQUEST_URI}_%{QUERY_STRING}.html -f
  RewriteRule ^(.*)$ cache/%{SERVER_NAME}/0/$1_%{QUERY_STRING}.html [L]

(since the 1.0 version has a path of /cache/server_name/0/filename.html)

Even with this change, no pages were loading through the Boost cache, though they were being created.

#35

dbeall - March 22, 2009 - 20:11

applies to cache path with htaccess.. This is not the right way, but works on single domain. Change server_name to the cached folder_name. In my case, the cache mod produced a folder named /cache/bexleymarine.com/
RewriteRule ^(.*)$ cache/bexleymarine.com/0/$1_%{QUERY_STRING}.html [L]

#36

mikeytown2 - March 22, 2009 - 21:10

@dbeall
I think there could be a lot of issues in regards to running boost on a multi site. Cron clearing the static cache comes to mind.

#37

dbeall - March 22, 2009 - 23:06

@mikeytown2
Off Topic: I know from the issue ques that you have done a lot of work on boost. You going to help get a new release rolled up for us dummes? The mod works fairly good after you find all the patches.

#38

mikeytown2 - April 7, 2009 - 01:17
Status:needs review» fixed

committed to 6.x dev

#39

System Message - April 21, 2009 - 01:20
Status:fixed» closed

Automatically closed -- issue fixed for 2 weeks with no activity.

#40

mikeytown2 - May 5, 2009 - 22:00
Version:6.x-1.x-dev» 5.x-1.x-dev
Category:support request» task
Status:closed» active

#31
http://drupal.org/node/182687#comment-1344568

#41

mikeytown2 - May 5, 2009 - 22:01
Assigned to:batman1983» Anonymous
Status:active» needs review

#42

SysOp - September 23, 2009 - 11:53
Category:task» bug report

This is true you can use it for your site. I did it on my: http://maniata.net/

#43

mikeytown2 - September 23, 2009 - 18:51
Category:bug report» task

#44

mikeytown2 - October 4, 2009 - 04:49
Status:needs review» needs work
 
 

Drupal is a registered trademark of Dries Buytaert.