Active
Project:
Boost
Version:
7.x-1.x-dev
Component:
Code
Priority:
Normal
Category:
Bug report
Assigned:
Unassigned
Reporter:
Created:
15 Nov 2013 at 00:42 UTC
Updated:
25 Nov 2013 at 22:24 UTC
Jump to comment: Most recent
URLs with commas in query string aren't delivered as boosted. For instance, pager querys look like
http:/x.com/node?page=0,1
This causes the page to always be rebuilt and never delivered as boosted.
Here's the watchdog entry.
Array
(
[scheme] => http
[host] => forbin
[path] => actual-proof-2011-06-25
[query] => page=0%2C1
[full_path] => actual-proof-2011-06-25
[base_path] => /gongawaremedia/
[query_array] => Array
(
[page] => 0,1
)
[query_extra] =>
[url_full] => forbin/gongawaremedia/actual-proof-2011-06-25_page=0%2C1
[url] => http://forbin/gongawaremedia/actual-proof-2011-06-25?page=0%2C1
[url_decoded] => /gongawaremedia/actual-proof-2011-06-25_page=0,1
[base_dir] => cache/normal/forbin/gongawaremedia/
[filename] => cache/normal/forbin/gongawaremedia/actual-proof-2011-06-25_page=0%2C1.html
[directory] => cache/normal/forbin/gongawaremedia
[normal_path] => node/68
[path_alias] => actual-proof-2011-06-25
[args] => Array
(
[0] => node
[1] => 68
[2] =>
)
[menu_item] => Array
(
[page_type] => node_gallery_gallery
[page_id] => 68
[path] => node/%
[load_functions] => Array
(
[1] => node_load
)
[to_arg_functions] =>
[access_callback] => node_access
[access_arguments] => a:2:{i:0;s:4:"view";i:1;i:1;}
[page_callback] => node
[delivery_callback] =>
[fit] => 2
[number_parts] => 2
[context] => 0
[tab_parent] =>
[tab_root] => node/%
[title] => Actual Proof - 2011-06-25
[title_callback] => node_page_title
[title_arguments] => a:1:{i:0;i:1;}
[theme_callback] =>
[theme_arguments] => Array
(
)
[type] => 6
[description] =>
[position] =>
[weight] => 0
[include_file] =>
[href] => node/68
[tab_root_href] => node/68
[tab_parent_href] =>
[options] => Array
(
)
[access] => 1
[localized_options] => Array
(
)
[original_map] => Array
(
[0] => node
[1] => 68
)
[status] => 200
[extra_arguments] =>
)
[is_cacheable] => 1
[header_info] => Array
(
[status] => 200 OK
[status-number] => 200
[content-type] => text/html; charset=utf-8
[content-type-basic] => text/html
[charset] => utf-8
[headers_sent] =>
)
[matched_header_info] => Array
(
[enabled] => 1
[gzip] => 1
[extension] => html
[lifetime_max] => 518400
[lifetime_min] => 0
[comment_start] => <!--
[comment_end] => -->
)
)
1. Use a "hack" in boost_exit() that creates two copies of the file. (See comment #4)
Comments
Comment #1
Anonymous (not verified) commentedWhich module creates the paging ? I am assuming that 0,1 is some kind of paging system where 1,2 would be the next one ?
Comment #2
jaylotta commentedViews is creating the pager. I believe the encoding is
0,1 maps to page 1 of pagerid 0.
Comment #3
jaylotta commentedIf I manually change the filename in the boost cache directory and change the %2C to a comma, everything works fine.
I'm still trying to figure out how the make the boost module do this.
Comment #4
jaylotta commentedI added this ugly hack to the boost_exit() code to fix it for me.
$_boost['filename'] = preg_replace('/[%]2C/',',',$_boost['filename']);
if (strpos($_boost['filename'], ',') != false) {
// Write to file.
boost_write_file($_boost['filename'], $data);
}
Basically, I'm just making two copies of the file. YUCK!
Comment #5
Anonymous (not verified) commentedcould I have a PHP and Apache (or whatever web server version) ? plus the preg_replace would probably be better using urldecode(). If the file is not written twice does it break any other part of boost like the directory creation ?
My concern is that boost has been around a long time, but Apache 2.4 changed some things in the ReWrite handling sections and may be handling URL's differently, in which case then it may be better if I craft a patch for "query string compatibility" as an option in the boost menu.
Comment #6
jaylotta commentedI'm using Apache 2.2.22 and PHP/5.4.21
I've tried using the B option in the Boost .htacess sections and it still seems that Boost is thinking that the page should be rebuilt and Apache is not rewriting the URL to find the undecoded url.
Admittedly, I'm not the best person to debug this problem as I'm not an Apache rewrite expert nor am I a boost expert.
I haven't been able, from the debug output I've tried, to figure out why Apache is not hitting on the URL or Boost is thinking the page should be rebuilt.
Comment #7
jaylotta commentedSorry to not have answered your whole question, but no, if the page is not written twice everything works fine except that boost keeps rebuilding the page.
Comment #8
Anonymous (not verified) commentedWhat platform ? Is this a live site issue or a windows test (recently had someone testing on windows and then the filename issue went away when it was placed live).
It appears that boost should be using the url_decoded or query_array variables from your watchdog entry. The continual page generation is "expected" because although you've fixed the filename, all the logic behind it (is the page valid, should it have been expired ?) is still going to report that the file with the %2C doesn't exist which can be fixed quite easily.
I just want to be assured that fixing this is not going to have a negative impact on anything else and I'm surprised that it's only just cropped up with Apache 2.2.x when the project has been established for a long time.
We've got the reason (incorrect query string parsing), the fix, but not the underlying cause which I'll have to investigate. If not found then an option will be added to the menu rather than changing the code base, otherwise when updating there may be a lot of upset people.
Comment #9
areke commentedComment #10
jaylotta commentedThis occurs on linux and windows servers. Test and live
The other stuff you said is kinda Greek to me. Sorry for my limited understanding. I can definitely see, when I look at the code, why you think that this should be working. However, it remains that my ugly hack delivers the boosted page.
I wish I understood the Drupal page delivery mechanisms better so that I could help you.
Comment #11
bgm commentedIt's weird because the query shows: "page=0%2C1", but under Linux at least (although I doubt it's OS-specific), it shows "page=0,1".
Passing the string through urlencode() does encode "," as "%2C".. so not sure what's happening.. Apache doesn't seem to expect %2C, and therefore not finding the file.
Just to be sure, are you using the latest version of Drupal core?
Comment #12
Anonymous (not verified) commentedI've been trying to check off the windows side of things because of one of jaylotta's previous posts #2135835: Node creation causes 'Undefined index: extension in boost_expire_cache()' indicated a windows set up. Trying to dot all of the i's etc...
Having checked the apache documentation for 2.2 and 2.4 the URL's are unescaped before checking (in both set ups) so page=0,1 is valid (ref: B flag and having to re-encode rules for rewrite processing http://httpd.apache.org/docs/2.4/rewrite/flags.html#flag_b ).
I may have the wrong end of the stick but I believe that the string is not to be encoded for the file name, but decoded.
Comment #13
jaylotta commentedI just recently applied the 7.24 update, but the problem existed on 7.23 as well.
I think Phillip is onto it with this...
"I may have the wrong end of the stick but I believe that the string is not to be encoded for the file name, but decoded."
I'm pretty sure Apache is being asked for page=0,1 which is why thought using the B option in the boost rewrite section would fix it and make Apache look for %2C but couldn't get it to work.
My hack pretty much does what Phillip is say and it seems to work.
Be aware though that if BOTH files are not present then the page=0%2C1 file is rebuilt and a boost file is not delivered. This was the wrinkle that made me think the problem is not in Apache but somewhere in the way boost is interacting with the page delivery system. That was when I just wrote my little fix and gave up.
Comment #14
bgm commentedI agree we should isolate the specific environment where such a change is needed, so that it can be an admin setting (or automatic if it can be detected, and is a server issue, not a browser issue). @jaylotta: please post more info about your environment.
And worst case, we should str_replace() on the filename before it is written, not duplicate the file.
Also, this issue is linked to non-ascii filenames too, such as:
http://example.org/été => %C3%A9t%C3%A9.html
This works fine with the standard rewrite rules:
RewriteCond %{DOCUMENT_ROOT}/cache/normal/%{SERVER_NAME}%{REQUEST_URI}_%{QUERY_STRING}\.html -s RewriteRule .* %{DOCUMENT_ROOT}/cache/normal/%{SERVER_NAME}%{REQUEST_URI}_%{QUERY_STRING}\.html [L,T=text/html](i.e. to consider as a use-case to test if investigating the 'B' rewrite option)
Comment #15
jaylotta commentedI use WAMP for testing and a shared linux hosting service for the production environment. Both are running PHP 5.4 and Apache 2.2
I am using the htacess rules boost supplied with the standard drupal htacess file.
Let me know if you need more verbose information.
I've tried this on FF, Chrome, IE, Android. I've had testers from outside my network check it too. All have the same results.
I also tried with other non-ascii characters and had the same problem.
From what I can tell, Apache is looking for the page=0%2C1 file but Drupal is looking for the page=0,1 file and when it can't find it, hitting the boost_exit function. This is only my speculation though.