Posted by drumm on June 13, 2006 at 10:17pm
| Project: | Drupal core |
| Version: | 4.7.2 |
| Component: | search.module |
| Category: | bug report |
| Priority: | critical |
| Assigned: | drumm |
| Status: | closed (works as designed) |
Issue Summary
When you do a search containing '&', it is encoded as '%26' in the URL. mod_rewrite decodes this and puts anything after '&' in the query string. This patch encodes the '%' to '%25', giving us '%2526', a double-encoded ampersand.
Makes searches and other paths containing '&' work.
| Attachment | Size | Status | Test result | Operations |
|---|---|---|---|---|
| common.inc.diff.txt | 783 bytes | Ignored: Check issue status. | None | None |
Comments
#1
The same problem exists for #. I was going to look into it, but I'd much rather prefer a fix in the rewrite rules rather than producing such ugly URLs. Unfortunately it might be hardcoded behaviour.
Shouldn't the comment say: "... to counter-act the extra decoding in mod_rewrite." ?
#2
I haven't looked at #. This is meant to solve &.
I spent some time reading over mod_rewrite rules and am fairly certain that this is not fixable within the confines of .htaccess. Here is the relevant bug over on Apache's side: http://issues.apache.org/bugzilla/show_bug.cgi?id=32328#c8 (doesn't seem too likely to be fixed.)
#3
You're right, it cannot be fixed in .htaccess. However, the patch on that Apache issue you linked to would not help. There is already a suitable RewriteMap to use to undo most of the damage (
escape), but in order to use it, you need a global RewriteMap directive in httpd.conf (not even per VirtualHost or Directory). That rules it out for us:# httpd.conf
RewriteMap escape int:escape
# .htaccess
RewriteRule ^(.*)$ index.php?q=${escape:$1} [L,QSA]
I updated that Apache Bugzilla report all the same. It's another shining example of why blindly applying text transformation functions is the best way to screw yourself over.
So, I did a test and aside from & and #, no other characters need to be escaped (*). Patch attached. I'm not sure why you had that odd (bool) cast and TRUE check. Isn't that what if ($var) implicitly does? I also merged the second str_replace into the first.
(*) For some values of 'no'.
#4
Looks okay to me. I was copying the clean URL test from earlier in the code, which I thought was a bit weird, but decided to be consistent with it. This is good too.
#5
<?php+ * - To avoid problems with mod_rewrites built-in unescaping, we double-escape
+ * ampersands and hashes, when clean URLs are used.
?>
Could we clarify 'problems' in the PHPdoc. It would be nice to be a little bit more specific (not verbose).
#6
How are these comments?
#7
Tested on a 4.7 install and works well.
#8
drumm: that's more clear, thanks.
I think "mod_rewrite's unescapes" should be "mod_rewrite unescapes" though (no "'s").
Feel free to commit.
#9
(I wonder how this affect IIS or Lighttpd, but I guess we'll figure that out ...)
#10
Committed to HEAD.
#11
#12
Hi,
why so much code, the patch at the attachment does the same with less lines?
#13
Creazion: your patch looks incomplete. You missed the '#'.
#14
Hi Dries,
sorry i made a wrong patch with diff on windows. The attached patch includes the '#'.
#15
- This misses ampersand encoding, which has the problem as #.
- The extra encoding should only happen when clean urls are on since this is a workaround for a bug in mod_rewrite.
- This needs to be a patch against HEAD.
#16
Creazion: your patch does not do the same as what was committed. The goal is to double encode ampersands and hashes. Yours does not encode them at all.
#17
drumm's patch was also committed to 4.7
#18
#19
Shouldn't this have looked more like:
function drupal_urlencode($text) {
if (variable_get('clean_url', '0')) {
return str_replace(array('%2F', '%26', '%23'),
array('/', '%2526', '%2523'),
urlencode($text));
}
else {
$text = str_replace('%2F', '/', urlencode($text));
return str_replace('%23', '#', urlencode($text));
}
}
Otherwise all the #'s are '%23, such as in the "login or register to post comments" links when clean urls is enabled.
#20
hmm... well that bit I posted above doesn't actually fix the problem, but on drupal.org, when you login from a link below a post, you get page not found because of the %23.
#21
changed this from task to bug since this committed patch created a bug in the login.
#22
Raising awareness (critical) for this is it's easily reproduced on d.o. and clearly broken.
#23
You guys are forgetting about "Named Anchors" double-encoded # mean you cant have a Named Anchor as a menu item.
I think when the requested page is the search page then double encode # but every where else leave #
<?phpfunction drupal_urlencode($text) {
$hash = (arg(0) == 'search' ? '%2523' : '#');
if (variable_get('clean_url', '0')) {
return str_replace(array('%2F', '%26', '%23'),
array('/', '%2526', $hash),
urlencode($text));
}
else {
return str_replace('%2F', '/', urlencode($text));
}
}
?>
The problem with this solution is that any Named Anchor menu links breaks on the search page and when clicked on goes to page not found. Maybe a better solution would be to look at replacing # with double-encoded # somehow only when it comes from the search form id=search_form and form id=search or in the search_menu function ('path' => 'search/'. $name . $keys).
#24
In 5.0, this code has been changed to:
<?phpfunction drupal_urlencode($text) {
if (variable_get('clean_url', '0')) {
return str_replace(array('%2F', '%26', '%23'),
array('/', '%2526', '%2523'),
urlencode($text));
}
else {
return str_replace('%2F', '/', urlencode($text));
}
}
?>
Does this need to be backported? Is there a problem still? Are we sure this is not just a case of some places lacking a call to drupal_urlencode() ?
#25
This patch makes it possible to both search for special characters (#,&) and allow named anchors in menu items (node/*#name). It is not very elegantly implemented and I suggest somebody that understands the workings of the core modules better than me look to implement it in a better way. But you can get the general idea of what I was trying to achieve.
This is the first patch I have created that involves multiple files, hope I did it correctly.
I hope you find it useful.
#26
This patch is a completely wrong approach to solving this issue. Search.module should receive no special treatment whatsoever.
Again: is there still a bug that needs to be addressed in the latest 4.7 release?
#27
Yes, this is an issue and can be seen now on drupal.org if you log out. The "login to comment" links look like:
http://drupal.org/user/login?destination=comment/reply/88728%2523comment...
Where the '#' has been encoded
#28
Instead of trying to change the drupal_urlencode function maybe we should be focusing on the search module as it seams that is where the problem lies when it comes to special characters.
Here is another go.
<?phpfunction drupal_urlencode($text) {
$search = array('%2F', '%23');
$replace = array('/', '#');
if (variable_get('clean_url', '0')) {
$search[] = '%26';
$replace[] ='%2526';
}
return str_replace($search, $replace, urlencode($text));
}
?>
But all these solutions means their is code in the drupal_urlencode function just for the search module maybe we should focus on the data the search module receives rather than all the current solutions.
<?phpfunction drupal_urlencode($text) {
return str_replace(array('%2F', '%23'), array('/', '#'), urlencode($text));
}
?>
#29
Did you bother to actually take a look at the URL in question?
http://drupal.org/user/login?destination=comment/reply/88728%2523comment...
This means: We are on the "user/login" page. After we finish logging in, we want to proceed to "comment/reply/88728#comment...". In other words, the #comment fragment identifier is part of the destination value, not part of the normal URL. If it was not escaped, it would be ignored by PHP.
By design.
#30
see:
http://issues.apache.org/bugzilla/show_bug.cgi?id=32328#c12
That works for us in Gallery 2. We're using THE_REQUEST for a long time now, with success.
e.g.
RewriteCond %{THE_REQUEST} /gallery2/tag/([^?/]+)RewriteRule . /gallery2/main.php?g2_view=tags.VirtualAlbum&g2_tagName=%1 [QSA,L]
#31
Reverting title