Download & Extend

Handle ampersands in search queries and other URLs when clean URLs are on

Project:Drupal core
Version:4.7.2
Component:search.module
Category:bug report
Priority:critical
Assigned:drumm
Status:closed (works as designed)

Issue Summary

When you do a search containing '&', it is encoded as '%26' in the URL. mod_rewrite decodes this and puts anything after '&' in the query string. This patch encodes the '%' to '%25', giving us '%2526', a double-encoded ampersand.

Makes searches and other paths containing '&' work.

AttachmentSizeStatusTest resultOperations
common.inc.diff.txt783 bytesIgnored: Check issue status.NoneNone

Comments

#1

The same problem exists for #. I was going to look into it, but I'd much rather prefer a fix in the rewrite rules rather than producing such ugly URLs. Unfortunately it might be hardcoded behaviour.

Shouldn't the comment say: "... to counter-act the extra decoding in mod_rewrite." ?

#2

I haven't looked at #. This is meant to solve &.

I spent some time reading over mod_rewrite rules and am fairly certain that this is not fixable within the confines of .htaccess. Here is the relevant bug over on Apache's side: http://issues.apache.org/bugzilla/show_bug.cgi?id=32328#c8 (doesn't seem too likely to be fixed.)

#3

You're right, it cannot be fixed in .htaccess. However, the patch on that Apache issue you linked to would not help. There is already a suitable RewriteMap to use to undo most of the damage (escape), but in order to use it, you need a global RewriteMap directive in httpd.conf (not even per VirtualHost or Directory). That rules it out for us:

# httpd.conf
RewriteMap escape int:escape

# .htaccess
RewriteRule ^(.*)$ index.php?q=${escape:$1} [L,QSA]

I updated that Apache Bugzilla report all the same. It's another shining example of why blindly applying text transformation functions is the best way to screw yourself over.

So, I did a test and aside from & and #, no other characters need to be escaped (*). Patch attached. I'm not sure why you had that odd (bool) cast and TRUE check. Isn't that what if ($var) implicitly does? I also merged the second str_replace into the first.

(*) For some values of 'no'.

AttachmentSizeStatusTest resultOperations
mod_rewrite.sucks.patch1.16 KBIgnored: Check issue status.NoneNone

#4

Status:needs review» reviewed & tested by the community

Looks okay to me. I was copying the clean URL test from earlier in the code, which I thought was a bit weird, but decided to be consistent with it. This is good too.

#5

<?php
+ * - To avoid problems with mod_rewrites built-in unescaping, we double-escape
+ *   ampersands and hashes, when clean URLs are used.
?>

Could we clarify 'problems' in the PHPdoc. It would be nice to be a little bit more specific (not verbose).

#6

How are these comments?

AttachmentSizeStatusTest resultOperations
common.inc.diff_0.txt1.28 KBIgnored: Check issue status.NoneNone

#7

Tested on a 4.7 install and works well.

#8

drumm: that's more clear, thanks.

I think "mod_rewrite's unescapes" should be "mod_rewrite unescapes" though (no "'s").

Feel free to commit.

#9

(I wonder how this affect IIS or Lighttpd, but I guess we'll figure that out ...)

#10

Status:reviewed & tested by the community» fixed

Committed to HEAD.

#11

Status:fixed» closed (fixed)

#12

Version:<none>» 4.7.2
Status:closed (fixed)» needs review

Hi,

why so much code, the patch at the attachment does the same with less lines?

AttachmentSizeStatusTest resultOperations
common.inc_27.patch439 bytesIgnored: Check issue status.NoneNone

#13

Category:bug report» task
Status:needs review» fixed

Creazion: your patch looks incomplete. You missed the '#'.

#14

Status:fixed» needs review

Hi Dries,

sorry i made a wrong patch with diff on windows. The attached patch includes the '#'.

AttachmentSizeStatusTest resultOperations
common.inc.2.patch421 bytesIgnored: Check issue status.NoneNone

#15

Status:needs review» needs work

- This misses ampersand encoding, which has the problem as #.
- The extra encoding should only happen when clean urls are on since this is a workaround for a bug in mod_rewrite.
- This needs to be a patch against HEAD.

#16

Status:needs work» closed (won't fix)

Creazion: your patch does not do the same as what was committed. The goal is to double encode ampersands and hashes. Yours does not encode them at all.

#17

Status:closed (won't fix)» fixed

drumm's patch was also committed to 4.7

#18

Status:fixed» closed (fixed)

#19

Version:4.7.2» 4.7.3
Status:closed (fixed)» active

Shouldn't this have looked more like:

function drupal_urlencode($text) {
if (variable_get('clean_url', '0')) {
return str_replace(array('%2F', '%26', '%23'),
array('/', '%2526', '%2523'),
urlencode($text));
}
else {
$text = str_replace('%2F', '/', urlencode($text));
return str_replace('%23', '#', urlencode($text));

}
}

Otherwise all the #'s are '%23, such as in the "login or register to post comments" links when clean urls is enabled.

#20

hmm... well that bit I posted above doesn't actually fix the problem, but on drupal.org, when you login from a link below a post, you get page not found because of the %23.

#21

Category:task» bug report

changed this from task to bug since this committed patch created a bug in the login.

#22

Priority:normal» critical

Raising awareness (critical) for this is it's easily reproduced on d.o. and clearly broken.

#23

You guys are forgetting about "Named Anchors" double-encoded # mean you cant have a Named Anchor as a menu item.

I think when the requested page is the search page then double encode # but every where else leave #

<?php
function drupal_urlencode($text) {
 
$hash = (arg(0) == 'search' ? '%2523' : '#');
  if (
variable_get('clean_url', '0')) {
    return
str_replace(array('%2F', '%26', '%23'),
                       array(
'/', '%2526', $hash),
                      
urlencode($text));
  }
  else {
    return
str_replace('%2F', '/', urlencode($text));
  }
}
?>

The problem with this solution is that any Named Anchor menu links breaks on the search page and when clicked on goes to page not found. Maybe a better solution would be to look at replacing # with double-encoded # somehow only when it comes from the search form id=search_form and form id=search or in the search_menu function ('path' => 'search/'. $name . $keys).

#24

In 5.0, this code has been changed to:

<?php
function drupal_urlencode($text) {
  if (
variable_get('clean_url', '0')) {
    return
str_replace(array('%2F', '%26', '%23'),
                       array(
'/', '%2526', '%2523'),
                      
urlencode($text));
  }
  else {
    return
str_replace('%2F', '/', urlencode($text));
  }
}
?>

Does this need to be backported? Is there a problem still? Are we sure this is not just a case of some places lacking a call to drupal_urlencode() ?

#25

Status:active» needs work

This patch makes it possible to both search for special characters (#,&) and allow named anchors in menu items (node/*#name). It is not very elegantly implemented and I suggest somebody that understands the workings of the core modules better than me look to implement it in a better way. But you can get the general idea of what I was trying to achieve.

This is the first patch I have created that involves multiple files, hope I did it correctly.
I hope you find it useful.

AttachmentSizeStatusTest resultOperations
url_special_char.patch2.45 KBIgnored: Check issue status.NoneNone

#26

Status:needs work» active

This patch is a completely wrong approach to solving this issue. Search.module should receive no special treatment whatsoever.

Again: is there still a bug that needs to be addressed in the latest 4.7 release?

#27

Yes, this is an issue and can be seen now on drupal.org if you log out. The "login to comment" links look like:

http://drupal.org/user/login?destination=comment/reply/88728%2523comment...

Where the '#' has been encoded

#28

Instead of trying to change the drupal_urlencode function maybe we should be focusing on the search module as it seams that is where the problem lies when it comes to special characters.

Here is another go.

<?php
function drupal_urlencode($text) {
 
$search = array('%2F', '%23');
 
$replace = array('/', '#');
  if (
variable_get('clean_url', '0')) {
    
$search[] = '%26';
    
$replace[] ='%2526';
  }
  return
str_replace($search, $replace, urlencode($text));
}
?>

But all these solutions means their is code in the drupal_urlencode function just for the search module maybe we should focus on the data the search module receives rather than all the current solutions.

<?php
function drupal_urlencode($text) {
  return
str_replace(array('%2F', '%23'), array('/', '#'), urlencode($text));
}
?>

#29

Status:active» closed (works as designed)

Did you bother to actually take a look at the URL in question?

http://drupal.org/user/login?destination=comment/reply/88728%2523comment...

This means: We are on the "user/login" page. After we finish logging in, we want to proceed to "comment/reply/88728#comment...". In other words, the #comment fragment identifier is part of the destination value, not part of the normal URL. If it was not escaped, it would be ignored by PHP.

By design.

#30

Title:Handle ampersands in search queries and other URLs when clean URLs are on» Use THE_REQUEST

see:
http://issues.apache.org/bugzilla/show_bug.cgi?id=32328#c12

If you need the unescaped uri with all its consequences, use the ENV
THE_REQUEST, which contains the full untouched request string like
GET /foo%20bar?foo=bar HTTP/1.1

That works for us in Gallery 2. We're using THE_REQUEST for a long time now, with success.

e.g.

    RewriteCond %{THE_REQUEST} /gallery2/tag/([^?/]+)
    RewriteRule .   /gallery2/main.php?g2_view=tags.VirtualAlbum&g2_tagName=%1   [QSA,L]

#31

Title:Use THE_REQUEST» Handle ampersands in search queries and other URLs when clean URLs are on

Reverting title

nobody click here