Handle ampersands in search queries and other URLs when clean URLs are on [#68886]

Comment	File	Size	Author
#25	url_special_char.patch	2.45 KB	NaX
#14	common.inc.2.patch	421 bytes	Creazion
#12	common.inc_27.patch	439 bytes	Creazion
#6	common.inc.diff_0.txt	1.28 KB	drumm
#3	mod_rewrite.sucks.patch	1.16 KB	Steven
	common.inc.diff.txt	783 bytes	drumm

Comment #1

Steven commented 14 June 2006 at 06:28

The same problem exists for #. I was going to look into it, but I'd much rather prefer a fix in the rewrite rules rather than producing such ugly URLs. Unfortunately it might be hardcoded behaviour.

Shouldn't the comment say: "... to counter-act the extra decoding in mod_rewrite." ?

Log in or register to post comments

Comment #2

drumm

he/him

NY, US

commented 21 June 2006 at 04:10

I haven't looked at #. This is meant to solve &.

I spent some time reading over mod_rewrite rules and am fairly certain that this is not fixable within the confines of .htaccess. Here is the relevant bug over on Apache's side: http://issues.apache.org/bugzilla/show_bug.cgi?id=32328#c8 (doesn't seem too likely to be fixed.)

Log in or register to post comments

Comment #3

Steven commented 21 June 2006 at 13:34

Status	File	Size
new	mod_rewrite.sucks.patch	1.16 KB

You're right, it cannot be fixed in .htaccess. However, the patch on that Apache issue you linked to would not help. There is already a suitable RewriteMap to use to undo most of the damage (escape), but in order to use it, you need a global RewriteMap directive in httpd.conf (not even per VirtualHost or Directory). That rules it out for us:

# httpd.conf
RewriteMap escape int:escape

# .htaccess
RewriteRule ^(.*)$ index.php?q=${escape:$1} [L,QSA]

I updated that Apache Bugzilla report all the same. It's another shining example of why blindly applying text transformation functions is the best way to screw yourself over.

So, I did a test and aside from & and #, no other characters need to be escaped (*). Patch attached. I'm not sure why you had that odd (bool) cast and TRUE check. Isn't that what if ($var) implicitly does? I also merged the second str_replace into the first.

(*) For some values of 'no'.

Log in or register to post comments

Comment #4

drumm

he/him

NY, US

commented 22 June 2006 at 08:57

Status:

Needs review

» Reviewed & tested by the community

Looks okay to me. I was copying the clean URL test from earlier in the code, which I thought was a bit weird, but decided to be consistent with it. This is good too.

Log in or register to post comments

Comment #5

dries commented 22 June 2006 at 12:30

+ * - To avoid problems with mod_rewrite's built-in unescaping, we double-escape
+ *   ampersands and hashes, when clean URLs are used.

Could we clarify 'problems' in the PHPdoc. It would be nice to be a little bit more specific (not verbose).

Log in or register to post comments

Comment #6

drumm

he/him

NY, US

commented 23 June 2006 at 08:48

Status	File	Size
new	common.inc.diff_0.txt	1.28 KB

How are these comments?

Log in or register to post comments

Comment #7

drumm

he/him

NY, US

commented 23 June 2006 at 22:32

Tested on a 4.7 install and works well.

Log in or register to post comments

Comment #8

dries commented 25 June 2006 at 15:03

drumm: that's more clear, thanks.

I think "mod_rewrite's unescapes" should be "mod_rewrite unescapes" though (no "'s").

Feel free to commit.

Log in or register to post comments

Comment #9

dries commented 25 June 2006 at 15:04

(I wonder how this affect IIS or Lighttpd, but I guess we'll figure that out ...)

Log in or register to post comments

Comment #10

drumm

he/him

NY, US

commented 2 July 2006 at 01:20

Status:

Reviewed & tested by the community

» Fixed

Committed to HEAD.

Log in or register to post comments

Comment #11

(not verified) commented 16 July 2006 at 01:34

Status:

Fixed

» Closed (fixed)

Log in or register to post comments

Comment #12

Creazion commented 22 July 2006 at 12:40

Version:		» 4.7.2
Status:	Closed (fixed)	» Needs review

Status	File	Size
new	common.inc_27.patch	439 bytes

Hi,

why so much code, the patch at the attachment does the same with less lines?

Log in or register to post comments

Comment #13

dries commented 22 July 2006 at 15:56

Category:	bug	» task
Status:	Needs review	» Fixed

Creazion: your patch looks incomplete. You missed the '#'.

Log in or register to post comments

Comment #14

Creazion commented 22 July 2006 at 20:07

Status:

Fixed

» Needs review

Status	File	Size
new	common.inc.2.patch	421 bytes

Hi Dries,

sorry i made a wrong patch with diff on windows. The attached patch includes the '#'.

Log in or register to post comments

Comment #15

drumm

he/him

NY, US

commented 22 July 2006 at 21:24

Status:

Needs review

» Needs work

- This misses ampersand encoding, which has the problem as #.
- The extra encoding should only happen when clean urls are on since this is a workaround for a bug in mod_rewrite.
- This needs to be a patch against HEAD.

Log in or register to post comments

Comment #16

Steven commented 22 July 2006 at 21:32

Status:

Needs work

» Closed (won't fix)

Creazion: your patch does not do the same as what was committed. The goal is to double encode ampersands and hashes. Yours does not encode them at all.

Log in or register to post comments

Comment #17

killes@www.drop.org commented 6 August 2006 at 18:05

Status:

Closed (won't fix)

» Fixed

drumm's patch was also committed to 4.7

Log in or register to post comments

Comment #18

(not verified) commented 20 August 2006 at 18:15

Status:

Fixed

» Closed (fixed)

Log in or register to post comments

Comment #19

onionweb commented 21 August 2006 at 15:58

Version:	4.7.2	» 4.7.3
Status:	Closed (fixed)	» Active

Shouldn't this have looked more like:

Otherwise all the #'s are '%23, such as in the "login or register to post comments" links when clean urls is enabled.

Log in or register to post comments

Comment #20

onionweb commented 21 August 2006 at 16:16

hmm... well that bit I posted above doesn't actually fix the problem, but on drupal.org, when you login from a link below a post, you get page not found because of the %23.

Log in or register to post comments

Comment #21

onionweb commented 22 August 2006 at 11:30

Category:

task

» bug

changed this from task to bug since this committed patch created a bug in the login.

Log in or register to post comments

Comment #22

AjK commented 18 September 2006 at 10:55

Priority:

Normal

» Critical

Raising awareness (critical) for this is it's easily reproduced on d.o. and clearly broken.

Log in or register to post comments

Comment #23

NaX commented 28 September 2006 at 16:16

You guys are forgetting about "Named Anchors" double-encoded # mean you cant have a Named Anchor as a menu item.

I think when the requested page is the search page then double encode # but every where else leave #

function drupal_urlencode($text) {
  $hash = (arg(0) == 'search' ? '%2523' : '#');
  if (variable_get('clean_url', '0')) {
    return str_replace(array('%2F', '%26', '%23'),
                       array('/', '%2526', $hash),
                       urlencode($text));
  }
  else {
    return str_replace('%2F', '/', urlencode($text));
  }
}

The problem with this solution is that any Named Anchor menu links breaks on the search page and when clicked on goes to page not found. Maybe a better solution would be to look at replacing # with double-encoded # somehow only when it comes from the search form id=search_form and form id=search or in the search_menu function ('path' => 'search/'. $name . $keys).

Log in or register to post comments

Comment #24

Steven commented 29 September 2006 at 15:48

In 5.0, this code has been changed to:

function drupal_urlencode($text) {
  if (variable_get('clean_url', '0')) {
    return str_replace(array('%2F', '%26', '%23'),
                       array('/', '%2526', '%2523'),
                       urlencode($text));
  }
  else {
    return str_replace('%2F', '/', urlencode($text));
  }
}

Does this need to be backported? Is there a problem still? Are we sure this is not just a case of some places lacking a call to drupal_urlencode() ?

Log in or register to post comments

Comment #25

NaX commented 8 October 2006 at 22:09

Status:

Active

» Needs work

Status	File	Size
new	url_special_char.patch	2.45 KB

This patch makes it possible to both search for special characters (#,&) and allow named anchors in menu items (node/*#name). It is not very elegantly implemented and I suggest somebody that understands the workings of the core modules better than me look to implement it in a better way. But you can get the general idea of what I was trying to achieve.

This is the first patch I have created that involves multiple files, hope I did it correctly.
I hope you find it useful.

Log in or register to post comments

Comment #26

Steven commented 10 October 2006 at 05:23

Status:

Needs work

» Active

This patch is a completely wrong approach to solving this issue. Search.module should receive no special treatment whatsoever.

Again: is there still a bug that needs to be addressed in the latest 4.7 release?

Log in or register to post comments

Comment #27

pwolanin commented 11 October 2006 at 23:56

Yes, this is an issue and can be seen now on drupal.org if you log out. The "login to comment" links look like:

http://drupal.org/user/login?destination=comment/reply/88728%2523comment...

Where the '#' has been encoded

Log in or register to post comments

Comment #28

NaX commented 12 October 2006 at 17:31

Instead of trying to change the drupal_urlencode function maybe we should be focusing on the search module as it seams that is where the problem lies when it comes to special characters.

Here is another go.

function drupal_urlencode($text) {
  $search = array('%2F', '%23');
  $replace = array('/', '#');
  if (variable_get('clean_url', '0')) {
     $search[] = '%26';
     $replace[] ='%2526';
  }
  return str_replace($search, $replace, urlencode($text));
}

But all these solutions means their is code in the drupal_urlencode function just for the search module maybe we should focus on the data the search module receives rather than all the current solutions.

function drupal_urlencode($text) {
  return str_replace(array('%2F', '%23'), array('/', '#'), urlencode($text));
}

Log in or register to post comments

Comment #29

Steven commented 14 October 2006 at 04:27

Status:

Active

» Closed (works as designed)

Did you bother to actually take a look at the URL in question?

http://drupal.org/user/login?destination=comment/reply/88728%2523comment...

This means: We are on the "user/login" page. After we finish logging in, we want to proceed to "comment/reply/88728#comment...". In other words, the #comment fragment identifier is part of the destination value, not part of the normal URL. If it was not escaped, it would be ignored by PHP.

By design.

Log in or register to post comments

Comment #30

valiant-1 commented 22 April 2007 at 05:12

Title:

Handle ampersands in search queries and other URLs when clean URLs are on

» Use THE_REQUEST

see:
http://issues.apache.org/bugzilla/show_bug.cgi?id=32328#c12

If you need the unescaped uri with all its consequences, use the ENV
THE_REQUEST, which contains the full untouched request string like
GET /foo%20bar?foo=bar HTTP/1.1

That works for us in Gallery 2. We're using THE_REQUEST for a long time now, with success.

e.g.

    RewriteCond %{THE_REQUEST} /gallery2/tag/([^?/]+)
    RewriteRule .   /gallery2/main.php?g2_view=tags.VirtualAlbum&g2_tagName=%1   [QSA,L]

Log in or register to post comments

Comment #31

asimmonds commented 22 April 2007 at 05:41

Title:

Use THE_REQUEST

» Handle ampersands in search queries and other URLs when clean URLs are on

Reverting title

Log in or register to post comments

Handle ampersands in search queries and other URLs when clean URLs are on

Comments

Comment #1

Comment #2

Comment #3

Comment #4

Comment #5

Comment #6

Comment #7

Comment #8

Comment #9

Comment #10

Comment #11

Comment #12

Comment #13

Comment #14

Comment #15

Comment #16

Comment #17

Comment #18

Comment #19

Comment #20

Comment #21

Comment #22

Comment #23

Comment #24

Comment #25

Comment #26

Comment #27

Comment #28

Comment #29

Comment #30

Comment #31

News items

Our community

Documentation

Drupal code base

Governance of community