I am having problems with plus signs ("+") in URLs. I was running Drupal 4.7 until recently. I do not remember why, but I was using the plus signs ("+") as a separator of words in all URLs (unfortunately, I can't remember how I set it up 3 years ago), i.e. www.mysite.com/new+york and everything was working perfect. Now I upgraded to Drupal 6.19. When I look at the url_alias table in the database, I can see that aliases are without pluses, so the new york is there stored as "new york" in the dst field. This is now the same as before the upgrade. However, when I load the page, the URL is www.mysite.com/new%20york now; it is no longer www.mysite.com/new+york as it used to be before the core upgrade. In my attempt to solve this, I implemented the pathauto module which enables the admin to control the patern of the URL. However, the module does not work when I select that I want to use + as a URL separator. See the attached pictures. How can I troubleshoot this, please?

My setup: mod_rewrite enabled, Apache 2.2.15, clean urls enabled, Drupal 6.19

CommentFileSizeAuthor
pic3.jpg32.46 KBxjessie007
pic2.jpg14.51 KBxjessie007
pic1.jpg19.77 KBxjessie007

Comments

greggles’s picture

Project: Pathauto » Drupal core
Version: 6.x-1.4 » 6.x-dev
Component: Code » path.module
Category: bug » support
Status: Active » Closed (works as designed)

This is how the core path module works.

Try turning off the pathauto modulue and creating a path with a space in it and you should see the same results.

xjessie007’s picture

I discovered the pathauto module has nothing to do with this problem. The problem is in the common.inc. This behavior is caused by the patch located here http://drupal.org/files/issues/rawurlencode_0.patch and referenced here http://drupal.org/node/191116 (post #4).

Index: includes/common.inc
===================================================================
RCS file: /cvs/drupal/drupal/includes/common.inc,v
retrieving revision 1.710
diff -u -r1.710 common.inc
--- includes/common.inc	4 Nov 2007 21:24:09 -0000	1.710
+++ includes/common.inc	10 Nov 2007 22:20:28 -0000
@@ -2272,10 +2272,10 @@
   if (variable_get('clean_url', '0')) {
     return str_replace(array('%2F', '%26', '%23', '//'),
                        array('/', '%2526', '%2523', '/%252F'),
-                       urlencode($text));
+                       rawurlencode($text));
   }
   else {
-    return str_replace('%2F', '/', urlencode($text));
+    return str_replace('%2F', '/', rawurlencode($text));
   }
 }

The patch is correct, but it breaks url aliases that are stored with spaces in the database on the url_alias table (spaces are not correct). Urlencode changes spaces into + signs. Rawurlencode changes spaces to %20. I do not know how to fix the whole problem correctly, but to make it at least work, changing rawurlencode back to urlencode does the trick. Drupal 7 is said to handle + signs in URL in a better way (?). I hope this helps to others. It took me 3 days to find the cause of the issue.

jp.stacey’s picture

Title: Plus sign in URL replaced by %20 » Spaces throughout URLs being replaced by plus signs, not %20 as standard
Status: Closed (works as designed) » Active

I think this is a serious and true bug, but I've reworded the title to be unambiguous based on what I think xjessie007 means.

As far as I can tell spaces and plus symbols are not equivalent in URLs generally: they're only equivalent in the query string component as a matter of convention. Everywhere else spaces should be encoded as "%20". The special use of plus symbols in the query string part of URLs is discussed by the W3C here (compare "Conventional URI encoding scheme" versus "Query strings.")

As xjessie007's patch suggests, the bug itself arises from misusing urlencode(). according to its documentation this function is only ever to be used as follows:

This function is convenient when encoding a string to be used in a query part of a URL, as a convenient way to pass variables to the next page.

That is, urlencode() should only be used for the query part, not the whole URL. Using the plus sign instead of "%20" in the non-query part of a URL definitely causes URLs to be misinterpreted by Apache (see #528452: Encoding URLs: spaces versus plus signs versus percent 20 for earlier reporting.) So I think "%20" is right and "+" is strictly wrong.

greggles’s picture

Status: Active » Closed (works as designed)

Well, could you create a new issue for it? I'd rather not be part of this discussion just because the issue was erroneously placed in Pathauto and I review all Pathauto bugs...

jp.stacey’s picture

Status: Closed (works as designed) » Active

Sorry to re-open, greggles: I don't know where it should be reported if not for Drupal core > path.module, as is currently set on this issue. Could you advise?

greggles’s picture

Status: Active » Closed (works as designed)

It's fine to make it a core issue, but I'd prefer a new one since it seems likely to be a bikeshed and I'd rather not have it in my tracker.

jp.stacey’s picture

Makes sense. Anyway, I can no longer replicate it with Drupal 6.19, PHP 5.3.2-1ubuntu4.5 (5.2 recommended for Drupal but my home machine is a slightly funny environment), so it's possible something else has fixed this.

Here's what I've tried:

* Upload an attachment with spaces in its name (this is how we originally got the problematic paths)
* Set a node's path (without pathauto enabled) to have spaces in it
* Install pathauto, set the separator to be " ", and save a node with an automatic alias

All of these create paths with spaces in the database, which are then successfully encoded to %20 strings by url() . So, er, worksforme. Now it does, anyway.

It could be that this has been fixed incidentally by some other bugfix (not clear how that would be as xjessie007 reports this bug in 6.19) or that our client's environment has a considerably older PHP version which was buggy in this function; in which case, if we see the bug again, we'll file specifics in a separate ticket.

xjessie007, if you can get specific steps together for how to reproduce this bug - if possible please without pathauto, as I think that's a complication - then do ping me with an email if there's any progress on a separate ticket. Right now I just can't reproduce it :(