A search for required words is done with plus signs in Apachsolr. E. g. this search looks for the words "search" AND "plus".
+search +plus

When submitting the search, the browser gets redirected to http://drupal.org/search/apachesolr_search/%2Bsearch%20%2Bplus
-> the plus signs are getting URL-encoded.

Unfortunately, plus signs in URLs are replaced by spaces. Thus, the markers for the required words get lost, resulting in a search for
search plus

This can be easily reproduced with the Drupal.org-Search, that uses Apachesolr too.

See also the screenshot!

Comments

pwolanin’s picture

Version: 6.x-1.0-beta10 » 6.x-1.x-dev
Priority: Normal » Critical

This is pretty important to fix

pwolanin’s picture

This would appear to be a side-effect of Apache re-writing the clean URL.

Look at: http://drupal.org/?q=search/apachesolr_search/%2Bsearch%20%2Bplus

pwolanin’s picture

see also: http://api.drupal.org/api/function/drupal_urlencode/6

We can maybe similarly work around the Apache quirk by transforming %2B to %252B

http://drupal.org/search/apachesolr_search/%252Bsearch%20%252Bplus

Very annoying however. Apache 2.2 has a rewrite rule flag 'B' that might be an alternative. http://httpd.apache.org/docs/2.2/mod/mod_rewrite.html

pwolanin’s picture

An alternative (not sure if it's nice or not) is to tweak the submit function so that it uses the (deprecated) keys query param like:

http://drupal.org/search/apachesolr_search/?keys=%2Bsearch%20%2Bplus

pwolanin’s picture

Title: Plus signs ("+") disappear in search » Plus signs ("+") disappear in search when using clean URLs with the Apache webserver

Woudl be good to find out if any possible fix would break on lighty or nginx, etc. I guess it must not if the drupal_urlencode works?

JacobSingh’s picture

StatusFileSize
new2.83 KB

I looked into this a bit, and I'm really at a loss. As Peter wrote, I think there are only 2 options:

1. The [B] Flag

This does look promising, but could affect intended decoding elsewhere, plus doesn't work on Apache < 2.2 so I think we can rule that out.

2. double url encoding

Damien Tournoud in email:

"Apache mod_rewrite and PHP have a lot of history in doing silly stuff with URLs.

I find drupal_urlencode() ugly: I never liked the idea of generating
wrong URLs just to cope with Apache decoding them before passing them
to PHP which decodes them another time.

For Drupal 7, I suggest we do our own parsing of
$_SERVER["REQUEST_URI"]. It should not be that slower (parsing an URL
is really easy), and would definitely be much cleaner.
"

I agree that the double urlencoding is ugly and to be avoided. Still, until the D7 change you propose (I agree with it) is a reality, this seems like the best way to go IMO.

Attached is a patch which should do this.

3. Use the $keys

As bad as this sounds, it is still there in D6, and therefor can be used until we properly solve this problem in D7.
It does make our URLs not so pretty though, so I'm against it.

JacobSingh’s picture

Status: Active » Needs review
pwolanin’s picture

Status: Needs review » Needs work

There seems to be a ton of unrelated changes in the patch.

Also, we only want to double-encode '+' and only if clean URLs are enabled.

pwolanin’s picture

using 'keys' is maybe the "safest", though it would require a bit more rewriting.

JacobSingh’s picture

The patch looks big, but that's just because I moved a conditional to be more efficient. It's actually only a few lines.

I think we would want to 2x urlencode everything, wouldn't we for consistency? What else does mod_rewrite zap? If someone wanted to search for "string/with/slash/in/it" should they be able to?

pwolanin’s picture

Drupal core already urlencodes links and does special casing for Apache quirks, so let's keep it simple. We will also need to fix theoutput of the facet links

pwolanin’s picture

looks like to do what Damien suggests would require a combination of $_SERVER["REQUEST_URI"]; and $_SERVER['PHP_SELF'];

However, I'm not sure we can avoid some special urlencoding if Apache still segfaults on an encoded '/'

damien tournoud’s picture

I implemented #484554: Stop relying on Apache for determining the current path for Drupal 7. The only blocking point is the encoded '/', but in our current practice, we output those directly in the URL, and collapse several path parts in one. A search for "n/a" will thus return:

http://drupal.org/search/apachesolr_search/n/a

I don't see any need to change that for now.

pwolanin’s picture

Status: Needs work » Needs review
StatusFileSize
new6.14 KB

I think this works - gets both facets links and the form submission with a minimal substitution.

JacobSingh’s picture

Status: Needs review » Reviewed & tested by the community

Looks good to me. Funny, this is exactly what I did after your comment about not urlencoding everything, but I wasn't sure it was comprehensive. Seems like it's a good start though, and I think it should go in.

pwolanin’s picture

Status: Reviewed & tested by the community » Fixed

committed to 6.x

Status: Fixed » Closed (fixed)

Automatically closed -- issue fixed for 2 weeks with no activity.

xjessie007’s picture

Title: Plus signs ("+") disappear in search when using clean URLs with the Apache webserver » Plus signs ("+") disappear in URL when using clean URLs with the Apache webserver
Status: Closed (fixed) » Needs work
Issue tags: +url, +url separator, +plus sign
StatusFileSize
new32.46 KB
new14.51 KB
new19.77 KB

I think this needs to be opened again. I am having problems with plus signs ("+") in URLs. Let me explain the history first. I was running Drupal 4.7 until recently. I do not remember why, but I was using the plus signs ("+") as a separator of words in all URLs, i.e. www.mysite.com/new+york and everything was working perfect.

Now I upgraded to Drupal 6.19. When I look at the url_alias table in the database, I can see that aliases are without pluses, so the new york is there stored as "new york" in the dst field. This is now the same as before the upgrade. When I load the page, the URL is www.mysite.com/new%20york now; it is no longer www.mysite.com/new+york as it used to be before the core upgrade. See the attached pictures.

Can anyone help, please, which way to go now? How can I troubleshoot this? Would I need to change something in the path.inc, or what can I do? Can this be fixed through rewrite rule in htaccess? Thanks.

My setup: mod_rewrite enabled, Apache 2.2.15, clean urls enabled, Drupal 6.19
PS: Changing URL separator from + to the - sign is probably not an option for me as I have very many hardcoded links.

janusman’s picture

Status: Needs work » Closed (fixed)

@xjessie007: Is your problem related to the Apache Solr Search Integration module? You seem to be talking about a problem with URL aliases and not paths starting with "search/apachesolr_search/" (which would be the only way your issue would have something to do with this module's issue queue) =)

Will close for now, if you think this really has something to do with Apache Solr and the changes talked about in this issue, then reopen.

etibmw’s picture

Hi
We have the same problem here but only on some of the links.
the links in the facet and sort work (the + is replaced with %2B)
but the links in the pager and the suggestion are not replaced (the + is still there, so the + is removed from the page)

we are using 6.x-1.6+9-dev version and can see the str_replace,

how did you guys solve this problem?

thanks

etibmw’s picture

etibmw’s picture

Hi All,

Well since no one replied to this issue i moved forward and fixed it.
first i added the b flag to my .htaccess so now the last lin looks like this:
RewriteRule ^(.*)$ index.php?q=$1 [L,QSA,B]
notice the ,B at the end of the line.
now this solved the problem but i needed to remove the fixes added to the module, so in three places i removed the str_replace('+', '%2B',
and all is well.
just be sure your apache is 2.2+ (so you have the B flag option)

hope this helps

i am closing this issue now