A search for required words is done with plus signs in Apachsolr. E. g. this search looks for the words "search" AND "plus".
+search +plus
When submitting the search, the browser gets redirected to http://drupal.org/search/apachesolr_search/%2Bsearch%20%2Bplus
-> the plus signs are getting URL-encoded.
Unfortunately, plus signs in URLs are replaced by spaces. Thus, the markers for the required words get lost, resulting in a search for
search plus
This can be easily reproduced with the Drupal.org-Search, that uses Apachesolr too.
See also the screenshot!
Comments
Comment #1
pwolanin commentedThis is pretty important to fix
Comment #2
pwolanin commentedThis would appear to be a side-effect of Apache re-writing the clean URL.
Look at: http://drupal.org/?q=search/apachesolr_search/%2Bsearch%20%2Bplus
Comment #3
pwolanin commentedsee also: http://api.drupal.org/api/function/drupal_urlencode/6
We can maybe similarly work around the Apache quirk by transforming %2B to %252B
http://drupal.org/search/apachesolr_search/%252Bsearch%20%252Bplus
Very annoying however. Apache 2.2 has a rewrite rule flag 'B' that might be an alternative. http://httpd.apache.org/docs/2.2/mod/mod_rewrite.html
Comment #4
pwolanin commentedAn alternative (not sure if it's nice or not) is to tweak the submit function so that it uses the (deprecated) keys query param like:
http://drupal.org/search/apachesolr_search/?keys=%2Bsearch%20%2Bplus
Comment #5
pwolanin commentedWoudl be good to find out if any possible fix would break on lighty or nginx, etc. I guess it must not if the drupal_urlencode works?
Comment #6
JacobSingh commentedI looked into this a bit, and I'm really at a loss. As Peter wrote, I think there are only 2 options:
1. The [B] Flag
This does look promising, but could affect intended decoding elsewhere, plus doesn't work on Apache < 2.2 so I think we can rule that out.
2. double url encoding
Damien Tournoud in email:
"Apache mod_rewrite and PHP have a lot of history in doing silly stuff with URLs.
I find drupal_urlencode() ugly: I never liked the idea of generating
wrong URLs just to cope with Apache decoding them before passing them
to PHP which decodes them another time.
For Drupal 7, I suggest we do our own parsing of
$_SERVER["REQUEST_URI"]. It should not be that slower (parsing an URL
is really easy), and would definitely be much cleaner.
"
I agree that the double urlencoding is ugly and to be avoided. Still, until the D7 change you propose (I agree with it) is a reality, this seems like the best way to go IMO.
Attached is a patch which should do this.
3. Use the $keys
As bad as this sounds, it is still there in D6, and therefor can be used until we properly solve this problem in D7.
It does make our URLs not so pretty though, so I'm against it.
Comment #7
JacobSingh commentedComment #8
pwolanin commentedThere seems to be a ton of unrelated changes in the patch.
Also, we only want to double-encode '+' and only if clean URLs are enabled.
Comment #9
pwolanin commentedusing 'keys' is maybe the "safest", though it would require a bit more rewriting.
Comment #10
JacobSingh commentedThe patch looks big, but that's just because I moved a conditional to be more efficient. It's actually only a few lines.
I think we would want to 2x urlencode everything, wouldn't we for consistency? What else does mod_rewrite zap? If someone wanted to search for "string/with/slash/in/it" should they be able to?
Comment #11
pwolanin commentedDrupal core already urlencodes links and does special casing for Apache quirks, so let's keep it simple. We will also need to fix theoutput of the facet links
Comment #12
pwolanin commentedlooks like to do what Damien suggests would require a combination of
$_SERVER["REQUEST_URI"];and$_SERVER['PHP_SELF'];However, I'm not sure we can avoid some special urlencoding if Apache still segfaults on an encoded '/'
Comment #13
damien tournoud commentedI implemented #484554: Stop relying on Apache for determining the current path for Drupal 7. The only blocking point is the encoded '/', but in our current practice, we output those directly in the URL, and collapse several path parts in one. A search for "n/a" will thus return:
I don't see any need to change that for now.
Comment #14
pwolanin commentedI think this works - gets both facets links and the form submission with a minimal substitution.
Comment #15
JacobSingh commentedLooks good to me. Funny, this is exactly what I did after your comment about not urlencoding everything, but I wasn't sure it was comprehensive. Seems like it's a good start though, and I think it should go in.
Comment #16
pwolanin commentedcommitted to 6.x
Comment #18
xjessie007 commentedI think this needs to be opened again. I am having problems with plus signs ("+") in URLs. Let me explain the history first. I was running Drupal 4.7 until recently. I do not remember why, but I was using the plus signs ("+") as a separator of words in all URLs, i.e. www.mysite.com/new+york and everything was working perfect.
Now I upgraded to Drupal 6.19. When I look at the url_alias table in the database, I can see that aliases are without pluses, so the new york is there stored as "new york" in the dst field. This is now the same as before the upgrade. When I load the page, the URL is www.mysite.com/new%20york now; it is no longer www.mysite.com/new+york as it used to be before the core upgrade. See the attached pictures.
Can anyone help, please, which way to go now? How can I troubleshoot this? Would I need to change something in the path.inc, or what can I do? Can this be fixed through rewrite rule in htaccess? Thanks.
My setup: mod_rewrite enabled, Apache 2.2.15, clean urls enabled, Drupal 6.19
PS: Changing URL separator from + to the - sign is probably not an option for me as I have very many hardcoded links.
Comment #19
janusman commented@xjessie007: Is your problem related to the Apache Solr Search Integration module? You seem to be talking about a problem with URL aliases and not paths starting with "search/apachesolr_search/" (which would be the only way your issue would have something to do with this module's issue queue) =)
Will close for now, if you think this really has something to do with Apache Solr and the changes talked about in this issue, then reopen.
Comment #20
etibmw commentedHi
We have the same problem here but only on some of the links.
the links in the facet and sort work (the + is replaced with %2B)
but the links in the pager and the suggestion are not replaced (the + is still there, so the + is removed from the page)
we are using 6.x-1.6+9-dev version and can see the str_replace,
how did you guys solve this problem?
thanks
Comment #21
etibmw commentedComment #22
etibmw commentedHi All,
Well since no one replied to this issue i moved forward and fixed it.
first i added the b flag to my .htaccess so now the last lin looks like this:
RewriteRule ^(.*)$ index.php?q=$1 [L,QSA,B]
notice the ,B at the end of the line.
now this solved the problem but i needed to remove the fixes added to the module, so in three places i removed the str_replace('+', '%2B',
and all is well.
just be sure your apache is 2.2+ (so you have the B flag option)
hope this helps
i am closing this issue now