Download & Extend

400 Bad Status if URL length limit exceeded

Project:Apache Solr Search Integration
Version:6.x-2.x-dev
Component:Code
Category:bug report
Priority:minor
Assigned:Unassigned
Status:closed (fixed)

Issue Summary

I'm working with SOLR on a software website where each node has 300 CCK field Yes/No settings (software features). I tried enabling all CCK filters for SOLR, but was getting "400 Status Bad Request" errors, though SOLR appeared to be working and indexing fine.

I imagine the problem is that SOLR filters / search requests are sent via $_GET[] over http, I mean like ?filters=tid:11 tid:12 tid:13
Now imagine 300 CCK filters... The URL length limit is 2048 characters. I believe that is the cause of the "400 Status Bad Request": the query URL ?filters= is over 2048 characters long, and therefore doesnt work.

Is there a way to send the filters via a $_POST instead? Like saving in a $_SESSION? Anything instead of $_GET. Does anyone have a suggestion or workaround for 300+ CCK filters?

You in my use case, users would be able to tick off 300 Yes/No filters, and find their matching software. E.g. "SOLR Search integration: Yes/No", "Blog feature: Yes/No" etc. With SOLR it would make an extremely powerful software search machine!

Comments

#1

Likely this is the same problem as: #685924: Coherent Access + Apache Solr = Buffer Overflow

URL limits is 2048 or 4096 depending on the web server - there is no easy way around this - just enable the filters you need.

#2

Ok, eventually I'll have to find a better solution. I'm planning on building a super-detailed software-search machine. Which requires at least 300 CCK "Yes/No" fields to work with SOLR. Anyway, maybe in Drupal 8...

#3

What if we call Apache_Solr_Service::search with Apache_Solr_Service::METHOD_POST as the last argument?

#4

Status:active» needs review

This seems to have no ill effect.

AttachmentSizeStatusTest resultOperations
761990-4-D6.patch2.67 KBIgnored: Check issue status.NoneNone

#5

Title:400 Bad Status on Large number of CCK fields» 400 Bad Status if URL length limit exceeded

#6

Did you check the request to see if the parameters are actually going into the POST body?

#7

Yes, they are. From the search method in Service.php in SolrPhpClient:

<?php
if ($method == self::METHOD_GET) {
  return
$this->_sendRawGet($this->_searchUrl . $this->_queryDelimiter . $queryString);
}
else if (
$method == self::METHOD_POST) {
  return
$this->_sendRawPost($this->_searchUrl, $queryString, FALSE, 'application/x-www-form-urlencoded');
}
?>

And the _sendRawPost method:

<?php
protected function _sendRawPost($url, $rawPost, $timeout = FALSE, $contentType = 'text/xml; charset=UTF-8')
{
 
stream_context_set_option($this->_postContext, array(
     
'http' => array(
       
// set HTTP method
       
'method' => 'POST',
       
// Add our posted content type
       
'header' => "Content-Type: $contentType",
       
// the posted content
       
'content' => $rawPost,
       
// default timeout
       
'timeout' => $this->_defaultTimeout
     
)
    )
  );
  ...
}
?>

#8

#9

some extra confirmation:

http://www.ibm.com/developerworks/java/library/j-solr1/

Solr accepts both HTTP GET and HTTP POST messages for queries.

Thanks for actually trying it James!

#10

Given that using POST request has potential performance implications I do not think it should be the default.

For example, the Solr servers for drupal.org are behind a varnish cache, so making all search requests via POST would typically bypass the cache.

Can we make this controlled by an opaque variable (i.e. one that is not set in the UI but only in settings.php or via devel etc.)? It would be good to have it there as an option for people that hit this problem, but 99% of users will never need it and should not enable it by mistake.

#11

Status:needs review» needs work

#12

Instead of a setting, how about using POST if the querystring is too long? No point in sending a GET request that will fail.

#13

The problem with that is that the max URL length may differ by server, so we'd still need a variable to control it.

#14

Status:needs work» closed (won't fix)

Ok, maybe this is not a bug in the module, but a server misconfiguration.

In Tomcat, it is possible to configure the maximum URL length using maxHttpHeaderSize http://tomcat.apache.org/tomcat-5.5-doc/config/http.html

In Apache, use the LimitRequestLine directive: http://httpd.apache.org/docs/2.2/mod/core.html#limitrequestline

Also configurable in IIS http://www.asp.net/Learn/whitepapers/aspnet4#0.2__Toc253429244

#15

Category:bug report» support request

#16

Status:closed (won't fix)» fixed

#17

Category:support request» bug report
Priority:normal» minor
Status:fixed» postponed

We use nginx in front of tomcat, but that can also be configured per:
http://forum.nginx.org/read.php?2,25207,25429 and http://wiki.nginx.org/NginxHttpCoreModule#large_client_header_buffers

So, I think it's still worth thinking about how to fix this in extreme cases - we'd probably override the search() method - but it's low priority.

#18

This *seems* to be the solution for those of us using the Jetty server included with Solr.

In etc/jetty.xml, add: *EDIT* See correct solution in comment #25 below =)

Needs testing, though.

#20

@#14: isn't URL length also limited by the browser? I thought IExplorer has a maximum of 2048 chars?

#21

For GET requests, yes. But not for POST requests (or, at least, the limit for POST requests is so high that a Solr URL is unlikely to reach it).

#22

Wait, we are now spreading FUD. This problem has nothing to do with a web browser.

relevant lines from _sendRawGet

<?php
//$http_response_header is set by file_get_contents
$response = new Apache_Solr_Response(@file_get_contents($url, false, $this->_getContext), $http_response_header, $this->_createDocuments, $this->_collapseSingleValueArrays);
?>

As you can see, the web browser is not making the request. The users web browser is not used here at all. The Apache web server is making the call to the Solr server.

#23

FUD is a strong word for miscommunication :P There is a separate issue that if a user selects a massive number of facets, for example, the browser URL may exceed Internet Explorer's limit. There's no solution for dealing with that right now. But, it is a tangential issue.

#24

Right, the problem where we've seen it in practice is from massive numbers of node access conditions, which are not exposed in the end-user URLs

#25

@Axol00 (who is a coworker) found out how to configure Jetty (bundled with Solr) to fix this =)

In jetty.xml, a line like

<Set name="headerBufferSize">65535</Set>

should be added within the <Call name="addConnector"> section. Like so:
  <Call name="addConnector">
      <Arg>
          <New class="org.mortbay.jetty.bio.SocketConnector">
            <Set name="port"><SystemProperty name="jetty.port" default="8983"/></Set>
            <Set name="maxIdleTime">50000</Set>
            <Set name="lowResourceMaxIdleTime">1500</Set>
            <!-- Next line added to handle incoming GET requests up to 64k in length -->
            <Set name="headerBufferSize">65535</Set>
          </New>
      </Arg>
    </Call>

I'm wondering if we should include a modified jetty.xml file along with the module... don't think it would hurt if they are the same across different versions of Solr.

I'll add this info to the Troubleshooting section of the handbook.

#26

I do not agree with #10. Of course we can start to patch or edit configs for all available solr servers, but I do not see any reason. If the request is large, varnish and other caches might turnout less performant than just passing the query to solr.

Is there any other reason, why not to take the suggested patch from #4 and maybe incorporate some length check defaulting to 8kb max or so?

The major advantage would be that even unforeseen large request would work by default.

#27

Version:6.x-2.x-dev» 6.x-1.x-dev

Jame's patch gives me fatal errors:

[16-Nov-2010 18:10:15] PHP Parse error:  syntax error, unexpected T_PAAMAYIM_NEKUDOTAYIM in apachesolr.module on line 1662
[16-Nov-2010 18:12:19] PHP Parse error:  syntax error, unexpected T_PAAMAYIM_NEKUDOTAYIM in apachesolr_search.module on line 1073

Here's a quick patch for testing purposes that uses the raw values of the constants.

AttachmentSizeStatusTest resultOperations
761990-method-post-27.patch1.66 KBIgnored: Check issue status.NoneNone

#28

Here's a rough stab at doing it from within the search method - sadly we can only do it correctly if we duplicate all the code from the base method. This is jsut a rough approximation based on param count.

AttachmentSizeStatusTest resultOperations
761990-method-post-28.patch940 bytesIgnored: Check issue status.NoneNone

#29

Subscribe

#30

Status:postponed» active

I think we can revisit this now

#31

Version:6.x-1.x-dev» 7.x-1.x-dev

Fix in HEAD first.

#32

I experienced this issue when enabling the apachesolr_text module. The patch in #28 fixes it, but only if I set the apachesolr_search_post_threshold to a very low number (site has about 20 text fields, using content permission module).

#33

We need to put together a real fix here - basically overriding the search() method of the underlying class so that we can determine the string length of the URL and decide on the method to use.

#34

I don't quite understand the difference between what you're saying in #33, and what the patch in #28 actually does. The patch does override the underlying search method.

#35

Yes, but not completely. It just counts the #param. We need the actual final string length to really get good behavior.

#36

Status:active» needs review

Here's a patch that overrides the Apache_Solr_Service::search method. Once the query string is built, it checks for the length, and if longer than 2000 characters, changes the method to use POST. The 2000 character limit is hard-coded. Not sure if that should be configurable (a little googling indicated that this was a rough limit, there's probably an actual limit somewhere in the Solr codebase).

AttachmentSizeStatusTest resultOperations
apachesolr.761990-36.patch2.41 KBIgnored: Check issue status.NoneNone

#37

Status:needs review» needs work

Looks like a reasonable start.

However, I think the should be 4000 by default, and be a variable. tomcat6, jetty, etc have different defaults and are configurable themselves. tomcat6 default is 4096 afaik.

Also, if we are bringing this code in, we should clean up the code style to Drupal standards.

#38

Status:needs work» needs review

Here is an updated patch that makes the threshold configurable (4k default), and also cleans up the coding standards (my reasoning with the previous patch left them as they were in the Apache_Solr_Service::search method for ease of diffing future changes to that code).

AttachmentSizeStatusTest resultOperations
apachesolr.761990-38.patch2.64 KBIgnored: Check issue status.NoneNone

#39

Oops. Incorrectly formatted patch.

AttachmentSizeStatusTest resultOperations
apachesolr.761990-39.patch2.47 KBIgnored: Check issue status.NoneNone

#40

Should potentially merge with http://drupal.org/node/1107502

Also, might make sense to haev this a per-server variables, since different servers might have different limits?

#41

the patch for 7 is now pretty small.

AttachmentSizeStatusTest resultOperations
761990-search-post-40-D7.patch859 bytesIgnored: Check issue status.NoneNone

#42

Version:7.x-1.x-dev» 6.x-1.x-dev

committed to 7.x

#43

Should the variable be 'apachesolr_post_threshold' instead of 'apachesolr_search_post_threshold'? Seems like most of the server level config is namespaced under apachesolr_ not apachesolr_search_

#44

I guess patch to review is #28

#45

Well I was interpreting the name like "apachesolr: search post threshold", not "apachesolr search: post threshold"

#46

Here's a patch for 6.x-1.x rolled based on #39 with some minor cleanup.

AttachmentSizeStatusTest resultOperations
761990-46-D6.patch2.35 KBIgnored: Check issue status.NoneNone

#47

Status:needs review» fixed

#48

Status:fixed» closed (fixed)

Automatically closed -- issue fixed for 2 weeks with no activity.

#49

Version:6.x-1.x-dev» 6.x-2.x-dev
Status:closed (fixed)» needs review

Looks like this didn't make it into the 6.x-2.x branch. Here's the patch from #46 rolled against 2.x.

AttachmentSizeStatusTest resultOperations
apachesolr-761990-49.patch2.35 KBIgnored: Check issue status.NoneNone

#50

Status:needs review» fixed

Fixed thanks!

#51

Status:fixed» closed (fixed)

Automatically closed -- issue fixed for 2 weeks with no activity.

nobody click here