Searches fail for most terms in the index

slnm - October 20, 2009 - 19:42
Project:Apache Solr Search Integration
Version:6.x-1.0-rc3
Component:Code
Category:bug report
Priority:critical
Assigned:Unassigned
Status:active
Description

I've had a Drupal site running apachesolr_search 6.x-1.0-rc2 that was working fine for a while. Now, most searches fail to yield any results. If I search Solr directly via its admin interface, I get results as I would expect. I do notice that I get lots of errors that look like this:

simplexml_load_string() [<a href='function.simplexml-load-string'>function.simplexml-load-string</a>]: ^ in /web/drupal/sites/MYSITE/modules/apachesolr/Drupal_Apache_Solr_Service.php on line 97.

I also notice that in admin/reports/apachesolr it says there are 2450 documents in the index but then the number of terms in index is blank and it says there's no data on indexed fields.

Here's what I've tried:

  1. Upgraded to rc3. Yes, I ran update.php.
  2. Used the devel module to remove and install apachesolr and apachesolr_search.
  3. Cleared the cache. No, I didn't think that would really do anything.
  4. Pointed apachesolr_search to a different Solr server. No luck but I didn't expect anything because searching Solr directly works.
  5. Cloned the application and all its data and modules. The cloned copy doesn't have a problem, interestingly enough.

I'm stumped. How can I debug this?

#1

slnm - October 20, 2009 - 19:58

Oh, and I've deleted the records, dropped the index, reloaded records and rebuilt the index several times.

#2

pwolanin - October 21, 2009 - 15:40

Odd - you need the parsed xml generally to analyze the search results.

Could be a PHP problem - do you have an opcode cache?

#3

slnm - October 22, 2009 - 14:37

There's no Opcode cache in the broken system.

I'd appreciate suggestions on how to debug this further.

#4

pwolanin - October 22, 2009 - 15:03

well this comment:

Cloned the application and all its data and modules. The cloned copy doesn't have a problem,

makes me suspicious of some unaccounted local change.

#5

slnm - October 23, 2009 - 14:23

This ends up not being an Apache Solr Search bug after all.

One of the senior developers in our organization tracked the bug down to a bug in core Drupal where drupal_http_request uses fread, which only reads 8K and terminates. The developer identified the following piece of code as problematic, even if he changed the value of 1024 to something much larger. The documented limit from fread is 8192 and this causes problems getting data from Solr that's larger than 8k.

Here's the problematic piece of drupal_http_request code:

while (!feof($fp) && $chunk = fread($fp, 1024)) {
   $response .= $chunk;
}

And, here's the replacement that solves our search problems:

while (!feof($fp)) {
   $response .= fgets($fp);
}

It's also interesting to note that we saw symptoms of this bug:
http://drupal.org/node/307879
which went away when we locally patched Drupal core.

I'll be filing a bug against core and pointing this bug to it so please leave this bug open for a few days.

#6

pwolanin - October 26, 2009 - 16:58

Thanks for the investigation - this is certainly an important finding.

#7

David_Rothstein - October 28, 2009 - 18:52

It looks like #617126: drupal_http_request() fails is related to this?

 
 

Drupal is a registered trademark of Dries Buytaert.