Hey all
I have a problem - the built-in search module is not indexing over half of my site's content.
It's skipping over thousands of nodes.
Can anyone help me understand how it determines what nodes need indexing? Or why it would skip over so many nodes?
Does it matter that my 2000+ nodes are "pages" and not "stories"?
I've cleared the index - dropped the related tables (incase there was left over data) - reloaded the modules...
Every time I re-index the site and run cron.php once (setting it to do 10 nodes at a time) - search comes back saying it's indexed over 80% of the site.
When I check the search_dataset table, I see that nodes up to node #4000 (for example) are indexed - then it skips from node #4000 to node #15000+ (my nodes are all numbered quite high from adding and removing content over the years).
Aside from scouring through the search.module code - I'm at a loss :(
Any helpful insight is always appreciated - thanks!!
Comments
A couple of things, I'd stay
A couple of things, I'd stay away from rebuilding the search index via mysql commands, instead just use the rebuild index button found at admin/settings/search. A number of drupal variables need to be updated, otherwise you can have an empty search data set, but drupal thinks its indexed everything already.
Now beyond that, I have a few questions:
1. Are any errors showing up in your system logs after cron runs? Sometimes servers have too low of a memory to handle large chucks of search data. On slower systems, the operations can time out as well.
2. I know this is a dumb question but I gotta ask, are nodes nid 4000-15000 visible, can you navigate to node/4358 for example, and see a full page with content?
3. Its generally a bad idea to drop system tables, can you verify that they were all properly recreated?
4. Do you have any custom modules that affect search results (usually via nodeapi $op = 'update index') ?
5. have you spot checked your search results, for example, leave a strange word like "pumpernickle" in a new piece of content, publish the content, run cron.php and see if pumpernickle returns any search results? (this is how I test search at our companies' website).
--
"I'm not concerned about all hell breaking loose, but that a PART of hell will break loose... it'll be much harder to detect." - George Carlin
--
Personal: http://www.nicklewis.org
Work: http://www.onnetworks.com
--
"I'm not concerned about all hell breaking loose, but that a PART of hell will break loose... it'll be much harder to detect." - George Carlin
--
Personal: http://www.nicklewis.org
Work: http://www.zivtech.com
Thanks for the quick reply
Thanks for the quick reply Nick.
Here is what my troubleshooting had come up with:
1. No errors are reported in the logs.
2. No worries - when troubleshooting, there's no dumb question. I double checked and the missing nodes are definitely visible.
3. Good advice, I restored from a backup I made before I started poking around in the database to be sure it wasn't something I did.
4. I don't have any custom modules that affect search enabled. - - not sure what this is though: (usually via nodeapi $op = 'update index') ?
5. This is interesting - in attempting to spot check my results, I did noticed new nodes are not getting picked up by the search system - it constantly says 100% indexed (even though 80% of it actually wasn't).
--
wOOge | axonz.com
--
wOOge | adrianjean.ca
So you've rebuilt the search
So you've rebuilt the search index using the button at admin/settings/search, correct? That will wipe out, and rebuild everything correctly (in theory at least). If the problem still exists after that, than something truly bizarre is going on.
--
"I'm not concerned about all hell breaking loose, but that a PART of hell will break loose... it'll be much harder to detect." - George Carlin
--
Personal: http://www.nicklewis.org
Work: http://www.onnetworks.com
--
"I'm not concerned about all hell breaking loose, but that a PART of hell will break loose... it'll be much harder to detect." - George Carlin
--
Personal: http://www.nicklewis.org
Work: http://www.zivtech.com
Yup, rebuilt it using the
Yup, rebuilt it using the "re-index" button/feature of the admin/settings/search pages. Trouble is it never indexes the entire site. :( ... hmm I was hoping there'd be an "easy" answer.
--
wOOge | axonz.com
--
wOOge | adrianjean.ca
Does eanyone know how
Does anyone know how search.module determines whether or not to index a node?
Is there some block of php logic in the search.module that I can comment out/modify to allow it to at least index once properly?
My site does not use permissions of any kind - just Taxonomy and Roles - so technically all the site's content is viewable to anyone.
--
wOOge | axonz.com
--
wOOge | adrianjean.ca
Well to be exact
Well to be exact, its not the search.module who does. What it does it to call
module_invoke($module, 'update_index');.This invokes
node_update_index()in node.module. node.module runs an sql query, and this bug results from and error in this query. I have posted what fails, and how to fix it in http://drupal.org/node/42277#comment-846833.