Major search issues (cck fields, search index) in need of help
Having two major search issues.
1) No CCK fields are being indexed by search at all (searching in the database revealed the data isn't in the index) only title and body are being indexed.
2) Some content which is definitely in the search index isn't being returned in search results on the site.
Here's the lowdown on the setup. It was originally a D5 site way back, which was upgraded to Acquia 6, and has since been updated to the latest Acquia builds as they are released. For search I am only using the core search module. No solr or anything else has ever been installed on this site.
I have re-indexed the site twice now and that has not resolved the issue. I have even gone so far as to truncate the database tables for search also with no result. Cron seems to be running fine, and nothing in the error logs. The site has about 3000 nodes.
Using multisite I set up a fresh install on the same code base and after creating a few content types and some content search worked just fine. So no problems on a clean install.
Any advise anyone could give would be greatly appreciated. Search is an essential part of this internal portal site.

=-=
my advice is to use SOLR.
As much as I think SOLR is
As much as I think SOLR is great. I think it's a little overkill considering core search will accomplish what I need it to, and I don't want to spend the time learning how to set up SOLR on our server.
---------------------------------------------------------------------
"I am a very model of a modern major general"
http://www.arvinsingla.com
http://www.wiiliketopodcast.com
=-=
my research points to the idea that core search doesn't search cck fields. Thus, at this time core search doesn't do what you want it to do.Edited by: VM; Corrected by Robert DouglassCan you cite the source of that research?
I believe core is supposed to search on everything that gets visibly rendered in a node.
@tkamen: Do any of your nodes in this site have PHP in them? Or HTTP redirects?
- Robert Douglass
-----
my Drupal book | Twitter
Some progress
Hi Robert. Some of my content uses computed fields. But that's about it. No HTTP redirects or anything like that. However I believe I have made some troubleshooting progress. I copied the site from my Ubuntu Dev server to my Mac running MAMP and after re-indexing all the content search worked perfectly for all my fields. So I have a feeling it may have something to do with the server itself. Since discovering this I have upgraded all my ubuntu packages and even ubuntu itself from 8.10 to 9.04 but still having the same problem. Are there any known issues with higher versions of Apache, PHP, MySQL?
Here are the version differences from the status page for the working and non working servers.
Ubuntu Dev (Search not working)
Apache: Apache/2.2.11 (Ubuntu) DAV/2 SVN/1.5.4 PHP/5.2.6-3ubuntu4.1 with Suhosin-Patch mod_ssl/2.2.11 OpenSSL/0.9.8g
MySQL: 5.0.75
PHP: 5.2.6-3ubuntu4.1
MAMP (Search Working)
Apache: Apache/2.0.59 (Unix) PHP/5.2.3 DAV/2
MySQL: 5.0.41
PHP: 5.2.3
Any ideas would be greatly appreciated! Thanks
---------------------------------------------------------------------
"I am a very model of a modern major general"
http://www.arvinsingla.com
http://www.wiiliketopodcast.com
Some more progress
Ok so after a great deal of time and head banging I have made some small headway.
Somehow as far as I can see part of the culprit is "Content Permissions" module for cck. I have run multiple tests and as soon as this module is disabled and the site is re-indexed the cck fields become indexed properly and I can search them on the site.
I have no clue how this tiny little module which seemingly wouldn't do anything to cause issues with search is messing it up. Has anyone experienced anything like this? or is able to recreate it?
---------------------------------------------------------------------
"I am a very model of a modern major general"
http://www.arvinsingla.com
http://www.wiiliketopodcast.com
_
Once you enable the content permissions module, the default is to prevent viewing the fields on the site-- you have to go to admin/user/permissions and check the view permissions for each field you wish to have viewable by a particular role. My guess is the fields were not viewable and therefor either not indexed or not shown in the results.
_
Don't be a Help Vampire - read and abide the forum guidelines.
If you find my assistance useful, please pay it forward to your fellow drupalers.
Yes that more than makes
Yes that more than makes sense, so here's the million dollar question. Will search index fields which are set to view only for authenticated users? Because that is the way I have the permissions currently set.
If it doesn't do this than the logic is severely broken. All content should be indexed and the permissions should determine both what get's shown and what results will show up in searches depending on the user.
---------------------------------------------------------------------
"I am a very model of a modern major general"
http://www.arvinsingla.com
http://www.wiiliketopodcast.com
_
The only way to be sure would be to test it out. And I agree, if it does't work that way it probably should be reported to the issue queue as a bug.
_
Don't be a Help Vampire - read and abide the forum guidelines.
If you find my assistance useful, please pay it forward to your fellow drupalers.
_
Yup looks like that's the way it works. After changing the view permissions from authenticated user to anonymous users the items re-indexed no problem.
---------------------------------------------------------------------
"I am a very model of a modern major general"
http://www.arvinsingla.com
http://www.wiiliketopodcast.com
The problem is that CCK
The problem is that CCK doesn't do anything special about indexing by itself. It simply relies on the way a node is indexed by default:
...$text = '<h1>'. check_plain($node->title) .'</h1>'. $node->body;
...
// Update index
search_index($node->nid, 'node', $text);
As you can see, node's content (title+body) is generated before being passed to the indexing engine.
As, by default, CCK fields' content is added to body's content, the fields will be indexed with the body (except labels which are automatically hidden)
That means that, as indexing is done as an anonymous user, only CCK fields viewable to anymous users are indexed.
A quick fix
To enable CCK fields indexing even for fields not accessible to anonymous users,
in /sites/all/modules/cck/content.module
inside function content_field
inside
$element = array('#type' => 'content_field',
'#title' => check_plain(t($field['widget']['label'])),
'#field_name' => $field['field_name'],
'#access' => $formatter_name != 'hidden' && content_access('view', $field),
'#label_display' => $label_display,
'#node' => $node,
'#teaser' => $teaser,
'#page' => $page,
'#context' => $context,
'#single' => $single,
'items' => array(),
);
line 768, replace:
'#access' => $formatter_name != 'hidden' && content_access('view', $field),by
'#access' => ($formatter_name != 'hidden') && (($context == NODE_BUILD_SEARCH_INDEX) || content_access('view', $field)),Can someone build a patch with this fix?