Content types that are disabled should also be removed from the search index, otherwise nodes of that type still show up in search results, even if you cannot specifically search for that content type.
A solution is to remove from the search index these nodes using the hook_update_index, sample code here:
http://drupal.org/node/63028#comment-259315
http://drupal.org/node/84955#comment-162473
Although far from ideal (performance issues and skewed word count), it is still better than nothing. Maybe a checkbox should control if this gets applied or not.
Here is a possible implementation, just add this function to search_config.module
function search_config_update_index() {
if (function_exists('search_wipe')) {
$remove = variable_get('search_config_disable_type', array());
foreach ($remove as $type => $value) {
if ($remove[$type]) {
$cnt = 0;
$result = db_query("SELECT nid FROM {node} WHERE type = '$type'");
while ($nid = db_result($result)) {
search_wipe($nid, 'node');
$cnt++;
}
// watchdog('search config', "Removed $cnt nodes of type $type from the search index.");
}
}
}
}
A proper fix is planned only for Drupal 7:
http://drupal.org/node/111744
| Comment | File | Size | Author |
|---|---|---|---|
| #6 | search_index.patch | 5 KB | NaX |
Comments
Comment #1
NaX commentedI have also just come across this method of controlling what gets indexed.
Here is my take on this. The SQL only retrieves nodes that have been indexed.
I don’t think the watchdog message is necessary, but I do think this should be managed from a setting, it would just give admins more control.
I also don’t think the search_config module should wait for core to implement this, I consider this to be a critical feature.
Comment #2
canen commentedComment #3
canen commentedI'm about to implement this and I'm looking for some feedback on where exactly I should but the setting.
1 is easier but can cause some issues where an admin only wants to remove the node type from the display but not the search index. 2 is more flexible but means that if you do want to remove a content type from the search index and search form at the same time it takes one extra step.
I'm leaning towards 2 at the moment. Any preference?
Comment #4
NaX commentedI vote for option 2, if possible with a tick box under the index node settings called something like "Same as Search Form Node Types". That could reduce the extra admin you were referring to.
To avoid confusion the 2 different node type settings need to be clearly labeled. My suggestions are “Search form Node Types” and “Search index Node types”.
There is one thing we need to take into consideration here. If a node type is set to not be indexed then it should also be removed from the form.
Comment #5
canen commentedThat was the idea. Will see how it turns out.
I'm not sure where you are referring to here.
Comment #6
NaX commentedHere is a patch of what I was thinking.
Comment #7
canen commentedThis looks good, thanks. I'll have more in depth look when I get home. Which version of the module is this patch against?
Comment #8
NaX commented5.x-1.2
Comment #9
canen commentedNaX,
I've committed your version of the patch in http://drupal.org/cvs?commit=81182. Thanks a lot. I'm sure there is more to be done. The approach I was taking was different (altering the content type form) but either way works for now.
I really would like some testing done on this to see if any issues pop up. The installation of Drupal I have here is really minimal (still recovering from a HD crash) so no content to speak of for testing. You would be surprised to know that I rarely use this module so it's good to see other people using it and making contributions :).
The version I'm developing against is the 5.x-1.x-dev version, there should be a package soon, if not you can use the CVS version. If after testing everything is OK I'll update the documentation and make a new release.
Thanks again.
Comment #10
NaX commentedI am currently testing this patch on a site that has a lot of nodes.
I first tested the patch by running the cron manually with the devel module showing redirects and SQL queries.
It all seems to be fine, but I will keep monitoring things.
The one thing that mariuss said I don’t understand.
What is the problem with skewed word count? The performance issue is not an issue anymore with the modified SQL.
Comment #11
mariuss commentedBy performance issues I meant the fact that some content is first indexed and then this index data is removed. Extra work that is not needed. Ideally nodes that are not supposed to be indexed should not be indexed in the first place. It seems that the actual performance hit is negligible, so I guess this is fine.
By skewed word count I mean the fact that index data is removed but the word count in the search_total table is not updated accordingly. I could be wrong on this one though. As an example, let's assume that the word "forest" shows up once in two nodes, in a node that should be searchable and in another node that we configured not to show up in searches (two different node types). The search_total table will probably show a total of 2, but the actual word count should be 1.
Comment #12
canen commentedComment #13
(not verified) commentedAutomatically closed -- issue fixed for two weeks with no activity.