Content types that are disabled should also be removed from the search index, otherwise nodes of that type still show up in search results, even if you cannot specifically search for that content type.

A solution is to remove from the search index these nodes using the hook_update_index, sample code here:
http://drupal.org/node/63028#comment-259315
http://drupal.org/node/84955#comment-162473

Although far from ideal (performance issues and skewed word count), it is still better than nothing. Maybe a checkbox should control if this gets applied or not.

Here is a possible implementation, just add this function to search_config.module

function search_config_update_index() {
  if (function_exists('search_wipe')) {
    $remove = variable_get('search_config_disable_type', array());
    foreach ($remove as $type => $value) {
      if ($remove[$type]) {
        $cnt = 0;
        $result = db_query("SELECT nid FROM {node} WHERE type = '$type'");
        while ($nid = db_result($result)) {
          search_wipe($nid, 'node');
          $cnt++;
        }
//        watchdog('search config', "Removed $cnt nodes of type $type from the search index.");
      }
    }
  }
}

A proper fix is planned only for Drupal 7:
http://drupal.org/node/111744

CommentFileSizeAuthor
#6 search_index.patch5 KBNaX

Comments

NaX’s picture

Priority: Normal » Critical
Status: Active » Needs review

I have also just come across this method of controlling what gets indexed.
Here is my take on this. The SQL only retrieves nodes that have been indexed.
I don’t think the watchdog message is necessary, but I do think this should be managed from a setting, it would just give admins more control.

I also don’t think the search_config module should wait for core to implement this, I consider this to be a critical feature.

/**
 * Implementation of hook_update_index()
 */
function search_config_update_index() {
  // This hook is only called when the search module is enabled
  // The function_exists check can be removed, but there is no harm in leaving it in
  if (function_exists('search_wipe')) {
    $remove = variable_get('search_config_disable_type', array());
    foreach ($remove as $type => $value) {
      if ($remove[$type]) {
        $result = db_query("SELECT n.nid FROM {node} n INNER JOIN {search_dataset} s ON n.nid=s.sid WHERE n.type = '%s'", $type);
        while ($nid = db_result($result)) {
          search_wipe($nid, 'node');
        }
      }
    }
  }
}
canen’s picture

Assigned: Unassigned » canen
canen’s picture

I'm about to implement this and I'm looking for some feedback on where exactly I should but the setting.

  1. Have a checkbox on the search settings page to automatically remove content types from the search index once they are removed from the display.
  2. Make the settings independent of the search and place in on each content type page.
  3. 1 is easier but can cause some issues where an admin only wants to remove the node type from the display but not the search index. 2 is more flexible but means that if you do want to remove a content type from the search index and search form at the same time it takes one extra step.

    I'm leaning towards 2 at the moment. Any preference?

NaX’s picture

I vote for option 2, if possible with a tick box under the index node settings called something like "Same as Search Form Node Types". That could reduce the extra admin you were referring to.

To avoid confusion the 2 different node type settings need to be clearly labeled. My suggestions are “Search form Node Types” and “Search index Node types”.

There is one thing we need to take into consideration here. If a node type is set to not be indexed then it should also be removed from the form.

canen’s picture

Priority: Critical » Normal

There is one thing we need to take into consideration here. If a node type is set to not be indexed then it should also be removed from the form.

That was the idea. Will see how it turns out.

...settings called something like "Same as Search Form Node Types...".

I'm not sure where you are referring to here.

NaX’s picture

StatusFileSize
new5 KB

Here is a patch of what I was thinking.

canen’s picture

This looks good, thanks. I'll have more in depth look when I get home. Which version of the module is this patch against?

NaX’s picture

5.x-1.2

canen’s picture

Version: 5.x-1.2 » 5.x-1.x-dev

NaX,

I've committed your version of the patch in http://drupal.org/cvs?commit=81182. Thanks a lot. I'm sure there is more to be done. The approach I was taking was different (altering the content type form) but either way works for now.

I really would like some testing done on this to see if any issues pop up. The installation of Drupal I have here is really minimal (still recovering from a HD crash) so no content to speak of for testing. You would be surprised to know that I rarely use this module so it's good to see other people using it and making contributions :).

The version I'm developing against is the 5.x-1.x-dev version, there should be a package soon, if not you can use the CVS version. If after testing everything is OK I'll update the documentation and make a new release.

Thanks again.

NaX’s picture

I am currently testing this patch on a site that has a lot of nodes.
I first tested the patch by running the cron manually with the devel module showing redirects and SQL queries.
It all seems to be fine, but I will keep monitoring things.

The one thing that mariuss said I don’t understand.

Although far from ideal (performance issues and skewed word count)

What is the problem with skewed word count? The performance issue is not an issue anymore with the modified SQL.

mariuss’s picture

By performance issues I meant the fact that some content is first indexed and then this index data is removed. Extra work that is not needed. Ideally nodes that are not supposed to be indexed should not be indexed in the first place. It seems that the actual performance hit is negligible, so I guess this is fine.

By skewed word count I mean the fact that index data is removed but the word count in the search_total table is not updated accordingly. I could be wrong on this one though. As an example, let's assume that the word "forest" shows up once in two nodes, in a node that should be searchable and in another node that we configured not to show up in searches (two different node types). The search_total table will probably show a total of 2, but the actual word count should be 1.

canen’s picture

Status: Needs review » Fixed
Anonymous’s picture

Status: Fixed » Closed (fixed)

Automatically closed -- issue fixed for two weeks with no activity.