Exclude node types from search index
asimov - January 22, 2007 - 18:05
| Project: | Drupal |
| Version: | 7.x-dev |
| Component: | search.module |
| Category: | feature request |
| Priority: | normal |
| Assigned: | Unassigned |
| Status: | patch (code needs work) |
Description
The Drupal search module currently includes all node types in its search results. A feature request to add capabilities to exclude some node types. Please see http://drupal.org/node/82348#comment-161699 for more info.

#1
I think there's more than just excluding node types.
The real work gets donne in the node.module at node_update_index(). It simply takes every node, renders it and indexes the content. Via hook_nodeapi modules have the chance to provide extra information.
I see the following possibilities:
Of course the two possibilities can be combined.
What do you think?
#2
I agree it's probably best to have the option to prevent certain nodes from being indexed in the first place rather than just filtering them, although a combination is better since it also allows for exclusion of results that have previously been indexed.
I imagine the indexing prevention patch would apply to node_update_index() in the node.module, and the addition of filtering capabilities would need to be made within the search module inside search_index(), with a check for an allowed $type against the new hook_index_view that you describe?
#3
(New features go only in the dev release.)
Excluding content types is an important feature. Now that content types are so easy to add, and esp. with modules such as 'node as block', we must have a way to prevent 'internal', 'administrative' or 'cosmetic' content types from being indexed.
Attached patch presents a select box on the search settings page letting the admin choose the indexed content types.
Note that indexing is opt-out (new content types are indexed by default), so we store a list of excluded content types. However, having the user select the content types he wants to exclude makes a kind of convoluted brainer ('negative choice'), so the UI is opt-in style ('positive choice').
@plueless : a finer grained ('per field') indexing would be sweet as well, but let's take small steps :-). What that would require actually is a 'search index' rendering style (see http://drupal.org/node/144608 and http://www.lullabot.com/blog/trouble_nodes) - VERY interesting, but probably not for now...
#4
aaaand the patch.
#5
http://drupal.org/node/63028 was marked a duplicate
#6
and so was http://drupal.org/node/84962
#7
I applied the patch to the latest version of HEAD, and tested story content in a story node and page content in a page node by selectively setting the search index to index one or the other. I also tried searching for the content in the excluded node type. It works great, thanks.
#8
The feature looks sensible...
- non_indexed_content_types should be 'search_excluded_content_types' or 'search_excluded_node_types', see the variable name prefix, and shorter naming
- content_indexing_settings should be shorter IMHO, why do we need the 'settings' there? this is a settings form, we know
- why do you need to add the validation with altering?
- the help text on the setting should note that existing search index data is wiped for excluded types
#9
Thanks for the review, Gabor.
non_indexed_content_types should be 'search_excluded_content_types' or 'search_excluded_node_types', see the variable name prefix, and shorter naming
True :-) done.
content_indexing_settings should be shorter IMHO, why do we need the 'settings' there? this is a settings form, we know
That was to stay inline with search_module's $form['indexing_settings'] fieldset. But sure, why not.
why do you need to add the validation with altering?
because the 'per module settings' forms are array_merged (search.module line 257), not array_merge_recursive, so if I add the #validate directly in node_search($op = 'admin'), either the node specific or search generic #validate gets overwritten.
the help text on the setting should note that existing search index data is wiped for excluded types
True - done.
Updated patch attached.
#10
validation: why not fix the original problem then? or add a field #validate key to validate that value?
#11
Actually, I think the whole hook_search('admin') op should be ditched, and handled by hook_form_alter (I think it's the general trend, isn't it ?)
The advanced settings on the search form are added that way, so what's the point ?
#12
Updated patch does the above (move hook_search('admin') to _form_alter), plus changes the selection widget for indexed content types from multiple selectbox to checkboxes (less error prone)
Please also note that the 'reset to defaults' button is problematic for this form, does not play well with the code used to wipe the search index - this is already the case without this patch, though...
#13
After you reset to defaults, the search index should be gradually built up again, no?
#14
when you hit 'reset to defaults' without changing any of the values in the form, the current search_admin_settings_validate() and my new node_search_admin_settings_validate() functions do not 'catch' the fact that the values _will_ change, and thus do not take the relevant action (search_wipe() ).
Not sure of the best way to fix that yet. As I stated above, this bug already exists in D5 and is not strictly related to this patch.
#15
Updated patch, fixes the issues with 'Reset to defaults' by :
- storing the old values of the relevant variables in validate
- letting system_theme_settings_submit() actually update the variables (be it new form values or back to defaults)
- only then take action in our own submit handlers, based on the comparison between the old values and the newly updated ones.
This patch is now 2-in-1 : new 'exclude content types form indexing' feature + fix existing problem with 'Back to defaults'. I can roll two different patches if you prefer.
#16
Small fix : $form_state['storage'] instead of $form_state['store']
#17
Rerolled after deletion API patch.
Just retested, everything works OK.
#18
subscribing - patch still applies with offset.
#19
Cool! Just What I'm looking for (subscribing)
#20
Going through this in detail makes me feel as though this should be a property of each node type (i.e. a table column), perhaps, rather than stuffed into an array setting? In which case, each node type's add/edit form could have a checkbox, rather than having one large set of checkboxes on the search admin form.
In any case, the patch needs to add code to node_node_type() to handle changes in node type names.
#21
One checkbox on each content type's settings page vs. checkboxes on search admin settings :
I guess this could be discussed. Changing the 'excluded types' settings might require rebuilding the index, so I felt it more adequate to leave this grouped with the other settings that wight do so (currently only on the search settings form).
New patch - rerolled against latest D6, and reacts to hook_node_types('update')
#22
Hey there,
I've been thinking it'd be useful to also include a per-node opt-out option for searching. Have it be a checkbox flag similar to promote to front, and would (on cron) exclude any new/existing keywords under the node ID.
Would this be useful for anyone?
#23
subscribing
#24
I think this is now for 7.x, but can be implemented to a reasonable degree in contrib in 6.x thanks to this patch: http://drupal.org/node/152493
#25
True - but I think your finger slipped :-)
#26
A per-node-opt-out would be very handy, especially where you need to have the node in published status for certain users to access directly, but don't want others to find it when searching.
#27
I like this option being implemented. Is there some way there could also be a hook_search_filter such that I could control whether or not to show my content type?
My case: I have already implemented hook_update_index to exclude my content type, but I am not totally happy with that choice. I'd like certain users to be able to search and see this content (I do implement a permission to "access xyz"). So I'd like to set up a "filter" to check that access after the search has selected items, but before they are shown, so that I can exclude the results for users who don't have the correct permission.
#28
for 6.x you can use the node build_mode to accomplish this in a contrib module. In 5.x, I'm not sure it's possible.
#29
Oh, you didn't have to tell me that - now I HAVE to do it!
#30
So is it my understanding this works for Drupal 6 and 7? and does not work on 5.1? If this is the case sorry for asking here but where does one obtain a copy of 6?
Thanks in advance!
#31
You do not want drupal 6 - drupal 6 is under development, not ready for public use.
You *might* want to adapt this patch to the drupal 5 branch (should be quite straightforward), *but* then you'll be running a forked version of Drupal core, which is really not advised...
#32
Does that mean this can't be implemented in 5.x as a contrib? Let me know so I don't waste my time trying, lol. Thanks.
#33
Node_build_mode probably could not be back-ported - certainly not easily.
That, however, doesn't mean you can't do this in 5.x. I added additional checks to hook_view (above and beyond menu access checks) to determine if the current user is allowed to see the node. It may not be "pretty" but it seems to work, but I want to do some more testing before I put this in my contributed module.
#34
Actually-- seems someone has already done it!
Check out: http://drupal.org/project/search_block. So far from my testing it's correctly blocking items from being searched. It just doesn't seem to be changing the advanced search form which should be easily doable with an override. From what I can tell taking a quick look at the code, it's not attempting to do it. If I get it figured out for my site, i'll post a patch for it.
#35
I've been using Views Fast Search very successfully to reach the same end.
#36
Wow-- it never ceases to amaze me how much time i waste on things 1) either already implemented in a module or 2) already solved in a previous issue or forum thread. I"m not a lightweight searcher-- i typically spend hours and hours researching (both drupal.org and google) something before attempting to fix it myself. For whatever reason, I'm often not successful in finding the gems out there.
For anyone else needing this functionality now for the 5.x code the search_config module does ALL of this in a single module-- fixes the index and the adv search form. Brilliant.
Hopefully this will save someone else from wasting time on something that's already been done beautifully.
#37
Those both look like promising solutions for most people. However my use case is that I want these nodes to be searchable, but only by those who are authorized to do so. Since the indexing is not dynamic, it falls back to my view code to disallow it. Excluding on a user-by-user basis from the advanced search form is easy. The problem is that I'd also like the search results to not show anything either.
#38
ahhh... interesting use case. Very valuable functionality too I would think. So in your case if a user doesn't have access to read those content types, they also shouldn't see them in search results. That is a bit different then what these modules do because in your case all the content types will still need to be indexed normally (for those that can search) but you'll have to throttle access control at the search or view stage based on permissions. What about a custom search module that uses an SQL WHERE clause to limit the results returned?
#39
Unfortunately "type" for search is just "node" for everything, so this would require a lot of JOINing to get the real type.
Right now (at least until D6), I think the most "efficient" is to use hook_view to deny access. I'm not crazy about that because too much stuff might still show up in the search results list.
#40
What is the status of these "needs" now that D6 is out ?
How to best achieve it under D6?
(I reckon D6 does not cater for all these variants out-of-the-box.)
How about role-based display? Does D6 core let ACL filter what is available to search?
#41
The needs are still there but I have not had a chance to see what D6 brings to the table. Unfortunately, node.module's (IMHO) incorrect access checking doesn't help any.
#42
subscribing
#43
anyone up for a reroll?
#44
Hey nice solution i think but not able to implement in my project . Please could you send the modified files rather than patch files. becoz iam not able to implement. please
Thanks inadvance
#45
http://drupal.org/patch
#46
Updated the patch for Drupal 7.
Todo: re-index nodes after change in excluded content type.
#47
Thx for re-rolling, flobruit.
IIRC, the "Todo: re-index nodes after change in excluded content type" part is already taken care of in node_search_admin_settings_submit(). Logic is : "Delete the indexed entries for newly opted-out content types, rebuild the whole index if there are newly opted-in content types"
#48
Related patch: http://drupal.org/node/192089
#49
I've created a patch with some better wording (in my opinion) for the settings form, screenshot attached.
Also, the submit buttons on the Search settings page come before the last two fieldsets, but I'm not sure how to address this.