Exclude node types from search index

asimov - January 22, 2007 - 18:05
Project:Drupal
Version:7.x-dev
Component:search.module
Category:feature request
Priority:normal
Assigned:Unassigned
Status:patch (code needs work)
Description

The Drupal search module currently includes all node types in its search results. A feature request to add capabilities to exclude some node types. Please see http://drupal.org/node/82348#comment-161699 for more info.

#1

pluess - June 8, 2007 - 14:01
Title:Feature to exclude node types from search results» There's more than just excluding node types

I think there's more than just excluding node types.

The real work gets donne in the node.module at node_update_index(). It simply takes every node, renders it and indexes the content. Via hook_nodeapi modules have the chance to provide extra information.

I see the following possibilities:

  1. Add a possibility to exclude certain node types from beeing indexed to the search.module and make the node.module skipping nodes of this types.
  2. Add a hook_index_view. The node can return whatever it want's to get indexed (maybe NULL). If isnt' availabe hook_view is called (as it is to day). This requires node_update_index in node.module to be patched.

Of course the two possibilities can be combined.

What do you think?

#2

asimov - June 9, 2007 - 08:08

I agree it's probably best to have the option to prevent certain nodes from being indexed in the first place rather than just filtering them, although a combination is better since it also allows for exclusion of results that have previously been indexed.

I imagine the indexing prevention patch would apply to node_update_index() in the node.module, and the addition of filtering capabilities would need to be made within the search module inside search_index(), with a check for an allowed $type against the new hook_index_view that you describe?

#3

yched - June 15, 2007 - 23:04
Title:There's more than just excluding node types» Exclude node types from search index
Version:4.7.5» 6.x-dev
Status:active» patch (code needs review)

(New features go only in the dev release.)

Excluding content types is an important feature. Now that content types are so easy to add, and esp. with modules such as 'node as block', we must have a way to prevent 'internal', 'administrative' or 'cosmetic' content types from being indexed.

Attached patch presents a select box on the search settings page letting the admin choose the indexed content types.
Note that indexing is opt-out (new content types are indexed by default), so we store a list of excluded content types. However, having the user select the content types he wants to exclude makes a kind of convoluted brainer ('negative choice'), so the UI is opt-in style ('positive choice').

@plueless : a finer grained ('per field') indexing would be sweet as well, but let's take small steps :-). What that would require actually is a 'search index' rendering style (see http://drupal.org/node/144608 and http://www.lullabot.com/blog/trouble_nodes) - VERY interesting, but probably not for now...

#4

yched - June 15, 2007 - 23:05

aaaand the patch.

AttachmentSize
search_exclude_content_types.patch7.56 KB

#5

yched - June 16, 2007 - 13:01

http://drupal.org/node/63028 was marked a duplicate

#6

yched - June 16, 2007 - 13:02

#7

asimov - June 17, 2007 - 09:47

I applied the patch to the latest version of HEAD, and tested story content in a story node and page content in a page node by selectively setting the search index to index one or the other. I also tried searching for the content in the excluded node type. It works great, thanks.

#8

Gábor Hojtsy - June 17, 2007 - 15:59

The feature looks sensible...

- non_indexed_content_types should be 'search_excluded_content_types' or 'search_excluded_node_types', see the variable name prefix, and shorter naming
- content_indexing_settings should be shorter IMHO, why do we need the 'settings' there? this is a settings form, we know
- why do you need to add the validation with altering?
- the help text on the setting should note that existing search index data is wiped for excluded types

#9

yched - June 17, 2007 - 17:15

Thanks for the review, Gabor.

non_indexed_content_types should be 'search_excluded_content_types' or 'search_excluded_node_types', see the variable name prefix, and shorter naming

True :-) done.

content_indexing_settings should be shorter IMHO, why do we need the 'settings' there? this is a settings form, we know

That was to stay inline with search_module's $form['indexing_settings'] fieldset. But sure, why not.

why do you need to add the validation with altering?
because the 'per module settings' forms are array_merged (search.module line 257), not array_merge_recursive, so if I add the #validate directly in node_search($op = 'admin'), either the node specific or search generic #validate gets overwritten.

the help text on the setting should note that existing search index data is wiped for excluded types
True - done.

Updated patch attached.

AttachmentSize
search_exclude_content_types_02.patch7.99 KB

#10

Gábor Hojtsy - June 17, 2007 - 17:20

validation: why not fix the original problem then? or add a field #validate key to validate that value?

#11

yched - June 17, 2007 - 17:27

Actually, I think the whole hook_search('admin') op should be ditched, and handled by hook_form_alter (I think it's the general trend, isn't it ?)
The advanced settings on the search form are added that way, so what's the point ?

#12

yched - June 17, 2007 - 17:56

Updated patch does the above (move hook_search('admin') to _form_alter), plus changes the selection widget for indexed content types from multiple selectbox to checkboxes (less error prone)

Please also note that the 'reset to defaults' button is problematic for this form, does not play well with the code used to wipe the search index - this is already the case without this patch, though...

AttachmentSize
search_exclude_content_types_03.patch11.47 KB

#13

Gábor Hojtsy - June 17, 2007 - 18:14

After you reset to defaults, the search index should be gradually built up again, no?

#14

yched - June 17, 2007 - 18:55

when you hit 'reset to defaults' without changing any of the values in the form, the current search_admin_settings_validate() and my new node_search_admin_settings_validate() functions do not 'catch' the fact that the values _will_ change, and thus do not take the relevant action (search_wipe() ).

Not sure of the best way to fix that yet. As I stated above, this bug already exists in D5 and is not strictly related to this patch.

#15

yched - June 17, 2007 - 20:02

Updated patch, fixes the issues with 'Reset to defaults' by :
- storing the old values of the relevant variables in validate
- letting system_theme_settings_submit() actually update the variables (be it new form values or back to defaults)
- only then take action in our own submit handlers, based on the comparison between the old values and the newly updated ones.

This patch is now 2-in-1 : new 'exclude content types form indexing' feature + fix existing problem with 'Back to defaults'. I can roll two different patches if you prefer.

AttachmentSize
search_exclude_content_types_04.patch14.05 KB

#16

yched - June 18, 2007 - 00:26

Small fix : $form_state['storage'] instead of $form_state['store']

AttachmentSize
search_exclude_content_types_05.patch14.3 KB

#17

yched - June 22, 2007 - 20:56

Rerolled after deletion API patch.
Just retested, everything works OK.

AttachmentSize
search_exclude_content_types_06.patch13.91 KB

#18

pwolanin - June 27, 2007 - 01:54

subscribing - patch still applies with offset.

#19

Bevan - June 27, 2007 - 06:58

Cool! Just What I'm looking for (subscribing)

#20

pwolanin - June 27, 2007 - 20:37
Status:patch (code needs review)» patch (code needs work)

Going through this in detail makes me feel as though this should be a property of each node type (i.e. a table column), perhaps, rather than stuffed into an array setting? In which case, each node type's add/edit form could have a checkbox, rather than having one large set of checkboxes on the search admin form.

In any case, the patch needs to add code to node_node_type() to handle changes in node type names.

#21

yched - June 28, 2007 - 02:24
Status:patch (code needs work)» patch (code needs review)

One checkbox on each content type's settings page vs. checkboxes on search admin settings :

I guess this could be discussed. Changing the 'excluded types' settings might require rebuilding the index, so I felt it more adequate to leave this grouped with the other settings that wight do so (currently only on the search settings form).

New patch - rerolled against latest D6, and reacts to hook_node_types('update')

AttachmentSize
search_exclude_content_types_08.patch15.61 KB

#22

phillamb168 - July 9, 2007 - 19:16

Hey there,

I've been thinking it'd be useful to also include a per-node opt-out option for searching. Have it be a checkbox flag similar to promote to front, and would (on cron) exclude any new/existing keywords under the node ID.

Would this be useful for anyone?

#23

inforeto - July 10, 2007 - 00:16

subscribing

#24

pwolanin - July 10, 2007 - 00:30
Version:6.x-dev» 5.x-dev

I think this is now for 7.x, but can be implemented to a reasonable degree in contrib in 6.x thanks to this patch: http://drupal.org/node/152493

#25

yched - July 10, 2007 - 00:37
Version:5.x-dev» 7.x-dev

True - but I think your finger slipped :-)

#26

asimov - July 10, 2007 - 03:22

A per-node-opt-out would be very handy, especially where you need to have the node in published status for certain users to access directly, but don't want others to find it when searching.

#27

NancyDru - July 22, 2007 - 17:27

I like this option being implemented. Is there some way there could also be a hook_search_filter such that I could control whether or not to show my content type?

My case: I have already implemented hook_update_index to exclude my content type, but I am not totally happy with that choice. I'd like certain users to be able to search and see this content (I do implement a permission to "access xyz"). So I'd like to set up a "filter" to check that access after the search has selected items, but before they are shown, so that I can exclude the results for users who don't have the correct permission.

#28

pwolanin - July 22, 2007 - 18:47

for 6.x you can use the node build_mode to accomplish this in a contrib module. In 5.x, I'm not sure it's possible.

#29

NancyDru - July 23, 2007 - 03:55

Oh, you didn't have to tell me that - now I HAVE to do it!

#30

jschneider - July 24, 2007 - 19:43

So is it my understanding this works for Drupal 6 and 7? and does not work on 5.1? If this is the case sorry for asking here but where does one obtain a copy of 6?

Thanks in advance!

#31

yched - July 24, 2007 - 21:32

You do not want drupal 6 - drupal 6 is under development, not ready for public use.

You *might* want to adapt this patch to the drupal 5 branch (should be quite straightforward), *but* then you'll be running a forked version of Drupal core, which is really not advised...

#32

WorldFallz - October 10, 2007 - 02:00

Does that mean this can't be implemented in 5.x as a contrib? Let me know so I don't waste my time trying, lol. Thanks.

#33

NancyDru - October 19, 2007 - 14:40

Node_build_mode probably could not be back-ported - certainly not easily.

That, however, doesn't mean you can't do this in 5.x. I added additional checks to hook_view (above and beyond menu access checks) to determine if the current user is allowed to see the node. It may not be "pretty" but it seems to work, but I want to do some more testing before I put this in my contributed module.

#34

WorldFallz - October 22, 2007 - 17:33

Actually-- seems someone has already done it!

Check out: http://drupal.org/project/search_block. So far from my testing it's correctly blocking items from being searched. It just doesn't seem to be changing the advanced search form which should be easily doable with an override. From what I can tell taking a quick look at the code, it's not attempting to do it. If I get it figured out for my site, i'll post a patch for it.

#35

Bevan - October 22, 2007 - 23:18

I've been using Views Fast Search very successfully to reach the same end.

#36

WorldFallz - October 23, 2007 - 18:55

Wow-- it never ceases to amaze me how much time i waste on things 1) either already implemented in a module or 2) already solved in a previous issue or forum thread. I"m not a lightweight searcher-- i typically spend hours and hours researching (both drupal.org and google) something before attempting to fix it myself. For whatever reason, I'm often not successful in finding the gems out there.

For anyone else needing this functionality now for the 5.x code the search_config module does ALL of this in a single module-- fixes the index and the adv search form. Brilliant.

Hopefully this will save someone else from wasting time on something that's already been done beautifully.

#37

NancyDru - October 24, 2007 - 00:20

Those both look like promising solutions for most people. However my use case is that I want these nodes to be searchable, but only by those who are authorized to do so. Since the indexing is not dynamic, it falls back to my view code to disallow it. Excluding on a user-by-user basis from the advanced search form is easy. The problem is that I'd also like the search results to not show anything either.

#38

WorldFallz - October 25, 2007 - 13:47

ahhh... interesting use case. Very valuable functionality too I would think. So in your case if a user doesn't have access to read those content types, they also shouldn't see them in search results. That is a bit different then what these modules do because in your case all the content types will still need to be indexed normally (for those that can search) but you'll have to throttle access control at the search or view stage based on permissions. What about a custom search module that uses an SQL WHERE clause to limit the results returned?

#39

NancyDru - October 25, 2007 - 14:48

Unfortunately "type" for search is just "node" for everything, so this would require a lot of JOINing to get the real type.

Right now (at least until D6), I think the most "efficient" is to use hook_view to deny access. I'm not crazy about that because too much stuff might still show up in the search results list.

#40

DanielTheViking - February 16, 2008 - 14:37

What is the status of these "needs" now that D6 is out ?
How to best achieve it under D6?
(I reckon D6 does not cater for all these variants out-of-the-box.)

How about role-based display? Does D6 core let ACL filter what is available to search?

#41

NancyDru - February 16, 2008 - 15:53

The needs are still there but I have not had a chance to see what D6 brings to the table. Unfortunately, node.module's (IMHO) incorrect access checking doesn't help any.

#42

dave.trainer - February 21, 2008 - 19:12

subscribing

#43

moshe weitzman - February 22, 2008 - 16:11

anyone up for a reroll?

#44

challa.kamal - April 2, 2008 - 10:05

Hey nice solution i think but not able to implement in my project . Please could you send the modified files rather than patch files. becoz iam not able to implement. please

Thanks inadvance

#45

Bevan - April 3, 2008 - 10:32

#46

flobruit - April 4, 2008 - 01:06

Updated the patch for Drupal 7.

Todo: re-index nodes after change in excluded content type.

AttachmentSize
search_exclude_node_types-111744-46.patch11.65 KB

#47

yched - April 4, 2008 - 13:52

Thx for re-rolling, flobruit.

IIRC, the "Todo: re-index nodes after change in excluded content type" part is already taken care of in node_search_admin_settings_submit(). Logic is : "Delete the indexed entries for newly opted-out content types, rebuild the whole index if there are newly opted-in content types"

#48

robertDouglass - April 9, 2008 - 07:05

#49

Rowanw - April 25, 2008 - 05:11
Status:patch (code needs review)» patch (code needs work)

I've created a patch with some better wording (in my opinion) for the settings form, screenshot attached.

Also, the submit buttons on the Search settings page come before the last two fieldsets, but I'm not sure how to address this.

AttachmentSize
search_exclude_node_types.patch11.65 KB
content-indexing.png16.67 KB
 
 

Drupal is a registered trademark of Dries Buytaert.