Posted by asimov on January 22, 2007 at 6:05pm
| Project: | Drupal core |
| Version: | 8.x-dev |
| Component: | search.module |
| Category: | feature request |
| Priority: | normal |
| Assigned: | Unassigned |
| Status: | active |
Issue Summary
The Drupal search module currently includes all node types in its search results. A feature request to add capabilities to exclude some node types. Please see http://drupal.org/node/82348#comment-161699 for more info.
Comments
#1
I think there's more than just excluding node types.
The real work gets donne in the node.module at node_update_index(). It simply takes every node, renders it and indexes the content. Via hook_nodeapi modules have the chance to provide extra information.
I see the following possibilities:
Of course the two possibilities can be combined.
What do you think?
#2
I agree it's probably best to have the option to prevent certain nodes from being indexed in the first place rather than just filtering them, although a combination is better since it also allows for exclusion of results that have previously been indexed.
I imagine the indexing prevention patch would apply to node_update_index() in the node.module, and the addition of filtering capabilities would need to be made within the search module inside search_index(), with a check for an allowed $type against the new hook_index_view that you describe?
#3
(New features go only in the dev release.)
Excluding content types is an important feature. Now that content types are so easy to add, and esp. with modules such as 'node as block', we must have a way to prevent 'internal', 'administrative' or 'cosmetic' content types from being indexed.
Attached patch presents a select box on the search settings page letting the admin choose the indexed content types.
Note that indexing is opt-out (new content types are indexed by default), so we store a list of excluded content types. However, having the user select the content types he wants to exclude makes a kind of convoluted brainer ('negative choice'), so the UI is opt-in style ('positive choice').
@plueless : a finer grained ('per field') indexing would be sweet as well, but let's take small steps :-). What that would require actually is a 'search index' rendering style (see http://drupal.org/node/144608 and http://www.lullabot.com/blog/trouble_nodes) - VERY interesting, but probably not for now...
#4
aaaand the patch.
#5
http://drupal.org/node/63028 was marked a duplicate
#6
and so was http://drupal.org/node/84962
#7
I applied the patch to the latest version of HEAD, and tested story content in a story node and page content in a page node by selectively setting the search index to index one or the other. I also tried searching for the content in the excluded node type. It works great, thanks.
#8
The feature looks sensible...
- non_indexed_content_types should be 'search_excluded_content_types' or 'search_excluded_node_types', see the variable name prefix, and shorter naming
- content_indexing_settings should be shorter IMHO, why do we need the 'settings' there? this is a settings form, we know
- why do you need to add the validation with altering?
- the help text on the setting should note that existing search index data is wiped for excluded types
#9
Thanks for the review, Gabor.
non_indexed_content_types should be 'search_excluded_content_types' or 'search_excluded_node_types', see the variable name prefix, and shorter naming
True :-) done.
content_indexing_settings should be shorter IMHO, why do we need the 'settings' there? this is a settings form, we know
That was to stay inline with search_module's $form['indexing_settings'] fieldset. But sure, why not.
why do you need to add the validation with altering?
because the 'per module settings' forms are array_merged (search.module line 257), not array_merge_recursive, so if I add the #validate directly in node_search($op = 'admin'), either the node specific or search generic #validate gets overwritten.
the help text on the setting should note that existing search index data is wiped for excluded types
True - done.
Updated patch attached.
#10
validation: why not fix the original problem then? or add a field #validate key to validate that value?
#11
Actually, I think the whole hook_search('admin') op should be ditched, and handled by hook_form_alter (I think it's the general trend, isn't it ?)
The advanced settings on the search form are added that way, so what's the point ?
#12
Updated patch does the above (move hook_search('admin') to _form_alter), plus changes the selection widget for indexed content types from multiple selectbox to checkboxes (less error prone)
Please also note that the 'reset to defaults' button is problematic for this form, does not play well with the code used to wipe the search index - this is already the case without this patch, though...
#13
After you reset to defaults, the search index should be gradually built up again, no?
#14
when you hit 'reset to defaults' without changing any of the values in the form, the current search_admin_settings_validate() and my new node_search_admin_settings_validate() functions do not 'catch' the fact that the values _will_ change, and thus do not take the relevant action (search_wipe() ).
Not sure of the best way to fix that yet. As I stated above, this bug already exists in D5 and is not strictly related to this patch.
#15
Updated patch, fixes the issues with 'Reset to defaults' by :
- storing the old values of the relevant variables in validate
- letting system_theme_settings_submit() actually update the variables (be it new form values or back to defaults)
- only then take action in our own submit handlers, based on the comparison between the old values and the newly updated ones.
This patch is now 2-in-1 : new 'exclude content types form indexing' feature + fix existing problem with 'Back to defaults'. I can roll two different patches if you prefer.
#16
Small fix : $form_state['storage'] instead of $form_state['store']
#17
Rerolled after deletion API patch.
Just retested, everything works OK.
#18
subscribing - patch still applies with offset.
#19
Cool! Just What I'm looking for (subscribing)
#20
Going through this in detail makes me feel as though this should be a property of each node type (i.e. a table column), perhaps, rather than stuffed into an array setting? In which case, each node type's add/edit form could have a checkbox, rather than having one large set of checkboxes on the search admin form.
In any case, the patch needs to add code to node_node_type() to handle changes in node type names.
#21
One checkbox on each content type's settings page vs. checkboxes on search admin settings :
I guess this could be discussed. Changing the 'excluded types' settings might require rebuilding the index, so I felt it more adequate to leave this grouped with the other settings that wight do so (currently only on the search settings form).
New patch - rerolled against latest D6, and reacts to hook_node_types('update')
#22
Hey there,
I've been thinking it'd be useful to also include a per-node opt-out option for searching. Have it be a checkbox flag similar to promote to front, and would (on cron) exclude any new/existing keywords under the node ID.
Would this be useful for anyone?
#23
subscribing
#24
I think this is now for 7.x, but can be implemented to a reasonable degree in contrib in 6.x thanks to this patch: http://drupal.org/node/152493
#25
True - but I think your finger slipped :-)
#26
A per-node-opt-out would be very handy, especially where you need to have the node in published status for certain users to access directly, but don't want others to find it when searching.
#27
I like this option being implemented. Is there some way there could also be a hook_search_filter such that I could control whether or not to show my content type?
My case: I have already implemented hook_update_index to exclude my content type, but I am not totally happy with that choice. I'd like certain users to be able to search and see this content (I do implement a permission to "access xyz"). So I'd like to set up a "filter" to check that access after the search has selected items, but before they are shown, so that I can exclude the results for users who don't have the correct permission.
#28
for 6.x you can use the node build_mode to accomplish this in a contrib module. In 5.x, I'm not sure it's possible.
#29
Oh, you didn't have to tell me that - now I HAVE to do it!
#30
So is it my understanding this works for Drupal 6 and 7? and does not work on 5.1? If this is the case sorry for asking here but where does one obtain a copy of 6?
Thanks in advance!
#31
You do not want drupal 6 - drupal 6 is under development, not ready for public use.
You *might* want to adapt this patch to the drupal 5 branch (should be quite straightforward), *but* then you'll be running a forked version of Drupal core, which is really not advised...
#32
Does that mean this can't be implemented in 5.x as a contrib? Let me know so I don't waste my time trying, lol. Thanks.
#33
Node_build_mode probably could not be back-ported - certainly not easily.
That, however, doesn't mean you can't do this in 5.x. I added additional checks to hook_view (above and beyond menu access checks) to determine if the current user is allowed to see the node. It may not be "pretty" but it seems to work, but I want to do some more testing before I put this in my contributed module.
#34
Actually-- seems someone has already done it!
Check out: http://drupal.org/project/search_block. So far from my testing it's correctly blocking items from being searched. It just doesn't seem to be changing the advanced search form which should be easily doable with an override. From what I can tell taking a quick look at the code, it's not attempting to do it. If I get it figured out for my site, i'll post a patch for it.
#35
I've been using Views Fast Search very successfully to reach the same end.
#36
Wow-- it never ceases to amaze me how much time i waste on things 1) either already implemented in a module or 2) already solved in a previous issue or forum thread. I"m not a lightweight searcher-- i typically spend hours and hours researching (both drupal.org and google) something before attempting to fix it myself. For whatever reason, I'm often not successful in finding the gems out there.
For anyone else needing this functionality now for the 5.x code the search_config module does ALL of this in a single module-- fixes the index and the adv search form. Brilliant.
Hopefully this will save someone else from wasting time on something that's already been done beautifully.
#37
Those both look like promising solutions for most people. However my use case is that I want these nodes to be searchable, but only by those who are authorized to do so. Since the indexing is not dynamic, it falls back to my view code to disallow it. Excluding on a user-by-user basis from the advanced search form is easy. The problem is that I'd also like the search results to not show anything either.
#38
ahhh... interesting use case. Very valuable functionality too I would think. So in your case if a user doesn't have access to read those content types, they also shouldn't see them in search results. That is a bit different then what these modules do because in your case all the content types will still need to be indexed normally (for those that can search) but you'll have to throttle access control at the search or view stage based on permissions. What about a custom search module that uses an SQL WHERE clause to limit the results returned?
#39
Unfortunately "type" for search is just "node" for everything, so this would require a lot of JOINing to get the real type.
Right now (at least until D6), I think the most "efficient" is to use hook_view to deny access. I'm not crazy about that because too much stuff might still show up in the search results list.
#40
What is the status of these "needs" now that D6 is out ?
How to best achieve it under D6?
(I reckon D6 does not cater for all these variants out-of-the-box.)
How about role-based display? Does D6 core let ACL filter what is available to search?
#41
The needs are still there but I have not had a chance to see what D6 brings to the table. Unfortunately, node.module's (IMHO) incorrect access checking doesn't help any.
#42
subscribing
#43
anyone up for a reroll?
#44
Hey nice solution i think but not able to implement in my project . Please could you send the modified files rather than patch files. becoz iam not able to implement. please
Thanks inadvance
#45
http://drupal.org/patch
#46
Updated the patch for Drupal 7.
Todo: re-index nodes after change in excluded content type.
#47
Thx for re-rolling, flobruit.
IIRC, the "Todo: re-index nodes after change in excluded content type" part is already taken care of in node_search_admin_settings_submit(). Logic is : "Delete the indexed entries for newly opted-out content types, rebuild the whole index if there are newly opted-in content types"
#48
Related patch: http://drupal.org/node/192089
#49
I've created a patch with some better wording (in my opinion) for the settings form, screenshot attached.
Also, the submit buttons on the Search settings page come before the last two fieldsets, but I'm not sure how to address this.
#50
1 hunk of the patch didn't work with Drupal 6.4. Another update?
#51
Sub.
#52
I took the patch in #39 for a test drive on a fresh checkout of head and received the following error when navigating to the search settings page:
Fatal error: Uncaught exception 'PDOException' with message 'SELECT COUNT(*) FROM {node} n WHERE n.type NOT IN(?) AND n.status = 1 - Array ( ) SQLSTATE[HY093]: Invalid parameter number: no parameters were bound' in C:\wamp\www\drupal7\drupal\includes\database\database.inc:365 Stack trace: #0 C:\wamp\www\drupal7\drupal\includes\database\database.inc(1154): DatabaseConnection->query('SELECT COUNT(*)...', Array, Array) #1 C:\wamp\www\drupal7\drupal\modules\node\node.module(1237): db_query('SELECT COUNT(*)...') #2 [internal function]: node_search('status') #3 C:\wamp\www\drupal7\drupal\includes\module.inc(445): call_user_func_array('node_search', Array) #4 C:\wamp\www\drupal7\drupal\modules\search\search.admin.inc(42): module_invoke('node', 'search', 'status') #5 [internal function]: search_admin_settings(Array) #6 C:\wamp\www\drupal7\drupal\includes\form.inc(360): call_user_func_array('search_admin_se...', Array) #7 [internal function]: drupal_retrieve_form('search_admin_se...', Array) #8 C:\wamp\www\drupal7\drupal\include in C:\wamp\www\drupal7\drupal\includes\database\database.inc on line 365I've no idea what that means or where to begin troubleshooting.
#53
It means the patch needs updating for the new database layer which got in last night.
#54
subscribe
#55
I'm the maintainer of Restricted Search, and am subscribing as I'd obviously not like to have to maintain a module for something that could be in core instead :). We're in the process of launching the site I wrote this for, but hopefully I'll be able to work on this patch once it's launched.
--Andrew
#56
Not sure of the status on this one but subscribing and a big +1 from me. I think this feature is critical in core. I've been trying to get the Restricted Search module to work for me on a Drupal 6.5 site and I've had to resort to hacking core node module (not at all happy about that).
#57
subscribing
#58
FYI, the Search Config module has a D6 patch that already does node exclusion and other features. Maybe there is some overlap? It seems further along than Search Restrict and Restricted Search modules.
http://drupal.org/node/294696
#59
#286263: Make search indexing smart to handle nodes wth php redirects fixes one of the primary issues this issue was created to fix. I'm inclined to mark this as druplicate given that #294696: Port Drupal 6 version of Search Config and #286263: Make search indexing smart to handle nodes wth php redirects have everything covered.
#60
Bevan : this thread is about having the feature in Drupal core, where IMO it belongs.
Now, whether a core patch should be based on the patch here or from Search Config / Search Restrict / Restricted Search contrib modules can probably be discussed, I haven't really looked at those modules.
I don't intend to do more work on the issue myself, though :-).
#61
I didn't realize that was a contrib module.
#62
I have needed the possibility a few times now to exclude node types from search for some users, but still keep all content types in the search index so privileged users can do a full search.
This patch needs some work, but is a very simple shot at having exclusion of content types chosen in the search settings page (see the screenshot).
in my opinion we might need both the possibility to exclude node types entirely from the search index and the possibility to exclude node types from some searches. Explaining the difference to user that has to set this up is probably not going to be an easy job though...
#63
Naxoc, your patch has to do with filtering search results, not with controlling search indexing. You should file a separate issue with your patch.
#64
I wanted to accomplish more or less the same thing and did something very simple which works just fine, although it doesn't remove any checkboxes in advanced search nor does it prevent content types from being indexed. I simply wanted to avoid confusing users with unneeded search results, and enclosed everything in my search-result.tpl.php file with this:
<?php if ($info_split['type'] != 'Profil') : // Hide Profil content-type (human-readable name) from search results ?>// The rest of the template
<?php endif; ?>
Hackish and not perfect, but it gets the job done without needing to add another module.
#65
Moving to D8.
#66
#67
Sub.
#68
subscribe
#69
sub
#70
This feature request is well over three years old, and a contrib module now exists which implements the feature: http://drupal.org/project/search_config.
I think efforts would be better spent on maturing that module, and then potentially moving it to core if it proves popular enough.
#71
Please leave this open. It is under consideration for Drupal 8.
#72
And by the way, http://drupal.org/project/search_by_page also has this functionality.
#73
It seems that's incorrect. I just checked search_config-6.x-1.6, and the README.txt file explains this module no longer attempts to prevent any nodes from being indexed, it just hides them from search results.
#74
Looking through the code of search_config, it intercepts the search module as indexing occurs and prevents the unwanted node types from being indexed, as opposed to letting them be indexed and then later removing them from the index.
#75
Subscribing.
#76
The "Search Config" module was based on the following article: Hiding content from Drupal's search system.
It basically uses hook_db_rewrite_sql to append an exclusionary "where" clause to the search SQL generated in the node module's hook_search implementation.
#77
Subscribing
#78
This may not the place for this, but if anyone's looking for just a small snippet of code to achieve what the issue's title suggests (rather than installing the Search by Page module - nice though it is):
<?phpfunction [my_module]_node_view($node, $view_mode, $langcode) {
if ($view_mode === 'search_index') {
$excluded_types = array(['my_type']);
if (in_array($node->type, $excluded_types)) {
$node->title = '';
$node->content = array();
}
}
}
?>
If the content's already been added to the {search_dataset} then you need to set 'reindex' <> 0 to force it to be reindexed. You could do this by calling node_search_reset() or by manually updating the DB.
#79
subscribe
#80
For Drupal 6, none of the solutions I found worked for me.
I ended up with hacking node.module, which sucks, but it had to be done.
<?php
/**
* Implementation of hook_update_index().
*/
function node_update_index() {
$limit = (int)variable_get('search_cron_limit', 100);
// Store the maximum possible comments per thread (used for ranking by reply count)
variable_set('node_cron_comments_scale', 1.0 / max(1, db_result(db_query('SELECT MAX(comment_count) FROM {node_comment_statistics}'))));
variable_set('node_cron_views_scale', 1.0 / max(1, db_result(db_query('SELECT MAX(totalcount) FROM {node_counter}'))));
// HACK: This was hacked to exclude sos_banners from the index.
$result = db_query_range("SELECT n.nid FROM {node} n LEFT JOIN {search_dataset} d ON d.type = 'node' AND d.sid = n.nid WHERE n.type NOT LIKE 'sos_banner' AND (d.sid IS NULL OR d.reindex <> 0) ORDER BY d.reindex ASC, n.nid ASC", 0, $limit);
while ($node = db_fetch_object($result)) {
// HACK: Mentioning indexed stuff in the logs.
watchdog('search', 'Indexing ' . l("node/". $node->nid, 'node/'.$node->nid.'/edit'));
_node_index_node($node);
}
}
?>
#81
Note that hook_query_node_access_alter() can be used to reduce the results nicely without losing all of the search index information for admin users!
The D7 of version of the Search Config module uses a series of permissions for this. For example, a site with the two content types, webform and page has the following permissions:
Search all content types
Allow users to search any content, overriding content type permissions.
Online Form: Search content of this type
Page: Search content of this type
If no permissions are set, no content is returned. The "Search all content types" is given to all users when the module is installed.
The relevant code is:
<?php
/**
* Implements hook_permission().
*/
function search_config_permission() {
$permissions = array(
'search all content' => array(
'title' => t('Search all content types'),
'description' => t('Allow users to search any content, overriding content type permissions.'),
),
);
$ct = array_map('check_plain', node_type_get_names());
foreach ($ct as $type => $label) {
$permissions["search $type content"] = array(
'title' => t('%type_name: Search content of this type', array('%type_name' => $label)),
);
}
return $permissions;
}
/**
* Implements of hook_query_node_access_alter().
*/
function search_config_query_node_access_alter(QueryAlterableInterface $query) {
// A global override to prevent user 1 from being restricted.
global $user;
if ($user->uid == 1) {
return;
}
// Try and see if we are dealing with a node search
$search = FALSE;
$node = FALSE;
foreach ($query->getTables() as $alias => $table) {
if ($table['table'] == 'search_index') {
$search = $alias;
}
elseif ($table['table'] == 'node') {
$node = $alias;
}
}
if ($node && $search) {
// Standard search
if (!user_access('search all content')) {
$excluded_content_types = array();
$ct = array_map('check_plain', node_type_get_names());
foreach ($ct as $type => $label) {
if (!user_access("search $type content")) {
$excluded_content_types[] = $type;
}
}
$query->condition($node . '.type', array($excluded_content_types), 'NOT IN');
}
}
}
// Also form alters for the search form and install code, etc.
?>
This code would probably be even more compact if in the node module.
Note: While it would be cool to get this functionality into core, I personally see Search Restrict module as the best place for this functionality, but after delays / stonewall from that project, I took up Search Config and refactored it to use this permission based solution.