Path module should add URL alias to update index in nodapi
robertDouglass - April 14, 2008 - 10:26
| Project: | Drupal |
| Version: | 7.x-dev |
| Component: | path.module |
| Category: | feature request |
| Priority: | normal |
| Assigned: | robertDouglass |
| Status: | needs work |
Description
The aliases for nodes are not being indexed yet this would go a long way to increasing relevancy. It would make projects (search d.o. for jstools, for example), bubble higher to top because of their aliases. The solution is to implement an update index hook in path.module's nodeapi.

#1
Here it is. Path alias is fetched in a language specific manner, ascribed a high ranking value, and given to the indexer.
#2
very nice idea
#3
We need to address nodes that are specifically added to menus, by adding the menu label text and giving the node a rank boost.
#4
LIVE FROM THE MINNESOTA SEARCH SPRINT...
I talked to Robert about this and I'm not convinced yet that this is a great idea, especially for core. This patch will only improve the keyword relevancy on nodes that have a single word path aliases. For example, if the url alias is http://example/drupal_handbook, and we index
<H2>drupal_handbook</H2>, the search engine will convert "drupal_handbook" to "drupalhandbook", and searching for "drupal", "handbook", or "drupal handbook" will all fail.While this is a good idea, it's not going to give us much of a benefit until we improve the handling of hyphenations and underscores.
And if we don't do this here, any contrib module is free to implement the 'update index' op and add this code.
#5
Patch cannot apply against HEAD since #314244: remove $op from hook_nodeapi commit.
#6
@douggreen : we can use a simple replacement like
$output = str_replace('_', ' ', drupal_get_path_alias($path, $language));#7
@douggreen:
My opinion is that the path module should interact with the rest of Drupal core in an ideal way and not have to depend on a whole contrib module just to add a path alias to the search index.
As for the hyphenation handling, it's a bug in the search module, and while it inhibits the usefulness of this patch, I don't think it should be addressed here.
#8
Rerolled.
#9
If there is a need for
<?php+ $language = empty($node->language) ? '' : $node->language;
?>
shouldn't it be done in drupal_lookup_path or in node_save? I think this is fluff that can be removed.
#10
@earnie: Maybe the $node has the alias as an attribute when it comes in and I didn't notice. Otherwise, the signature for drupal_get_path_alias pretty much demands that if you want a localized path, you send in a language, which is an attribute on the node.
<?phpfunction drupal_get_path_alias($path, $path_language = '') {
$result = $path;
if ($alias = drupal_lookup_path('alias', $path, $path_language)) {
$result = $alias;
}
return $result;
}
?>
#11
Robert $node->language can be passed to drupal_get_path_alias without needing to check for its existence. The value of a non-existent class property is NULL. There isn't a need to protect against a non-existent property.
#12
Ah, you're totally correct. Now I understand.
#13
I think you might get a warning for trying to access a nonexistent member of an object. That's probably why path's node_load implementation looks like this:
<?phpfunction path_nodeapi_load(&$node, $arg) {
$language = isset($node->language) ? $node->language : '';
$path = 'node/' . $node->nid;
$alias = drupal_get_path_alias($path, $language);
if ($path != $alias) {
$node->path = $alias;
}
}
?>
This also means, however, that I have the language adjusted path already available, so I don't need to repeat the work.
#14
This may be the simplest implementation. Now however, I'm thinking that if a node has many path aliases (this happens sometimes), they should all get added. I think I'm going to roll another one that actually polls the database to get all of the aliases.
#15
But if I don't have time to re-roll, someone else should, and if they don't this patch is still an improvement.
#16
I'm not sure if every alias should be indexed. What if the node's title has changed, and the user isn't doing something smarter like a redirect for the old URL? Since 302 redirects aren't in core, I've seen quite a few non-technical users with sites where multiple path aliases exist just so links don't break.
Also, I think the suggestion up at #6 should be implemented. Pathauto has a set of characters which are user configurable to be considered punctuation. Is there something similar in core we can use?
#17
Is there something similar in core we can use?
Actually, the handling for putting things in the search index should handle punctuation better. There are several common characters that get stripped, (the minus is one of them - ) which should in fact either get replaced with a space, or both (added in two different forms). Therefore my recommendation is to let this patch go in without taking care of punctuation but then handling improved punctuation in core search in another patch.
#18
Centralising punctuation replacement/stripping in another patch seems good.
I think this should come with a test though.
#19
The last submitted patch failed testing.
#20
Simple feature with an interesting scope!
Patch still applies.
#21
1.) The hook is called "hook_node_update_index", not "hook_nodeapi_update_index".
2.) If underlines and hyphens aren't stripped, then what does the search module do with the forward slashes that alias paths contain by definition? Are those separated properly? Even more so than the underline or hyphens, this is essential. If we end up with compound words like "contenttitle" from content/title, or "projectdrupal" from project/drupal, then this patch adds nothing of value to the search index, and shouldn't go in without the stripping logic taken care of first (whether in a separate patch or here).