Path module should add URL alias to update index in nodapi

robertDouglass - April 14, 2008 - 10:26
Project:Drupal
Version:8.x-dev
Component:path.module
Category:feature request
Priority:normal
Assigned:robertDouglass
Status:needs work
Description

The aliases for nodes are not being indexed yet this would go a long way to increasing relevancy. It would make projects (search d.o. for jstools, for example), bubble higher to top because of their aliases. The solution is to implement an update index hook in path.module's nodeapi.

#1

robertDouglass - April 18, 2008 - 18:21
Component:search.module» path.module
Assigned to:Anonymous» robertDouglass
Status:active» needs review

Here it is. Path alias is fetched in a language specific manner, ascribed a high ranking value, and given to the indexer.

AttachmentSizeStatusTest resultOperations
search-path-alias.patch1015 bytesIdleFailed: Failed to apply patch.View details | Re-test

#2

moshe weitzman - April 19, 2008 - 02:33

very nice idea

#3

puregin - May 9, 2008 - 15:14

We need to address nodes that are specifically added to menus, by adding the menu label text and giving the node a rank boost.

#4

douggreen - May 9, 2008 - 17:11

LIVE FROM THE MINNESOTA SEARCH SPRINT...

I talked to Robert about this and I'm not convinced yet that this is a great idea, especially for core. This patch will only improve the keyword relevancy on nodes that have a single word path aliases. For example, if the url alias is http://example/drupal_handbook, and we index <H2>drupal_handbook</H2>, the search engine will convert "drupal_handbook" to "drupalhandbook", and searching for "drupal", "handbook", or "drupal handbook" will all fail.

While this is a good idea, it's not going to give us much of a benefit until we improve the handling of hyphenations and underscores.

And if we don't do this here, any contrib module is free to implement the 'update index' op and add this code.

#5

lilou - October 11, 2008 - 22:41
Status:needs review» needs work

Patch cannot apply against HEAD since #314244: remove $op from hook_nodeapi commit.

#6

lilou - October 11, 2008 - 22:35

@douggreen : we can use a simple replacement like

$output = str_replace('_', ' ', drupal_get_path_alias($path, $language));

#7

robertDouglass - October 12, 2008 - 07:51

@douggreen:

And if we don't do this here, any contrib module is free to implement the 'update index' op and add this code

My opinion is that the path module should interact with the rest of Drupal core in an ideal way and not have to depend on a whole contrib module just to add a path alias to the search index.

As for the hyphenation handling, it's a bug in the search module, and while it inhibits the usefulness of this patch, I don't think it should be addressed here.

#8

robertDouglass - October 12, 2008 - 08:53
Status:needs work» needs review

Rerolled.

AttachmentSizeStatusTest resultOperations
path_search.patch1.05 KBIdleFailed: 9172 passes, 3 fails, 3 exceptionsView details | Re-test

#9

earnie - October 12, 2008 - 15:57
Status:needs review» needs work

If there is a need for

<?php
$language = empty($node->language) ? '' : $node->language;
?>

shouldn't it be done in drupal_lookup_path or in node_save? I think this is fluff that can be removed.

#10

robertDouglass - October 12, 2008 - 19:44

@earnie: Maybe the $node has the alias as an attribute when it comes in and I didn't notice. Otherwise, the signature for drupal_get_path_alias pretty much demands that if you want a localized path, you send in a language, which is an attribute on the node.

<?php
function drupal_get_path_alias($path, $path_language = '') {
 
$result = $path;
  if (
$alias = drupal_lookup_path('alias', $path, $path_language)) {
   
$result = $alias;
  }
  return
$result;
}
?>

#11

earnie - October 13, 2008 - 12:37

Robert $node->language can be passed to drupal_get_path_alias without needing to check for its existence. The value of a non-existent class property is NULL. There isn't a need to protect against a non-existent property.

#12

robertDouglass - October 13, 2008 - 15:48

Ah, you're totally correct. Now I understand.

#13

robertDouglass - October 13, 2008 - 15:53

I think you might get a warning for trying to access a nonexistent member of an object. That's probably why path's node_load implementation looks like this:

<?php
function path_nodeapi_load(&$node, $arg) {
 
$language = isset($node->language) ? $node->language : '';
 
$path = 'node/' . $node->nid;
 
$alias = drupal_get_path_alias($path, $language);
  if (
$path != $alias) {
   
$node->path = $alias;
  }
}
?>

This also means, however, that I have the language adjusted path already available, so I don't need to repeat the work.

#14

robertDouglass - October 13, 2008 - 15:59

This may be the simplest implementation. Now however, I'm thinking that if a node has many path aliases (this happens sometimes), they should all get added. I think I'm going to roll another one that actually polls the database to get all of the aliases.

AttachmentSizeStatusTest resultOperations
path.patch818 bytesIdlePassed: 10908 passes, 0 fails, 0 exceptionsView details | Re-test

#15

robertDouglass - November 20, 2008 - 14:54
Status:needs work» needs review

But if I don't have time to re-roll, someone else should, and if they don't this patch is still an improvement.

#16

deviantintegral - November 20, 2008 - 17:39

I'm not sure if every alias should be indexed. What if the node's title has changed, and the user isn't doing something smarter like a redirect for the old URL? Since 302 redirects aren't in core, I've seen quite a few non-technical users with sites where multiple path aliases exist just so links don't break.

Also, I think the suggestion up at #6 should be implemented. Pathauto has a set of characters which are user configurable to be considered punctuation. Is there something similar in core we can use?

#17

robertDouglass - November 21, 2008 - 10:39

Is there something similar in core we can use?

Actually, the handling for putting things in the search index should handle punctuation better. There are several common characters that get stripped, (the minus is one of them - ) which should in fact either get replaced with a space, or both (added in two different forms). Therefore my recommendation is to let this patch go in without taking care of punctuation but then handling improved punctuation in core search in another patch.

#18

catch - November 21, 2008 - 11:56

Centralising punctuation replacement/stripping in another patch seems good.

I think this should come with a test though.

#19

System Message - January 21, 2009 - 09:55
Status:needs review» needs work

The last submitted patch failed testing.

#20

Gurpartap Singh - January 21, 2009 - 22:56
Status:needs work» needs review

Simple feature with an interesting scope!

Patch still applies.

#21

Arancaytar - April 25, 2009 - 20:58
Status:needs review» needs work

1.) The hook is called "hook_node_update_index", not "hook_nodeapi_update_index".

2.) If underlines and hyphens aren't stripped, then what does the search module do with the forward slashes that alias paths contain by definition? Are those separated properly? Even more so than the underline or hyphens, this is essential. If we end up with compound words like "contenttitle" from content/title, or "projectdrupal" from project/drupal, then this patch adds nothing of value to the search index, and shouldn't go in without the stripping logic taken care of first (whether in a separate patch or here).

#22

deviantintegral - November 14, 2009 - 17:27
Version:7.x-dev» 8.x-dev
 
 

Drupal is a registered trademark of Dries Buytaert.