Fix search index link handling for non-existent nodes
| Project: | Drupal |
| Version: | 7.x-dev |
| Component: | search.module |
| Category: | bug report |
| Priority: | normal |
| Assigned: | Unassigned |
| Status: | patch (code needs work) |
If the text of a node contains a link to another node and that node doesn't exist yet (e.g., http://www.example.com/node/9999999999 or something like that), a PHP error is generated when the search index is updated:
notice: Trying to get property of non-object in /web/modules/search/search.module on line 503.
notice: Trying to get property of non-object in /web/modules/search/search.module on line 505.The offending code is in search_index() and is the following:
if ($linknid > 0) {
// Note: ignore links to uncachable nodes to avoid redirect bugs.
$node = db_fetch_object(db_query('SELECT n.title, n.nid, n.vid, r.format FROM {node} n INNER JOIN {node_revisions} r ON n.vid = r.vid WHERE n.nid = %d', $linknid));
if (filter_format_allowcache($node->format)) {
$link = TRUE;
$linktitle = $node->title;
}
}Basically it tries to load the node and do something with it regardless of whether or not the node exists.
I would write a patch, but I'm not sure what the correct behavior should be. Should the search index just ignore links to nodes that haven't been created yet? (I'm not sure that's ideal; think of Wikipedia, for example, where links to articles that don't exist yet are all over the place). But if it doesn't ignore them, what should be done about the $linktitle and the uncachable nodes issue?

#1
I would think that if the node doesn't exist yet, then it's not using a format that can't be cached, so this check to prevent redirect bugs, could be skipped when the node doesn't exist (i.e., we could do
!isset($node) || filter_format_allowcache($node->format).Other possible problems:
elsestatement. We probably need logic probably needs to catch this case.SELECT ... ORDER BY r.timestamp DESCso that we get the most recent title?SELECT nid, vid).it would be nice to get Steven's input on this...
#2
#3
Subscribe
#4
Here is a stab at a patch.
If the node doesn't exist yet, I set the $linktitle to be equal to the node id (I imagine it probably doesn't matter much). And of course you're right that there's no need to worry about a nonexistent node being uncacheable -- I have no idea what I was thinking when I wrote that the first time ;)
The attached patch also fixes #3 in @douggreen's comment above. I think #1 and #2 aren't necessary (but please review):
(1) $linktitle is only used in the
elsestatement when $link is TRUE, so the current behavior (only setting $linktitle when $link is set to TRUE) seems okay.(2) The
SELECTstatement acts on the {node} table, which only stores the most recent version of the node and the most recent vid, so it should be okay as is, right?#5
Also note that there is a minor problem which this patch doesn't address (and which is not limited to the case of a nonexistent node):
If you create node/1 containing
http://www.example.com/node/2in the body, then a link to node 2 will be put in the search index (under 'title' if node/2 exists and has 'title' as its title, or under '2' if node 2 doesn't exist yet, as per the above patch).If you then change node 2's title to 'newtitle' (or create it with this title), the search index does not get correctly updated... if you search for 'title' you will still get references to node/2 (whereas searching for 'newtitle' will not correctly count the link from node 1). Things only get fixed if and when node/1 gets edited, so that the search reindexing for node/1 will get triggered.
This probably isn't a major problem, but if anyone can think of a simple way to deal with it, it could be added to this patch...
#6
Rerolled and tested. Works as advertised. One more review, please, but otherwise, should go in.
#7
I was only suggesting that we fix the php error. If the node does not exist, then I think we should just ignore the link. The entire point of the node linking is to score the title words from the linked to node, with the linking node, thus improving the search relevancy of the linked to node. But if the linked to node doesn't exist, then there's nothing to do...
#8
@douggreen, the use case for this is if you are adding pages sequentially. For example, you might create node 1:
Blah blah blah... See the next page for some information about <a href="/node/2">dinosaurs</a>.and then go ahead and create node 2 later. (I think that's how I discovered this bug in the first place, although I don't really remember.) This link should be counted, and searching for "dinosaurs" should pull up node 2... but with your method, it won't happen.
Where this would be even more interesting is if it worked with path aliases. Then you could have:
Some day this site will have a page about <a href="/dinosaurs">dinosaurs</a>, but it doesn't yet.As mentioned previously, the use case here is something like Wikipedia, which has those kinds of links all over the place. It would be nice if they could somehow be counted, although I don't see any obvious way to do this at the moment. However, if we put the infrastructure for dealing with nonexistent nodes in place, then maybe it will be possible eventually...
#9
@David, I don't see ever linking to a node that doesn't exist yet being anything but an error. How do you know what the node number will be of a future, yet to be created node?