I have only just started this but I wanted to post for opinion. In IRC I was pushed towards using hook_page_alter() instead of defining a new hook. However hook_page_alter() runs after the 404 header has been sent. I don't expect the code as is to get in. Largely I am posting this as a start because I work on large sites (Economist.com) and I don't want to have to use nfs to share the files directory as it adds another single point of failure in the stack.

Background for wanting hook_404()

The approach that Vignette and other large cm systems take is as follows.

Server X and Y are physical or logical Drupal servers that share the same codebase and database.

1) User A uploads an image ( sites/default/dog.png ) to server X to be included onto node/1234.
2) User B loads node/1234 on server Y. When the request comes into load sites/default/dog.png it fires a Drupal 404 page. If hook_404() exists a module of some sort could look up whether the file exists in the files table and go collect the image from the correct server, write the image locally and either serve the image or redirect to it. I am intentionally vague on how the module would collect the image from Server Y or Z or Q as this is an example use case.
2) User C loads node/1234 on server Y. When the request comes into load sites/default/dog.png the file exists and is served by the webserver. Apache never gets involved.

This approach doesn't need each upload to jump through hoops to put each file uploaded on each server which is fast for the person uploading. It doesn't require a cron job to push the relevant files around which would have time lag issues anyway.

ImageCache registers the callback for imagecache/% if I remember correctly but I think something using a hook_404() could capture any missing URL. In fact imagecache in D7 could well use this approach. I imagine I could have a module with the callback for sites/% to work in the same way as ImageCache but it seems less glamorous.

Opinions?

Files: 
CommentFileSizeAuthor
#8 444756_hook_page_not_found_8.patch5.01 KBdeviantintegral
FAILED: [[SimpleTest]]: [MySQL] Unable to apply patch 444756_hook_page_not_found_8.patch. Unable to apply patch. See the log in the details link for more information.
[ View ]
#6 444756_hook_page_not_found.patch2.51 KBdeviantintegral
Failed: Failed to apply patch.
[ View ]
#5 hook404.patch1.16 KBstewsnooze
Failed: Failed to apply patch.
[ View ]
hook404.patch1.14 KBstewsnooze
FAILED: [[SimpleTest]]: [MySQL] Unable to apply patch hook404.patch. Unable to apply patch. See the log in the details link for more information.
[ View ]

Comments

Wow - this is great timing.

I need the same feature but for another reason. In this case, we have two sites, example.com and subsite.example.com. If users type example.com/some-path, and it 404's, we want to check to see if it exists on subsite.example.com, and redirect there. Since Drupal sets the header before calling the path in the site404 variable, it's impossible to redirect as a header has all ready been set.

Two thoughts at the moment:

  1. I think the hook should be called hook_page_not_found(), instead of hook_404()
  2. Do we want modules to be able to override the return value of menu_execute_active_handler()? In my specific case, it wouldn't matter since the module would end in a drupal_goto() call, but I could imagine other cases where a module may not want MENU_NOT_FOUND to be returned

Status:Active» Needs work

I agree with the remarks from #1.

Secondly, as far as I know none of the existing hook_*() functions are called directly. I believe you need to perform the actions in your hook_404() function directly and you can use module_invoke_all.

So it becomes something like:

<?php
function hook_page_not_found($path = NULL) {
 
// We can leave this empty. hook_access() has a basic implementation but it's not
  // called directly so I don't see the use.
}
// Invoke, possibly
$current_path = isset($path) ? $_GET['q'] : NULL;
module_invoke_all('page_not_found', $current_path);
?>

Title:Add a 404 hook to DrupalAdd a page_not_found() hook to Drupal
Status:Needs work» Needs review

I've changed the code to support the two comments. Additionally in the patch there is a space removed from the end of the line in theme_menu_tree() as my eclipse is set up to remove end of line spaces. I could pull that out if needed but I don't see the point yet.

"- * - leaf: The menu item has no submenu. "
"+ * - leaf: The menu item has no submenu."

Also I've created a very small sample module that would use the hook and can be edited to return different page states, like access denied or page not found e.t.c.

You can toy with that by downloading it from

http://www.stewsnooze.com/content/support-adding-hookpagenotfound-drupal-7

Can you attach the patch please?

StatusFileSize
new1.16 KB
Failed: Failed to apply patch.
[ View ]

ha ha. I am such a fool.

Patch attached.

StatusFileSize
new2.51 KB
Failed: Failed to apply patch.
[ View ]

Looks pretty good to me.

Here's an updated patch which contains hook API documentation. The example is just how I'm currently using this hook, but I'm open to making it a simpler example if someone has any ideas.

Status:Needs review» Needs work

The last submitted patch failed testing.

Status:Needs work» Needs review
StatusFileSize
new5.01 KB
FAILED: [[SimpleTest]]: [MySQL] Unable to apply patch 444756_hook_page_not_found_8.patch. Unable to apply patch. See the log in the details link for more information.
[ View ]

Rerolled patch - only difference was the comment cleanup that got fixed in another commit.

Status:Needs review» Needs work

The last submitted patch failed testing.

Status:Needs work» Needs review

Status:Needs review» Closed (won't fix)

I'm able to write my module for D7 using the built-in API's now that we have drupal_static(). The only catch is that you have to remember to unset($_REQUEST['destination']) before calling drupal_goto(), otherwise you get redirected to the previous destination. The following seems to work fine when set as the callback for the site404 variable:

function offsite_redirect_execute() {
  $remote_url = variable_get('offsite_redirect_site', FALSE);
  if (!empty($remote_url) && isset($_REQUEST['destination'])) {
    $path = urlencode($_REQUEST['destination']);
    $result = drupal_http_request($remote_url . $path);
    if (floor($result->code / 100) * 100 == 200 || floor($result->code / 100) * 100 == 300) {
      // Unset the destination pararmeter as drupal_goto will redirect to it if it's present.
      unset($_REQUEST['destination']);
      drupal_goto($remote_url . $path);
    }
  }
  return MENU_NOT_FOUND;
}

I'm marking as won't fix as I believe all of the other cases mentioned in this issue can now be handled.

Status:Closed (won't fix)» Needs review

hook404.patch queued for re-testing.

#8: 444756_hook_page_not_found_8.patch queued for re-testing.

Status:Needs review» Needs work

The last submitted patch, 444756_hook_page_not_found_8.patch, failed testing.

Status:Needs work» Closed (won't fix)

As per #11 this is not needed and this issue has been closed for 3 years.

The general scheme in comment #11 works except that I had to use $_GET['destination'] instead of $_REQUEST['destination'].

EDIT: Also, doing the redirect as part of the default 404 page results in Drupal logging the original access as a page-not-found error even when we never return a 404 to the browser/client. Seems like a real hook would be better, as in the originally proposed patch.