While working on #90468: Only record unique hits in node counter stats, I found out that 403s and 404s are counted in {node_counter}. This further invalidates the node view counter.
I see no quick and easy solution, as the status is not available in $GLOBALS nor elsewhere. Querying the node title in {accesslog} would be ugly and slow, so we'd need this to be set somewhere. Or: we use (or add) another hook that resembles hook_exit but is not called in these cases.
Unfortunately I don't have the time right now to further investigate this. Maybe someone can throw in her/his knowledge?

Note: This is loosely connected to #330542: Exempt custom 403 and 404 pages from popular content - customerror could probably help.

Files: 
CommentFileSizeAuthor
#13 statistics_handle_403-404_13.patch4.44 KBPancho
Passed: 7659 passes, 0 fails, 0 exceptions
[ View ]
#11 statistics_handle_403-404_11.patch4.4 KBPancho
Passed: 7659 passes, 0 fails, 0 exceptions
[ View ]
#9 statistics_handle_403-404_9.patch2.66 KBPancho
Passed: 7659 passes, 0 fails, 0 exceptions
[ View ]
#7 statistics_handle_403-404_7.patch2.67 KBPancho
Passed: 7659 passes, 0 fails, 0 exceptions
[ View ]
#6 statistics_handle_403-404.patch2.7 KBPancho
Passed: 7659 passes, 0 fails, 0 exceptions
[ View ]
#4 statistics_handle_403-404.patch2.49 KBPancho
Failed: 7658 passes, 1 fail, 0 exceptions
[ View ]
#2 345133-dont-count-them-nodes-D7.patch1.04 KBDave Reid
Passed: 7641 passes, 0 fails, 0 exceptions
[ View ]

Comments

You are right. customerror will help in this case. As long as they are nodes, they have all the baggage that nodes have. The purpose of customerror is to make those pages non-nodes so they don't have all the side effects described here and in the other issue.

Assigned:Unassigned» Dave Reid
Status:Active» Needs review
StatusFileSize
new1.04 KB
Passed: 7641 passes, 0 fails, 0 exceptions
[ View ]

Steps to reproduce:
1. Go to an invalid node, e.g. node/1234567890
2. Look in the {node_counter} table in your database
3. Notice there has been a record added for an invalid nid '1234567890'

After some investigating, here's a patch that checks if it is a valid node and the user actually had access to view it. This will perform even better when we cache the data from node_load(). I couldn't think of any other solution.

Status:Needs review» Needs work

Arg... this doesn't work if the page is a cached view and hasn't loaded all modules.

node_load() requires node.module and common.inc (drupal_schema_fields_sql).
node_access() requires node.module and user.module.

Assigned:Dave Reid» Unassigned
Status:Needs work» Needs review
StatusFileSize
new2.49 KB
Failed: 7658 passes, 1 fail, 0 exceptions
[ View ]

I took another approach that should always work plus it also solves a bunch of other requests like #330542: Exempt custom 403 and 404 pages from popular content.
I'm extracting the '403' or '404' string out of the header. The value is then also written to a new 'status_code' field in the {accesslog} table, so 'popular content' lists or blocks can easily filter on that.
A variation of this would be: to do drupal_get_title() earlier and use it both for the 403/404 handling and for the title field in {accesslog}.
What do you think?

It's a good idea, although IMHO it should probably be a patch as a part of #330542: Exempt custom 403 and 404 pages from popular content since it's out of the context of this issue.

Plus this looks very dubious:

<?php
$status_code
= substr(drupal_get_headers(), strlen('Content-Type: text/html; charset=utf-8' . $_SERVER['SERVER_PROTOCOL']) +2, 3);
?>

StatusFileSize
new2.7 KB
Passed: 7659 passes, 0 fails, 0 exceptions
[ View ]

I think that this is very right here in this issue. I'm not fixing the UI for the other issues, that remains to be solved over there. I'm just fixing this issue here, and only by the way I'm tackling the underlying problem that prevents the other issues to be easily solved. We can decouple it though, if that makes more sense to the committers.

I agree the obscure string operation in patch #4 is not optimal. So the other suggestion proposed in #4 will probably be the more transparent way. Patch enclosed.

StatusFileSize
new2.67 KB
Passed: 7659 passes, 0 fails, 0 exceptions
[ View ]

Removed a leftover. Please test this one.

I feel that this is hackish, because it is an exception to the normal case. Exceptions and special cases are always problematic in the long run. Rather, I would see a cut down version of customerror in core. Simply two textareas and a menu path for each. Then they are not nodes and the problem goes away.

StatusFileSize
new2.66 KB
Passed: 7659 passes, 0 fails, 0 exceptions
[ View ]

I don't know why you think this was more of an exception than it should be, and thus problematic. In fact, 403s and 404 are exceptions, so we need to treat them as such.

Still, I think for a very different reason that #7 is still not optimal as we're checking the title string to find out the status. If there were any nodes with one of the two "magic titles", they wouldn't be counted either.

However, I don't see any other possibility unless we introduce another global (e.g. $http_status) that is set in drupal_not_found() and drupal_access_denied(). This is what I first thought, and probably it remains the best way.

Simply including a cut down version of customerror would IMHO not help. Surely non-nodes don't have the problem. But we are offering the admin to choose any node as 403 or 404 page, and whatever choice the admin takes, the error page shouldn't be counted. So we'd have to /replace/ this feature by customerror, and if we did that, I can already see the complaints, because customerror is not CCK-capable and whatever else.
So no - unfortunately we can't go this way.

Enclosed is another corrected version of patch #7 (I was too much in a hurry). You may continue reviewing that, but I'm gonna write another alternative solution tonight.

Recording the HTTP status in statistics module is a good idea. I stopped using this module a long time ago because of scalability issues though.

My objection is mainly that we are continuing to treat the 403/404 as nodes.

We are going around the main problem, using nodes where nodes is not a good solution, than trying to find solutions for this.

StatusFileSize
new4.4 KB
Passed: 7659 passes, 0 fails, 0 exceptions
[ View ]

Let's tackle this bug first, please, then we can still start a debate over the general idea! There would be a way to go before the non-node solution is as flexible as the the current solution, and I'm still not bought that this really is what we want.

How do you like the following alternative? IMHO it is the cleanest of the approaches we have seen.

Status:Needs review» Needs work

We don't need the 'description' key when adding a field:

+  $ret = array();
+  $field = array(
+    'type' => 'int',
+    'unsigned' => TRUE,
+    'not null' => FALSE,
+    'default' => 0,
+    'description' => 'HTTP/1.1 Status Code',
+  );
+  db_add_field($ret, 'accesslog', 'http_status', $field);
+  return $ret;
+}

Plus, you're adding an index in statistics_schema, but not providing an upgrade path to create the index. What need to we have to create an index on the status code field?

Status:Needs work» Needs review
StatusFileSize
new4.44 KB
Passed: 7659 passes, 0 fails, 0 exceptions
[ View ]

Oh, now you're getting picky like me... ;)
But you're absolutely right about the 'description' and the missing index upgrade path!
However, I think an index is very appropriate to be able to filter out 403s and 404 in the 'Popular content' block (see #330542: Exempt custom 403 and 404 pages from popular content and mark them yellow or red in the 'Recent hits', 'Top referrers' and 'Top pages' report. The UI part would be a separate issue though.

My remaining question with this is whether you guys think that drupal_initialize_variables() is the correct (or the best) place to initialize the $http_status global or if we should put it somewhere else...

This looks like a reasonable approach. Would be ideal if it came with some tests.

Note that Drupal sometimes emits other response codes, e.g. 304...

I like the idea, but I think this may also be better implemented as a function instead of introducing a whole new global variable:

<?php
function drupal_http_status($set_status = NULL) {
  static
$status = 200;
  if (isset(
$set_status) && is_int($set_status)) {
   
$status = $set_status;
   
// Maybe even send the status header here?
    // header($set_status, ...) we'd need an extra $reason = '' parameter to do this
 
}
  return
$status;
}
?>

So in drupal_access_denied():

<?php
drupal_http_status
(403);
?>

And in statistics_exit():

<?php
$status
= drupal_http_status();
?>

Agreed with Dries that this will def. need tests.

Status:Needs review» Needs work

Yeah, that makes sense, though for code readability I'd prefer to call it drupal_set_http_status($status_code) and add a separate wrapper function drupal_get_http_status().

Version:7.x-dev» 8.x-dev

Moving to 8.x. Might then be backported to 7.x.

Priority:Normal» Major

Bumping this to major priority. Can't believe it's still not fixed.

Priority:Major» Normal

While this is an old bug that would be great to have fixed, it doesn't classify as a major in the sense of having significant repercussions. Have a look at Priority levels of issues for the issue priority guidelines.

But anyway, would it be possible to get this issue going again by trying out Dave Reid's suggestion in #16?

My patch in #1209532: Count node views via AJAX in the statistics module moves the statistics node_counter too hook_node_view, therefore only real nodes will get counted, rather than 404 or 403 nodes.

Version:8.x-dev» 7.x-dev

Patch referenced in #22 is in 8.x; I've confirmed this no longer affects Drupal 8.x. Note that tests in #1326698: Proper test coverage for Statistics logging explicitly check for this.

As such, moving back to 7.x