While working on #90468: Only record unique hits in node counter stats, I found out that 403s and 404s are counted in {node_counter}. This further invalidates the node view counter.
I see no quick and easy solution, as the status is not available in $GLOBALS nor elsewhere. Querying the node title in {accesslog} would be ugly and slow, so we'd need this to be set somewhere. Or: we use (or add) another hook that resembles hook_exit but is not called in these cases.
Unfortunately I don't have the time right now to further investigate this. Maybe someone can throw in her/his knowledge?
Note: This is loosely connected to #330542: Exempt custom 403 and 404 pages from popular content - customerror could probably help.
Comment | File | Size | Author |
---|---|---|---|
#13 | statistics_handle_403-404_13.patch | 4.44 KB | Pancho |
#11 | statistics_handle_403-404_11.patch | 4.4 KB | Pancho |
#9 | statistics_handle_403-404_9.patch | 2.66 KB | Pancho |
#7 | statistics_handle_403-404_7.patch | 2.67 KB | Pancho |
#6 | statistics_handle_403-404.patch | 2.7 KB | Pancho |
Comments
Comment #1
kbahey CreditAttribution: kbahey commentedYou are right. customerror will help in this case. As long as they are nodes, they have all the baggage that nodes have. The purpose of customerror is to make those pages non-nodes so they don't have all the side effects described here and in the other issue.
Comment #2
Dave ReidSteps to reproduce:
1. Go to an invalid node, e.g. node/1234567890
2. Look in the {node_counter} table in your database
3. Notice there has been a record added for an invalid nid '1234567890'
After some investigating, here's a patch that checks if it is a valid node and the user actually had access to view it. This will perform even better when we cache the data from node_load(). I couldn't think of any other solution.
Comment #3
Dave ReidArg... this doesn't work if the page is a cached view and hasn't loaded all modules.
node_load() requires node.module and common.inc (drupal_schema_fields_sql).
node_access() requires node.module and user.module.
Comment #4
PanchoI took another approach that should always work plus it also solves a bunch of other requests like #330542: Exempt custom 403 and 404 pages from popular content.
I'm extracting the '403' or '404' string out of the header. The value is then also written to a new 'status_code' field in the {accesslog} table, so 'popular content' lists or blocks can easily filter on that.
A variation of this would be: to do drupal_get_title() earlier and use it both for the 403/404 handling and for the title field in {accesslog}.
What do you think?
Comment #5
Dave ReidIt's a good idea, although IMHO it should probably be a patch as a part of #330542: Exempt custom 403 and 404 pages from popular content since it's out of the context of this issue.
Plus this looks very dubious:
Comment #6
PanchoI think that this is very right here in this issue. I'm not fixing the UI for the other issues, that remains to be solved over there. I'm just fixing this issue here, and only by the way I'm tackling the underlying problem that prevents the other issues to be easily solved. We can decouple it though, if that makes more sense to the committers.
I agree the obscure string operation in patch #4 is not optimal. So the other suggestion proposed in #4 will probably be the more transparent way. Patch enclosed.
Comment #7
PanchoRemoved a leftover. Please test this one.
Comment #8
kbahey CreditAttribution: kbahey commentedI feel that this is hackish, because it is an exception to the normal case. Exceptions and special cases are always problematic in the long run. Rather, I would see a cut down version of customerror in core. Simply two textareas and a menu path for each. Then they are not nodes and the problem goes away.
Comment #9
PanchoI don't know why you think this was more of an exception than it should be, and thus problematic. In fact, 403s and 404 are exceptions, so we need to treat them as such.
Still, I think for a very different reason that #7 is still not optimal as we're checking the title string to find out the status. If there were any nodes with one of the two "magic titles", they wouldn't be counted either.
However, I don't see any other possibility unless we introduce another global (e.g. $http_status) that is set in drupal_not_found() and drupal_access_denied(). This is what I first thought, and probably it remains the best way.
Simply including a cut down version of customerror would IMHO not help. Surely non-nodes don't have the problem. But we are offering the admin to choose any node as 403 or 404 page, and whatever choice the admin takes, the error page shouldn't be counted. So we'd have to /replace/ this feature by customerror, and if we did that, I can already see the complaints, because customerror is not CCK-capable and whatever else.
So no - unfortunately we can't go this way.
Enclosed is another corrected version of patch #7 (I was too much in a hurry). You may continue reviewing that, but I'm gonna write another alternative solution tonight.
Comment #10
kbahey CreditAttribution: kbahey commentedRecording the HTTP status in statistics module is a good idea. I stopped using this module a long time ago because of scalability issues though.
My objection is mainly that we are continuing to treat the 403/404 as nodes.
We are going around the main problem, using nodes where nodes is not a good solution, than trying to find solutions for this.
Comment #11
PanchoLet's tackle this bug first, please, then we can still start a debate over the general idea! There would be a way to go before the non-node solution is as flexible as the the current solution, and I'm still not bought that this really is what we want.
How do you like the following alternative? IMHO it is the cleanest of the approaches we have seen.
Comment #12
Dave ReidWe don't need the 'description' key when adding a field:
Plus, you're adding an index in statistics_schema, but not providing an upgrade path to create the index. What need to we have to create an index on the status code field?
Comment #13
PanchoOh, now you're getting picky like me... ;)
But you're absolutely right about the 'description' and the missing index upgrade path!
However, I think an index is very appropriate to be able to filter out 403s and 404 in the 'Popular content' block (see #330542: Exempt custom 403 and 404 pages from popular content and mark them yellow or red in the 'Recent hits', 'Top referrers' and 'Top pages' report. The UI part would be a separate issue though.
My remaining question with this is whether you guys think that drupal_initialize_variables() is the correct (or the best) place to initialize the $http_status global or if we should put it somewhere else...
Comment #14
Dries CreditAttribution: Dries commentedThis looks like a reasonable approach. Would be ideal if it came with some tests.
Comment #15
gpk CreditAttribution: gpk commentedNote that Drupal sometimes emits other response codes, e.g. 304...
Comment #16
Dave ReidI like the idea, but I think this may also be better implemented as a function instead of introducing a whole new global variable:
So in drupal_access_denied():
And in statistics_exit():
Agreed with Dries that this will def. need tests.
Comment #17
PanchoYeah, that makes sense, though for code readability I'd prefer to call it drupal_set_http_status($status_code) and add a separate wrapper function drupal_get_http_status().
Comment #18
Dave ReidComment #19
PanchoMoving to 8.x. Might then be backported to 7.x.
Comment #20
PanchoBumping this to major priority. Can't believe it's still not fixed.
Comment #21
Tor Arne Thune CreditAttribution: Tor Arne Thune commentedWhile this is an old bug that would be great to have fixed, it doesn't classify as a major in the sense of having significant repercussions. Have a look at Priority levels of issues for the issue priority guidelines.
But anyway, would it be possible to get this issue going again by trying out Dave Reid's suggestion in #16?
Comment #22
timmillwoodMy patch in #1209532: Count node views via AJAX in the statistics module moves the statistics node_counter too hook_node_view, therefore only real nodes will get counted, rather than 404 or 403 nodes.
Comment #23
iamEAP CreditAttribution: iamEAP commentedPatch referenced in #22 is in 8.x; I've confirmed this no longer affects Drupal 8.x. Note that tests in #1326698: Proper test coverage for Statistics logging explicitly check for this.
As such, moving back to 7.x
Comment #24
vanessakovalsky CreditAttribution: vanessakovalsky as a volunteer commentedThis patch need to be backport to D7
Comment #25
poker10 CreditAttribution: poker10 at ActivIT s.r.o. commentedIf nothing was committed to D8+ (according to the comment #23), then the correct status is Needs Work, because we cannot port non-existing commit to D7 (see: https://www.drupal.org/docs/develop/issues/fields-and-other-parts-of-an-...).
However looking at this issue, the proposed changes are not small (including a change in the current behavior), so unless we will make a big progress in next 2 months, I am not sure this will have a chance to be fixed until the D7 EOL.