One of my biggest long-time annoyances with Drupal is the way that the hit counter for a node behaves. The hit counter records every single page view for a node. If you reload a node/x page 10 times, then 10 hits will be recorded for that page. This is ridiculous: the hit counter system in statistics.module needs to be just a little bit more intelligent than this. It needs to identify hits that are not unique (i.e. hits from the same user/host for the same page), and to stop such hits from bumping up the counter.

Attached patch implements this checking system. With this patch, only the first hit for page 'x' from user 'y' (which can of course be an anonymous user, i.e. user 0) on host IP 'z' will increment the counter. All subsequent hits will be recognised as duplicates, and will not result in the counter being incremented. This results in 1 extra query per node page view, but the performance impact is minimal (considering that more performance-intensive INSERT queries are being run anyway, whereas this is just an extra little SELECT query). This checking only occurs if the node access log feature is enabled, because otherwise there is no way to check for non-unique page views.

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

Jaza’s picture

Version: x.y.z » 5.x-dev
Category: task » bug

Actually, I think this qualifies as a bug. The patch above still applies and still works.

chx’s picture

Status: Needs review » Needs work

IMO it's the session id you need to check against. One IP != one anonymous user, could be several from the same IP.

Jaza’s picture

Status: Needs work » Needs review
FileSize
1.84 KB

Good point, chx. New patch that uses session ID instead of IP address. The check for user ID is still in there: it's not really needed, but you never know (there could be two different logged-in users with session IDs that are hash collisions). And besides, I doubt that it has any effect on the performance of the query.

Apart from that, same patch. Tested, and it has the same visible effect as the original patch.

drumm’s picture

How does this affect page load time?

drumm’s picture

Status: Needs review » Needs work
Lowell’s picture

Version: 5.x-dev » 5.1

I just tried this latest patch and don't seem to have any results. I did run the update.php script and it did modify the accesslog table.

Is anybody working on this issue?

It would be good if the tracker/counter did not log duplicate views, edits, and tracker views either.

Dave Reid’s picture

Status: Needs work » Needs review
FileSize
2.56 KB

Here's the patch I've tested and used on my own Drupal installation. I've added an option in the admin/logs/settings page to only count unique views. Also, if the users is currently logged in, the code will check the history table which logs when registered users last viewed nodes. Otherwise for anonymous users, it still checks the accesslog table with uid = 0 and the session id.

On the discussion of performance of this patch, I firmly believe that this has a minimal effect on performance. At minimum, pre-patch performs one UPDATE query, post-patch performs one SELECT query.

drumm’s picture

Version: 5.1 » 7.x-dev
Category: bug » feature

I would consider such setting additions to be features. This should be developed for the current development version of Drupal.

catch’s picture

Version: 7.x-dev » 6.x-dev
Category: feature » task
Status: Needs review » Needs work

...without the settings, it would reduce database load by eliminating a lot of writes to 'node_counter' (or possibly 'node' if it's merged back in - ameliorating some of the performance trade-off of that optimisation as well). So could maybe get in as a bug fix/performance enhancement?

Pancho’s picture

FileSize
2.15 KB

Unbelievable... patch #3 still applies with some offset - after one year and a quarter!
I just broke the code down a little bit to make it more digestible, and added proper documentation.

This bug renders the node statistics rather useless. Also it improves performance per node view. Therefore I keep it in the D6 queue.
Please test this patch thoroughly.

Pancho’s picture

Status: Needs work » Needs review
sdecabooter’s picture

Hi,

I tested this patch on Drupal 6 RC3, and the dev tarball.

I keep getting the following error:

Fatal error: Call to undefined function db_num_rows() in <...>\modules\statistics\statistics.module on line 59

The db_num_rows function seems to be removed from Drupal 6.

Also, if you set the 'Enable access log' setting to disabled, but 'Count content views' to enabled, it will still count every pageload.
Is this supposed to happen?

Pancho’s picture

Assigned: Jaza » Unassigned
Status: Needs review » Needs work

Thank you for testing this! I need to check this and will post an corrected patch asap.

However, it is in fact by design, that with disables access log every pageload is counted. Finally we depend on the access log to discriminate unique page loads by checking the Session ID. The only question is, whether we should disable 'Count content views' then as well to avoid confusion and as the results are rather useless. Anyway this needs to be documented better.

Pancho’s picture

Status: Needs work » Needs review
FileSize
2.16 KB

Okay, I removed db_num_rows() and use SELECT COUNT instead.
Also I removed checking uid in the access log, as the session id is unique enough. The only difference this makes except for performance is in the following case: an authenticated user views a page (which is counted as a page view) and logs out. If he now views the page again this is not counted again, as he keeps his session id. This is IMHO even preferrable.

This time I tested it thouroughly and it works now as advertised.

Surely we can further improve the mechanism determining whether a given page view is unique. We might want to discuss if we want to count only one page view, in case the user views a page anonymously and then again logged in (cross session ids). We might also elaborate on whether for all authenticated users, a page view should be counted only once - ever. Or within a given timespan. Finally we could somehow use the IP address. We might even want to add configuration options for that.

But these are details that would take up too much time right now. This is a stop-gap measure, rendering the page counter usable at all. Also we improve performance on every page view. Mission accomplished, IMHO.

Please test some more and then RTBC.

moshe weitzman’s picture

Version: 6.x-dev » 7.x-dev

Sorry, it is too late for D6 to change the meaning of these counters and change the mix of queries on every page view. D7 is open for business.

Dave Reid’s picture

For those following this issue, I created a module to fix this until something can get committed. Check it out at http://drupal.org/project/statistics_advanced

SeanBannister’s picture

You spelt Statistics wrong in the url, it should be http://drupal.org/project/statistics_advanced
I'd be interested to see this in D7

swentel’s picture

Patch in #14 still applies and works as intented. Should we add more features or setting it rtbc'd ?

Jamie Holly’s picture

This query can cause problems as the {accesslog} grows:

$count = db_result(db_query("SELECT COUNT(*) FROM {accesslog} WHERE path = '%s' AND sid = '%s'", $_GET['q'], session_id()));

To make it more effective, the accesslog should have an index added on path, sid (in that order). As it stands right now, neither field is indexed so every query requires a full table scan.

bsherwood’s picture

Is this to late for D7?

This seems like a good default behavior to deal with page views. It also helps prevent users from just clicking to view the node a hundred times to bump up the node views.

SeanBannister’s picture

No it's not to late for Drupal 7 however we do need to deal with the issue in #19

Dave Reid’s picture

Patch rolled against HEAD with DB API changes.
- Checks against history table if user is logged in (possibly faster than searching accesslog table)
- Adds accesslog_path_sid index in accesslog table for optimization
- Uses db_query_range and 'SELECT 1' to only return first result
- Also fixes some weird trailing space issues...

Patch failed to apply. More information can be found at http://testing.drupal.org/node/14001. If you need help with creating patches please look at http://drupal.org/patch/create

Dave Reid’s picture

Hold on...retrying patch...

Dave Reid’s picture

FileSize
3.51 KB

Alright, patch ready for testing.

Dave Reid’s picture

Patch re-rolled to HEAD (time() function was removed).

Dave Reid’s picture

Assigned: Unassigned » Dave Reid
Dave Reid’s picture

Status: Needs review » Needs work

I will need to make a test for this before it is ready.

Pasqualle’s picture

EDIT: wrong review..

I did not tested, but reading the patch, I think we are still incrementing the node counter when the tracker module is disabled and the node is viewed by an authenticated user..

cases of node counter:

disabled tracker, disabled statistics_enable_access_log
authenticated: increment always
anonymous: increment always

enabled tracker, disabled statistics_enable_access_log
authenticated: increment only if unique
anonymous: increment always

disabled tracker, enabled statistics_enable_access_log
authenticated: increment always
anonymous: increment only if unique

enabled tracker, enabled statistics_enable_access_log
authenticated: increment only if unique
anonymous: increment only if unique

conclusion: this works perfectly only when the tracker module and the statistics_enable_access_log variable are enabled..

I think it is ok to increment when we can't decide if it is unique or not, but it is not ok that the correct functionality depend on two settings (tracker, statistics_enable_access_log)..

Pasqualle’s picture

sorry, absolutely wrong review..
the {history} table does not depend on the tracker module..

Dave Reid’s picture

Actually, if the count content views is disabled (!variable_get('statistics_count_content_views', 0)) then it will never hit any logic to increment. It works best if the access log is enabled since we can test for unique anonymous hits, so maybe we should add a message under the count views option to let the user know that this feature will work best if the access log is enabled.

bflora’s picture

Hi there, I'm a bit of a drupal project newbie still and have yet to understand exactly how a lot of this works.

This appears to be a thread discussing how to improve the statistics module so that it's more efficient and doesn't record every single pageview sent its way (which I agree is lame and facile).

Somewhere in this thread, is there a patch for the D5 statistics module? Or is this all D7? I'm running a D5 site. If I want to make this change to my stats module there, does this thread have a way to do that in it? If not, could someone link me to a thread that does?

Pasqualle’s picture

@bflora: the patch in #7 is for D5, but you should NOT hack core if you want to avoid problems.. I think this patch won't be backported to D5, as this is a new feature..

Dave Reid’s picture

bflora, I should have a 5.x version of the Statistics Advanced module available soon which has this feature.

bflora’s picture

Very cool. I look forward to it.

wflorian’s picture

I really need a D5 patch. Which one should I take? I tried the first three patches, but none of them worked with my D5.10 version, the counter still counts every hit.

I would really appreciate your help.

I did try the patch in #7. Unfortunately even this does not work with my version. If logged in it does not count anything, even if the node has not been viewed before. Not logged in, every hit is counted.

Anybody any idea?

Pancho’s picture

Status: Needs work » Needs review
FileSize
3.08 KB

Last patch fails. Here's a rerolled version with clearified comments.
Please review.

I wonder if we shouldn't add an admin option to select after which time page views by the same user should be counted again. Nodes can be updated, and if the user reads the node again after some time, that page view might be worth being counted.
Possible set of options: "immediately", "after 5 minutes", "after 15 minutes", "after 1 hour", "after 1 day", "after 7 days", "after 1 month", "never".

Dave Reid’s picture

If that's the case, we can compare the node's updated timestamp to when the user last accessed the node, no admin option needed.

Pancho’s picture

That's yet another way of thinking about this. However we can't easily determine how much a node was changed if at all. So we can't generally assume whether this is a reason a page view should be counted again or not. This depends heavily on the content.
I think if someone can decide after which time page views should be counted again, then only the webmaster.

Pancho’s picture

Status: Needs review » Needs work

Some testing revealed that the code doesn't work at all: Authenticated users are not even counted once.

We had that working, but then in #25 we started checking against the history table, which is a mistake.
The problem is that the history record gets updated way earlier, so we can't know if the page has been viewed by this user before - we only find that the user has viewed the page right now (surprise, surprise). So a user viewing the page for the very first time, at this point already has a corresponding {history} record. Because we misinterpret this already as a repeated page view, we don't ever count a user's page view.
Only the {accesslog} table is reliable, as we are updating it later within the same hook.

Now, what can we do?
IMHO we should return to querying only the {access_log} table. We can further improve the indexing to make sure it's as fast as possible.
Even if we can fix the implementation of the query to the {history} table, there are some more reasons for this move:

Case 1: Access log is enabled
There is no point in adding up absolutely-unique page views (by authenticated users) with per-session-unique page views (by anonymous visitors). This handicaps authentificated users compared to anonymous visitors. The result is like adding apples and cherries.
For a default core behaviour it is also too complicated to explain where the data comes from in this case or in that case, and what exactly this does imply. Finally this is so difficult, because it doesn't really imply anything, but is only some kind of haltingly approximation to the popularity of a page.
Case 2: Access log is disabled
It also doesn't make much sense to count only authenticated users' page views. Surely there might be some use cases for this. But in most other cases the - in fact low - significance of such figures will be misinterpreted and overrated. So this is nothing we should ship as default core behavior. If we want, we can make this an option that the admin can decidedly choose.
General concept of Unique page-views
Finally, as already indicated in #37, I seriously question the concept of counting authenticated users' page views only once in a lifetime.
Google Analytics defines:
A unique pageview [...] aggregates pageviews that are generated by the same user during the same session.
[It] represents the number of sessions during which that page was viewed one or more times.
(see here.)
To comply with this definition, which makes a lot of sense and is the expected behaviour, our counts need to be session-based for both authenticated users and anonymous visitors. This means we need to rely on the {accesslog} data only.

From my point of view, this is a reasonable standard behavior that we can ship with D7. To allow for every other way of counting page-views, we should make sure this can be easily be overridden by a simple hook implementation in contrib.

If you guys follow my considerations, I'm going to rework this into a new patch ASAP. I also started work on some simpletests, which obviously need to be adapted to our new strategy, but otherwise are quite done.

Pancho’s picture

Another remark: We need to make sure the pageview is not counted if a http status code "403 Forbidden" or "404 Not Found" is returned. Probably we need to add a 'status_code' column to the {accesslog} table. Though we could also check the Page title in the record, this would be no good idea performance-wise.

Dave Reid’s picture

Page views are only counted in the {node_counter} table... They are not affected by 403s and 404s.

Pancho’s picture

Page views [...] are not affected by 403s and 404s.

No. This statement is wrong in two respects:

  1. Both 403s and 404s do get counted in {node_counter}!
  2. {accesslog} records are added for both 403s and 404s, so they are also checked against in statistics_exit(). So after we have fixed the first bug, the next two edge-cases will arise:
    a) Some user wants to see node/3, but he gets a "403 forbidden". Now he gets permission. But even if he is successful on retry, then no pageview will be counted at all.
    b) node/135 doesn't exist now and the user gets a "404 Not found". Now that it has been created, but the next (successful) try won't be counted as well.
    Why? In both cases an {accesslog} entry was written already at the failed pageview, so the next (successful) pageview is considered a duplicate.

The second problem is admittedly not really big (I said: edge-cases), it's just not nice.
The first one though is really uncool and needs to be fixed asap (see #345133: 403s and 404s are counted in {node_counter}). IMHO this also needs to be backported to D6.

If you still can't believe it: go and check it, please.

Dave Reid’s picture

Pancho. Wow. I did not believe that 403s and 404s were counted in {node_counter}. You were right. :) I posted a patch in that issue that I'm going to implement in my statistics advanced module, which continues to essentially be a D6 backport of this issue.

Dave Reid’s picture

yonailo’s picture

Any one has succesfully patched a D5 installation ? I am looking forward to a D5 fix and the Statistics Advance module is only for D6.

Dave Reid’s picture

Version: 7.x-dev » 8.x-dev
Category: task » feature

Bumping to D8.

wflorian’s picture

No I am sorry to tell you, that I have never been able in patching a D5 version of the module...I am also still looking for this in a D5 version!

CFW’s picture

Does unique_page_views_37.patch work on Drupal 7.2?

wedge’s picture

subscribing

Alex Andrascu’s picture

I'm amazed that this ancient issue is still going on. How hard is it to have a working core Statistics module ? There are patches dated 2008 in this thread. Wow.

Jessica A’s picture

subscribe

haggster’s picture

If you are using Google Analytics, I think this module can do the job: http://drupal.org/project/google_analytics_counter. It has an option to override drupals core node counter.

SeanBannister’s picture

@haggster Just keep in mind that Google Analytics is not a replacement if you need 100% accurate stats. Users are able to opt-out of Google Analytics in which case the hit won't be logged. I'm also concerned about the future of Google Analytics if they start respecting the Do Not Track header this could happen if there was pressure from the FTC, making the stats even less accurate.

Dave Reid’s picture

Assigned: Dave Reid » Unassigned
timmillwood’s picture

I don't think there is a good way todo this without bringing more performance issues, especially as I would like remove accesslog and only keep statistics.module as a node_counter.

Tsubo’s picture

Category: feature » support

Any movement on this? Statistics Filter (http://drupal.org/project/statistics_filter) helps with filtering out hits based on role, but are we still short of a D7 solution/fix for the core stats counter?

togbonna’s picture

Issue summary: View changes

Out of curiosity:

1) Was/Is the Statistics Advance module a suitable solution to this age-old issue for D6?

2) If the answer to 1) above is yes, why is the module still not ported to D7?

3) Has the issue become irrelevant/solved by the D7 core, categorically?

4) If no to 3), are there other solution options for D7?

5) Or (I don't believe this) the issue CANNOT be fixed.

Answers, anyone?

Elin Yordanov’s picture

Version: 8.x-dev » 9.x-dev
Category: Support request » Bug report
Priority: Normal » Critical

Bumping to D9. Since I don't believe that this issue will ever be solved.

BTW, statistics module is nothing but a scrap without this functionality.

catch’s picture

Version: 9.x-dev » 8.1.x-dev
Category: Bug report » Feature request
Priority: Critical » Normal

Version: 8.1.x-dev » 8.2.x-dev

Drupal 8.1.0-beta1 was released on March 2, 2016, which means new developments and disruptive changes should now be targeted against the 8.2.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

Version: 8.2.x-dev » 8.3.x-dev

Drupal 8.2.0-beta1 was released on August 3, 2016, which means new developments and disruptive changes should now be targeted against the 8.3.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

Version: 8.3.x-dev » 8.4.x-dev

Drupal 8.3.0-alpha1 will be released the week of January 30, 2017, which means new developments and disruptive changes should now be targeted against the 8.4.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

Version: 8.4.x-dev » 8.5.x-dev

Drupal 8.4.0-alpha1 will be released the week of July 31, 2017, which means new developments and disruptive changes should now be targeted against the 8.5.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

Version: 8.5.x-dev » 8.6.x-dev

Drupal 8.5.0-alpha1 will be released the week of January 17, 2018, which means new developments and disruptive changes should now be targeted against the 8.6.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

Wim Leers’s picture

Status: Needs work » Needs review
Issue tags: +JavaScript
FileSize
2.36 KB

I don't think there is a good way todo this without bringing more performance issues, especially as I would like remove accesslog and only keep statistics.module as a node_counter.

I'm inclined to agree. accesslog is gone, and for good reason: no chance of privacy violations.

However … there actually is a way to do this without incurring any performance cost. By using the history module's client-side cache of when a node was last viewed! We can't let the statistics module require the history module, that'd be a BC break. But we can leverage it when it's available to make things better.

timmillwood’s picture

Status: Needs review » Needs work
Issue tags: +Needs tests

It'd be good to have a test which depends on history module to check this actually works.

  1. +++ b/core/modules/statistics/statistics.es6.js
    @@ -4,12 +4,22 @@
    +  console.log(lastReadTime);
    
    +++ b/core/modules/statistics/statistics.js
    @@ -6,12 +6,22 @@
    +  console.log(lastReadTime);
    

    Oops

  2. +++ b/core/modules/statistics/statistics.js
    @@ -6,12 +6,22 @@
    \ No newline at end of file
    

    New line

Version: 8.6.x-dev » 8.7.x-dev

Drupal 8.6.0-alpha1 will be released the week of July 16, 2018, which means new developments and disruptive changes should now be targeted against the 8.7.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

debasisnaskar’s picture

I have solved this issue. I have got the unique visitor count per node using "nodeviewcount.module". Just install this "nodeviewcount.module" module and then in my page.tpl.php I just fetched the result with the below code.

<!--Page visitor count-->
<?php
			
    $query = 'SELECT COUNT(DISTINCT uid) FROM {nodeviewcount} WHERE nid='.$node->nid;
    $unique_visitors = db_query($query)->fetchField();
?>
<div class="visitor-count"><span style="font-size:38px;"><?php print $unique_visitors; ?></span> views</div>

I think this might help you.

timmillwood’s picture

@debasisnaskar - Database queries in template files is very bad practice, so much so that it's not even possible in Drupal 8. Also COUNT and DISTINCT queries are going to be pretty slow. I was working on a site yesterday that had used Node View Count module, and the performance was so bad it took the site down.

Version: 8.7.x-dev » 8.8.x-dev

Drupal 8.7.0-alpha1 will be released the week of March 11, 2019, which means new developments and disruptive changes should now be targeted against the 8.8.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

Version: 8.8.x-dev » 8.9.x-dev

Drupal 8.8.0-alpha1 will be released the week of October 14th, 2019, which means new developments and disruptive changes should now be targeted against the 8.9.x-dev branch. (Any changes to 8.9.x will also be committed to 9.0.x in preparation for Drupal 9’s release, but some changes like significant feature additions will be deferred to 9.1.x.). For more information see the Drupal 8 and 9 minor version schedule and the Allowed changes during the Drupal 8 and 9 release cycles.

Version: 8.9.x-dev » 9.1.x-dev

Drupal 8.9.0-beta1 was released on March 20, 2020. 8.9.x is the final, long-term support (LTS) minor release of Drupal 8, which means new developments and disruptive changes should now be targeted against the 9.1.x-dev branch. For more information see the Drupal 8 and 9 minor version schedule and the Allowed changes during the Drupal 8 and 9 release cycles.

Version: 9.1.x-dev » 9.2.x-dev

Drupal 9.1.0-alpha1 will be released the week of October 19, 2020, which means new developments and disruptive changes should now be targeted for the 9.2.x-dev branch. For more information see the Drupal 9 minor version schedule and the Allowed changes during the Drupal 9 release cycle.

Version: 9.2.x-dev » 9.3.x-dev

Drupal 9.2.0-alpha1 will be released the week of May 3, 2021, which means new developments and disruptive changes should now be targeted for the 9.3.x-dev branch. For more information see the Drupal core minor version schedule and the Allowed changes during the Drupal core release cycle.

Version: 9.3.x-dev » 9.4.x-dev

Drupal 9.3.0-rc1 was released on November 26, 2021, which means new developments and disruptive changes should now be targeted for the 9.4.x-dev branch. For more information see the Drupal core minor version schedule and the Allowed changes during the Drupal core release cycle.

Version: 9.4.x-dev » 9.5.x-dev

Drupal 9.4.0-alpha1 was released on May 6, 2022, which means new developments and disruptive changes should now be targeted for the 9.5.x-dev branch. For more information see the Drupal core minor version schedule and the Allowed changes during the Drupal core release cycle.

Version: 9.5.x-dev » 10.1.x-dev

Drupal 9.5.0-beta2 and Drupal 10.0.0-beta2 were released on September 29, 2022, which means new developments and disruptive changes should now be targeted for the 10.1.x-dev branch. For more information see the Drupal core minor version schedule and the Allowed changes during the Drupal core release cycle.

quietone’s picture

Status: Needs work » Postponed
Issue tags: -JavaScript +JavaScript

Statistics is approved for removal. See #3266457: [Policy] Deprecate Statistics module in D10 and move to contrib in D11

This is now Postponed. The status is set according to two policies. The Remove a core extension and move it to a contributed project and the Extensions approved for removal policies.

It will be moved to a contributed Statistics project once the project is created and the Drupal 11 branch is open.

Version: 10.1.x-dev » 11.x-dev

Drupal core is moving towards using a “main” branch. As an interim step, a new 11.x branch has been opened, as Drupal.org infrastructure cannot currently fully support a branch named main. New developments and disruptive changes should now be targeted for the 11.x branch, which currently accepts only minor-version allowed changes. For more information, see the Drupal core minor version schedule and the Allowed changes during the Drupal core release cycle.