One of my biggest long-time annoyances with Drupal is the way that the hit counter for a node behaves. The hit counter records every single page view for a node. If you reload a node/x
page 10 times, then 10 hits will be recorded for that page. This is ridiculous: the hit counter system in statistics.module needs to be just a little bit more intelligent than this. It needs to identify hits that are not unique (i.e. hits from the same user/host for the same page), and to stop such hits from bumping up the counter.
Attached patch implements this checking system. With this patch, only the first hit for page 'x' from user 'y' (which can of course be an anonymous user, i.e. user 0) on host IP 'z' will increment the counter. All subsequent hits will be recognised as duplicates, and will not result in the counter being incremented. This results in 1 extra query per node page view, but the performance impact is minimal (considering that more performance-intensive INSERT queries are being run anyway, whereas this is just an extra little SELECT query). This checking only occurs if the node access log feature is enabled, because otherwise there is no way to check for non-unique page views.
Comment | File | Size | Author |
---|---|---|---|
#66 | 90468-66.patch | 2.36 KB | Wim Leers |
#37 | unique_page_views_37.patch | 3.08 KB | Pancho |
#26 | 90468.statistics_unique.patch | 3.13 KB | Dave Reid |
#25 | 90468.patch | 3.51 KB | Dave Reid |
#14 | unique_page_views.patch | 2.16 KB | Pancho |
Comments
Comment #1
Jaza CreditAttribution: Jaza commentedActually, I think this qualifies as a bug. The patch above still applies and still works.
Comment #2
chx CreditAttribution: chx commentedIMO it's the session id you need to check against. One IP != one anonymous user, could be several from the same IP.
Comment #3
Jaza CreditAttribution: Jaza commentedGood point, chx. New patch that uses session ID instead of IP address. The check for user ID is still in there: it's not really needed, but you never know (there could be two different logged-in users with session IDs that are hash collisions). And besides, I doubt that it has any effect on the performance of the query.
Apart from that, same patch. Tested, and it has the same visible effect as the original patch.
Comment #4
drummHow does this affect page load time?
Comment #5
drummComment #6
Lowell CreditAttribution: Lowell commentedI just tried this latest patch and don't seem to have any results. I did run the update.php script and it did modify the accesslog table.
Is anybody working on this issue?
It would be good if the tracker/counter did not log duplicate views, edits, and tracker views either.
Comment #7
Dave ReidHere's the patch I've tested and used on my own Drupal installation. I've added an option in the admin/logs/settings page to only count unique views. Also, if the users is currently logged in, the code will check the history table which logs when registered users last viewed nodes. Otherwise for anonymous users, it still checks the accesslog table with uid = 0 and the session id.
On the discussion of performance of this patch, I firmly believe that this has a minimal effect on performance. At minimum, pre-patch performs one UPDATE query, post-patch performs one SELECT query.
Comment #8
drummI would consider such setting additions to be features. This should be developed for the current development version of Drupal.
Comment #9
catch...without the settings, it would reduce database load by eliminating a lot of writes to 'node_counter' (or possibly 'node' if it's merged back in - ameliorating some of the performance trade-off of that optimisation as well). So could maybe get in as a bug fix/performance enhancement?
Comment #10
PanchoUnbelievable... patch #3 still applies with some offset - after one year and a quarter!
I just broke the code down a little bit to make it more digestible, and added proper documentation.
This bug renders the node statistics rather useless. Also it improves performance per node view. Therefore I keep it in the D6 queue.
Please test this patch thoroughly.
Comment #11
PanchoComment #12
sdecabooter CreditAttribution: sdecabooter commentedHi,
I tested this patch on Drupal 6 RC3, and the dev tarball.
I keep getting the following error:
The db_num_rows function seems to be removed from Drupal 6.
Also, if you set the 'Enable access log' setting to disabled, but 'Count content views' to enabled, it will still count every pageload.
Is this supposed to happen?
Comment #13
PanchoThank you for testing this! I need to check this and will post an corrected patch asap.
However, it is in fact by design, that with disables access log every pageload is counted. Finally we depend on the access log to discriminate unique page loads by checking the Session ID. The only question is, whether we should disable 'Count content views' then as well to avoid confusion and as the results are rather useless. Anyway this needs to be documented better.
Comment #14
PanchoOkay, I removed db_num_rows() and use SELECT COUNT instead.
Also I removed checking uid in the access log, as the session id is unique enough. The only difference this makes except for performance is in the following case: an authenticated user views a page (which is counted as a page view) and logs out. If he now views the page again this is not counted again, as he keeps his session id. This is IMHO even preferrable.
This time I tested it thouroughly and it works now as advertised.
Surely we can further improve the mechanism determining whether a given page view is unique. We might want to discuss if we want to count only one page view, in case the user views a page anonymously and then again logged in (cross session ids). We might also elaborate on whether for all authenticated users, a page view should be counted only once - ever. Or within a given timespan. Finally we could somehow use the IP address. We might even want to add configuration options for that.
But these are details that would take up too much time right now. This is a stop-gap measure, rendering the page counter usable at all. Also we improve performance on every page view. Mission accomplished, IMHO.
Please test some more and then RTBC.
Comment #15
moshe weitzman CreditAttribution: moshe weitzman commentedSorry, it is too late for D6 to change the meaning of these counters and change the mix of queries on every page view. D7 is open for business.
Comment #16
Dave ReidFor those following this issue, I created a module to fix this until something can get committed. Check it out at http://drupal.org/project/statistics_advanced
Comment #17
SeanBannister CreditAttribution: SeanBannister commentedYou spelt Statistics wrong in the url, it should be http://drupal.org/project/statistics_advanced
I'd be interested to see this in D7
Comment #18
swentel CreditAttribution: swentel commentedPatch in #14 still applies and works as intented. Should we add more features or setting it rtbc'd ?
Comment #19
Jamie Holly CreditAttribution: Jamie Holly commentedThis query can cause problems as the {accesslog} grows:
$count = db_result(db_query("SELECT COUNT(*) FROM {accesslog} WHERE path = '%s' AND sid = '%s'", $_GET['q'], session_id()));
To make it more effective, the accesslog should have an index added on path, sid (in that order). As it stands right now, neither field is indexed so every query requires a full table scan.
Comment #20
bsherwood CreditAttribution: bsherwood commentedIs this to late for D7?
This seems like a good default behavior to deal with page views. It also helps prevent users from just clicking to view the node a hundred times to bump up the node views.
Comment #21
SeanBannister CreditAttribution: SeanBannister commentedNo it's not to late for Drupal 7 however we do need to deal with the issue in #19
Comment #22
Dave ReidPatch rolled against HEAD with DB API changes.
- Checks against history table if user is logged in (possibly faster than searching accesslog table)
- Adds accesslog_path_sid index in accesslog table for optimization
- Uses db_query_range and 'SELECT 1' to only return first result
- Also fixes some weird trailing space issues...
Comment #24
Dave ReidHold on...retrying patch...
Comment #25
Dave ReidAlright, patch ready for testing.
Comment #26
Dave ReidPatch re-rolled to HEAD (time() function was removed).
Comment #27
Dave ReidComment #28
Dave ReidI will need to make a test for this before it is ready.
Comment #29
PasqualleEDIT: wrong review..
I did not tested, but reading the patch, I think we are still incrementing the node counter when the tracker module is disabled and the node is viewed by an authenticated user..
cases of node counter:
disabled tracker, disabled statistics_enable_access_log
authenticated: increment always
anonymous: increment always
enabled tracker, disabled statistics_enable_access_log
authenticated: increment only if unique
anonymous: increment always
disabled tracker, enabled statistics_enable_access_log
authenticated: increment always
anonymous: increment only if unique
enabled tracker, enabled statistics_enable_access_log
authenticated: increment only if unique
anonymous: increment only if unique
conclusion: this works perfectly only when the tracker module and the statistics_enable_access_log variable are enabled..
I think it is ok to increment when we can't decide if it is unique or not, but it is not ok that the correct functionality depend on two settings (tracker, statistics_enable_access_log)..
Comment #30
Pasquallesorry, absolutely wrong review..
the {history} table does not depend on the tracker module..
Comment #31
Dave ReidActually, if the count content views is disabled (!variable_get('statistics_count_content_views', 0)) then it will never hit any logic to increment. It works best if the access log is enabled since we can test for unique anonymous hits, so maybe we should add a message under the count views option to let the user know that this feature will work best if the access log is enabled.
Comment #32
bflora CreditAttribution: bflora commentedHi there, I'm a bit of a drupal project newbie still and have yet to understand exactly how a lot of this works.
This appears to be a thread discussing how to improve the statistics module so that it's more efficient and doesn't record every single pageview sent its way (which I agree is lame and facile).
Somewhere in this thread, is there a patch for the D5 statistics module? Or is this all D7? I'm running a D5 site. If I want to make this change to my stats module there, does this thread have a way to do that in it? If not, could someone link me to a thread that does?
Comment #33
Pasqualle@bflora: the patch in #7 is for D5, but you should NOT hack core if you want to avoid problems.. I think this patch won't be backported to D5, as this is a new feature..
Comment #34
Dave Reidbflora, I should have a 5.x version of the Statistics Advanced module available soon which has this feature.
Comment #35
bflora CreditAttribution: bflora commentedVery cool. I look forward to it.
Comment #36
wflorian CreditAttribution: wflorian commentedI really need a D5 patch. Which one should I take? I tried the first three patches, but none of them worked with my D5.10 version, the counter still counts every hit.
I would really appreciate your help.
I did try the patch in #7. Unfortunately even this does not work with my version. If logged in it does not count anything, even if the node has not been viewed before. Not logged in, every hit is counted.
Anybody any idea?
Comment #37
PanchoLast patch fails. Here's a rerolled version with clearified comments.
Please review.
I wonder if we shouldn't add an admin option to select after which time page views by the same user should be counted again. Nodes can be updated, and if the user reads the node again after some time, that page view might be worth being counted.
Possible set of options: "immediately", "after 5 minutes", "after 15 minutes", "after 1 hour", "after 1 day", "after 7 days", "after 1 month", "never".
Comment #38
Dave ReidIf that's the case, we can compare the node's updated timestamp to when the user last accessed the node, no admin option needed.
Comment #39
PanchoThat's yet another way of thinking about this. However we can't easily determine how much a node was changed if at all. So we can't generally assume whether this is a reason a page view should be counted again or not. This depends heavily on the content.
I think if someone can decide after which time page views should be counted again, then only the webmaster.
Comment #40
PanchoSome testing revealed that the code doesn't work at all: Authenticated users are not even counted once.
We had that working, but then in #25 we started checking against the history table, which is a mistake.
The problem is that the history record gets updated way earlier, so we can't know if the page has been viewed by this user before - we only find that the user has viewed the page right now (surprise, surprise). So a user viewing the page for the very first time, at this point already has a corresponding {history} record. Because we misinterpret this already as a repeated page view, we don't ever count a user's page view.
Only the {accesslog} table is reliable, as we are updating it later within the same hook.
Now, what can we do?
IMHO we should return to querying only the {access_log} table. We can further improve the indexing to make sure it's as fast as possible.
Even if we can fix the implementation of the query to the {history} table, there are some more reasons for this move:
For a default core behaviour it is also too complicated to explain where the data comes from in this case or in that case, and what exactly this does imply. Finally this is so difficult, because it doesn't really imply anything, but is only some kind of haltingly approximation to the popularity of a page.
Google Analytics defines:
To comply with this definition, which makes a lot of sense and is the expected behaviour, our counts need to be session-based for both authenticated users and anonymous visitors. This means we need to rely on the {accesslog} data only.
From my point of view, this is a reasonable standard behavior that we can ship with D7. To allow for every other way of counting page-views, we should make sure this can be easily be overridden by a simple hook implementation in contrib.
If you guys follow my considerations, I'm going to rework this into a new patch ASAP. I also started work on some simpletests, which obviously need to be adapted to our new strategy, but otherwise are quite done.
Comment #41
PanchoAnother remark: We need to make sure the pageview is not counted if a http status code "403 Forbidden" or "404 Not Found" is returned. Probably we need to add a 'status_code' column to the {accesslog} table. Though we could also check the Page title in the record, this would be no good idea performance-wise.
Comment #42
Dave ReidPage views are only counted in the {node_counter} table... They are not affected by 403s and 404s.
Comment #43
PanchoNo. This statement is wrong in two respects:
a) Some user wants to see node/3, but he gets a "403 forbidden". Now he gets permission. But even if he is successful on retry, then no pageview will be counted at all.
b) node/135 doesn't exist now and the user gets a "404 Not found". Now that it has been created, but the next (successful) try won't be counted as well.
Why? In both cases an {accesslog} entry was written already at the failed pageview, so the next (successful) pageview is considered a duplicate.
The second problem is admittedly not really big (I said: edge-cases), it's just not nice.
The first one though is really uncool and needs to be fixed asap (see #345133: 403s and 404s are counted in {node_counter}). IMHO this also needs to be backported to D6.
If you still can't believe it: go and check it, please.
Comment #44
Dave ReidPancho. Wow. I did not believe that 403s and 404s were counted in {node_counter}. You were right. :) I posted a patch in that issue that I'm going to implement in my statistics advanced module, which continues to essentially be a D6 backport of this issue.
Comment #45
Dave ReidComment #46
yonailo CreditAttribution: yonailo commentedAny one has succesfully patched a D5 installation ? I am looking forward to a D5 fix and the Statistics Advance module is only for D6.
Comment #47
Dave ReidBumping to D8.
Comment #48
wflorian CreditAttribution: wflorian commentedNo I am sorry to tell you, that I have never been able in patching a D5 version of the module...I am also still looking for this in a D5 version!
Comment #49
CFW CreditAttribution: CFW commentedDoes unique_page_views_37.patch work on Drupal 7.2?
Comment #50
wedge CreditAttribution: wedge commentedsubscribing
Comment #51
Alex Andrascu CreditAttribution: Alex Andrascu commentedI'm amazed that this ancient issue is still going on. How hard is it to have a working core Statistics module ? There are patches dated 2008 in this thread. Wow.
Comment #52
Jessica A CreditAttribution: Jessica A commentedsubscribe
Comment #53
haggster CreditAttribution: haggster commentedIf you are using Google Analytics, I think this module can do the job: http://drupal.org/project/google_analytics_counter. It has an option to override drupals core node counter.
Comment #54
SeanBannister CreditAttribution: SeanBannister commented@haggster Just keep in mind that Google Analytics is not a replacement if you need 100% accurate stats. Users are able to opt-out of Google Analytics in which case the hit won't be logged. I'm also concerned about the future of Google Analytics if they start respecting the Do Not Track header this could happen if there was pressure from the FTC, making the stats even less accurate.
Comment #55
Dave ReidComment #56
timmillwoodI don't think there is a good way todo this without bringing more performance issues, especially as I would like remove accesslog and only keep statistics.module as a node_counter.
Comment #57
Tsubo CreditAttribution: Tsubo commentedAny movement on this? Statistics Filter (http://drupal.org/project/statistics_filter) helps with filtering out hits based on role, but are we still short of a D7 solution/fix for the core stats counter?
Comment #58
togbonna CreditAttribution: togbonna commentedOut of curiosity:
1) Was/Is the Statistics Advance module a suitable solution to this age-old issue for D6?
2) If the answer to 1) above is yes, why is the module still not ported to D7?
3) Has the issue become irrelevant/solved by the D7 core, categorically?
4) If no to 3), are there other solution options for D7?
5) Or (I don't believe this) the issue CANNOT be fixed.
Answers, anyone?
Comment #59
Elin Yordanov CreditAttribution: Elin Yordanov commentedBumping to D9. Since I don't believe that this issue will ever be solved.
BTW, statistics module is nothing but a scrap without this functionality.
Comment #60
catchComment #66
Wim LeersI'm inclined to agree.
accesslog
is gone, and for good reason: no chance of privacy violations.However … there actually is a way to do this without incurring any performance cost. By using the
history
module's client-side cache of when a node was last viewed! We can't let thestatistics
module require thehistory
module, that'd be a BC break. But we can leverage it when it's available to make things better.Comment #67
timmillwoodIt'd be good to have a test which depends on history module to check this actually works.
Oops
New line
Comment #69
debasisnaskar CreditAttribution: debasisnaskar as a volunteer and commentedI have solved this issue. I have got the unique visitor count per node using "nodeviewcount.module". Just install this "nodeviewcount.module" module and then in my page.tpl.php I just fetched the result with the below code.
I think this might help you.
Comment #70
timmillwood@debasisnaskar - Database queries in template files is very bad practice, so much so that it's not even possible in Drupal 8. Also
COUNT
andDISTINCT
queries are going to be pretty slow. I was working on a site yesterday that had used Node View Count module, and the performance was so bad it took the site down.Comment #79
quietone CreditAttribution: quietone at PreviousNext commentedStatistics is approved for removal. See #3266457: [Policy] Deprecate Statistics module in D10 and move to contrib in D11
This is now Postponed. The status is set according to two policies. The Remove a core extension and move it to a contributed project and the Extensions approved for removal policies.
It will be moved to a contributed Statistics project once the project is created and the Drupal 11 branch is open.