| Project: | Drupal core |
| Version: | 8.x-dev |
| Component: | statistics.module |
| Category: | feature request |
| Priority: | normal |
| Assigned: | Unassigned |
| Status: | needs work |
| Issue tags: | Needs Review, Needs tests |
Issue Summary
Problem/Motivation
Add basic host and referrer filtering to statistics.module to allow site admins to better monitor site traffic without the noise created by their own activity.
Comment #14 by @MikeRyan
After contributing a statistics_filter module he adds "This functionality would better live in statistics.module (particularly the filtering, which works by undoing the logging already done by statistics.module, more-or-less doubling the overhead per page)."
Comment #19 by @Sun
After a query about whether the issue is still valid he says "Core should provide the least minimum: query_alter via DBTNG, and (optionally) also allow modules to alter the result set/output additionally via drupal_alter()."
Comment #40 by @StBorchert
Submits a patch which is then rerolled by Dereine in #42 but still fails testing. According to @Roger Saner in #45 this is due to this bug #463064: 'content' cannot be used as a $form element and breaks any page which uses it, such as admin/settings/statistics the patch for which @Moshe rejects, suggesting the problem is fixed more elegantly in #428744: Make the main page content a real block.
In Comment #50 @kattekrab posted the following summary:
Isn't statistics.module a candidate for removal from D8 ?
Should this be marked closed won't fix?
Summary:
To which @Sun replied in Comment #51
"As long as it's in core, it's in core. No need to preemptively block issues on a hypothetical removal."
Proposed resolution
Debate continues about the "best" solution.
- No solution required. Solved by contributed modules.
- statistics.module should be improved, simplified, made more flexible
- statistics.module should be removed entirely
Remaining tasks
- Reroll/Rewrite submitted patch for GIT/DBTNG/D8 - 696_access_log_filter_41.patch
- Resolve META debate about removing or improving statistics.module in Drupal8
- Review existing contrib modules
Remove Statistics module from D8 core
#1018716: Remove Statistics module from D8 core
Improve Statistics module for D8?
#1209532: Improve statistics module for D8?
Possible duplicates and related issues
#255769: statistics module needs basic log rules
Contains a patch that allows users to exclude certain traffic from statistics.
- system administrator (UID1)
- site administrators
- traffic that originates from the same IP as the site itself
- users
- IP addresses
- user agents
#345133: 403s and 404s are counted in {node_counter}
Contains a patch to handle 403 & 404 errors, but needs to be rerolled and requires tests. Got bumped to D8 in 5Feb2011
#90468: Only record unique hits in node counter stats
Contains a patch, but needs work.
#1277710: Radioactivity for Drupal 8 core.
Rather than improve or rewrite statistics.module - look at replacing it with radioactivity.
Related contrib modules
Statistics Filtering
http://drupal.org/project/statistics_filter
Statistics Advanced
http://drupal.org/project/statistics_advanced
Statistics Pro
http://drupal.org/project/statspro
Original report by ax
it would be nice if statistics.module would have an option to exclude certain hosts from the access logs. because right now, the log of my personal blog gets filled mainly with my own ip and the ip of the machine thats calling my cron.php :( would be handy to exclude bots and CodeRed'ded web servers as well.
same (allowing excluding certain urls) would be nice for "view referrers".
Comments
#1
i also would like to stress that this holds true for the reffer logging, i would like to be able to exclude my own host (has multiple fqdn's) from the refferre logging
#2
first implementation (by jeremy): http://lists.drupal.org/pipermail/drupal-devel/2002-November/020114.html (I added a new configuration option: "Filter current host". If this option is enabled, your current IP will be filtered when viewing the access log.)
second implementation (by me): http://lists.drupal.org/pipermail/drupal-devel/2003-January/020956.html (a patch that adds a "Filter a list of IPs (regular expressions) from being shown in the access log" feature to jeremys patch.)
now this just needs to go into cvs ...
#3
I was too impatient to wait on someone else to fix this so I've updated these patches mentioned above and created a diff for the current cvs. However I don't know how to submit it, so here is a link to it if someone wants to include it....
http://www.runlevelzero.com/statistics.diff
It works great, although an extension to it could perhaps be to also filter the top posts, but I haven't looked into if it's possible or not yet.
#4
On Wednesday, June 11, 2003 8:09 PM [GMT+1=CET],
rikard wrote:
> I was too impatient to wait on someone else to fix this so I've
> updated these patches mentioned above and created a diff for the
> current cvs. However I don't know how to submit it, so here is a link
> to it if someone wants to include it....
just send your patches to the devel mailing list (see
http://drupal.org/title/creating+and+sending+your+patches ). i am doing
this for you now ...
... because i really hope this patch finally makes it into cvs. there
seems to be quite some demand for this feature - there have even been 2
more feature requests (i had to mark DIPLICATE) ...
tested twice.
> It works great, although an extension to it could perhaps be to also
> filter the top posts, but I haven't looked into if it's possible or
> not yet.
feel free to do another patch ;)
#5
#6
Excluding certain hosts (cron, myself, search bots) from access logs is a slightly different problem than excluding them from the "who's online" listing. Although currently the "who's online" block is provided by the statistics module, it really does not need any of the logging or any other part of the statistics module to function. It works using the core required sessions table.
I suggest removing the "who's online" block from statistics.module, and then adding the capability to filter out a small list of hosts from the statistics access logs.
#7
changed title to reflect better what this feature is about
#8
It looks like this feature hibernated. I'd like to wake it up.
@ rikard (cool homepage btw!), the patch you provide on your site (
http://www.runlevelzero.com/statistics.diff) cannot be found anymore, it seems.
Would you be so kind to ditch it up and re-send it? You can now attach it to the comment in this thread.
And for the rest: What approach is best:
1) Do not log the list of hosts. (filter before insert)
2) Before presentation. (Filter before output)
1) has the downfall that the content is not available at all.
2) 's downside is that you are saving content that you don't want to see anyway.
Regards, Bèr
#9
I went for 1) because it's the easiest to implement :).
The patch was made for 4.3.1 I'll see if i find some time to port it to 4.4 (CVS, that is). If there's someone out there with no todo list, please feel free to take this.
Note that i'm still not very happy about this. It creates a lot of extra overhead. So i might still get to it, and mock up a patch that does some filtering before output. After all: now each pagehit requires some logic to be run. And in the case of 2)this logick is only required when you view the stats. This means far less overhead
But then again: the node_counter will have to have this too, so after all is said and done: I dunno.
#10
woops *blush*
#11
please only use 'patch' status when you have a patch against HEAD Drupal that is ready for CVS review team ... the submitted patch meets neither of these requirements.
#12
Here's a module that lets you:
1. Ignore hits from the builtin administrative user (uid == 1)
2. Ignore hits from users in any roles you choose
3. Ignore hits from crawlers
I don't think this would make an appropriate contributed module, because:
1. It basically takes (at least) twice the time an integrated statistics.module implementation would (it undoes what statistics.module does, after the fact)
2. It specifically undoes what the 4.5.1 statistics.module does, thus cannot be assumed to work on later versions without specific review of any statistics.module code changes
3. For crawler ignoring, it requires configuration of PHP's browscap configuration setting (disabled by default) and installlation in an appropriate place of browscap.ini, so is challenging to configure for non-technical users.
But, maybe some people will find it useful for now, and maybe it can help inspire incorporation of this functionality into the core statistics.module someday...
#13
Changing title back because the new one is not descriptive.
#14
I have now contributed the statistics_filter module. It can be used to filter out hits from the default administrative user, any specified user roles, and crawlers (as identified by browscap.ini). I've also added stats on user agents (also based on browscap.ini) - i.e., you can see that you've had 509 hits from IE 6.0, 2198 hits from Google, etc.
This functionality would better live in statistics.module (particularly the filtering, which works by undoing the logging already done by statistics.module, more-or-less doubling the overhead per page). This module can serve as a testbed for this functionality, as well as a temporary solution - once the functionality and implementation seem stable, I intend on submitting a patch to bring these features into the core statistics.module.
#15
I do not like the implementation of this much.
PLease consider updating xstatistics.module to:
* extend satistics.module,
* *not* remove the data from the database, by default, but just make it not show the data from spieders etc. IMO its bad to first log something, and then remove it. Feels too hackish to me.
* and have it in a central place. One statistics.module for normal use in Drupal, and one advanced contrib module to do all the cool stats magic.
You can also think of a way to add a hook to statiscs.module, so that it allows you to modify the logs, before they are inserted. Then you can make it so, that the robots, adminsor even a configurable list of IPs are not logged.
I would like that more, for, again, your idea is very nice, but the UPDATE appears a bit hackish to me.
#16
#17
See also http://drupal.org/node/21377
#18
This feature request is it still valid ?
#19
Core should provide the least minimum: query_alter via DBTNG, and (optionally) also allow modules to alter the result set/output additionally via drupal_alter().
#20
Here is another approach: why not add a filter form to statistics pages (as on admin/user/user)?
Optionally this filter can be saved (not done yet) so it is not session-related.
Stefan
#21
The last submitted patch failed testing.
#22
Retest.
#23
With a final solution, we'll definitely need tests for this.
#24
Ok, tests. Hm, I tried to write some but at some point it always fails. How do I take care of $_SESSION values?
This is my attempt:
function testStatisticsAdmin() {
// Create some log entries.
// access log
$access_text1 = $this->randomName();
$access_text2 = $this->randomName();
db_query("INSERT INTO {accesslog} (title, path, url, hostname, uid, sid, timer, timestamp) values('%s', '%s', '%s', '%s', %d, '%s', %d, %d)", $access_text1, 'node/1', 'http://example.com', '192.168.1.1', '0', '10', '10', REQUEST_TIME);
db_query("INSERT INTO {accesslog} (title, path, url, hostname, uid, sid, timer, timestamp) values('%s', '%s', '%s', '%s', %d, '%s', %d, %d)", $access_text2, 'node/2', 'http://sub.example.com', '192.168.1.2', '0', '10', '10', REQUEST_TIME);
// Create admin user to filter the logs.
$admin_user = $this->drupalCreateUser(array('access statistics'));
$this->drupalLogin($admin_user);
$this->drupalGet('admin/reports/hits');
$this->assertText($access_text1, t("Found '%text' on recent hits page", array('%text' => $access_text1)));
$this->assertText($access_text2, t("Found '%text' on recent hits page", array('%text' => $access_text2)));
// Filter the access log by page.
$edit = array();
$edit['filter'] = 'page';
$edit['page'] = $access_text1;
$this->drupalPost('admin/reports/hits', $edit, t('Filter'));
// Check if the correct log items show up.
$this->assertNoText($access_text1, t("Found '%text' not on filtered by page on recent hits page", array('%text' => $access_text1)));
$this->assertText($access_text2, t("Found '%text' on filtered by page on recent hits page", array('%text' => $access_text2)));
}
Unfortunately (hm, maybe not) I found an issue in statistics.admin.inc with an unset session key in function statistics_filter_form() (will fix this with the next patch).
The test above fails on "$this->assertNoText($access_text1, ..." because the filter isn't saved in $_SESSION. Don't know why. Any suggestions on the test or the functionality of the patch?
Stefan
#25
You might want to take a look at how core uses filters like this in lists. I'm pretty sure they do not use $_SESSION.
#26
Core uses $_SESSION for filters.
e.g.
function user_filter_form() {if (!isset($_SESSION['user_overview_filter'])) {
drupal_set_session('user_overview_filter', array());
}
$session = &$_SESSION['user_overview_filter'];
(I've taken the code from user.module and modified to the needs of statistics.module.)
But I wonder why the test fails. I've done some more debugging: the session is set and
statistics_build_filter_query()returns the correct results.If checked the result list and the specific entry is no returned. But why does
$this->assertNoText()for this tring returnsFALSE.I don't get it. :-{
Unfortunately I can't create patches with Option
-urp(only-u) but here's a try with a small additional test.Stefan
#27
Hmm...my mistake about not using $_SESSION. Have you tried -uRp (uppercase R)?
#28
Yes. Unfortunately there is no way to tell eclipse or TortoiseCVS to use other options than "-u". And cygwin seems to have some problems on this computer (don't have my laptop at the moment...).
But I created the patch on my test server this morning so here it is. The new test fails at the last but one item, but I have no idea why.
The result list from db doesn't contain the string
$this->assertNoText(...)is searching but it always returns false.Would be great if you could have a look on it.
Stefan
#29
The last submitted patch failed testing.
#30
If someone could look at the tests and get it working... I didn't find the missing part (or the mistake).
Stefan
#31
#32
The last submitted patch failed testing.
#33
Re-rolled patched against current HEAD.
#34
The last submitted patch failed testing.
#35
Re-rolled patch to work with PDO.
#36
for example
<?php+class StatisticsAdminTestCase extends DrupalWebTestCase {
+ function getInfo() {
+ return array(
+ 'name' => t('Statistics administration'),
+ 'description' => t('Test admininstration page functionality.'),
+ 'group' => t('Statistics')
+ );
+ }
+ //here
?>
<?php+ return array('where' => $where,
+ 'join' => $join,
+ 'args' => $args,
+ );
?>
<?php+function theme_statistics_filters(&$form) {
?>
theme('item_list') is not used ?
what happens if +function statistics_filter_form($form_state, $type) { the type is empty?
Does it redirect on the submit part to "admin/reports/"?
So code needs work, sry :)
#37
Ah, thanks for the hint. Didn't see this.
Strange, indeed. Anyway; I formatted the array in an other way now.
Yes. One the one hand it isn't really a list (the
<li>is just a strange way to structure the form items) and on the other hand its exactly the way it is done in user.admin.inc (the place I copied the whole idea from) :-).You'll see an error message complaining about a missing required parameter if you try to open a page where the form is called without that param. The param is mandatory for the form to work because otherwise it doesn't know which filter options to display.
(btw: no need to apologize)
#38
patch needs review of course
#39
+ db_query("INSERT INTO {accesslog} (title, path, url, hostname, uid, sid, timer, timestamp) values('%s', '%s', '%s', '%s', %d, '%s', %d, %d)", $access_text1, $access_text1, 'http://example.com', '192.168.1.1', '0', '10', '10', REQUEST_TIME);+ db_query("INSERT INTO {accesslog} (title, path, url, hostname, uid, sid, timer, timestamp) values('%s', '%s', '%s', '%s', %d, '%s', %d, %d)", $access_text2, $access_text2, 'http://sub.example.com', '192.168.1.2', '0', '10', '10', REQUEST_TIME);
You should use db_insert() for these, see: http://drupal.org/node/310079
+ $condition = db_and();+ foreach ($filter['where'] as $key => $where) {
+ $condition = $condition->condition($where['field'], $filter['args'][$key], $where['operator']);
+ }
+ $query->condition($condition);
I'm not sure about this... can't we add the conditions directly to $query? it will use AND to add them, so that should imho work. So this is imho just unecessary complex
#40
Updated patch with
db_insert()and replaced$condition = ...with$query->condition($where['field'], $filter['args'][$key], $where['operator']);.Didn't thought so but that worked and looks quite simple now.
#41
The last submitted patch failed testing.
#42
reroled to changes the theme('pager'), cannot find the issue at the moment.
#43
set to needs review
#44
The last submitted patch failed testing.
#45
dereine: bet you're mystified why your patch failed testing! It's because of a core bug: any form element called 'content' breaks drupal (and the statistics module has a few of those - it's the only code in core which does, which is maybe why this hasn't been picked up before). I've submitted a patch for that here - once it's committed your patch should pass testing.
#46
#47
#42: 696_access_log_filter_41.patch queued for re-testing.
#48
subscribe
#49
The last submitted patch, 696_access_log_filter_41.patch, failed testing.
#50
Isn't statistics.module a candidate for removal from D8 ?
Should this be marked closed won't fix?
Summary:
#51
As long as it's in core, it's in core. No need to preemptively block issues on a hypothetical removal.
#52
So is anyone still around to re-roll this patch?
The core bug blocker mentioned in #45 has been marked closed (duplicate) of #428744: Make the main page content a real block which is closed and fixed. Yay!
Or is the contrib solution mentioned in #12 the way to go here?
#53
This old old old feature request from 2002 relates to the following active discussions:
#1018716: Remove Statistics module from D8 core
#1209532: Improve statistics module for D8?
#1255674: [Meta] Make core maintainable
This is node 696!!! Can we deal with this in one way or another?
I posted a summary back at #50
#54
#55
Issue Summary has been updated - Patch at #42 needs to re-rolled for GIT / DBNTG / D8