Add filter to logs and referrer lists
ax - October 21, 2002 - 15:26
| Project: | Drupal |
| Version: | 7.x-dev |
| Component: | statistics.module |
| Category: | feature request |
| Priority: | normal |
| Assigned: | Unassigned |
| Status: | needs work |
| Issue tags: | Needs tests |
Description
it would be nice if statistics.module would have an option to exclude certain hosts from the access logs. because right now, the log of my personal blog gets filled mainly with my own ip and the ip of the machine thats calling my cron.php :( would be handy to exclude bots and CodeRed'ded web servers as well.
same (allowing excluding certain urls) would be nice for "view referrers".

#1
i also would like to stress that this holds true for the reffer logging, i would like to be able to exclude my own host (has multiple fqdn's) from the refferre logging
#2
first implementation (by jeremy): http://lists.drupal.org/pipermail/drupal-devel/2002-November/020114.html (I added a new configuration option: "Filter current host". If this option is enabled, your current IP will be filtered when viewing the access log.)
second implementation (by me): http://lists.drupal.org/pipermail/drupal-devel/2003-January/020956.html (a patch that adds a "Filter a list of IPs (regular expressions) from being shown in the access log" feature to jeremys patch.)
now this just needs to go into cvs ...
#3
I was too impatient to wait on someone else to fix this so I've updated these patches mentioned above and created a diff for the current cvs. However I don't know how to submit it, so here is a link to it if someone wants to include it....
http://www.runlevelzero.com/statistics.diff
It works great, although an extension to it could perhaps be to also filter the top posts, but I haven't looked into if it's possible or not yet.
#4
On Wednesday, June 11, 2003 8:09 PM [GMT+1=CET],
rikard wrote:
> I was too impatient to wait on someone else to fix this so I've
> updated these patches mentioned above and created a diff for the
> current cvs. However I don't know how to submit it, so here is a link
> to it if someone wants to include it....
just send your patches to the devel mailing list (see
http://drupal.org/title/creating+and+sending+your+patches ). i am doing
this for you now ...
... because i really hope this patch finally makes it into cvs. there
seems to be quite some demand for this feature - there have even been 2
more feature requests (i had to mark DIPLICATE) ...
tested twice.
> It works great, although an extension to it could perhaps be to also
> filter the top posts, but I haven't looked into if it's possible or
> not yet.
feel free to do another patch ;)
#5
#6
Excluding certain hosts (cron, myself, search bots) from access logs is a slightly different problem than excluding them from the "who's online" listing. Although currently the "who's online" block is provided by the statistics module, it really does not need any of the logging or any other part of the statistics module to function. It works using the core required sessions table.
I suggest removing the "who's online" block from statistics.module, and then adding the capability to filter out a small list of hosts from the statistics access logs.
#7
changed title to reflect better what this feature is about
#8
It looks like this feature hibernated. I'd like to wake it up.
@ rikard (cool homepage btw!), the patch you provide on your site (
http://www.runlevelzero.com/statistics.diff) cannot be found anymore, it seems.
Would you be so kind to ditch it up and re-send it? You can now attach it to the comment in this thread.
And for the rest: What approach is best:
1) Do not log the list of hosts. (filter before insert)
2) Before presentation. (Filter before output)
1) has the downfall that the content is not available at all.
2) 's downside is that you are saving content that you don't want to see anyway.
Regards, Bèr
#9
I went for 1) because it's the easiest to implement :).
The patch was made for 4.3.1 I'll see if i find some time to port it to 4.4 (CVS, that is). If there's someone out there with no todo list, please feel free to take this.
Note that i'm still not very happy about this. It creates a lot of extra overhead. So i might still get to it, and mock up a patch that does some filtering before output. After all: now each pagehit requires some logic to be run. And in the case of 2)this logick is only required when you view the stats. This means far less overhead
But then again: the node_counter will have to have this too, so after all is said and done: I dunno.
#10
woops *blush*
#11
please only use 'patch' status when you have a patch against HEAD Drupal that is ready for CVS review team ... the submitted patch meets neither of these requirements.
#12
Here's a module that lets you:
1. Ignore hits from the builtin administrative user (uid == 1)
2. Ignore hits from users in any roles you choose
3. Ignore hits from crawlers
I don't think this would make an appropriate contributed module, because:
1. It basically takes (at least) twice the time an integrated statistics.module implementation would (it undoes what statistics.module does, after the fact)
2. It specifically undoes what the 4.5.1 statistics.module does, thus cannot be assumed to work on later versions without specific review of any statistics.module code changes
3. For crawler ignoring, it requires configuration of PHP's browscap configuration setting (disabled by default) and installlation in an appropriate place of browscap.ini, so is challenging to configure for non-technical users.
But, maybe some people will find it useful for now, and maybe it can help inspire incorporation of this functionality into the core statistics.module someday...
#13
Changing title back because the new one is not descriptive.
#14
I have now contributed the statistics_filter module. It can be used to filter out hits from the default administrative user, any specified user roles, and crawlers (as identified by browscap.ini). I've also added stats on user agents (also based on browscap.ini) - i.e., you can see that you've had 509 hits from IE 6.0, 2198 hits from Google, etc.
This functionality would better live in statistics.module (particularly the filtering, which works by undoing the logging already done by statistics.module, more-or-less doubling the overhead per page). This module can serve as a testbed for this functionality, as well as a temporary solution - once the functionality and implementation seem stable, I intend on submitting a patch to bring these features into the core statistics.module.
#15
I do not like the implementation of this much.
PLease consider updating xstatistics.module to:
* extend satistics.module,
* *not* remove the data from the database, by default, but just make it not show the data from spieders etc. IMO its bad to first log something, and then remove it. Feels too hackish to me.
* and have it in a central place. One statistics.module for normal use in Drupal, and one advanced contrib module to do all the cool stats magic.
You can also think of a way to add a hook to statiscs.module, so that it allows you to modify the logs, before they are inserted. Then you can make it so, that the robots, adminsor even a configurable list of IPs are not logged.
I would like that more, for, again, your idea is very nice, but the UPDATE appears a bit hackish to me.
#16
#17
See also http://drupal.org/node/21377
#18
This feature request is it still valid ?
#19
Core should provide the least minimum: query_alter via DBTNG, and (optionally) also allow modules to alter the result set/output additionally via drupal_alter().
#20
Here is another approach: why not add a filter form to statistics pages (as on admin/user/user)?
Optionally this filter can be saved (not done yet) so it is not session-related.
Stefan
#21
The last submitted patch failed testing.
#22
Retest.
#23
With a final solution, we'll definitely need tests for this.
#24
Ok, tests. Hm, I tried to write some but at some point it always fails. How do I take care of $_SESSION values?
This is my attempt:
function testStatisticsAdmin() {
// Create some log entries.
// access log
$access_text1 = $this->randomName();
$access_text2 = $this->randomName();
db_query("INSERT INTO {accesslog} (title, path, url, hostname, uid, sid, timer, timestamp) values('%s', '%s', '%s', '%s', %d, '%s', %d, %d)", $access_text1, 'node/1', 'http://example.com', '192.168.1.1', '0', '10', '10', REQUEST_TIME);
db_query("INSERT INTO {accesslog} (title, path, url, hostname, uid, sid, timer, timestamp) values('%s', '%s', '%s', '%s', %d, '%s', %d, %d)", $access_text2, 'node/2', 'http://sub.example.com', '192.168.1.2', '0', '10', '10', REQUEST_TIME);
// Create admin user to filter the logs.
$admin_user = $this->drupalCreateUser(array('access statistics'));
$this->drupalLogin($admin_user);
$this->drupalGet('admin/reports/hits');
$this->assertText($access_text1, t("Found '%text' on recent hits page", array('%text' => $access_text1)));
$this->assertText($access_text2, t("Found '%text' on recent hits page", array('%text' => $access_text2)));
// Filter the access log by page.
$edit = array();
$edit['filter'] = 'page';
$edit['page'] = $access_text1;
$this->drupalPost('admin/reports/hits', $edit, t('Filter'));
// Check if the correct log items show up.
$this->assertNoText($access_text1, t("Found '%text' not on filtered by page on recent hits page", array('%text' => $access_text1)));
$this->assertText($access_text2, t("Found '%text' on filtered by page on recent hits page", array('%text' => $access_text2)));
}
Unfortunately (hm, maybe not) I found an issue in statistics.admin.inc with an unset session key in function statistics_filter_form() (will fix this with the next patch).
The test above fails on "$this->assertNoText($access_text1, ..." because the filter isn't saved in $_SESSION. Don't know why. Any suggestions on the test or the functionality of the patch?
Stefan
#25
You might want to take a look at how core uses filters like this in lists. I'm pretty sure they do not use $_SESSION.
#26
Core uses $_SESSION for filters.
e.g.
function user_filter_form() {if (!isset($_SESSION['user_overview_filter'])) {
drupal_set_session('user_overview_filter', array());
}
$session = &$_SESSION['user_overview_filter'];
(I've taken the code from user.module and modified to the needs of statistics.module.)
But I wonder why the test fails. I've done some more debugging: the session is set and
statistics_build_filter_query()returns the correct results.If checked the result list and the specific entry is no returned. But why does
$this->assertNoText()for this tring returnsFALSE.I don't get it. :-{
Unfortunately I can't create patches with Option
-urp(only-u) but here's a try with a small additional test.Stefan
#27
Hmm...my mistake about not using $_SESSION. Have you tried -uRp (uppercase R)?
#28
Yes. Unfortunately there is no way to tell eclipse or TortoiseCVS to use other options than "-u". And cygwin seems to have some problems on this computer (don't have my laptop at the moment...).
But I created the patch on my test server this morning so here it is. The new test fails at the last but one item, but I have no idea why.
The result list from db doesn't contain the string
$this->assertNoText(...)is searching but it always returns false.Would be great if you could have a look on it.
Stefan
#29
The last submitted patch failed testing.
#30
If someone could look at the tests and get it working... I didn't find the missing part (or the mistake).
Stefan
#31
#32
The last submitted patch failed testing.
#33
Re-rolled patched against current HEAD.
#34
The last submitted patch failed testing.
#35
Re-rolled patch to work with PDO.
#36
for example
<?php+class StatisticsAdminTestCase extends DrupalWebTestCase {
+ function getInfo() {
+ return array(
+ 'name' => t('Statistics administration'),
+ 'description' => t('Test admininstration page functionality.'),
+ 'group' => t('Statistics')
+ );
+ }
+ //here
?>
<?php+ return array('where' => $where,
+ 'join' => $join,
+ 'args' => $args,
+ );
?>
<?php+function theme_statistics_filters(&$form) {
?>
theme('item_list') is not used ?
what happens if +function statistics_filter_form($form_state, $type) { the type is empty?
Does it redirect on the submit part to "admin/reports/"?
So code needs work, sry :)
#37
Ah, thanks for the hint. Didn't see this.
Strange, indeed. Anyway; I formatted the array in an other way now.
Yes. One the one hand it isn't really a list (the
<li>is just a strange way to structure the form items) and on the other hand its exactly the way it is done in user.admin.inc (the place I copied the whole idea from) :-).You'll see an error message complaining about a missing required parameter if you try to open a page where the form is called without that param. The param is mandatory for the form to work because otherwise it doesn't know which filter options to display.
(btw: no need to apologize)
#38
patch needs review of course
#39
+ db_query("INSERT INTO {accesslog} (title, path, url, hostname, uid, sid, timer, timestamp) values('%s', '%s', '%s', '%s', %d, '%s', %d, %d)", $access_text1, $access_text1, 'http://example.com', '192.168.1.1', '0', '10', '10', REQUEST_TIME);+ db_query("INSERT INTO {accesslog} (title, path, url, hostname, uid, sid, timer, timestamp) values('%s', '%s', '%s', '%s', %d, '%s', %d, %d)", $access_text2, $access_text2, 'http://sub.example.com', '192.168.1.2', '0', '10', '10', REQUEST_TIME);
You should use db_insert() for these, see: http://drupal.org/node/310079
+ $condition = db_and();+ foreach ($filter['where'] as $key => $where) {
+ $condition = $condition->condition($where['field'], $filter['args'][$key], $where['operator']);
+ }
+ $query->condition($condition);
I'm not sure about this... can't we add the conditions directly to $query? it will use AND to add them, so that should imho work. So this is imho just unecessary complex
#40
Updated patch with
db_insert()and replaced$condition = ...with$query->condition($where['field'], $filter['args'][$key], $where['operator']);.Didn't thought so but that worked and looks quite simple now.
#41
The last submitted patch failed testing.
#42
reroled to changes the theme('pager'), cannot find the issue at the moment.
#43
set to needs review
#44
The last submitted patch failed testing.
#45
dereine: bet you're mystified why your patch failed testing! It's because of a core bug: any form element called 'content' breaks drupal (and the statistics module has a few of those - it's the only code in core which does, which is maybe why this hasn't been picked up before). I've submitted a patch for that here - once it's committed your patch should pass testing.