Add filter to logs and referrer lists

ax - October 21, 2002 - 15:26
Project:Drupal
Version:8.x-dev
Component:statistics.module
Category:feature request
Priority:normal
Assigned:Unassigned
Status:needs work
Issue tags:Needs tests
Description

it would be nice if statistics.module would have an option to exclude certain hosts from the access logs. because right now, the log of my personal blog gets filled mainly with my own ip and the ip of the machine thats calling my cron.php :( would be handy to exclude bots and CodeRed'ded web servers as well.

same (allowing excluding certain urls) would be nice for "view referrers".

#1

bertboerland@ww... - October 21, 2002 - 16:16

i also would like to stress that this holds true for the reffer logging, i would like to be able to exclude my own host (has multiple fqdn's) from the refferre logging

#2

ax - June 9, 2003 - 02:13
Priority:» normal

first implementation (by jeremy): http://lists.drupal.org/pipermail/drupal-devel/2002-November/020114.html (I added a new configuration option: "Filter current host". If this option is enabled, your current IP will be filtered when viewing the access log.)

second implementation (by me): http://lists.drupal.org/pipermail/drupal-devel/2003-January/020956.html (a patch that adds a "Filter a list of IPs (regular expressions) from being shown in the access log" feature to jeremys patch.)

now this just needs to go into cvs ...

#3

rikard - June 11, 2003 - 18:09

I was too impatient to wait on someone else to fix this so I've updated these patches mentioned above and created a diff for the current cvs. However I don't know how to submit it, so here is a link to it if someone wants to include it....

http://www.runlevelzero.com/statistics.diff

It works great, although an extension to it could perhaps be to also filter the top posts, but I haven't looked into if it's possible or not yet.

#4

ax - June 11, 2003 - 19:12
Title:allow excluding certain hosts (myself, cron) from access logs» [PATCH] Re: [drupal-devel] feature #696 : allow excluding certain hosts (myself,cron) from access logs

On Wednesday, June 11, 2003 8:09 PM [GMT+1=CET],
rikard wrote:

> I was too impatient to wait on someone else to fix this so I've
> updated these patches mentioned above and created a diff for the
> current cvs. However I don't know how to submit it, so here is a link
> to it if someone wants to include it....

just send your patches to the devel mailing list (see
http://drupal.org/title/creating+and+sending+your+patches ). i am doing
this for you now ...

... because i really hope this patch finally makes it into cvs. there
seems to be quite some demand for this feature - there have even been 2
more feature requests (i had to mark DIPLICATE) ...

tested twice.

> It works great, although an extension to it could perhaps be to also
> filter the top posts, but I haven't looked into if it's possible or
> not yet.

feel free to do another patch ;)

#5

al - June 11, 2003 - 20:06
Title:[PATCH] Re: [drupal-devel] feature #696 : allow excluding certain hosts (myself,cron) from access logs» allow excluding certain hosts (myself,cron) from access logs

#6

Chris Johnson - November 13, 2003 - 02:06

Excluding certain hosts (cron, myself, search bots) from access logs is a slightly different problem than excluding them from the "who's online" listing. Although currently the "who's online" block is provided by the statistics module, it really does not need any of the logging or any other part of the statistics module to function. It works using the core required sessions table.

I suggest removing the "who's online" block from statistics.module, and then adding the capability to filter out a small list of hosts from the statistics access logs.

#7

ax - November 29, 2003 - 23:43
Title:allow excluding certain hosts (myself,cron) from access logs» access log / referrer filter

changed title to reflect better what this feature is about

#8

Bèr Kessels - February 4, 2004 - 13:42

It looks like this feature hibernated. I'd like to wake it up.

@ rikard (cool homepage btw!), the patch you provide on your site (
http://www.runlevelzero.com/statistics.diff) cannot be found anymore, it seems.

Would you be so kind to ditch it up and re-send it? You can now attach it to the comment in this thread.

And for the rest: What approach is best:

1) Do not log the list of hosts. (filter before insert)
2) Before presentation. (Filter before output)

1) has the downfall that the content is not available at all.
2) 's downside is that you are saving content that you don't want to see anyway.

Regards, Bèr

#9

Bèr Kessels - February 5, 2004 - 13:56

I went for 1) because it's the easiest to implement :).

The patch was made for 4.3.1 I'll see if i find some time to port it to 4.4 (CVS, that is). If there's someone out there with no todo list, please feel free to take this.

Note that i'm still not very happy about this. It creates a lot of extra overhead. So i might still get to it, and mock up a patch that does some filtering before output. After all: now each pagehit requires some logic to be run. And in the case of 2)this logick is only required when you view the stats. This means far less overhead

But then again: the node_counter will have to have this too, so after all is said and done: I dunno.

AttachmentSizeStatusTest resultOperations
statistics_exclude_hosts.patch5.14 KBIgnoredNoneNone

#10

Bèr Kessels - February 5, 2004 - 13:57

woops *blush*

#11

moshe weitzman - February 5, 2004 - 16:11

please only use 'patch' status when you have a patch against HEAD Drupal that is ready for CVS review team ... the submitted patch meets neither of these requirements.

#12

mikeryan - February 26, 2005 - 02:53
Title:access log / referrer filter» Module to do this

Here's a module that lets you:

1. Ignore hits from the builtin administrative user (uid == 1)
2. Ignore hits from users in any roles you choose
3. Ignore hits from crawlers

I don't think this would make an appropriate contributed module, because:

1. It basically takes (at least) twice the time an integrated statistics.module implementation would (it undoes what statistics.module does, after the fact)
2. It specifically undoes what the 4.5.1 statistics.module does, thus cannot be assumed to work on later versions without specific review of any statistics.module code changes
3. For crawler ignoring, it requires configuration of PHP's browscap configuration setting (disabled by default) and installlation in an appropriate place of browscap.ini, so is challenging to configure for non-technical users.

But, maybe some people will find it useful for now, and maybe it can help inspire incorporation of this functionality into the core statistics.module someday...

AttachmentSizeStatusTest resultOperations
statistics_filter.module3.94 KBIgnoredNoneNone

#13

tangent - February 26, 2005 - 17:33
Title:Module to do this» access log / referrer filter

Changing title back because the new one is not descriptive.

#14

mikeryan - February 26, 2005 - 21:24

I have now contributed the statistics_filter module. It can be used to filter out hits from the default administrative user, any specified user roles, and crawlers (as identified by browscap.ini). I've also added stats on user agents (also based on browscap.ini) - i.e., you can see that you've had 509 hits from IE 6.0, 2198 hits from Google, etc.

This functionality would better live in statistics.module (particularly the filtering, which works by undoing the logging already done by statistics.module, more-or-less doubling the overhead per page). This module can serve as a testbed for this functionality, as well as a temporary solution - once the functionality and implementation seem stable, I intend on submitting a patch to bring these features into the core statistics.module.

#15

Bèr Kessels - February 28, 2005 - 13:16

I do not like the implementation of this much.

PLease consider updating xstatistics.module to:
* extend satistics.module,
* *not* remove the data from the database, by default, but just make it not show the data from spieders etc. IMO its bad to first log something, and then remove it. Feels too hackish to me.
* and have it in a central place. One statistics.module for normal use in Drupal, and one advanced contrib module to do all the cool stats magic.

You can also think of a way to add a hook to statiscs.module, so that it allows you to modify the logs, before they are inserted. Then you can make it so, that the robots, adminsor even a configurable list of IPs are not logged.

I would like that more, for, again, your idea is very nice, but the UPDATE appears a bit hackish to me.

#16

forngren - August 7, 2006 - 17:51
Version:<none>» x.y.z

#17

magico - September 18, 2006 - 18:59

#18

lilou - August 13, 2008 - 20:06
Version:x.y.z» 7.x-dev

This feature request is it still valid ?

#19

sun - January 11, 2009 - 19:03

Core should provide the least minimum: query_alter via DBTNG, and (optionally) also allow modules to alter the result set/output additionally via drupal_alter().

#20

stBorchert - January 13, 2009 - 13:28
Status:active» needs review

Here is another approach: why not add a filter form to statistics pages (as on admin/user/user)?
Optionally this filter can be saved (not done yet) so it is not session-related.

Stefan

AttachmentSizeStatusTest resultOperations
696_access_log_filter_20_preview.png21.63 KBIgnoredNoneNone
696_access_log_filter_20.patch16.71 KBIdleFailed: 9253 passes, 3 fails, 3 exceptionsView details | Re-test

#21

System Message - January 22, 2009 - 15:25
Status:needs review» needs work

The last submitted patch failed testing.

#22

stBorchert - January 26, 2009 - 07:36
Status:needs work» needs review

Retest.

AttachmentSizeStatusTest resultOperations
696_access_log_filter_22.patch16.71 KBIdleFailed: 10546 passes, 11 fails, 2 exceptionsView details | Re-test

#23

Dave Reid - January 26, 2009 - 16:20

With a final solution, we'll definitely need tests for this.

#24

stBorchert - January 27, 2009 - 15:33

Ok, tests. Hm, I tried to write some but at some point it always fails. How do I take care of $_SESSION values?
This is my attempt:

  function testStatisticsAdmin() {
    // Create some log entries.
    // access log
    $access_text1 = $this->randomName();
    $access_text2 = $this->randomName();
    db_query("INSERT INTO {accesslog} (title, path, url, hostname, uid, sid, timer, timestamp) values('%s', '%s', '%s', '%s', %d, '%s', %d, %d)", $access_text1, 'node/1', 'http://example.com', '192.168.1.1', '0', '10', '10', REQUEST_TIME);
    db_query("INSERT INTO {accesslog} (title, path, url, hostname, uid, sid, timer, timestamp) values('%s', '%s', '%s', '%s', %d, '%s', %d, %d)", $access_text2, 'node/2', 'http://sub.example.com', '192.168.1.2', '0', '10', '10', REQUEST_TIME);
   
    // Create admin user to filter the logs.
    $admin_user = $this->drupalCreateUser(array('access statistics'));
    $this->drupalLogin($admin_user);
    $this->drupalGet('admin/reports/hits');
    $this->assertText($access_text1, t("Found '%text' on recent hits page", array('%text' => $access_text1)));
    $this->assertText($access_text2, t("Found '%text' on recent hits page", array('%text' => $access_text2)));

    // Filter the access log by page.
    $edit = array();
    $edit['filter'] = 'page';
    $edit['page'] = $access_text1;
   
    $this->drupalPost('admin/reports/hits', $edit, t('Filter'));

    // Check if the correct log items show up.
    $this->assertNoText($access_text1, t("Found '%text' not on filtered by page on recent hits page", array('%text' => $access_text1)));
    $this->assertText($access_text2, t("Found '%text' on filtered by page on recent hits page", array('%text' => $access_text2)));
  }

Unfortunately (hm, maybe not) I found an issue in statistics.admin.inc with an unset session key in function statistics_filter_form() (will fix this with the next patch).

The test above fails on "$this->assertNoText($access_text1, ..." because the filter isn't saved in $_SESSION. Don't know why. Any suggestions on the test or the functionality of the patch?

 Stefan

#25

Dave Reid - January 27, 2009 - 15:50
Status:needs review» needs work

You might want to take a look at how core uses filters like this in lists. I'm pretty sure they do not use $_SESSION.

#26

stBorchert - January 27, 2009 - 20:30

Core uses $_SESSION for filters.
e.g.

function user_filter_form() {
  if (!isset($_SESSION['user_overview_filter'])) {
    drupal_set_session('user_overview_filter', array());
  }
  $session = &$_SESSION['user_overview_filter'];

(I've taken the code from user.module and modified to the needs of statistics.module.)
But I wonder why the test fails. I've done some more debugging: the session is set and statistics_build_filter_query() returns the correct results.
If checked the result list and the specific entry is no returned. But why does $this->assertNoText() for this tring returns FALSE.
I don't get it. :-{

Unfortunately I can't create patches with Option -urp (only -u) but here's a try with a small additional test.

 Stefan

AttachmentSizeStatusTest resultOperations
696_access_log_filter_26.patch20.3 KBIdleFailed: Invalid PHP syntax.View details | Re-test

#27

Dave Reid - January 27, 2009 - 21:31
Title:access log / referrer filter» Add filter to logs and referrer lists

Hmm...my mistake about not using $_SESSION. Have you tried -uRp (uppercase R)?

#28

stBorchert - January 28, 2009 - 07:45
Status:needs work» needs review

Yes. Unfortunately there is no way to tell eclipse or TortoiseCVS to use other options than "-u". And cygwin seems to have some problems on this computer (don't have my laptop at the moment...).
But I created the patch on my test server this morning so here it is. The new test fails at the last but one item, but I have no idea why.
The result list from db doesn't contain the string $this->assertNoText(...) is searching but it always returns false.

Would be great if you could have a look on it.

 Stefan

AttachmentSizeStatusTest resultOperations
696_access_log_filter_28.patch19.24 KBIdleFailed: 9364 passes, 1 fail, 0 exceptionsView details | Re-test

#29

System Message - January 28, 2009 - 08:20
Status:needs review» needs work

The last submitted patch failed testing.

#30

stBorchert - February 18, 2009 - 07:36

If someone could look at the tests and get it working... I didn't find the missing part (or the mistake).

Stefan

#31

stBorchert - March 22, 2009 - 22:22
Status:needs work» needs review
AttachmentSizeStatusTest resultOperations
696_access_log_filter_31.patch19.48 KBIdleFailed: Failed to apply patch.View details | Re-test

#32

System Message - April 2, 2009 - 15:45
Status:needs review» needs work

The last submitted patch failed testing.

#33

stBorchert - April 3, 2009 - 21:27
Status:needs work» needs review

Re-rolled patched against current HEAD.

AttachmentSizeStatusTest resultOperations
696_access_log_filter_33.patch19.44 KBIdleFailed: Failed to apply patch.View details | Re-test

#34

System Message - April 14, 2009 - 05:15
Status:needs review» needs work

The last submitted patch failed testing.

#35

stBorchert - April 18, 2009 - 11:11
Status:needs work» needs review

Re-rolled patch to work with PDO.

AttachmentSizeStatusTest resultOperations
696_access_log_filter_35.patch16.73 KBIdleFailed: Failed to apply patch.View details | Re-test

#36

dereine - April 25, 2009 - 11:37
Status:needs review» needs work
  1. Whitespace in tests, and on certain other places.

    for example

    <?php
    +class StatisticsAdminTestCase extends DrupalWebTestCase {
    +  function
    getInfo() {
    +    return array(
    +     
    'name' => t('Statistics administration'),
    +     
    'description' => t('Test admininstration page functionality.'),
    +     
    'group' => t('Statistics')
    +    );
    +  }
    //here
    ?>

  2. Strange codestyle :), with a single tab

    <?php
    +  return array('where' => $where,
    +              
    'join' => $join,
    +                
    'args' => $args,
    +               );
    ?>

  3. is there a reason why on:
    <?php
    +function theme_statistics_filters(&$form) {
    ?>

    theme('item_list') is not used ?
  4. One short question:
    what happens if +function statistics_filter_form($form_state, $type) { the type is empty?

    Does it redirect on the submit part to "admin/reports/"?

So code needs work, sry :)

#37

stBorchert - April 25, 2009 - 15:55

Whitespace in tests, and on certain other places.

Ah, thanks for the hint. Didn't see this.

Strange codestyle :), with a single tab

Strange, indeed. Anyway; I formatted the array in an other way now.

is there a reason why on theme_statistics_filters theme('item_list') is not used?

Yes. One the one hand it isn't really a list (the <li> is just a strange way to structure the form items) and on the other hand its exactly the way it is done in user.admin.inc (the place I copied the whole idea from) :-).

what happens if statistics_filter_form($form_state, $type) the type is empty?

You'll see an error message complaining about a missing required parameter if you try to open a page where the form is called without that param. The param is mandatory for the form to work because otherwise it doesn't know which filter options to display.

(btw: no need to apologize)

AttachmentSizeStatusTest resultOperations
696_access_log_filter_37.patch16.66 KBIdleFailed: Failed to apply patch.View details | Re-test

#38

stBorchert - April 25, 2009 - 15:56
Status:needs work» needs review

patch needs review of course

#39

Berdir - April 25, 2009 - 21:24
Status:needs review» needs work

+    db_query("INSERT INTO {accesslog} (title, path, url, hostname, uid, sid, timer, timestamp) values('%s', '%s', '%s', '%s', %d, '%s', %d, %d)", $access_text1, $access_text1, 'http://example.com', '192.168.1.1', '0', '10', '10', REQUEST_TIME);
+    db_query("INSERT INTO {accesslog} (title, path, url, hostname, uid, sid, timer, timestamp) values('%s', '%s', '%s', '%s', %d, '%s', %d, %d)", $access_text2, $access_text2, 'http://sub.example.com', '192.168.1.2', '0', '10', '10', REQUEST_TIME);

You should use db_insert() for these, see: http://drupal.org/node/310079

+    $condition = db_and();
+    foreach ($filter['where'] as $key => $where) {
+      $condition = $condition->condition($where['field'], $filter['args'][$key], $where['operator']);
+    }
+    $query->condition($condition);

I'm not sure about this... can't we add the conditions directly to $query? it will use AND to add them, so that should imho work. So this is imho just unecessary complex

#40

stBorchert - April 25, 2009 - 22:39
Status:needs work» needs review

Updated patch with db_insert() and replaced $condition = ... with $query->condition($where['field'], $filter['args'][$key], $where['operator']);.
Didn't thought so but that worked and looks quite simple now.

AttachmentSizeStatusTest resultOperations
696_access_log_filter_40.patch16.71 KBIdleFailed: Failed to apply patch.View details | Re-test

#41

System Message - April 27, 2009 - 16:56
Status:needs review» needs work

The last submitted patch failed testing.

#42

dereine - April 27, 2009 - 18:15

reroled to changes the theme('pager'), cannot find the issue at the moment.

AttachmentSizeStatusTest resultOperations
696_access_log_filter_41.patch17.67 KBIdleFailed: 11248 passes, 5 fails, 0 exceptionsView details | Re-test

#43

dereine - April 27, 2009 - 18:16
Status:needs work» needs review

set to needs review

#44

System Message - May 13, 2009 - 07:55
Status:needs review» needs work

The last submitted patch failed testing.

#45

Roger Saner - May 14, 2009 - 23:04

dereine: bet you're mystified why your patch failed testing! It's because of a core bug: any form element called 'content' breaks drupal (and the statistics module has a few of those - it's the only code in core which does, which is maybe why this hasn't been picked up before). I've submitted a patch for that here - once it's committed your patch should pass testing.

#46

Dave Reid - January 21, 2010 - 19:29
Version:7.x-dev» 8.x-dev
 
 

Drupal is a registered trademark of Dries Buytaert.