Problem/Motivation

Add basic host and referrer filtering to statistics.module to allow site admins to better monitor site traffic without the noise created by their own activity.

Comment #14 by @MikeRyan
After contributing a statistics_filter module he adds "This functionality would better live in statistics.module (particularly the filtering, which works by undoing the logging already done by statistics.module, more-or-less doubling the overhead per page)."

Comment #19 by @Sun
After a query about whether the issue is still valid he says "Core should provide the least minimum: query_alter via DBTNG, and (optionally) also allow modules to alter the result set/output additionally via drupal_alter()."

Comment #40 by @StBorchert
Submits a patch which is then rerolled by Dereine in #42 but still fails testing. According to @Roger Saner in #45 this is due to this bug #463064: 'content' cannot be used as a $form element and breaks any page which uses it, such as admin/settings/statistics the patch for which @Moshe rejects, suggesting the problem is fixed more elegantly in #428744: Make the main page content a real block.

In Comment #50 @kattekrab posted the following summary:

Isn't statistics.module a candidate for removal from D8 ?
Should this be marked closed won't fix?
Summary:

  • 2002 - Issue created
  • 2005 - Contrib module suggested as solution #12
  • 2009 - Patch submitted, consistently fails tests #20
  • 2009 - Claim made patch failed testing due to a core bug in #45
  • 2010 - Moved to D8-dev in Jan 2010 #46

To which @Sun replied in Comment #51
"As long as it's in core, it's in core. No need to preemptively block issues on a hypothetical removal."

Proposed resolution

Debate continues about the "best" solution.

  1. No solution required. Solved by contributed modules.
  2. statistics.module should be improved, simplified, made more flexible
  3. statistics.module should be removed entirely

Remaining tasks

  1. Reroll/Rewrite submitted patch for GIT/DBTNG/D8 - 696_access_log_filter_41.patch
  2. Resolve META debate about removing or improving statistics.module in Drupal8
  3. Review existing contrib modules

Remove Statistics module from D8 core
#1018716: Remove Statistics module from D8 core

Improve Statistics module for D8?
#1209532: Count node views via AJAX in the statistics module

Possible duplicates and related issues

#255769: statistics module needs basic log rules
Contains a patch that allows users to exclude certain traffic from statistics.

  1. system administrator (UID1)
  2. site administrators
  3. traffic that originates from the same IP as the site itself
  4. users
  5. IP addresses
  6. user agents

#345133: 403s and 404s are counted in {node_counter}
Contains a patch to handle 403 & 404 errors, but needs to be rerolled and requires tests. Got bumped to D8 in 5Feb2011

#90468: Only record unique hits in node counter stats
Contains a patch, but needs work.

#1277710: Radioactivity for Drupal 8 core.
Rather than improve or rewrite statistics.module - look at replacing it with radioactivity.

Related contrib modules

Statistics Filtering
http://drupal.org/project/statistics_filter

Statistics Advanced
http://drupal.org/project/statistics_advanced

Statistics Pro
http://drupal.org/project/statspro

Original report by ax

it would be nice if statistics.module would have an option to exclude certain hosts from the access logs. because right now, the log of my personal blog gets filled mainly with my own ip and the ip of the machine thats calling my cron.php :( would be handy to exclude bots and CodeRed'ded web servers as well.

same (allowing excluding certain urls) would be nice for "view referrers".

Files: 
CommentFileSizeAuthor
#63 vdc-dblog-view-696-63.patch16.61 KBPsikik
#61 vdc-dblog-view-696-61.patch0 bytesPsikik
#42 696_access_log_filter_41.patch17.67 KBdawehner
FAILED: [[SimpleTest]]: [MySQL] Unable to apply patch 696_access_log_filter_41.patch.
[ View ]
#40 696_access_log_filter_40.patch16.71 KBstBorchert
Failed: Failed to apply patch.
[ View ]
#37 696_access_log_filter_37.patch16.66 KBstBorchert
Failed: Failed to apply patch.
[ View ]
#35 696_access_log_filter_35.patch16.73 KBstBorchert
Failed: Failed to apply patch.
[ View ]
#33 696_access_log_filter_33.patch19.44 KBstBorchert
Failed: Failed to apply patch.
[ View ]
#31 696_access_log_filter_31.patch19.48 KBstBorchert
Failed: Failed to apply patch.
[ View ]
#28 696_access_log_filter_28.patch19.24 KBstBorchert
Failed: 9364 passes, 1 fail, 0 exceptions
[ View ]
#26 696_access_log_filter_26.patch20.3 KBstBorchert
Failed: Invalid PHP syntax.
[ View ]
#22 696_access_log_filter_22.patch16.71 KBstBorchert
Failed: 10546 passes, 11 fails, 2 exceptions
[ View ]
#20 696_access_log_filter_20_preview.png21.63 KBstBorchert
#20 696_access_log_filter_20.patch16.71 KBstBorchert
Failed: 9253 passes, 3 fails, 3 exceptions
[ View ]
#12 statistics_filter.module3.94 KBmikeryan
#9 statistics_exclude_hosts.patch5.14 KBBèr Kessels
FAILED: [[SimpleTest]]: [MySQL] Unable to apply patch statistics_exclude_hosts.patch. This may be a -p0 (old style) patch, which is no longer supported by the testbots.
[ View ]

Comments

i also would like to stress that this holds true for the reffer logging, i would like to be able to exclude my own host (has multiple fqdn's) from the refferre logging

Priority:Major» Normal

first implementation (by jeremy): http://lists.drupal.org/pipermail/drupal-devel/2002-November/020114.html (I added a new configuration option: "Filter current host". If this option is enabled, your current IP will be filtered when viewing the access log.)

second implementation (by me): http://lists.drupal.org/pipermail/drupal-devel/2003-January/020956.html (a patch that adds a "Filter a list of IPs (regular expressions) from being shown in the access log" feature to jeremys patch.)

now this just needs to go into cvs ...

I was too impatient to wait on someone else to fix this so I've updated these patches mentioned above and created a diff for the current cvs. However I don't know how to submit it, so here is a link to it if someone wants to include it....

http://www.runlevelzero.com/statistics.diff

It works great, although an extension to it could perhaps be to also filter the top posts, but I haven't looked into if it's possible or not yet.

Title:allow excluding certain hosts (myself, cron) from access logs[PATCH] Re: [drupal-devel] feature #696 : allow excluding certain hosts (myself,cron) from access logs

On Wednesday, June 11, 2003 8:09 PM [GMT+1=CET],
rikard wrote:

> I was too impatient to wait on someone else to fix this so I've
> updated these patches mentioned above and created a diff for the
> current cvs. However I don't know how to submit it, so here is a link
> to it if someone wants to include it....

just send your patches to the devel mailing list (see
http://drupal.org/title/creating+and+sending+your+patches ). i am doing
this for you now ...

... because i really hope this patch finally makes it into cvs. there
seems to be quite some demand for this feature - there have even been 2
more feature requests (i had to mark DIPLICATE) ...

tested twice.

> It works great, although an extension to it could perhaps be to also
> filter the top posts, but I haven't looked into if it's possible or
> not yet.

feel free to do another patch ;)

Title:[PATCH] Re: [drupal-devel] feature #696 : allow excluding certain hosts (myself,cron) from access logsallow excluding certain hosts (myself,cron) from access logs

Excluding certain hosts (cron, myself, search bots) from access logs is a slightly different problem than excluding them from the "who's online" listing. Although currently the "who's online" block is provided by the statistics module, it really does not need any of the logging or any other part of the statistics module to function. It works using the core required sessions table.

I suggest removing the "who's online" block from statistics.module, and then adding the capability to filter out a small list of hosts from the statistics access logs.

Title:allow excluding certain hosts (myself,cron) from access logsaccess log / referrer filter

changed title to reflect better what this feature is about

It looks like this feature hibernated. I'd like to wake it up.

@ rikard (cool homepage btw!), the patch you provide on your site (
http://www.runlevelzero.com/statistics.diff) cannot be found anymore, it seems.

Would you be so kind to ditch it up and re-send it? You can now attach it to the comment in this thread.

And for the rest: What approach is best:

1) Do not log the list of hosts. (filter before insert)
2) Before presentation. (Filter before output)

1) has the downfall that the content is not available at all.
2) 's downside is that you are saving content that you don't want to see anyway.

Regards, Bèr

StatusFileSize
new5.14 KB
FAILED: [[SimpleTest]]: [MySQL] Unable to apply patch statistics_exclude_hosts.patch. This may be a -p0 (old style) patch, which is no longer supported by the testbots.
[ View ]

I went for 1) because it's the easiest to implement :).

The patch was made for 4.3.1 I'll see if i find some time to port it to 4.4 (CVS, that is). If there's someone out there with no todo list, please feel free to take this.

Note that i'm still not very happy about this. It creates a lot of extra overhead. So i might still get to it, and mock up a patch that does some filtering before output. After all: now each pagehit requires some logic to be run. And in the case of 2)this logick is only required when you view the stats. This means far less overhead

But then again: the node_counter will have to have this too, so after all is said and done: I dunno.

woops *blush*

please only use 'patch' status when you have a patch against HEAD Drupal that is ready for CVS review team ... the submitted patch meets neither of these requirements.

Title:access log / referrer filterModule to do this
StatusFileSize
new3.94 KB

Here's a module that lets you:

1. Ignore hits from the builtin administrative user (uid == 1)
2. Ignore hits from users in any roles you choose
3. Ignore hits from crawlers

I don't think this would make an appropriate contributed module, because:

1. It basically takes (at least) twice the time an integrated statistics.module implementation would (it undoes what statistics.module does, after the fact)
2. It specifically undoes what the 4.5.1 statistics.module does, thus cannot be assumed to work on later versions without specific review of any statistics.module code changes
3. For crawler ignoring, it requires configuration of PHP's browscap configuration setting (disabled by default) and installlation in an appropriate place of browscap.ini, so is challenging to configure for non-technical users.

But, maybe some people will find it useful for now, and maybe it can help inspire incorporation of this functionality into the core statistics.module someday...

Title:Module to do thisaccess log / referrer filter

Changing title back because the new one is not descriptive.

I have now contributed the statistics_filter module. It can be used to filter out hits from the default administrative user, any specified user roles, and crawlers (as identified by browscap.ini). I've also added stats on user agents (also based on browscap.ini) - i.e., you can see that you've had 509 hits from IE 6.0, 2198 hits from Google, etc.

This functionality would better live in statistics.module (particularly the filtering, which works by undoing the logging already done by statistics.module, more-or-less doubling the overhead per page). This module can serve as a testbed for this functionality, as well as a temporary solution - once the functionality and implementation seem stable, I intend on submitting a patch to bring these features into the core statistics.module.

I do not like the implementation of this much.

PLease consider updating xstatistics.module to:
* extend satistics.module,
* *not* remove the data from the database, by default, but just make it not show the data from spieders etc. IMO its bad to first log something, and then remove it. Feels too hackish to me.
* and have it in a central place. One statistics.module for normal use in Drupal, and one advanced contrib module to do all the cool stats magic.

You can also think of a way to add a hook to statiscs.module, so that it allows you to modify the logs, before they are inserted. Then you can make it so, that the robots, adminsor even a configurable list of IPs are not logged.

I would like that more, for, again, your idea is very nice, but the UPDATE appears a bit hackish to me.

Version:» x.y.z

Version:x.y.z» 7.x-dev

This feature request is it still valid ?

Core should provide the least minimum: query_alter via DBTNG, and (optionally) also allow modules to alter the result set/output additionally via drupal_alter().

Status:Active» Needs review
StatusFileSize
new16.71 KB
Failed: 9253 passes, 3 fails, 3 exceptions
[ View ]
new21.63 KB

Here is another approach: why not add a filter form to statistics pages (as on admin/user/user)?
Optionally this filter can be saved (not done yet) so it is not session-related.

Stefan

Status:Needs review» Needs work

The last submitted patch failed testing.

Status:Needs work» Needs review
StatusFileSize
new16.71 KB
Failed: 10546 passes, 11 fails, 2 exceptions
[ View ]

Retest.

Issue tags:+Needs tests

With a final solution, we'll definitely need tests for this.

Ok, tests. Hm, I tried to write some but at some point it always fails. How do I take care of $_SESSION values?
This is my attempt:

  function testStatisticsAdmin() {
    // Create some log entries.
    // access log
    $access_text1 = $this->randomName();
    $access_text2 = $this->randomName();
    db_query("INSERT INTO {accesslog} (title, path, url, hostname, uid, sid, timer, timestamp) values('%s', '%s', '%s', '%s', %d, '%s', %d, %d)", $access_text1, 'node/1', 'http://example.com', '192.168.1.1', '0', '10', '10', REQUEST_TIME);
    db_query("INSERT INTO {accesslog} (title, path, url, hostname, uid, sid, timer, timestamp) values('%s', '%s', '%s', '%s', %d, '%s', %d, %d)", $access_text2, 'node/2', 'http://sub.example.com', '192.168.1.2', '0', '10', '10', REQUEST_TIME);
    // Create admin user to filter the logs.
    $admin_user = $this->drupalCreateUser(array('access statistics'));
    $this->drupalLogin($admin_user);
    $this->drupalGet('admin/reports/hits');
    $this->assertText($access_text1, t("Found '%text' on recent hits page", array('%text' => $access_text1)));
    $this->assertText($access_text2, t("Found '%text' on recent hits page", array('%text' => $access_text2)));
    // Filter the access log by page.
    $edit = array();
    $edit['filter'] = 'page';
    $edit['page'] = $access_text1;
    $this->drupalPost('admin/reports/hits', $edit, t('Filter'));
    // Check if the correct log items show up.
    $this->assertNoText($access_text1, t("Found '%text' not on filtered by page on recent hits page", array('%text' => $access_text1)));
    $this->assertText($access_text2, t("Found '%text' on filtered by page on recent hits page", array('%text' => $access_text2)));
  }

Unfortunately (hm, maybe not) I found an issue in statistics.admin.inc with an unset session key in function statistics_filter_form() (will fix this with the next patch).

The test above fails on "$this->assertNoText($access_text1, ..." because the filter isn't saved in $_SESSION. Don't know why. Any suggestions on the test or the functionality of the patch?

 Stefan

Status:Needs review» Needs work

You might want to take a look at how core uses filters like this in lists. I'm pretty sure they do not use $_SESSION.

StatusFileSize
new20.3 KB
Failed: Invalid PHP syntax.
[ View ]

Core uses $_SESSION for filters.
e.g.

function user_filter_form() {
  if (!isset($_SESSION['user_overview_filter'])) {
    drupal_set_session('user_overview_filter', array());
  }
  $session = &$_SESSION['user_overview_filter'];

(I've taken the code from user.module and modified to the needs of statistics.module.)
But I wonder why the test fails. I've done some more debugging: the session is set and statistics_build_filter_query() returns the correct results.
If checked the result list and the specific entry is no returned. But why does $this->assertNoText() for this tring returns FALSE.
I don't get it. :-{

Unfortunately I can't create patches with Option -urp (only -u) but here's a try with a small additional test.

 Stefan

Title:access log / referrer filterAdd filter to logs and referrer lists

Hmm...my mistake about not using $_SESSION. Have you tried -uRp (uppercase R)?

Status:Needs work» Needs review
StatusFileSize
new19.24 KB
Failed: 9364 passes, 1 fail, 0 exceptions
[ View ]

Yes. Unfortunately there is no way to tell eclipse or TortoiseCVS to use other options than "-u". And cygwin seems to have some problems on this computer (don't have my laptop at the moment...).
But I created the patch on my test server this morning so here it is. The new test fails at the last but one item, but I have no idea why.
The result list from db doesn't contain the string $this->assertNoText(...) is searching but it always returns false.

Would be great if you could have a look on it.

 Stefan

Status:Needs review» Needs work

The last submitted patch failed testing.

If someone could look at the tests and get it working... I didn't find the missing part (or the mistake).

Stefan

Status:Needs work» Needs review
StatusFileSize
new19.48 KB
Failed: Failed to apply patch.
[ View ]

Status:Needs review» Needs work

The last submitted patch failed testing.

Status:Needs work» Needs review
StatusFileSize
new19.44 KB
Failed: Failed to apply patch.
[ View ]

Re-rolled patched against current HEAD.

Status:Needs review» Needs work

The last submitted patch failed testing.

Status:Needs work» Needs review
StatusFileSize
new16.73 KB
Failed: Failed to apply patch.
[ View ]

Re-rolled patch to work with PDO.

Status:Needs review» Needs work
  1. Whitespace in tests, and on certain other places.

    for example

    <?php
    +class StatisticsAdminTestCase extends DrupalWebTestCase {
    +  function
    getInfo() {
    +    return array(
    +     
    'name' => t('Statistics administration'),
    +     
    'description' => t('Test admininstration page functionality.'),
    +     
    'group' => t('Statistics')
    +    );
    +  }
    //here
    ?>
  2. Strange codestyle :), with a single tab

    <?php
    +  return array('where' => $where,
    +              
    'join' => $join,
    +                
    'args' => $args,
    +               );
    ?>
  3. is there a reason why on:
    <?php
    +function theme_statistics_filters(&$form) {
    ?>

    theme('item_list') is not used ?
  4. One short question:
    what happens if +function statistics_filter_form($form_state, $type) { the type is empty?

    Does it redirect on the submit part to "admin/reports/"?

So code needs work, sry :)

StatusFileSize
new16.66 KB
Failed: Failed to apply patch.
[ View ]

Whitespace in tests, and on certain other places.

Ah, thanks for the hint. Didn't see this.

Strange codestyle :), with a single tab

Strange, indeed. Anyway; I formatted the array in an other way now.

is there a reason why on theme_statistics_filters theme('item_list') is not used?

Yes. One the one hand it isn't really a list (the <li> is just a strange way to structure the form items) and on the other hand its exactly the way it is done in user.admin.inc (the place I copied the whole idea from) :-).

what happens if statistics_filter_form($form_state, $type) the type is empty?

You'll see an error message complaining about a missing required parameter if you try to open a page where the form is called without that param. The param is mandatory for the form to work because otherwise it doesn't know which filter options to display.

(btw: no need to apologize)

Status:Needs work» Needs review

patch needs review of course

Status:Needs review» Needs work

+    db_query("INSERT INTO {accesslog} (title, path, url, hostname, uid, sid, timer, timestamp) values('%s', '%s', '%s', '%s', %d, '%s', %d, %d)", $access_text1, $access_text1, 'http://example.com', '192.168.1.1', '0', '10', '10', REQUEST_TIME);
+    db_query("INSERT INTO {accesslog} (title, path, url, hostname, uid, sid, timer, timestamp) values('%s', '%s', '%s', '%s', %d, '%s', %d, %d)", $access_text2, $access_text2, 'http://sub.example.com', '192.168.1.2', '0', '10', '10', REQUEST_TIME);

You should use db_insert() for these, see: http://drupal.org/node/310079

+    $condition = db_and();
+    foreach ($filter['where'] as $key => $where) {
+      $condition = $condition->condition($where['field'], $filter['args'][$key], $where['operator']);
+    }
+    $query->condition($condition);

I'm not sure about this... can't we add the conditions directly to $query? it will use AND to add them, so that should imho work. So this is imho just unecessary complex

Status:Needs work» Needs review
StatusFileSize
new16.71 KB
Failed: Failed to apply patch.
[ View ]

Updated patch with db_insert() and replaced $condition = ... with $query->condition($where['field'], $filter['args'][$key], $where['operator']);.
Didn't thought so but that worked and looks quite simple now.

Status:Needs review» Needs work

The last submitted patch failed testing.

StatusFileSize
new17.67 KB
FAILED: [[SimpleTest]]: [MySQL] Unable to apply patch 696_access_log_filter_41.patch.
[ View ]

reroled to changes the theme('pager'), cannot find the issue at the moment.

Status:Needs work» Needs review

set to needs review

Status:Needs review» Needs work

The last submitted patch failed testing.

dereine: bet you're mystified why your patch failed testing! It's because of a core bug: any form element called 'content' breaks drupal (and the statistics module has a few of those - it's the only code in core which does, which is maybe why this hasn't been picked up before). I've submitted a patch for that here - once it's committed your patch should pass testing.

Version:7.x-dev» 8.x-dev

Status:Needs work» Needs review

#42: 696_access_log_filter_41.patch queued for re-testing.

subscribe

Status:Needs review» Needs work

The last submitted patch, 696_access_log_filter_41.patch, failed testing.

Status:Needs work» Postponed (maintainer needs more info)

Isn't statistics.module a candidate for removal from D8 ?
Should this be marked closed won't fix?

Summary:

  • 2002 - Issue created
  • 2005 - Contrib module suggested as solution #12
  • 2009 - Patch submitted, consistently fails tests #20
  • 2009 - Claim made patch failed testing due to a core bug in #45
  • 2010 - Moved to D8-dev in Jan 2010 #46

Status:Postponed (maintainer needs more info)» Active

As long as it's in core, it's in core. No need to preemptively block issues on a hypothetical removal.

So is anyone still around to re-roll this patch?

The core bug blocker mentioned in #45 has been marked closed (duplicate) of #428744: Make the main page content a real block which is closed and fixed. Yay!

Or is the contrib solution mentioned in #12 the way to go here?

This old old old feature request from 2002 relates to the following active discussions:

#1018716: Remove Statistics module from D8 core

#1209532: Count node views via AJAX in the statistics module

#1255674: [meta] Make core maintainable

This is node 696!!! Can we deal with this in one way or another?

I posted a summary back at #50

Status:Active» Needs review

Issue summary:View changes

Posted an Issue Summary

Issue summary:View changes

added user ID url to [ax]

Issue summary:View changes

Updated issue summary.

Issue summary:View changes

Updated issue summary.

Issue summary:View changes

Updated issue summary.

Issue summary:View changes

Updated issue summary.

Status:Needs review» Needs work

Issue Summary has been updated - Patch at #42 needs to re-rolled for GIT / DBNTG / D8

Issue summary:View changes

Updated issue summary.

Issue summary:View changes

Adding possible dupes and related issues, plus links to patches in those issues.

Issue summary:View changes

Added emphasis.

Issue summary:View changes

Updated issue summary. - fixed typos

Issue summary:View changes

Updated issue summary. Added another related issue 1277710

Suppose this could be postponed to wait VDC to land

Status:Needs work» Postponed
Issue tags:+VDC

Status:Postponed» Active

it's in now. feature request and all, D9?

Issue summary:View changes

m

Assigned:Unassigned» Psikik

Grabbing this to convert in tandem with #1874534: Introduce a dblog / watchdog views integration

StatusFileSize
new0 bytes

Here's a first stab at the new view. I'll continue working on this.

Looks like the patch in #61 is empty -- @Psikik, can you re-roll the patch?

StatusFileSize
new16.61 KB

Here's the patch with the view. You'll need to apply the patch from 17 in #1874534: Introduce a dblog / watchdog views integration also.

Assigned:Psikik» Unassigned

Not sure the patch in #63, which focuses on adding Views integration to dblog, matches the issue description, which is about adding filtering capabilities to accesslog, which no longer exists: #1446956: Remove the accesslog from statistics

As such, would love to close this out or move back to 7.x.

@xjm et al., any qualms?

Agreed this needs to be done separately from #1874534: Introduce a dblog / watchdog views integration. This issue doesn't even apply to D8 since the accesslog is no more.

Version:8.x-dev» 7.x-dev
Issue tags:-VDC

Oh. Damn. I totally misunderstood this issue, thought it was about dblog not accesslog. My apologies @Psikik, let's just combine this into the other issue.

Okay, I'll merge the view into the dblog issue. Sorry for the confusion guys.

No problem, thanks for helping with #1874534: Introduce a dblog / watchdog views integration! It's greatly appreciated!

Sooo... no longer needed for D8 and moved back to D7...

Is it likely this will happen for D7 or is it now time, to close out the oldest feature request in the queue?

Is the approach back at #42 even still valid?

Probably needs an issue summary update now too.

Issue summary:View changes

add link to meta

Issue summary:View changes
Status:Active» Closed (won't fix)

Yeah, this is best addressed with contrib in D7.