Filtering Read Count stats by Agent?

garym@teledyn.com - January 9, 2005 - 02:09
Project:Drupal
Version:7.x-dev
Component:other
Category:feature request
Priority:normal
Assigned:Unassigned
Status:active
Description

I've noticed for some time that my sidebar Popular Content is largely noise, and sifting my referrer log, it's easy to see why: These aren't nodes being read, they are primarily nodes being indexed (or referrer-spammed).

There was some talk (see access log / referrer filter) to allow filtering our own IP's from the logs, which is still useful to exclude not only self-references but also testing from the dev sites, but this current issue is different: I exclude referrer spammers directly in the Apache .htaccess "deny from" rules, but what I want here is the means to exclude spiders and other bots from polluting the Popular Content counts -- if I want hard hit-counts, I can still refer to my webserver logs, so it's no great loss if this also means losing all Stats-log page-counts from spiders and bots, and removing those referrers would greatly improve the meaning of the Today's and especially the Last Viewed sidebar.

One potential problem: this may be a fairly large list of exemptions as most every webcrawler has it's own unique Agent string; the filter would need to be in one or more regex since a list of string exemptions is probably impractical -- since most non-MSIE identify themselves as "Mozilla compatible" (or something like that) it may also be easier to specify a positive matching regex than to even attempt any meaningful exclusion rule.

#1

forngren - August 7, 2006 - 18:40
Version:4.5.0» 4.7.3

#2

magico - August 30, 2006 - 15:05
Version:4.7.3» x.y.z

#3

LAsan - April 7, 2008 - 09:04
Version:x.y.z» 7.x-dev

Feature request go to cvs.

 
 

Drupal is a registered trademark of Dries Buytaert.