I saw 50 Guest online on my page. A look in the statistics shows some googlebots visiting my page (IPs: 64.86.80.X) at once.

I'm proud of getting indexed by google but we may filter some Searchmaschine bots and prevent them for count as visitors.

Comments

ax’s picture

dup of [Allow excluding certain hosts (myself,cron) from access logs | http://drupal.org/node/view/696]. please discuss this issue there. feel free to edit (the title) to better reflect the wanted feature.

ax’s picture

marking this ACTIVE again after feedback from jeremy (author of statistics.module):

For what it's worth, this is not a duplicate but a different feature
request.  The earlier request is regarding access logs.  This request is
regarding the Who's Online block, which no longer uses the access logs.
Adding a fix for the old request will not address this new one (and
wouldn't have even if the Who's online box still used the access logs)

Regardless, I still think the new request is a bad idea.
ax’s picture

another (earlier) opinion from jeremy (would be nice if you could reply via web to keep things together):

-1

Three reasons:

  1) It would be a major waste of time to try and maintain a search
engine blacklist.

  2) As far as Drupal is concerned, there really are 50 guests online at
that time...  That they're not human is a technicality, the fact of the
matter is that they are consuming site resources.

  3) You can configure the block to show active users over the past n
minutes.  Set n to something low like 5 minutes and you'll only see
activity for the past 5 minutes...  thereby minimizing any negative
affects you may perceive in the numbers reflected in the who's online box.
ax’s picture

and another opinion from Chris Johnson, copied from the "Allow excluding certain hosts (myself,cron) from access logs" feature request:

Excluding certain hosts (cron, myself, search bots) from access logs is a slightly 
different problem than excluding them from the "who's online" listing. Although 
currently the "who's online" block is provided by the statistics module, it really 
does not need any of the logging or any other part of the statistics module to 
function. It works using the core required sessions table.

I suggest removing the "who's online" block from statistics.module, and then adding 
the capability to filter out a small list of hosts from the statistics access logs.

now who takes this? ;)

jeremy’s picture

Component: statistics.module » user.module

The "Who's online" block is no longer part of the statistics.module.

Reassigned to the 'user.module' which now generates the "Who's online" block.

ax’s picture

Title: filter searchengine-Bots in statistics / 'who is online' » filter searchengine-bots / cron.php / ... in 'who is online'
ceti’s picture

Version: » 4.6.1

Ah, so that's what that surge of users is. And I was beginning to worry that a full time hackjob was happening! This spidering ends up in the stats as well, so it does make it hard to tell what's going on with Anonymous users though. I guess there's no way of filtering this out by IP, but knowing the IP of incoming searches from google or other engines would be nice. Anyone have a list?

magico’s picture

Title: filter searchengine-bots / cron.php / ... in 'who is online' » Do not count searchengine-bots in the 'who is online' block
Version: 4.6.1 » x.y.z

I don't think that maintaining a list of IP's will be a solution.
Instead if we could detect the "User-agent" when the session begins and then mark that as a "spider", we can then apply a filter.

RobRoy’s picture

Version: x.y.z » 6.x-dev
Status: Active » Closed (duplicate)