Need to add clicktale

### 
### Allow some known php files (like serve.php in the ad module).
###
location ~* /(?:modules|libraries)/(?:contrib/)?(?:ad|clicktale|tinybrowser|f?ckeditor|tinymce|wysiwyg_spellcheck|ecc|civicrm|fbconnect|radioactivity)/.*\.php$
  tcp_nopush   off;
  keepalive_requests 0;
  access_log   off;
  if ($is_bot) {
    return 403;
  }
  try_files    $uri =404;
  fastcgi_pass 127.0.0.1:9000;
}

Comments

realityloop’s picture

Issue summary: View changes

edit

realityloop’s picture

Title: Add clicktale to allowed php files » Add clicktale to allowed php files, and allow 'ClickTale bot' user agent to access server

even after adding the allowed php the clicktale bot get 403 Forbidden when trying to access the site.

Verified by using Firefox User Agent Switcher

omega8cc’s picture

So the problem is that by default we deny access for any known bots to any .php URLs, since they skip caching and are vulnerable for even unintended DoS attempts.

One possible workaround would be to add clicktale as another allowed exception in this location and then modify this service user agent to exclude "bot", but I have no idea if this is possible or configurable, like it is at Pingdom, for example. Could you confirm that? If not, then it could further complicate things, because we would need another map in the main Nginx config to whitelist this bot somehow, but I don't like this approach, so not sure what to do.

Note however, that you could use custom location to override it completely: http://drupalcode.org/project/barracuda.git/blob/HEAD:/docs/HINTS.txt#l16

By using two-level locations matching you could easily stop requests to this file in your custom location, where you can simply use the copy of your modified location posted above, minus that if ($is_bot) {} check.

realityloop’s picture

The bot needs to access all front facing sites, so commenting out the deny only in that section doesn't work.

Commenting out the following section aloows me to access the site with the user agent set to ClickeTale bot

###
### Deny crawlers.
###
#if ($is_crawler) {
#  return 403;
#}

note, this server isn't running head which has the override support.

omega8cc’s picture

Category: feature » bug
Status: Active » Needs work

Uh, that is because we use probably too generic "Click" matching in the map below.

###
### Deny crawlers.
###
map $http_user_agent $is_crawler {
  default  '';
  ~*HTTrack|BrokenLinkCheck|2009042316.*Firefox.*3\.0\.10|MJ12|HTMLParser|libwww|PECL|Automatic|Click|SiteBot|BuzzTrack|Sistrix|Offline|Screaming|Nutch|Mireo|SWEB|Morfeus|GSLFbot  is_crawler;
}

This is too broad then.

omega8cc’s picture

Is it possible to modify/configure this bot Agent identity? Note that Click is not used by any known real user agent, and is used mostly by various aggressive spam bots. Some lists of known UA identities: http://www.useragentstring.com/pages/useragentstring.php or http://www.user-agents.org

omega8cc’s picture

Category: bug » feature
Status: Needs work » Active
omega8cc’s picture

Component: Code » Nginx Server
Status: Active » Closed (duplicate)

Adding exceptions never ends. I guess we would need more customizable Nginx configuration in general. There are plans to address this on the Aegir level directly - see for reference: http://drupal.org/node/1635596#comment-6764146

omega8cc’s picture

Issue summary: View changes

edit