Add clicktale to allowed php files, and allow 'ClickTale bot' user agent to access server [#1865500]

Need to add clicktale

### 
### Allow some known php files (like serve.php in the ad module).
###
location ~* /(?:modules|libraries)/(?:contrib/)?(?:ad|clicktale|tinybrowser|f?ckeditor|tinymce|wysiwyg_spellcheck|ecc|civicrm|fbconnect|radioactivity)/.*\.php$
  tcp_nopush   off;
  keepalive_requests 0;
  access_log   off;
  if ($is_bot) {
    return 403;
  }
  try_files    $uri =404;
  fastcgi_pass 127.0.0.1:9000;
}

Comments

Comment #0.0

realityloop commented 13 December 2012 at 10:14

Issue summary:

View changes

edit

Comment #1

realityloop commented 13 December 2012 at 10:47

Title:

Add clicktale to allowed php files

» Add clicktale to allowed php files, and allow 'ClickTale bot' user agent to access server

even after adding the allowed php the clicktale bot get 403 Forbidden when trying to access the site.

Verified by using Firefox User Agent Switcher

Comment #2

omega8cc commented 19 December 2012 at 01:07

So the problem is that by default we deny access for any known bots to any .php URLs, since they skip caching and are vulnerable for even unintended DoS attempts.

One possible workaround would be to add clicktale as another allowed exception in this location and then modify this service user agent to exclude "bot", but I have no idea if this is possible or configurable, like it is at Pingdom, for example. Could you confirm that? If not, then it could further complicate things, because we would need another map in the main Nginx config to whitelist this bot somehow, but I don't like this approach, so not sure what to do.

Note however, that you could use custom location to override it completely: http://drupalcode.org/project/barracuda.git/blob/HEAD:/docs/HINTS.txt#l16

By using two-level locations matching you could easily stop requests to this file in your custom location, where you can simply use the copy of your modified location posted above, minus that if ($is_bot) {} check.

Comment #3

realityloop commented 19 December 2012 at 01:19

The bot needs to access all front facing sites, so commenting out the deny only in that section doesn't work.

Commenting out the following section aloows me to access the site with the user agent set to ClickeTale bot

###
### Deny crawlers.
###
#if ($is_crawler) {
#  return 403;
#}

note, this server isn't running head which has the override support.

Comment #4

omega8cc commented 19 December 2012 at 01:25

Category:	feature	» bug
Status:	Active	» Needs work

Uh, that is because we use probably too generic "Click" matching in the map below.

###
### Deny crawlers.
###
map $http_user_agent $is_crawler {
  default  '';
  ~*HTTrack|BrokenLinkCheck|2009042316.*Firefox.*3\.0\.10|MJ12|HTMLParser|libwww|PECL|Automatic|Click|SiteBot|BuzzTrack|Sistrix|Offline|Screaming|Nutch|Mireo|SWEB|Morfeus|GSLFbot  is_crawler;
}

This is too broad then.

Comment #5

omega8cc commented 19 December 2012 at 01:33

Is it possible to modify/configure this bot Agent identity? Note that Click is not used by any known real user agent, and is used mostly by various aggressive spam bots. Some lists of known UA identities: http://www.useragentstring.com/pages/useragentstring.php or http://www.user-agents.org

Comment #6

omega8cc commented 10 January 2013 at 19:38

Category:	bug	» feature
Status:	Needs work	» Active

Comment #7

omega8cc commented 10 March 2013 at 19:19

Component:	Code	» Nginx Server
Status:	Active	» Closed (duplicate)

Adding exceptions never ends. I guess we would need more customizable Nginx configuration in general. There are plans to address this on the Aegir level directly - see for reference: http://drupal.org/node/1635596#comment-6764146

Comment #7.0