Critical: Robots.txt On Multi Language Drupal CMS!!
This really needed fixing and attention! Your original robots.txt file with drupal shields certain paths from search engines such as /admin, etc. BUT if you installed a multi lingo system in your drupal CMS, then all paths are starting with /en /es fr!!! This cases the robot being able to enter those forbitten paths and even DIRECTORIES such as files since in multi-lingo the file path is also /fr/files/images/ etc
The translation module programmers should add automatic adaption of the robots file... Yet for one of my sites it is already too late :( These paths are all now googled .... grgrgr!
How to fix it Manually:
Well, copy all the sections but the files section like cron.php (domain/en/cron.php does not work nor exists, that is why..) to duplicate for each added language
Node with NO "/" at the end since it is the node and not the /node/2 path as /node/ would block to much!
CA is for catalan: Check /ca to be with clean urls and with the unclean url /?q=ca/ instead
Disallow: /ca/database/
Disallow: /ca/includes/
Disallow: /ca/misc/
Disallow: /ca/modules/
Disallow: /ca/sites/
Disallow: /ca/themes/
Disallow: /ca/scripts/
Disallow: /ca/updates/
Disallow: /ca/profiles/
Disallow: /ca/files/
Disallow: /ca/node
# Paths (clean URLs) CA
Disallow: /ca/admin/
Disallow: /ca/aggregator/
Disallow: /ca/comment/reply/
Disallow: /ca/contact/
Disallow: /ca/logout/
Disallow: /ca/node/add/
Disallow: /ca/search/
Disallow: /ca/user/register/
Disallow: /ca/user/password/
Disallow: /ca/user/login/
# Paths (no clean URLs)
Disallow: /?q=ca/admin/
Disallow: /?q=ca/aggregator/
Disallow: /?q=ca/comment/reply/
Disallow: /?q=ca/contact/
Disallow: /?q=ca/logout/
Disallow: /?q=ca/node/add/
Disallow: /?q=ca/search/
Disallow: /?q=ca/user/password/
Disallow: /?q=ca/user/register/
Disallow: /?q=ca/user/login/

This is how I wrote
This is how I wrote robots.txt for a multi-site installation in order to have two language site:
User-agent: *
Crawl-delay: 10
# ############ DRUPAL
# Directories
Disallow: /database/
Disallow: /includes/
Disallow: /misc/
Disallow: /modules/
Disallow: /sites/
Disallow: /themes/
Disallow: /scripts/
Disallow: /updates/
Disallow: /profiles/
# Files
Disallow: /xmlrpc.php
Disallow: /cron.php
Disallow: /update.php
Disallow: /ehosting.php
# Paths (clean URLs)
Disallow: /admin/
Disallow: /aggregator/
Disallow: /comment/reply/
Disallow: /contact/
Disallow: /logout/
Disallow: /node/add/
Disallow: /search/
Disallow: /user/register/
Disallow: /user/password/
Disallow: /user/login/
# Paths (no clean URLs)
Disallow: /?q=admin/
Disallow: /?q=aggregator/
Disallow: /?q=comment/reply/
Disallow: /?q=contact/
Disallow: /?q=logout/
Disallow: /?q=node/add/
Disallow: /?q=search/
Disallow: /?q=user/password/
Disallow: /?q=user/register/
Disallow: /?q=user/login/
# Radut
Disallow: /node/
Disallow: /book/export/
# ############ ENGLISH
# Directories
Disallow: /en/database/
Disallow: /en/includes/
Disallow: /en/misc/
Disallow: /en/modules/
Disallow: /en/sites/
Disallow: /en/themes/
Disallow: /en/scripts/
Disallow: /en/updates/
Disallow: /en/profiles/
# Files
Disallow: /en/xmlrpc.php
Disallow: /en/cron.php
Disallow: /en/update.php
Disallow: /en/ehosting.php
# Paths (clean URLs)
Disallow: /en/admin/
Disallow: /en/aggregator/
Disallow: /en/comment/reply/
Disallow: /en/contact/
Disallow: /en/logout/
Disallow: /en/node/add/
Disallow: /en/search/
Disallow: /en/user/register/
Disallow: /en/user/password/
Disallow: /en/user/login/
# Paths (no clean URLs)
Disallow: /en/?q=admin/
Disallow: /en/?q=aggregator/
Disallow: /en/?q=comment/reply/
Disallow: /en/?q=contact/
Disallow: /en/?q=logout/
Disallow: /en/?q=node/add/
Disallow: /en/?q=search/
Disallow: /en/?q=user/password/
Disallow: /en/?q=user/register/
Disallow: /en/?q=user/login/
# Radut
Disallow: /en/node/
Disallow: /en/book/export/
# ############ ROMANIAN
# Directories
Disallow: /ro/database/
Disallow: /ro/includes/
Disallow: /ro/misc/
Disallow: /ro/modules/
Disallow: /ro/sites/
Disallow: /ro/themes/
Disallow: /ro/scripts/
Disallow: /ro/updates/
Disallow: /ro/profiles/
# Files
Disallow: /ro/xmlrpc.php
Disallow: /ro/cron.php
Disallow: /ro/update.php
Disallow: /ro/ehosting.php
# Paths (clean URLs)
Disallow: /ro/admin/
Disallow: /ro/aggregator/
Disallow: /ro/comment/reply/
Disallow: /ro/contact/
Disallow: /ro/logout/
Disallow: /ro/node/add/
Disallow: /ro/search/
Disallow: /ro/user/register/
Disallow: /ro/user/password/
Disallow: /ro/user/login/
# Paths (no clean URLs)
Disallow: /ro/?q=admin/
Disallow: /ro/?q=aggregator/
Disallow: /ro/?q=comment/reply/
Disallow: /ro/?q=contact/
Disallow: /ro/?q=logout/
Disallow: /ro/?q=node/add/
Disallow: /ro/?q=search/
Disallow: /ro/?q=user/password/
Disallow: /ro/?q=user/register/
Disallow: /ro/?q=user/login/
# Radut
Disallow: /ro/node/
Disallow: /ro/book/export/
So, you are right. Fortunately I have this robots.txt for all my sites from the begining, and everithing is quite ok with search engines.
Florian
Dr.Radut | Puzzle IT | EU Copyright Office | STReight
The Multilingual Module
The Multilingual Module should do this work though and drupal core chekc on any new subdirectories automatically ...
~~
Linux, Drupal5 and Simple Machines Forum R great (got a "non-Russian Bridge yet (for me..))
http://veberu.hostfabrica.ru/index.php
DRUPAL ME BAD!!! & Feel free to contact me (4 faster repsonse) I am happy to look at your site (privately or com
for many enabled languages: wildcards now work with some bots!
According to http://tools.seobook.com/robots-txt/ , Googlebot, Yahoo Slurp and Microsoft's crawler support wildcards, so if you have a lot of languages enabled try appending this to the default drupal robots.txt:
# Multi-language with wildcards# Paths (clean URLs)
Disallow: /*/admin/
Disallow: /*/comment/reply/
Disallow: /*/contact/
Disallow: /*/logout/
Disallow: /*/node/add/
Disallow: /*/search/
Disallow: /*/user/register/
Disallow: /*/user/password/
Disallow: /*/user/login/
# Paths (no clean URLs)
Disallow: /*/?q=admin/
Disallow: /*/?q=comment/reply/
Disallow: /*/?q=contact/
Disallow: /*/?q=logout/
Disallow: /*/?q=node/add/
Disallow: /*/?q=search/
Disallow: /*/?q=user/password/
Disallow: /*/?q=user/register/
Disallow: /*/?q=user/login/