Looking through the issues related to multisite support, I wasn't able to find a good solution. Removing robots.txt is not an option, as my site is on a platform, on which there are couple of dozens of other sites. This would require coordination between different teems, working withing different organizations. And, it feels there is not really much in the admin interface to provide flexibility.

So... The concept is simple: have each multisite marked in such a way that Apache or Nginx could recognize, and perform redirect to a URL on which the module will return that site's robots.txt file. This will require a change to rewrite rules. Marking/unmarking of the sites should be available from the admin interface. Additionally, we should be able to recognize aliases recorded in sites.php file and be able to respond to the aliases not recorded in sites.php.

Provided patch is a start, and allows all of the above, with exception of Nginx support - still working on that one.

Sites are marked via admin interface, in the following manner:

sites/mysite.com/.robotstxt
sites/mysite.alias.com.robotstxt            - for 'mysite.alias.com' alias
sites/mysite.another.alias.com.robotstxt    - for 'mysite.another.alias.com' alias

Apache mod_rewrite additions to .htaccess file, and Nginx rules updates are displayed in the admin interface as a suggestion.

Apache rules as follows:

RewriteCond %{REQUEST_URI} ^/robots.txt$
RewriteCond %{DOCUMENT_ROOT}/sites/%{HTTP_HOST}/.robotstxt -f [OR]
RewriteCond %{DOCUMENT_ROOT}/sites/%{SERVER_PORT}.%{HTTP_HOST}.robotstxt -f [OR]
RewriteCond %{DOCUMENT_ROOT}/sites/%{HTTP_HOST}.robotstxt -f
RewriteRule ^ index.php?q=robots.txt [L] 

Nginx rules as follows:

location = /robots.txt {
    allow all;
    log_not_found off;
    access_log off;
    if (-f $document_root/sites/$host/.robotstxt) {
        rewrite ^ /index.php?q=robots.txt;
    }
    if (-f $document_root/sites/$host.robotstxt) {
        rewrite ^ /index.php?q=robots.txt;
    }
    if (-f $document_root/sites/$server_port.$host.robotstxt) {
        rewrite ^ /index.php?q=robots.txt;
    }
}

Comments

euk created an issue. See original summary.

euk’s picture

Issue summary: View changes
euk’s picture

New patch with Nginx rules updated

euk’s picture

Issue summary: View changes
euk’s picture

After some testing and other consideration, turns out we are quite limited in what we can do here. The new patch is more conservative, but still is good to use.
As a requirement for this patch to work:

  • all sites in a multi-site setup should have their public file system path follow same naming pattern, e.g. all the sites should have it 'sites/SITE-NAME/FILES-FOLDER-PATH'. the 'sites' part can be any path withing Drupal root, but has to be the same across all sites.
  • the patch uses 'site/default/files' as the files folder for default site. This can be overridden though 'robotstxt_default_override' variable
kevinquillen’s picture

Status: Needs review » Postponed

Postponing due to upcoming Drupal 7 EOL.