I see Aegir supports robots.txt module, but really, why should one use a module when per-site robots.txt can be achived with simple rewrites ? As an added benefit, per-site robots.txt could be version-controlled. Also having settings like this in the database generally goes against recent configuration-in-code movement (CTools, exportables, Features, etc).

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

crea’s picture

I propose to store the file in the sites/example.com folder. We could even have rewrites that do following checks:
1) try per-site file
2) try global file
3) fall-back to drupal

crea’s picture

Project: Hosting » Provision
Version: 6.x-0.4-alpha3 » 6.x-1.1
crea’s picture

Status: Active » Needs work
FileSize
1.13 KB

This is for nginx only, so needs work.

crea’s picture

dupe

crea’s picture

hmm that attach is broken...second try

omega8cc’s picture

The idea is nice, but the file shouldn't be stored directly in the site directory, because only Aegir system user should have write access there, while any static file should be uploaded only to the files directory.

Example: try_files /sites/$host/files/robots.txt $uri @cache;

crea’s picture

In my opinion this is sort of system setting (same as global and local settings.php), and should be outside files directory. Also I want to put the file in VCS while files directory should be out of it, and I don't want to introduce custom policy just for this file.

omega8cc’s picture

OK, then maybe:

try_files /sites/$host/robots.txt /sites/$host/files/robots.txt $uri @cache;

crea’s picture

Is it true that the file will be overwritten during migration ? I mean, maybe it would be better idea to store the file in the profile directory so that it becomes a part of the platform ?

crea’s picture

We don't know the path to the profile inside nginx unfortunately.

omega8cc’s picture

No, the file will be moved with the site as-is and never touched by Aegir.

It shouldn't be a part of platform on the install profile level - it should be a part of the *site*, or there is no point in avoiding standard robots.txt file in the platform root.

crea’s picture

No, the file will be moved with the site as-is and never touched by Aegir.

I meant exactly that - in deployment scenario with platform in VCS the file won't be a part of the platform so it's not possible to have it in the same VCS repo. If I roll out new platform with updated robots.txt it will be overwritten with the old one.

crea’s picture

We know profile name in the vhost template. Thus we can generate dynamic per-host rewrite using php.
I agree, that (generally) this should be a part of site and not a part of profile. However, in order to support platforms in VCS as a site deployment model, Aegir already suggests to store site parts in the profile - site-specific modules and themes (i.e. separate profile per site model). I already do that, and it works great. So, I think it would be ok to store such configuration files as robots.txt at the profile level (not as a general rule but as an option).

omega8cc’s picture

Then maybe custom config overrides is the way to go, because we shouldn't add such non-standard locations in the generic setup.

I would never expect robots.txt to be a part of the *code* and part of the platform, by the way.

The problem with standard multisite setup is that the default file in the platform root doesn't allow you to manage robots.txt per site, so if we want to introduce it, we need current solution (support for robotstxt module), maybe extended with support for static file per site, but stored in the sites space.

anarcat’s picture

Status: Needs work » Needs review

Sounds like a great idea!! Why don't we do something like this in apache:

  RewriteCond %{REQUEST_FILENAME} !-f
  RewriteRule ^robots.txt /sites/%{HTTP_HOST}/robots.txt [L]

Is there anything missing in the Nginx side?

anarcat’s picture

Status: Needs review » Needs work

Actually, we need a patch here :)

omega8cc’s picture

Well, I'm against allowing robots.txt in the sites/domain directory, because then you have to open write access to this directory for the group (as we do for modules/themes/libraries) which is rather crazy idea, IMO.

There could be a rewrite added to support sites/domain/files/robots.txt - and we don't have it in the Nginx config yet, as we support standard, platform root level location for robots.txt and also the robotstxt module, so far.

Steven Jones’s picture

It should be noted that Drupal 7 includes a robots.txt at its root, so it seems that having a platform level robots.txt or using the robotstxt module is the 'Drupal way'.

I don't see why this couldn't be handled in a contrib, 'Aegir robotstxt' module?

anarcat’s picture

I have a problem with the robots.txt module - it bootstraps drupal, so it's a huge performance hit.

I would rather have a static file generated.

And I agree it goes in files/robots.txt, that's alright for me.

And I don't mind having this in contrib, but it seems like a so simple task (one rewrite rule!) that we should get it in. Besides, if we want to prioritize the platform robots.txt, we can do that too...

omega8cc’s picture

Sounds good. So we could avoid robotstxt module and create/copy the robots.txt files from platform root to sites/domain/files/ by default, on site deploy maybe (not on verify to not overwrite it).

crea’s picture

Platform-level robots.txt doesn't make sense when you add multisiting to the picture, and Aegir heavily uses it. It exists in Drupal simply for legacy reasons and also because patch with a better approach wasn't submitted.
The question is simple: should we support something better, or stick with inferior solution just because Drupal does it.

Steven Jones’s picture

@omega8cc aren't we proposing a simple rewrite rule, not actually copying the robots.txt over.

I Guess if someone wants it then they can still install robotstxt module and delete any robots.txt in their platform and it'll just work, but otherwise the can use the Aegir magic.

omega8cc’s picture

Sure, we can simply add a rewrite to support both legacy location in the platform root and in the sites/domain/files.

Then we can leave this for the server admin and site admin to use either legacy location - platform-wide (which still makes sense also in the multisite env, unless some site really requires custom robots.txt) or upload the file to the sites/domain/files directory (after it was deleted from the platform root).

So we need only simple rewrite and some how-to entry in the handbook/docs to explain that it as a built-in feature.

omega8cc’s picture

Status: Needs work » Needs review
anarcat’s picture

Status: Needs review » Patch (to be ported)

I have committed the patch for Nginx and I have rolled my own patch for Apache. I am not sure they work the same way though. In apache, i first check if the site-specific robots.txt file exists, and redirect there only if so, otherwise the normal process follows course (ie. the platform-level robots.txt gets served).

Is that the way nginx works?

Here's the apache patch:

http://drupalcode.org/project/provision.git/commitdiff/e7127de6027c54727...

Let's let this sit for a while in 2.x and merge when we have some more tests.

j0nathan’s picture

Subscribing.

omega8cc’s picture

In the Nginx configuration it checks for platform-wide file first, then site specific if no platform-wide exists, and then, if none exist, it sends the request to Drupal via php-fpm backend, to support also robotstxt module.

We don't use any rewrite or redirect here, only the file check in the above order, which is also cached in the Nginx memory for better performance.

crea’s picture

The feature should check site file first, then fallback to the platform-wide. Otherwise we are adding an additional step of removing platform file as a requirement, and also killing fallback mechanics in the process since there's nothing to fallback to

anarcat’s picture

Status: Patch (to be ported) » Needs work

Ok, we have a problem with the patch then because they work in opposite ways.

I believe we should allow users to override the platform robots.txt. Otherwise it means we need to add a robots.txt to *every site* if we want to customize *one*. Seems backwards.

Can you change your patch so that the site-specific robots.txt has precedenceÉ

omega8cc’s picture

Status: Needs work » Needs review
anarcat’s picture

Status: Needs review » Patch (to be ported)

Alright, patch applied, we are now making sense again. Thanks! :)

anarcat’s picture

Status: Patch (to be ported) » Fixed

pushed to 1.x.

Status: Fixed » Closed (fixed)

Automatically closed -- issue fixed for 2 weeks with no activity.

netpez’s picture

I currently have AEGIR 1.6...

I noticed the patch lines:

+
+ RewriteCond print $this->root; /sites/%{HTTP_HOST}/files/robots.txt -f
+ RewriteRule ^robots.txt /sites/%{HTTP_HOST}/files/robots.txt [L]

are in the template file within the .drush dir on my version....

So I created a robots.txt file that disallows all (/) on the manager (aegir) server in a specific /platform/site/files directory and I verified it so it would push out to all servers.

It is still not working :(

Anyone else still have issues?

anarcat’s picture

Please open an new issue instead of posting in older ones. For such questions, you should probably use the community site. See also http://community.aegirproject.org/help

  • Commit 6cc9169 on 7.x-2.x, dev-ssl-ip-allocation-refactor, dev-1205458-move_sites_out_of_platforms, 7.x-3.x, dev-subdir-multiserver, 6.x-2.x-backports, dev-helmo-3.x authored by omega8cc, committed by anarcat:
    Issue #1173954 - Nginx configuration - Add support for static robots.txt...
  • Commit e7127de on 7.x-2.x, dev-ssl-ip-allocation-refactor, dev-1205458-move_sites_out_of_platforms, 7.x-3.x, dev-subdir-multiserver, 6.x-2.x-backports, dev-helmo-3.x by anarcat:
    #1173954 - use the site-specific robots.txt if present
    
    
  • Commit 42c39b1 on 7.x-2.x, dev-ssl-ip-allocation-refactor, dev-1205458-move_sites_out_of_platforms, 7.x-3.x, dev-subdir-multiserver, 6.x-2.x-backports, dev-helmo-3.x authored by omega8cc, committed by anarcat:
    Issue #1173954 - Nginx configuration - Fix support for static robots.txt...
  • Commit ed77be2 on 6.x-1.x, dev-ssl-ip-allocation-refactor, dev-1205458-move_sites_out_of_platforms, 7.x-3.x, dev-subdir-multiserver, 6.x-2.x-backports, dev-helmo-3.x authored by omega8cc, committed by anarcat:
    Issue #1173954 - Nginx configuration - Add support for static robots.txt...
  • Commit 291dfd1 on 6.x-1.x, dev-ssl-ip-allocation-refactor, dev-1205458-move_sites_out_of_platforms, 7.x-3.x, dev-subdir-multiserver, 6.x-2.x-backports, dev-helmo-3.x by anarcat:
    #1173954 - use the site-specific robots.txt if present
    
    
  • Commit 334afd1 on 6.x-1.x, dev-ssl-ip-allocation-refactor, dev-1205458-move_sites_out_of_platforms, 7.x-3.x, dev-subdir-multiserver, 6.x-2.x-backports, dev-helmo-3.x authored by omega8cc, committed by anarcat:
    Issue #1173954 - Nginx configuration - Fix support for static robots.txt...