Download & Extend

robots.txt: add wildcarded paths for multilingual sites

Project:Drupal core
Version:7.x-dev
Component:base system
Category:feature request
Priority:normal
Assigned:Unassigned
Status:closed (duplicate)

Issue Summary

Find attached patches against latest D6 and D7 checkouts and give your opinion ;)

AttachmentSizeStatusTest resultOperations
robots-txt-wildcard-paths-for-multilanguage-d7.patch860 bytesIdlePassed on all environments.View details | Re-test
robots-txt-wildcard-paths-for-multilanguage-d6.patch852 bytesIgnored: Check issue status.NoneNone

Comments

#1

Status:needs review» closed (duplicate)

Marking as duplicate of
#180379: Fixing Robots.txt which also tries to reduce duplicate content.

#2

Status:closed (duplicate)» postponed (maintainer needs more info)

Well but that one was outdated and didn't apply while so i thought maybe if i open a clean issue and attach a clean patch that actually can be applied now we could go from there? The other issue is mixing several things which also were still in discussion, while this patch just duplicates the existing exclusion paths to apply for multi-language sites.. that would save a lot of sites a lot of unnecessary server load, for free, now!

#3

This is invalid syntax - from http://www.robotstxt.org/robotstxt.html:

Note also that globbing and regular expression are not supported in either the User-agent or Disallow lines. The '*' in the User-agent field is a special value meaning "any robot". Specifically, you cannot have lines like "User-agent: *bot*", "Disallow: /tmp/*" or "Disallow: *.gif".

Unless we are going to add every possible language, or make robots.txt autogenerated by Drupal I don't think there is any technical solution to this. I guess the best approach is to update the documentation to explain to people how to add these themselves.

#4

OK, reading the "fixing" issue I guess "*" is pretty commonly accepted.

#5

Status:postponed (maintainer needs more info)» closed (duplicate)

Marking as duplicate of #180379: Fixing Robots.txt. You can follow up on that issue to track its status instead. If any information from this issue is missing in the other issue, please make sure you provide it over there.

However, thanks for taking the time to report this issue.

#6

Ok, but imho this easily could and should have been committed to the D6 branch long ago, to fix hammering of all the i18n sites in the wild...anyways, at least not a problem anymore for my site.