Find attached patches against latest D6 and D7 checkouts and give your opinion ;)

Comments

catch’s picture

Status: Needs review » Closed (duplicate)

Marking as duplicate of
#180379: Fix path matching in robots.txt which also tries to reduce duplicate content.

eMPee584’s picture

Status: Closed (duplicate) » Postponed (maintainer needs more info)

Well but that one was outdated and didn't apply while so i thought maybe if i open a clean issue and attach a clean patch that actually can be applied now we could go from there? The other issue is mixing several things which also were still in discussion, while this patch just duplicates the existing exclusion paths to apply for multi-language sites.. that would save a lot of sites a lot of unnecessary server load, for free, now!

owen barton’s picture

This is invalid syntax - from http://www.robotstxt.org/robotstxt.html:

Note also that globbing and regular expression are not supported in either the User-agent or Disallow lines. The '*' in the User-agent field is a special value meaning "any robot". Specifically, you cannot have lines like "User-agent: *bot*", "Disallow: /tmp/*" or "Disallow: *.gif".

Unless we are going to add every possible language, or make robots.txt autogenerated by Drupal I don't think there is any technical solution to this. I guess the best approach is to update the documentation to explain to people how to add these themselves.

owen barton’s picture

OK, reading the "fixing" issue I guess "*" is pretty commonly accepted.

sun.core’s picture

Status: Postponed (maintainer needs more info) » Closed (duplicate)

Marking as duplicate of #180379: Fix path matching in robots.txt. You can follow up on that issue to track its status instead. If any information from this issue is missing in the other issue, please make sure you provide it over there.

However, thanks for taking the time to report this issue.

eMPee584’s picture

Ok, but imho this easily could and should have been committed to the D6 branch long ago, to fix hammering of all the i18n sites in the wild...anyways, at least not a problem anymore for my site.

j0rd’s picture

Same problem. There appears to be very little discussion about the limitation of Drupal's default robots.txt when it comes to multi-language sites.

For others who have this problem and find this issue on Google, here's the best Drupal 6 robots.txt I've found.
#1317338: Improvements to the core file robots.txt in Drupal
https://wiki.koumbit.net/DrupalRobots