The default robots.txt does not cooperate very well with multilingual sites - it blocks /admin/ but not /fr/admin , more importantly, /search is covered, but /xx/search isn't, which causes googlebot to crawl our search-results.

Since robots.txt only accepts * as a wildcard and the main issue is the crawling of search, I would propose adding /*/search to the disallow list.

CommentFileSizeAuthor
#3 drupal-robotstxt-2195283-3.patch360 bytesstefan.r
Support from Acquia helps fund testing for Drupal Acquia logo

Comments

stefan.r’s picture

Assigned: bart.hanssens » bdeclerc

see #180379: Fix path matching in robots.txt

To be complete, we can do

Disallow: /*/search$
Disallow: /*/?q=search$
Disallow: /*/search/
Disallow: /*/?q=search/
bart.hanssens’s picture

Issue tags: +release1.6

added tag

stefan.r’s picture

stefan.r’s picture

Title: robots.txt does not cover language-specific search urls » Cover language-specific search URLs in robots.txt

Added to Openfed.

  • Commit 13112db on 7.x-1.x, 7.x-1.7 by stefan.r:
    Issue #2195283: Cover language-specific search URLs in robots.txt
    

  • Commit 13112db on 7.x-1.x, 7.x-1.8 by stefan.r:
    Issue #2195283: Cover language-specific search URLs in robots.txt
    
bart.hanssens’s picture

Status: Active » Closed (fixed)