I read a good approach on how to manage robots.txt in this post:
http://drupal.org/node/22265#comment-98197
I followed the directions and damned if it didn't break my cron job. Really.
I had a problem and posted a reply there, but I think the thread may have run dry. SO.. I'll risk being redundant in hopes to get some experienced insights on the subject.
Background
We're using multi-site, single code base with clean URLs.
Disclaimers - my coding skills are very rusty. Sure I used to use vi - but that was over 10 years ago and I don't know php. SO maybe it's just something stupid (usually is right :-).
I created a "page", turned off rich text marked the input as type php. Then I pasted the following into it (without the code tags of course):
<?php
Header('Content-type: text/plain');
?>
User-agent: *
Crawl-Delay: 10
Disallow: /tracker
Disallow: /comment/reply
Disallow: /node/add
Disallow: /user
Disallow: /search
Disallow: /book/print
<?php
die();
?>
I used a path alias of robots.txt
I hit save. Then when I navigate to mydomain.com/robots.txt I see the appropriate robots text just fine.
But I started getting cron errors right after that. I tried navigating to:
mydomain.com/cron.php and what do you know, it showed me the robots.txt text.
YET when I went on the fileserver and looked at cron.php it was there, it was correct.
I looked for redirects and couldn't find any.
I decided to delete the drupal node for the robots.txt and guess what cron.php began working again.
Any ideas out there? Where am I being lame?
Comments
There really is no need to
There really is no need to create the robots.txt file in that manner. you could just create the .txt file in an editor and upload the actual file into your drupal root rather then create a node and use it as a robots.txt file, which i actually think is easier and shouldnt interfere with anything at all.
but multisite?
Ok, but I'm running multisite, single codebase. I thought this approach would make it nice to keep different robots.txt files per domain/site.
Though... in the name of simplicity and damn it want this working, perhaps I'll sacrafice that flexibility. Hell, most will have the same robots.txt settings anyway.
Thanks for the advice. I'll use it.
RobotsTxt module is what you need...
Please see the most excellent module by Robert Douglass that does everything that you want:
http://drupal.org/project/robotstxt
Its beauty is in its simplicity... enjoy!
Thanks! will check it out
Looks like that could do the trick.
RE: RobotsTxt module is what you need...
It is also true that at today it contains a bug that not allow the 6.x-1.x-dev version of the module to work.
See #345252: Apache with MultiViews configured: robots.txt is not displayed, instead the frontpage of my site is shown.