Allow admin to specify the default robots.txt file and add additional override checkbox
Dave Reid - October 30, 2009 - 22:45
| Project: | RobotsTxt |
| Version: | 6.x-1.x-dev |
| Component: | Code |
| Category: | feature request |
| Priority: | normal |
| Assigned: | Unassigned |
| Status: | needs review |
Description
In my case I have a multi-site install that I want to enable robotstxt on all of them, but for some of them I want to be able to specify a different default robots.txt file than the one included with the module so I don't have to configure it on each site. So I'm proposing changing the admin settings to the following:
1. Textfield with path to a robots.txt file to use as the base (default to sites/all/modules/robotstxt/robots.txt).
2. Textarea with additions to include to the result robots.txt. If the user wants to change everything about the robots.txt result, they would leave the file textfield blank and only use this field.

#1
Patch attached for review implementing this feature with an upgrade path for existing users as well.
So basically, we get the same functionality as before, but make it more configurable. I can go into a few of my site's settings.php and add:
<?php$conf = array(
'robotstxt_custom' => 'Disallow: /images',
);
?>
#2
And the reason I've removed the root '/robots.txt' as the default file is because we expect that file to not exist to use this module, so it doesn't make sense to use it as the default. Plus people who have the file re-named to something like 'robots.txt.removed' can use the following in their site's settings.php:
<?php$conf = array(
'robotstxt_file' => 'robots.txt.removed',
);
?>
#3
thanks Dave, works great. didn't check the upgrade, only fresh install.
#4
Ok this one is working now. Last patch wouldn't allow you to save an empty file path and didn't properly get the variable for the default value. I tested this one thoroughly.
#5
robotstxt_get_content() is not an API function and is missing
_prefix. Aside I remember - I've said Dave some time ago - I do not like to read the disk for the robots.txt file after the installation has completed. I do not like to commit this as it overcomplicates this simple module without a benefit except ~100 new support requests.I may not fully understand the need for this as every site could also have it's own custom
sites/example.com/modules/robotstxt/robots.txtand otherwise there is a way to override the var via settings.php...#6
I fail to see how this would create 100 new support requests for you. The problem with your solution is that I'd have to keep a separate version of robots.txt in each subsite. I don't need that. Also I've said that setting something on install is not a Drupal way of doing things. I'll even do some benchmarks.
#7
Revised patch that does some filemtime checking to increase performance to a teeny bit that the actual current code in CVS. This even has a very smart update function. Basically it checks to see if the user hasn't made any changes to the robotstxt content (by checking if each line of sites/all/modules/robotstxt/robots.txt can be found in variable_get('robotstxt', '')), then it can safely use the default file and does a smart job of setting the custom lines. Otherwise, it puts the whole contents of the robotstxt variable into the 'custom lines' variable and leaves the default file blank, keeping the same functionality as before.
For testing:
- Went to admin/settings/robots.txt, clicked 'Reset to defaults', and then pressed 'Save configuration'
- The hook_robotstxt module invoke disabled (help narrow down exactly the difference between the old and new versions)
- Caching disabled
- Tested with ab -n 500 -c 1 http://mysql.drupal6dev.local/robots.txt
WITHOUT PATCH:
Requests per second: 5.35 [#/sec] (mean)
Time per request: 186.971 [ms] (mean)
Time per request: 186.971 [ms] (mean, across all concurrent requests)
Transfer rate: 11.71 [Kbytes/sec] received
Connection Times (ms)
min mean[+/-sd] median max
Connect: 0 0 0.0 0 0
Processing: 181 187 5.9 186 253
Waiting: 181 187 5.9 186 253
Total: 181 187 5.9 186 253
WITH PATCH:
Requests per second: 5.37 [#/sec] (mean)
Time per request: 186.136 [ms] (mean)
Time per request: 186.136 [ms] (mean, across all concurrent requests)
Transfer rate: 11.45 [Kbytes/sec] received
Connection Times (ms)
min mean[+/-sd] median max
Connect: 0 0 0.0 0 0
Processing: 181 186 4.9 184 213
Waiting: 181 186 4.9 184 213
Total: 181 186 4.9 184 213
I ran these tests three times each, so I posted the media result of each 'run'. And yes, with the patch, this is actually faster than the current code.
#8
Missed a variable_del() in robotstxt_uninstall().
#9
And in response to robotstxt_get_content() being a private function, I don't agree. This seems like a public function that any module could call if they wanted/needed to. There's nothing that screams 'NEVER CALL THIS' in the code. As a compromise, I can rename the function in a revised patch if you want.
#10
So, is this good to proceed? Yea? Nay?
#11
Need to find some time to look into...