In my case I have a multi-site install that I want to enable robotstxt on all of them, but for some of them I want to be able to specify a different default robots.txt file than the one included with the module so I don't have to configure it on each site. So I'm proposing changing the admin settings to the following:
1. Textfield with path to a robots.txt file to use as the base (default to sites/all/modules/robotstxt/robots.txt).
2. Textarea with additions to include to the result robots.txt. If the user wants to change everything about the robots.txt result, they would leave the file textfield blank and only use this field.
Comment | File | Size | Author |
---|---|---|---|
#8 | 619404-robotstxt-D6.patch | 6.88 KB | Dave Reid |
#7 | 619404-robotstxt-D6.patch | 6.84 KB | Dave Reid |
#4 | 619404-robotstxt-D6.patch | 4.85 KB | Dave Reid |
#1 | 619404-robotstxt-D6.patch | 4.81 KB | Dave Reid |
Comments
Comment #1
Dave ReidPatch attached for review implementing this feature with an upgrade path for existing users as well.
So basically, we get the same functionality as before, but make it more configurable. I can go into a few of my site's settings.php and add:
Comment #2
Dave ReidAnd the reason I've removed the root '/robots.txt' as the default file is because we expect that file to not exist to use this module, so it doesn't make sense to use it as the default. Plus people who have the file re-named to something like 'robots.txt.removed' can use the following in their site's settings.php:
Comment #3
asak CreditAttribution: asak commentedthanks Dave, works great. didn't check the upgrade, only fresh install.
Comment #4
Dave ReidOk this one is working now. Last patch wouldn't allow you to save an empty file path and didn't properly get the variable for the default value. I tested this one thoroughly.
Comment #5
hass CreditAttribution: hass commentedrobotstxt_get_content() is not an API function and is missing
_
prefix. Aside I remember - I've said Dave some time ago - I do not like to read the disk for the robots.txt file after the installation has completed. I do not like to commit this as it overcomplicates this simple module without a benefit except ~100 new support requests.I may not fully understand the need for this as every site could also have it's own custom
sites/example.com/modules/robotstxt/robots.txt
and otherwise there is a way to override the var via settings.php...Comment #6
Dave ReidI fail to see how this would create 100 new support requests for you. The problem with your solution is that I'd have to keep a separate version of robots.txt in each subsite. I don't need that. Also I've said that setting something on install is not a Drupal way of doing things. I'll even do some benchmarks.
Comment #7
Dave ReidRevised patch that does some filemtime checking to increase performance to a teeny bit that the actual current code in CVS. This even has a very smart update function. Basically it checks to see if the user hasn't made any changes to the robotstxt content (by checking if each line of sites/all/modules/robotstxt/robots.txt can be found in variable_get('robotstxt', '')), then it can safely use the default file and does a smart job of setting the custom lines. Otherwise, it puts the whole contents of the robotstxt variable into the 'custom lines' variable and leaves the default file blank, keeping the same functionality as before.
For testing:
- Went to admin/settings/robots.txt, clicked 'Reset to defaults', and then pressed 'Save configuration'
- The hook_robotstxt module invoke disabled (help narrow down exactly the difference between the old and new versions)
- Caching disabled
- Tested with ab -n 500 -c 1 http://mysql.drupal6dev.local/robots.txt
I ran these tests three times each, so I posted the media result of each 'run'. And yes, with the patch, this is actually faster than the current code.
Comment #8
Dave ReidMissed a variable_del() in robotstxt_uninstall().
Comment #9
Dave ReidAnd in response to robotstxt_get_content() being a private function, I don't agree. This seems like a public function that any module could call if they wanted/needed to. There's nothing that screams 'NEVER CALL THIS' in the code. As a compromise, I can rename the function in a revised patch if you want.
Comment #10
Dave ReidSo, is this good to proceed? Yea? Nay?
Comment #11
hass CreditAttribution: hass commentedNeed to find some time to look into...
Comment #12
Dave Reid@hass Do you need more reviews? Help maintaining the module? Let me know and I'm more than willing to help.
Comment #13
hass CreditAttribution: hass commentedNo thanks.
Comment #14
Dave ReidFair enough.
Comment #15
perarnet CreditAttribution: perarnet commentedI need this change too, it's a hell to have to install and enable and configure this module when you're running 20+ sites on a single drupal installation. Would be good to know if this can be implemented in the module, so I don't need to run a forked version
Comment #16
hass CreditAttribution: hass commentedYou are still able to have a default file without the patch
Comment #17
Dave ReidNot really since it sets the file content on install.
hass I don't like to pester anyone, but I'd just like a response if you are or are not going to consider/implement this since this is still set to 'needs review'.
Comment #18
hass CreditAttribution: hass commentedPerarnet have complained about something missing, but he have not understood how the module install works. He said he need to enable the module and configure on all sites by hand afterwards. This is wrong! He can add a pre-configuration in the module folder, enable the module and the config becomes his pre-configuration. His issue is thereby non-existent.
Dave - you are asking for a change than may go in if there are reviews, but there seems no interrest or we would have some reviews and comments here about test results.
Comment #19
Dave Reid@hass: I appreciate the update. We'll just let it lie awaiting reviews. I apologize about pestering for an update. I hate it when that happens in my modules' issue queues.
Comment #20
AlexisWilke CreditAttribution: AlexisWilke commentedDave,
I see that you are much more persistent than I. I just dropped the ball on my end. Hass does not seem to understand maintenance of a multi-site install. And 20 websites isn't very many...
I noticed today that most of my users forbid access to user/<uid> but Google will hit those folders because it is not in the default robots.txt (reason is you may allow anonymous users to access those folders, obviously.) Note: Google knows because of Google Analytics which should probably be turned off on those folders. But it does not change the need for a correct robotstxt file.
That means many more entries in the db log that do not need to be there and false advertisement to search engines.
This being said, I reviewed your code changes. Doing so I noticed we could add a hook in our own modules. Cool!
Of course, I'd have one more request... a way to remove entries from the default robots.txt file... (he! he!) not too important though.
One thing though:
The $file parameter is used in the else part even when empty. Either change it to an elseif () or do this (better optimized):
The really cool thing with this is you can upload a robots.txt file to your files folder and point this module to that file. This means any user can do it without having to change anything in the modules folder (still need to chop the Core robots.txt file though.)
Thank you.
Alexis Wilke
P.S. One more thing, by the way, the new way prevents you from seeing the content of the txt file from the admin screen. It could be nice to add one more widget with the content the way it looks at this point. So on save you can check and see that it looks as you intended.
Comment #21
hass CreditAttribution: hass commentedComment #22
hass CreditAttribution: hass commentedComment #25
hass CreditAttribution: hass commented