Closed (fixed)
Project:
Search 404
Version:
5.x-1.x-dev
Component:
Miscellaneous
Priority:
Minor
Category:
Support request
Assigned:
Reporter:
Created:
27 Dec 2007 at 20:23 UTC
Updated:
15 Jun 2009 at 11:00 UTC
Hello,
Google bot and other search engines boats still visit hundreds of pages that have been created by a translation script even though I disabled the script and deleted the translated pages on the server,
do you think it is because of search 404 that the bots still visits the translated pages, causing high load on the server ?
I do not use the jump feature.
Thanks.
Comments
Comment #1
wwwoliondorcom commentedAny help ? I still have the same problem. Thanks.
Comment #2
vsr commentedIf the translations were in a special directory you could tell them to not go there using the robots.txt file in the root document directory . If you know mod_rewrite, you might be able to have it give a message like a 410 - gone, or at least give a forbidden message..
Comment #3
wwwoliondorcom commentedYes but chinese bots do not respect any robots.txt , so will they respect a mod_rewrite ? (I didn't try yet)
Thanks.
Comment #4
vsr commentedMod_rewrite is part of Apache and does what the rules tell it to do. You ae in control when you use your .htaccess file to control access You can block by IP address, user agent, you can do a lot. If you wanted to you could redirect the bots from china back on tho their host using your .htaccess file. You have a directory tht you do not want accessed any more you can just create a .htaccess file for that directory and put in that file deny from all. Than any attempt to access that directory will give a 403 - Forbidden message. You just have to makesure that you have a 403 page or apache will spit out a message.
If you know the IP addresses you could do something like this in your server
<limit GET POST>
order deny,allow
deny from 123.456.789.123
allow from all
</limit>
Actually http://www.biyw.com/ has a nice little .htaccess cheat sheet and primer on this if you do not know much about this. Look in the sitemap for the site. Not sure what the page is. http://apache.org/ has a lot of information about this and more. Hope this helps you.
Comment #5
zyxware commentedCan you please email me a URL(Use http://drupal.org/user/222163/contact) where bots are still visiting. Search 404 returns "Error 404 page not found" error which should take care of the removal of the page from the index. Perhaps you should wait a little longer for the indices to reflect the change.
Comment #6
zyxware commentedThis module is working perfectly fine on zyxware.com. Unless we get the URL of the site where the problems are seen we cannot say anything about this issue. Also try upgrading to the latest version of the module
Comment #7
zyxware commentedSince there has been no activity for over an year I am closing this issue, if the problem persists feel free to open the issue
Regards
Zyxware