Hello,

Google bot and other search engines boats still visit hundreds of pages that have been created by a translation script even though I disabled the script and deleted the translated pages on the server,

do you think it is because of search 404 that the bots still visits the translated pages, causing high load on the server ?

I do not use the jump feature.

Thanks.

Comments

wwwoliondorcom’s picture

Any help ? I still have the same problem. Thanks.

vsr’s picture

If the translations were in a special directory you could tell them to not go there using the robots.txt file in the root document directory . If you know mod_rewrite, you might be able to have it give a message like a 410 - gone, or at least give a forbidden message..

wwwoliondorcom’s picture

Yes but chinese bots do not respect any robots.txt , so will they respect a mod_rewrite ? (I didn't try yet)

Thanks.

vsr’s picture

Mod_rewrite is part of Apache and does what the rules tell it to do. You ae in control when you use your .htaccess file to control access You can block by IP address, user agent, you can do a lot. If you wanted to you could redirect the bots from china back on tho their host using your .htaccess file. You have a directory tht you do not want accessed any more you can just create a .htaccess file for that directory and put in that file deny from all. Than any attempt to access that directory will give a 403 - Forbidden message. You just have to makesure that you have a 403 page or apache will spit out a message.

If you know the IP addresses you could do something like this in your server

<limit GET POST>
order deny,allow
deny from 123.456.789.123
allow from all
</limit>

Actually http://www.biyw.com/ has a nice little .htaccess cheat sheet and primer on this if you do not know much about this. Look in the sitemap for the site. Not sure what the page is. http://apache.org/ has a lot of information about this and more. Hope this helps you.

zyxware’s picture

Component: Documentation » Code
Assigned: Unassigned » zyxware

Can you please email me a URL(Use http://drupal.org/user/222163/contact) where bots are still visiting. Search 404 returns "Error 404 page not found" error which should take care of the removal of the page from the index. Perhaps you should wait a little longer for the indices to reflect the change.

zyxware’s picture

Component: Code » Miscellaneous
Priority: Normal » Minor
Status: Active » Postponed (maintainer needs more info)

This module is working perfectly fine on zyxware.com. Unless we get the URL of the site where the problems are seen we cannot say anything about this issue. Also try upgrading to the latest version of the module

zyxware’s picture

Status: Postponed (maintainer needs more info) » Closed (fixed)

Since there has been no activity for over an year I am closing this issue, if the problem persists feel free to open the issue

Regards
Zyxware