By Amarnath-1 on
Hi,
Can you suggest how to hide an intranet site from search engines/crawlers?
I am thinking of a private internet site which should not be searched and viewed by public.
Any suggestion is very much appreciated.
Hi,
Can you suggest how to hide an intranet site from search engines/crawlers?
I am thinking of a private internet site which should not be searched and viewed by public.
Any suggestion is very much appreciated.
Comments
What exactly do you want to
What exactly do you want to hide?
Hiding the content is easy. Using access control, do not allow non-logged-in users to access any content. You may want to enable anonymous users to access a front page, which includes a login box.
You can prevent a crawler from wanting to go through you site using robots.txt:
http://www.robotstxt.org/wc/faq.html#what
How do I prevent robots scanning my site?
Quote:
"The quick way to prevent robots visiting your site is put these two lines into the /robots.txt file on your server:
User-agent: *
Disallow: /
"
I'm not an authority on this. I don't think that would hide your site, it will just tell the robot to go away. I don't know how fool-proof that would be.
I can tell you that using
I can tell you that using access-control might or might not work, this is a bit anecdotal as it was a while back - but i had a test site that was crawled by google whether or not i had user permissions or not....
So something to investigate for sure!
I run a private site and I
I run a private site and I never got any evidence that anyone (including a search engine) could access content without privileges. Why should a search engine be able to access your database and any other normal http query needs a password and a cookie to get anything but the front page?
Thanks Arzajac & Luyendao
Thanks Arzajac & Luyendao for the useful info.
Dear Arzajac, as you know, Drupal is very search engine friendly (I do agree with this). Basically I do not want search engines reach my personal content and display it to public.
From your details, I understand that proper access rights (that is password protected content pages) and through robots.txt, we can protect the the personal site from search engines & crawlers.
Any further sugeestion is welcome!
Use a robots.txt file
Use a robots.txt file in your root that disallows the directory you want from being crawled. Most crawlers respect it and this is the best way to get them not crawl the files and folders you wish not to get indexed by search engines.
Creating it is super easy. Save this in a text file, upload it to the root of your server directory and you're done:
User-Agent: *
Disallow: /intranet (or whatever your folder name is)
Learn more here.
http://www.outfront.net/tutorials_02/adv_tech/robots.htm
Hope it solves your problem
Oh. I didn't notice that
Oh. I didn't notice that robots.txt was also mentioned above. Sorry about the redundant info.