Is there a way to make sure that the user/login page doesn't show up when searching in search engines (ie google). I am using the XML Sitemap, SEO Checklist and Metatags modules, and was hoping there would be a way you could tell it not to include a page when the search engine robots scan through.

I have foudn a way with the metatags module to edit individual pages and tell them not to be included in the robots. How can i make that apply to the user/login page (since it isn't an actual page, and i can't edit it).

Thanks

Comments

gpk’s picture

That's what the robots.txt file in the Drupal root folder is for. A quick internet search should tell you what you need.

gpk
----
www.alexoria.co.uk

chris_huh’s picture

Ah thanks. I tried searching, but didn't really know what to search for.

I noticed the metatags module allows you to set the default for all pages to be to not include it (then i can obviously edit each page i want on there). But i will look at the robots.txt file.

gpk’s picture

>set the default for all pages to be to not include it
Ah when I last used metatags (rather a long time ago) it wasn't so flexible. However if user and user/logon are the only pages you don't want to be indexed then robots.txt is probably still the way to go.. :)

gpk
----
www.alexoria.co.uk

chris_huh’s picture

I checked my robots.txt file and it says that user/login should be disabled already. I haven't changed the robots.txt file, so it is the default file that came with drupal. It is in the root of the site.

User-agent: *
Crawl-delay: 10
# Directories
Disallow: /database/
Disallow: /includes/
Disallow: /misc/
Disallow: /modules/
Disallow: /sites/
Disallow: /themes/
Disallow: /scripts/
Disallow: /updates/
Disallow: /profiles/
# Files
Disallow: /xmlrpc.php
Disallow: /cron.php
Disallow: /update.php
Disallow: /install.php
Disallow: /INSTALL.txt
Disallow: /INSTALL.mysql.txt
Disallow: /INSTALL.pgsql.txt
Disallow: /CHANGELOG.txt
Disallow: /MAINTAINERS.txt
Disallow: /LICENSE.txt
Disallow: /UPGRADE.txt
# Paths (clean URLs)
Disallow: /admin/
Disallow: /aggregator/
Disallow: /comment/reply/
Disallow: /contact/
Disallow: /logout/
Disallow: /node/add/
Disallow: /search/
Disallow: /user/register/
Disallow: /user/password/
Disallow: /user/login/
# Paths (no clean URLs)
Disallow: /?q=admin/
Disallow: /?q=aggregator/
Disallow: /?q=comment/reply/
Disallow: /?q=contact/
Disallow: /?q=logout/
Disallow: /?q=node/add/
Disallow: /?q=search/
Disallow: /?q=user/password/
Disallow: /?q=user/register/
Disallow: /?q=user/login/

gpk’s picture

Could it be the path "user" (which reduces to the same thing as "user/login" when not logged in) that Google is indexing?

gpk
----
www.alexoria.co.uk

chris_huh’s picture

Ah, i see what you mean. Perhaps it is. I will add user to the list of disallowed paths and see what happens. Thanks

Right, i have added that, buti suppose i have to wait a while for the search engines to realise.

gpk’s picture

Fingers crossed! You might also be able to see what the URL of the page is that is in the search engine index.

gpk
----
www.alexoria.co.uk

chris_huh’s picture

The URL in google (the green one on the search results page) shows the full users/login page. Which doesn't bode well, but it might be alright.

gpk’s picture

Oh dear! Doesn't bode well!

I wonder if any of the Google webmaster tools might shed some light on this. Or if the metatags module is putting a tag in the user/login page itself that is deemed (e.g. by Google) to override the robots.txt.

Only other way I can think of is to write a custom module to check for the Googlebot user agent in hook_init() and do a drupal_access_denied() or somesuch. Might interfere with cacheing though. You could even perhaps do the check in settings.php which might get round cacheing problems.

gpk
----
www.alexoria.co.uk