By chris_huh on
Is there a way to make sure that the user/login page doesn't show up when searching in search engines (ie google). I am using the XML Sitemap, SEO Checklist and Metatags modules, and was hoping there would be a way you could tell it not to include a page when the search engine robots scan through.
I have foudn a way with the metatags module to edit individual pages and tell them not to be included in the robots. How can i make that apply to the user/login page (since it isn't an actual page, and i can't edit it).
Thanks
Comments
That's what the robots.txt
That's what the robots.txt file in the Drupal root folder is for. A quick internet search should tell you what you need.
gpk
----
www.alexoria.co.uk
gpk
----
www.alexoria.co.uk
Ah thanks. I tried
Ah thanks. I tried searching, but didn't really know what to search for.
I noticed the metatags module allows you to set the default for all pages to be to not include it (then i can obviously edit each page i want on there). But i will look at the robots.txt file.
>set the default for all
>set the default for all pages to be to not include it
Ah when I last used metatags (rather a long time ago) it wasn't so flexible. However if user and user/logon are the only pages you don't want to be indexed then robots.txt is probably still the way to go.. :)
gpk
----
www.alexoria.co.uk
gpk
----
www.alexoria.co.uk
I checked my robots.txt file
I checked my robots.txt file and it says that user/login should be disabled already. I haven't changed the robots.txt file, so it is the default file that came with drupal. It is in the root of the site.
User-agent: *
Crawl-delay: 10
# Directories
Disallow: /database/
Disallow: /includes/
Disallow: /misc/
Disallow: /modules/
Disallow: /sites/
Disallow: /themes/
Disallow: /scripts/
Disallow: /updates/
Disallow: /profiles/
# Files
Disallow: /xmlrpc.php
Disallow: /cron.php
Disallow: /update.php
Disallow: /install.php
Disallow: /INSTALL.txt
Disallow: /INSTALL.mysql.txt
Disallow: /INSTALL.pgsql.txt
Disallow: /CHANGELOG.txt
Disallow: /MAINTAINERS.txt
Disallow: /LICENSE.txt
Disallow: /UPGRADE.txt
# Paths (clean URLs)
Disallow: /admin/
Disallow: /aggregator/
Disallow: /comment/reply/
Disallow: /contact/
Disallow: /logout/
Disallow: /node/add/
Disallow: /search/
Disallow: /user/register/
Disallow: /user/password/
Disallow: /user/login/
# Paths (no clean URLs)
Disallow: /?q=admin/
Disallow: /?q=aggregator/
Disallow: /?q=comment/reply/
Disallow: /?q=contact/
Disallow: /?q=logout/
Disallow: /?q=node/add/
Disallow: /?q=search/
Disallow: /?q=user/password/
Disallow: /?q=user/register/
Disallow: /?q=user/login/
Could it be the path "user"
Could it be the path "user" (which reduces to the same thing as "user/login" when not logged in) that Google is indexing?
gpk
----
www.alexoria.co.uk
gpk
----
www.alexoria.co.uk
Ah, i see what you mean.
Ah, i see what you mean. Perhaps it is. I will add user to the list of disallowed paths and see what happens. Thanks
Right, i have added that, buti suppose i have to wait a while for the search engines to realise.
Fingers crossed! You might
Fingers crossed! You might also be able to see what the URL of the page is that is in the search engine index.
gpk
----
www.alexoria.co.uk
gpk
----
www.alexoria.co.uk
The URL in google (the green
The URL in google (the green one on the search results page) shows the full users/login page. Which doesn't bode well, but it might be alright.
Oh dear! Doesn't bode
Oh dear! Doesn't bode well!
I wonder if any of the Google webmaster tools might shed some light on this. Or if the metatags module is putting a tag in the user/login page itself that is deemed (e.g. by Google) to override the robots.txt.
Only other way I can think of is to write a custom module to check for the Googlebot user agent in hook_init() and do a drupal_access_denied() or somesuch. Might interfere with cacheing though. You could even perhaps do the check in settings.php which might get round cacheing problems.
gpk
----
www.alexoria.co.uk
gpk
----
www.alexoria.co.uk