Support for Drupal 7 is ending on 5 January 2025—it’s time to migrate to Drupal 10! Learn about the many benefits of Drupal 10 and find migration tools in our resource center.
By rajesh190888 on
Hi,
Before it was working fine
But suddenly
In my site,
Google is start indexing "www.mysite.com/comment/reply/4432"
instead of "www.mysite.com/acutal-path"
In my Robots.txt
It is clearly mentioned that
User-agent: *
Crawl-delay: 10
# Paths (clean URLs)
Disallow: /admin/
Disallow: /comment/reply/
# Paths (no clean URLs)
Disallow: /?q=admin/
Disallow: /?q=comment/reply/
But still it is indexing so... What to do ?
Comments
The robots.txt tells Google
The robots.txt tells Google not to crawl those urls. But the node pages are probably outputting links to those comment urls, so it's still going to notice that those comments urls exist, and put them in the index despite not visiting them. Also see my answer to this here.
So basically, use a meta robots tag "noindex,follow" on those the /comment/ paths instead of relying on robots.txt. You can do this with a bit of custom code like in the linked answer, or use metatag module with context. Note that you'll have to allow the /comment/ urls again in your robots.txt, otherwise the bot can't see your noindex tag.
Yet another approach is by returning a http 410 code to search bots only, as in this answer. Wiki on the 410 http status:
So technically this will work, but I don't know if there are any SEO drawbacks to pretending the resource is removed when it's not actually removed for normal visitors.
problem is solved. I just put
problem is solved.
I just put disallow: comment/* in robots.txt