Hi webmasters,

I've noticed in my Google searches that it's indexing the RSS feeds for issue queues. See the following search for example:

http://www.google.com/search?hl=en&client=firefox-a&rls=org.mozilla%3Aen...

If you click on the first link, you are taken to a pure RSS feed which is clearly not the intended result. This happens quite often, I have to carefully check my Drupal.org search results before I click because many of them will be RSS feeds.

Do you think you could add a clause to robots.txt to stop Google from spidering difficult-to-get-at information?

Thanks
Mark

Comments

mikey_p’s picture

Seems like it's more of a Google issue to be honest. I could see uses where an individual may want to be able to search for RSS feeds in Google, but Google needs to do a better job of showing the user what they are looking for.

Seems like a better solution may be to find a way for google to know that it's an rss feed, maybe make sure that all rss feeds have an alias that ends in rss.xml or at least just xml, so that you could add "-filetype:xml" to your search to exclude RSS feeds.

Mark Theunissen’s picture

Ok thanks ... will try some other approaches.

Fintan’s picture

From Google on this issue

" As a webmaster, you may have been concerned about your RSS/Atom feeds crowding out their associated HTML pages in Google's search results. By serving feeds, we could cause a poor user experience:

1. Feeds increase the likelihood that users see duplicate search results.
2. Users clicking on a feed may miss valuable content available only in the HTML page.

To address these concerns, we prevent feeds from being returned in Google's search results, with the exception of podcasts (feeds with multimedia enclosures). "

see full details here : http://googlewebmastercentral.blogspot.com/2007/12/taking-feeds-out-of-o...

Seems to me that they are already addressing this issue and its one they accept they need to fix.

Mark Theunissen’s picture

Ok cool, but the problem is that whatever they're doing doesn't work. See the following search:

http://www.google.co.za/search?q=inurl%3Arss%3Fprojects+site%3Adrupal.or...

I think Google is obviously missing the fact that URLs in Drupal, with the following in the title /rss?projects= are RSS feeds.

Any way to tell Google this is the case?

Fintan’s picture

Options are:

Raise an issue here http://groups.google.com/group/Google_Webmaster_Help-Indexing/topics dont hold your breath !

Use webmaster console to remove the rss urls and then add a no-index tag to the feeds from here on in and/or add a robots.txt exclusion to those files.

Mark Theunissen’s picture

Status: Active » Closed (fixed)