I realize this now on the 3rd feedapi install across a couple of versions. I am not sure wether this is at all feedapi related:
Links to original articles or to original site from Drupal result in a "Stopped" message in FF and no action in other browsers (I haven't tested extensively).
http://news.google.com/news/url?sa=T&ct=us/0-0&fd=R&url=http://gigaom.co...
Copying URL and pasting it to the browser address field results in the same behaviour.
Using the last part of the URL works fine: http://gigaom.com/2007/10/26/productivity-goes-social-with-jive/&cid=0&e...
Comments
Comment #1
aron novakIf I do this, i experience strange thigs. At the first few times, it got redirected by 301 Moved temporary. After a few tries, i got 403 Forbidden.
It can be a bug of the news.google.com, it's definitely not a bug of feedapi, i think
Comment #2
alex_b commentedIs news.google.com starting to protect their RSS service from aggregators?
Comment #3
alex_b commented... thanks for checking this btw. Alex
Comment #4
Jo Wouters commentedYes, it looks like google filters on user agent:
wget http://news.google.com/news?hl=en&ned=us&ie=UTF-8&q=drupal&output=rssgives a
wget --user-agent="testing" http://news.google.com/news?hl=en&ned=us&ie=UTF-8&q=drupal&output=rssgives:
I did not find any information about in online thought; not in the Terms of Use of Google News, or in any articles that describe this.
Comment #5
alex_b commentedHi Jo,
thanks for checking this out... strange. Really looks like Google is starting to build walls. A way around this would be to filter out the target URL of the article of the news.google.com URL. This would at least be a strategy until Google News doesn't embed the original one anymore.
Any other ideas?
Alex
Comment #6
Jo Wouters commentedAlex,
That would not be a solution:
1) what I tested was trying to get the rss-feed from news.google, and that didn't even work because the wget user-agent is blocked by them (so they block both the rss-feed itself, as the link to the original article)
2) filtering out the target URL would violate their terms of use (http://www.google.com/support/news/bin/answer.py?answer=59255&hl=en ): "include a link to the Google News cluster of related articles for each news item, using the link provided in the Google News feed."
I think the right solution would be to use a user agent that is accepted by Google News. They must have a valid reason to block these kinds of requests.
I posted a question in Google News Help ( http://groups.google.com/group/news-HelpUsers/browse_thread/thread/30315... )
btw. blogsearch.google still seems to accepts requests with wget (without a special user-agent).
Comment #7
alex_b commentedGreat that you posted this question to Google. I am curious to see their response...
Comment #8
aron novakIn my opinion FeedAPI should not include rss-publisher-related ugly hacks. If a site really needs to process such awkward feeds, it should be done by a separate parser.
Comment #9
AntiNSA commentedI am looking at how to parse google feeds correctly... if anyont knows Id appreciate some leads...