Googledesktop doesnt seem to "get" the urls of the atom feed, it thinbks the are abosulte when the are relative and the keep adding "atom" to the existing "atom" url. for example i see
xx.130.111.103 - - [11/Dec/2005:04:53:49 +0100] "GET /myblog/node/atom/atom/atom
/atom/atom/atom/atom/ .... etc... /atom/feed HTTP/1.1" 200 48062 "-" "Mozilla/4.0 (compatible; Google De
sktop)"
i my log files. The next hit of this IP address will likely have an extra "atom" in the url. And since drupal generates a "200" status code, googledesktop doesnt know this is not an invlid url. In fact, it generates the standard node hompage, see for example willy.boerland.com/myblog/node/atom/atom/atom/feed
How should I solve this? Is is atom related? Should I make a rewrite in .htaccess for ataom/atom/... towards /ataom/feed?
| Comment | File | Size | Author |
|---|---|---|---|
| #7 | recursive_atom.module.patch | 1.36 KB | deekayen |
Comments
Comment #1
deekayen commentedIt's not Google desktop specific. I have the same problems when I try to crawl my site to spell check content. I don't see anything in the atom_menu() function that looks out of the ordinary. Perhaps the atom module callback somehow conflicts in core with the core feed.
Comment #2
FlemmingLeer commentedI think it´s just a bug in firefox 1.5
I can bring the recurring url forth just by pressing the "sort by lastest"
in ../admin/logs/referrers
1 st time it´s this:
admin/logs/referrers?sort=asc&order=Seneste+bes%C3%B8g
2nd time it´s this:
admin/logs/referrers?sort=asc&order=Seneste+bes%C3%B8g&order=Seneste%20bes%C3%B8g
3rd time:
admin/logs/referrers?sort=asc&order=Seneste+bes%C3%B8g&order=Seneste%20bes%C3%B8g&order=Seneste%20bes%C3%B8g
And that´s just by pressing latest visitors.
So guess what and atom feed will do all by itself :/
I had some 210 MB traffic on that account last month all garbled up.
Comment #3
deekayen commentedIt's not just a Firefox thing. I tried to spider http://deekayen.net/ with InSpyder InSite (checks for spelling errors and dead links) and I had to stop the spider because it kept getting caught in atom recursion.
Comment #4
killes@www.drop.org commentedApparently many of our atam browser friends don't understand Drupal's base href tag. Alert the repective author's of this bug.
Comment #5
bertboerland commentedsorry, marking this "wont fix, it is not our problem" it a bit too technocratic way of answering this. it IS a problem and it CAN be fixed. the fact that the source of the problem is not at our site of the stick, doesnt matter.
my system just went beserk because a bot (gigabot) went beserk lost in this recursive atom url.
the load went to 50+. to give you an example
and
It is not just google desktop, it is 60% of all the bots and crawlers. So even if it is not "our" (code to) blame, we need to solve this. For the moment I have disabled my atom module...
Comment #6
deekayen commentedWhat about a preg_match in the menu hook to match recursive atom URLs? Anything beyond atom/feed, /blog/atom/feed, or /blog/#/atom/feed returns
drupal_not_found().Comment #7
deekayen commentedWhat about this (against HEAD)
Comment #8
deekayen commentedI committed this patch and made the atom link tag in the HTML head an absolute URL in DRUPAL-4-6, DRUPAL-4-7, and HEAD. Anything that doesn't follow the absolute URLs and continues to recurse through 404 pages I would say is a feed reader bug, not the atom module.
Comment #9
deekayen commentedWhen I put the module on a live 4.6 production site, I got
array_merge() [function.array-merge]: Argument #1 is not an array in /users/home/deekayen/web/public/includes/menu.inc on line 351.when visiting blog/1/atom/feed/atom. It doesn't make sense to me since blog/1/atom/feed, blog/atom/feed/atom and atom/feed/atom don't have the error. Could use some advice from someone with more knowledge about core on this.
Comment #10
bertboerland commentedplease open a new support / bug report and or d/l latest code since this one is solved.
Comment #11
(not verified) commented