Googledesktop doesnt seem to "get" the urls of the atom feed, it thinbks the are abosulte when the are relative and the keep adding "atom" to the existing "atom" url. for example i see

xx.130.111.103 - - [11/Dec/2005:04:53:49 +0100] "GET /myblog/node/atom/atom/atom
/atom/atom/atom/atom/ .... etc... /atom/feed HTTP/1.1" 200 48062 "-" "Mozilla/4.0 (compatible; Google De
sktop)"

i my log files. The next hit of this IP address will likely have an extra "atom" in the url. And since drupal generates a "200" status code, googledesktop doesnt know this is not an invlid url. In fact, it generates the standard node hompage, see for example willy.boerland.com/myblog/node/atom/atom/atom/feed

How should I solve this? Is is atom related? Should I make a rewrite in .htaccess for ataom/atom/... towards /ataom/feed?

CommentFileSizeAuthor
#7 recursive_atom.module.patch1.36 KBdeekayen

Comments

deekayen’s picture

It's not Google desktop specific. I have the same problems when I try to crawl my site to spell check content. I don't see anything in the atom_menu() function that looks out of the ordinary. Perhaps the atom module callback somehow conflicts in core with the core feed.

FlemmingLeer’s picture

I think it´s just a bug in firefox 1.5

I can bring the recurring url forth just by pressing the "sort by lastest"
in ../admin/logs/referrers

1 st time it´s this:
admin/logs/referrers?sort=asc&order=Seneste+bes%C3%B8g

2nd time it´s this:
admin/logs/referrers?sort=asc&order=Seneste+bes%C3%B8g&order=Seneste%20bes%C3%B8g

3rd time:
admin/logs/referrers?sort=asc&order=Seneste+bes%C3%B8g&order=Seneste%20bes%C3%B8g&order=Seneste%20bes%C3%B8g

And that´s just by pressing latest visitors.

So guess what and atom feed will do all by itself :/

I had some 210 MB traffic on that account last month all garbled up.

deekayen’s picture

It's not just a Firefox thing. I tried to spider http://deekayen.net/ with InSpyder InSite (checks for spelling errors and dead links) and I had to stop the spider because it kept getting caught in atom recursion.

killes@www.drop.org’s picture

Status: Active » Closed (won't fix)

Apparently many of our atam browser friends don't understand Drupal's base href tag. Alert the repective author's of this bug.

bertboerland’s picture

Status: Closed (won't fix) » Active

sorry, marking this "wont fix, it is not our problem" it a bit too technocratic way of answering this. it IS a problem and it CAN be fixed. the fact that the source of the problem is not at our site of the stick, doesnt matter.

my system just went beserk because a bot (gigabot) went beserk lost in this recursive atom url.
the load went to 50+. to give you an example

-bash-3.00# tail -1000 /var/log/httpd/access_log | grep "atom/atom" | wc -l
<bold>453</bold>

and

17:10:01           14       157     20.76     12.21      9.09
17:20:04           23       128     11.62      7.80      7.89
17:30:44           24       177     26.33     18.49     12.56
17:40:24           33       182     28.09     25.47     18.94
17:50:01           25       211     25.04     22.14     19.89
18:05:12           17       309     39.65     39.89     32.99
18:11:15            2       303     63.15     52.27     39.47

It is not just google desktop, it is 60% of all the bots and crawlers. So even if it is not "our" (code to) blame, we need to solve this. For the moment I have disabled my atom module...

deekayen’s picture

What about a preg_match in the menu hook to match recursive atom URLs? Anything beyond atom/feed, /blog/atom/feed, or /blog/#/atom/feed returns drupal_not_found().

deekayen’s picture

Status: Active » Needs review
StatusFileSize
new1.36 KB

What about this (against HEAD)

deekayen’s picture

Status: Needs review » Fixed

I committed this patch and made the atom link tag in the HTML head an absolute URL in DRUPAL-4-6, DRUPAL-4-7, and HEAD. Anything that doesn't follow the absolute URLs and continues to recurse through 404 pages I would say is a feed reader bug, not the atom module.

deekayen’s picture

Priority: Normal » Minor
Status: Fixed » Active

When I put the module on a live 4.6 production site, I got

array_merge() [function.array-merge]: Argument #1 is not an array in /users/home/deekayen/web/public/includes/menu.inc on line 351.

when visiting blog/1/atom/feed/atom. It doesn't make sense to me since blog/1/atom/feed, blog/atom/feed/atom and atom/feed/atom don't have the error. Could use some advice from someone with more knowledge about core on this.

bertboerland’s picture

Priority: Minor » Normal
Status: Active » Fixed

please open a new support / bug report and or d/l latest code since this one is solved.

Anonymous’s picture

Status: Fixed » Closed (fixed)