Drupal.org

Drupal.org aggregator stores news posts broken, Drupal Planet broken

Project:Drupal.org infrastructure
Component:Other
Category:bug report
Priority:normal
Assigned:Unassigned
Status:closed (fixed)

Issue Summary

50% of aggregated items look weird on http://drupal.org/planet page.
A post from Dries is also displayed incorrectly..

AttachmentSize
planet_item.png11.55 KB

Comments

#1

That's odd...

#2

I have seen this before on posts, and it fixed itself after some time. All posts paging back seem to be affected, so it looks like a display issue. It looks like HTML tag markers are removed, but the tags and attribute values themselves are kept. I've seen this multiple times before but I don't know what causes it. Whenever it happened, by the time I raised it, it solved itself. Look like we have a more lasting occasion now, so maybe we will have the opportunity to debug. An example:

pimg src=http://www.glennburks.com/sites/default/files/planet-earth.jpg alt=planet-earth.jpg border=0 width=100 height=100 align=right / p

#3

Just to note this problem can be seen on RSS feeds as well as on site.

Sorry, no ideas on a solution.

#4

Looks lie I got beaten in reporting this. But an additional datapoint.
Unlike Gabor when I noticed the problem yesterday(and still true now), it was not impacting all entries being displayed. The last mangled entry currently is Bonneville, and the next one comes out with correct formatting.

Upcoming talk: Totally Rocking Your Drupal Development Environment

Sacha Chua - January 21, 2009 - 16:08

On February 12, at 12 EST, I'll be giving a teleconference presentation to the IBM Drupal Users Group. =) It's internal-only, but I wanted to post it here because I often need to look up my abstracts and bios. The abstract is the same as the talk I submitted to DrupalCon09 (Totally Rocking Your Development Environment), but I'll add some more IBM-specific tips.

Abstract:

#5

Looks like the formatting issue I've seen on some PIFR comments, e.g. Failed: 8789 passes, 53 fails, 8 exceptions a href=http://testing.drupal.org/pifr/file/1/file_30520_9.patchDetailed results/a

#6

Any chance the allowed tags got corrupted?

Here's what's listed at http://drupal.org/admin/content/aggregator/settings : http://drupalbin.com/5209

Alternately, if I manually go and refresh the individual feeds, the formatting is corrected. I tried this yesterday with a few feeds, but couldn't get Bert's post cleaned up. Today I just did this for Tao's post and it cleaned up.

Suggestions?

#7

I am going to cite some examples. Bert's feed: http://willy.boerland.com/myblog/taxonomy/term/60/0/feed

http://willy.boerland.com/myblog/ahold_using_drupal is stripped of tags. If I remove all of Bert's feed items and update the feed we see there's no change in the formatting on planet.

#8

Refreshing the http://www.digett.com/taxonomy/term/9/0/feed feed has no impact.

But when remove all the items from the pingVision feed: http://pingv.com/taxonomy/term/92/0/feed and update the feed the formatting on Drupal planet is just fine.

#9

I removed the items in John Forsythe's feed, http://blamcast.net/articles/drupal/feed, and updated it. Then that item worked.

#10

There were always some posts on the planet which were unformatted, but mostly from the same person. I think nodes with special filter (maybe Markdown filter) were always corrupt.
here is a comment about wrong format: http://rocktreesky.com/nice-menus-getting-nicer#comment-573

but as I remember posts from willy.boerland.com were ok on the planet..

#11

Yes, mine have been messed up a few times (from rocktreesky.com) but I don't use any funky formats, only Filtered or Full HTML.

#12

As reported by Jamie Holly on the development mailing list, both those issues probably come from a bug in libxml2:

http://bugs.php.net/bug.php?id=45996

#13

#2 I'd like to agree with Gabor's assessment that this is a display issue. [edit: But we] won't be able to tell for sure what's going on without looking at the DB and at code, otherwise we're condemned to speculation. Who's got access and who's working on it?

#9 / amazon: did you try clearing the caches?

#14

Title:Planet: Missing input filter?» Drupal.org aggregator stores news posts broken, Drupal Planet broken
Project:Drupal.org webmasters» Drupal.org infrastructure

I've looked into the database table for aggregator_item, and it already has broken data:

pSo eventually people started to experiment building new tools on top of project issues to help manage project planning, overviews and checklists. Here is a hopefully comprehensive list of what tools people built on top:/p
pa href=http://hojtsy.hu/blog/2009-jan-23/selfmade-project-planning-tools-drupalorg target=_blankread more/a/p

Retitling for what looks like a bug in the system setup or a Drupal 5 core bug. I did run a diff on the drupal.org aggregator module code, and it said there are no changes made to that module, so look like it is a bug in the infrastructure. Moving to that queue.

#15

I'm pretty sure most of the problem was because of the corrupted tags setting. See #6 above. Once those were fixed, we were able to fix (by delete/add) feeds that had stored bad data. It does need more investigation.

#16

Well, I've reviewed Kieran's "broken allowed tags list", but I did see what is broken with that. Now that the "broken setup" is fixed, resyncing the sources should all be fine, not?

I've tried deleting items and reupdating items from Dries, Ryan Szrama, Tim Millwood and myself. It worked right away for Ryan and myself, but for Dries and Tim, the first attempts ended up with the same broken items. For both, the second attempts ended up with good items.

To me this all sounds like it depends on what web-head (web server instance) the request is fullfilled at, since there is/are buggy instances and good instances as far as I see. I would guess some system upgrade/tweak went on on one of them. This is not a definite analyses of course.

#17

Actually, removing items is not required. It is "easy" to unbreak feeds by running update on them if the update runs on the right web-head. I've fixed Pingvison and Tao Starbow this way. It is equally easy to break feeds though. By re-running the update for Ronan Berder multiple times, it sometimes ends up broken, sometimes ends up right. So whatever feeds we just happened to fix on the web UI will easily become broken again on the next round of automated cron runs, if that runs on a broken machine.

#18

This problem seemed to crop up around the same time most of the infrastructure suffered an...unplanned restart. If this is an issue with PHP, this could have forced in a new PHP version. The webnodes that got restarted are www3 and 4 so I'll look for differences there.

#19

And indeed, several feeds including mine and Dries just got broken after it was running through an automated update again. Futile to try and fix these manually.

#20

We are also getting this problem on the Drupal association planet. http://association.drupal.org/aggregator/categories/1

@alex_b: I did not try clearing the caches.

#21

Status:active» fixed

Narayan downgraded libxml on the affected hosts. The feeds should fix themselves once they are auto-updated.

#22

Status:fixed» closed (fixed)

Automatically closed -- issue fixed for 2 weeks with no activity.