I noticed that the last update was over a week ago. Is Cron Job not working?

Comments

Steven’s picture

The problem is with feedster.. one of the posts is from a gb2312 encoded feed, and it's being included verbatim in the results (which are UTF-8 encoded). The XML parser (correctly) stops because the feed contains many invalid characters.

I checked it out, and as far as I can see the gb2312 feed is correct. Maybe feedster doesn't have a convertor for that.

Doing a UTF-8 validator in Drupal is not impossible, but IMO it's not our problem. Suppose a feed suddenly contained NULL characters, would we have to be able to handle that too? IMO not.

Iangbruk’s picture

I ended up hitting "reply to this comment" but was I hoping for was a button "take a tutorial to understand this comment". :-) It's great to have people of your technical know how on board this project.

I feel that Drupal ultimately comes down to managing knowledge. So when one of the main blocks on the front page of the web site isn't working I think it's best to fix it. "Drupal Talk" to me is a great value add as a monitor of weblog traffic concerning Drupal. If Feedster is not up to snuff in delivering this information should an alternative be sought?

Steven’s picture

UTF-8 and gb2312 are two methods of encoding characters (others are ASCII, ANSI Codepages, ISO-8859-1, ...). UTF-8 is a transformation of Unicode, which is a universal character set. GB2312 is a Chinese-specific character set.

Data-collecting services such as Google or Feedster have to convert everything into one encoding to be able to show everything together: obviously it's best to pick a Unicode encoding so every character can still be represented.

Feedster's feed says it is encoded as UTF-8, but it contains GB2312 characters which are garbage when viewed as UTF-8. When Drupal receives the feed, it notices the feed contains garbage data and stops. The problem will go away when either Feedster fixes that particular issue or when the broken posts stop being part of the results.

boris mann’s picture

Might have better results. Or, fix the Drupal aggregator so it skips feeds and/or posts with problems.

Ian is correct when he says that it needs to get fixed, *especially* on the front page.

rayg’s picture

the actual feed was broken for over a day as feedster was pushing out new changes, but i talked to scott@feedster and he fixed it. but the aggregator doesn't like the new feed. it's valid xml and no other errors (like the lame 'suspicious input data') are returned when i add the feed to my own drupal install. don't know what the previous version looked like so i can't compare.

druvision’s picture

I have similar problems with drupal 4.6 when trying to parse external feeds. I am trying to parse rss feeds in the windows-1255 format (e.g this feed). From my point of view, drupal should go one step further, and automatically convert them to utf-8 by using iconv / recode string.

Question is how to do it? Where is the correct module? Was it solved by 4.7?

Thanks

Amnon
-
Personal: Bring Dolphin's Simple Joy to your Work - Job - Career
Community: Drupal Israel
Professional: My Eco Web Strategy Blog (Hebrew)