I'm having a problem with Yahoo Finance RSS feeds.

My problem is that the date is not being parsed correctly from the article.

I have the "Retrieve From Feed" checked on in the settings and am using the "FeedAPI Node" processor. I use the built in Parser Common Syndication because I have a lot of feeds and the SimplePie parser has memory issues with the amount of feeds I am parsing.

I can't find any information that helps me solve this problem.

Comments

aron novak’s picture

Cactii1’s picture

Mmmm... I see.

So I guess the question is: "Is there any way to fix this?"

Cactii1’s picture

In the parser_common_syndication.module in the function _parser_common_syndication_RSS20_parse

Add this hack here...

    $item = new stdClass();
    $item->title = _parser_common_syndication_title($title, $body);
    $item->description = $body;
    $item->options = new stdClass();
    $item->options->original_author = $original_author;
    
//Hack for YAHOO!
  $date = $news['pubDate'];
  if(substr($date, -7) == 'Etc/GMT') {
    $date = (substr($date, 0, strlen($date)-7) . '+0400');
  }
  $news['pubDate'] = $date;
//Hack for YAHOO!

    $item->options->timestamp = _parser_common_syndication_parse_date($news['pubDate']);
    $item->options->original_url = $original_url;
    $item->options->guid = $guid;
    $item->options->domains = $additional_taxonomies['RSS Domains'];
    $item->options->tags = $additional_taxonomies['RSS Categories'];
    $parsed_source->items[] = $item;
gsnedders’s picture

For compat. for both HTTP and RSS, when parsing RFC 822 dates, you have to treat "zone" as optional and ignore any trailing garbage.

don@robertson.net.nz’s picture

I am having a similar problem with a feed from Moodle. The feed items pubdate passes the feed validator mentioned above (the feed fails some other things though).

<pubDate>Mon, 23 Mar 2009 14:26:47 GMT</pubDate>

Can the feed validator handle the GMT timezone or should I use something similar to the above?

The feed url:
http://moodle.org/rss/file.php/1/1/forum/1/rss.xml

Validator results:
http://feedvalidator.org/check.cgi?url=http%3A%2F%2Fmoodle.org%2Frss%2Ff...

aron novak’s picture

I tried out that moodle feed with Common Syndication Parser and the date parsers fine.
Before you evaluate the result what you see, please be aware of the following:
You can configure Drupal timezone setting and it affects the date what you'll see.

Cactii1’s picture

I've only ever found the problem with Yahoo! feeds. They don't specify their date correctly and the sysytem does not like that.

The date in the above Moodle feed looks fine.

don@robertson.net.nz’s picture

Okay - so it is not the 'GMT' bit.

Everything from the feed is dated with the refresh time - regardless of the date in the feeds, which is often days ago. I've checked my timezones etc and it seems to be okay.

I am using the feedapi_feedmapper module. Should I be setting the options->timestamp: to 'Map to created (node)'?

I am getting the same thing on some other feeds as well - but other feeds - even from the same site - work fine.

Any suggestions would be appreciated. I am going to copy the site to another machine and try it, then turn off modules/themes to see if I find anything.

don

don@robertson.net.nz’s picture

Moodle Feed: I deleted all the feed items, set the feedapi node to 'Use time of download' and saved, then set it to 'Retrieve from feed' and saved, refreshed and it gets the correct time. WTF?

I have three other feeds, all from the same site, that are giving me problems. I think they must be created by hand, because they are not consistant. Sometimes a feed will use:

<pubDate>
	yyyy-mm-dd
</pubDate>

which is not valid, and other times it will use

<dc:date>
	yyyy-mm-dd
</dc:date>

which is - at least according to feedvalidator.org.

But it does give me some inconsistant results. A couple of examples:

http://www.educationcounts.govt.nz/__data/assets/file/0014/29003/statist...
http://feedvalidator.org/check.cgi?url=http%3A%2F%2Fwww.educationcounts....

<item>
 <title>Homeschooling as at 1 July 2008</title>
 <link>http://www.educationcounts.govt.nz/statistics/schooling/homeschooling2/homeschooling/32587</link>
 <guid>http://www.educationcounts.govt.nz/statistics/schooling/homeschooling2/homeschooling/32587</guid>
 <description>At 1 July 2008 there were 6,501 home schooled students recorded on the Ministry of Education’s homeschooling database, which represents less than one per cent of total school enrolments at July 2008. These students belonged to 3,379 families. </description>
  <dc:date>
 2008-10-09
  </dc:date>
</item>

Gives me this in the feedapi-mapper Feed item example box:

Array
(
    [title] => Homeschooling as at 1 July 2008
    [description] => At 1 July 2008 there were 6,501 home schooled students recorded on the Ministry of Education’s homeschooling database, which represents less than one per cent of total school enrolments at July ...
    [options] => Array
        (
            [original_author] => Statistics on Education Counts
            [timestamp] => 1201777200
            [original_url] => http://www.educationcounts.govt.nz/statistics/ece/ece_staff_return/licensed_services_and_licence-exempt_groups/17812
            [guid] =>
            [domains] =>
            [tags] =>
        )

)

The timestamp from this is:

don@bassak:~/www$ date -d @1201777200 -u
Thu Jan 31 11:00:00 UTC 2008

And the output on the page:

Homeschooling as at 1 July 2008
Don Robertson on 31/03/2009 11:59:37 AM

At 1 July 2008 there were 6,501 home schooled students recorded on the Ministry of Education’s homeschooling database, which represents less than one per cent of total school enrolments at July 2008. These students belonged to 3,379 families.

i.e, three different times for the item.

Example 2:
http://feedvalidator.org/check.cgi?url=http%3A%2F%2Fwww.educationcounts....

http://www.educationcounts.govt.nz/__data/assets/file/0007/24388/publica...

<item>
  <title> Review of the International Student Levy
 </title>
  <link>http://www.educationcounts.govt.nz/publications/international/36227 </link>
   <guid>http://www.educationcounts.govt.nz/publications/international/36227 </guid>
 <description>
 This publication is an independent review of the International Student Levy (ISL or the Levy) which currently applies to primary and secondary state and state-integrated schools that receive tuition fees from international-fee paying students studying in New Zealand.
   </description>
  <dc:date>
 2009-03-25
  </dc:date>
</item>
Array
(
    [title] =>  Review of the International Student Levy

    [description] =>
 This publication is an independent review of the International Student Levy (ISL or the Levy) which currently applies to primary and secondary state and state-integrated schools that receive ...
    [options] => Array
        (
            [original_author] => Publications on Education Counts
            [timestamp] => 1238461948
            [original_url] => http://www.educationcounts.govt.nz/publications/tertiary_education/35980
            [guid] => http://www.educationcounts.govt.nz/publications/tertiary_education/35980
            [domains] =>
            [tags] =>
        )

)
don@bassak:~/www$ date -d @1238461948 -u
Tue Mar 31 01:12:28 UTC 2009

Review of the International Student Levy
Don Robertson on 31/03/2009 01:12:24 PM

This publication is an independent review of the International Student Levy (ISL or the Levy) which currently applies to primary and secondary state and state-integrated schools that receive tuition fees from international-fee paying students studying in New Zealand.

    * Education
    * Education Counts: Publications

    * Feed: Education Counts: Publications
    * Original article

NOTE: I am at UTC +12, so the timestamp does at least almost match the node creation time. the timestamp cahnges every time I view the 'Map' page.

Anyway - probably caused by the feed. I'll email the site, and see if they can fix the feeds, but otherwise I leave it in your capable hands.

aron novak’s picture

"Moodle Feed: I deleted all the feed items, set the feedapi node to 'Use time of download' and saved, then set it to 'Retrieve from feed' and saved, refreshed and it gets the correct time. WTF?"
It seems you simply misconfigured the module.
"Created date of item nodes" - i think this is clear enough. If it's not, please recommend a text for this settings to make it obvious for the users.

Cactii1’s picture

Yes... Welcome to the world of feed parsing magic. Very black box, fiddle through, oh help me jeeses kind of stuff.

The problem with feeds is that we're relying on the quality of other people's work.

socialnicheguru’s picture

Title: Date Not Being Parsed Correctly » Date Not Being Parsed Correctly- setting to Use time of dowload then back to retrieve from feed might help solve the issue

Ok this is the WEIRDEST thing!

#10 I am right there with you.

I have spent a few weeks coming back to this issue during development.

did what you suggested and voila... who the f*******K knew! OMG. I love Drupal but sometimes....

Chris

aron novak’s picture

Status: Active » Fixed

Ok, i make this fixed then.

socialnicheguru’s picture

I don't know if this is a fix vs. a work around.

Maybe someone can put this on the readme and project page so people know what to do if this happens to them. How about reviewed and tested by community?

Status: Fixed » Closed (fixed)

Automatically closed -- issue fixed for 2 weeks with no activity.