If pubDate is 0 or empty, use current time

alex_b - October 29, 2009 - 14:01
Project:Feeds
Version:6.x-1.x-dev
Component:Code
Category:bug report
Priority:normal
Assigned:mcaudy
Status:closed
Issue tags:Novice
Description

#611578: Feeds from PubMed not getting parsed reports that feed http://eutils.ncbi.nlm.nih.gov/entrez/eutils/erss.cgi?rss_guid=1RCQjEc-E... does not show items on Managing News. This is due to the fact that it does not have pubDates defined for the items.

Needs to be fixed on the parser level (SimplePie and Syndication Parser).

This is a novice task.

#1

mcaudy - October 30, 2009 - 00:17
Assigned to:Anonymous» mcaudy

I will work on this.

#2

mcaudy - October 30, 2009 - 10:09

alex_b: I added the following "if" statement after the original timestamp "get_date("U")" on line 48 of FeedsSimplePie parser.inc:

48 $item['timestamp'] = $simplepie_item->get_date("U");

// Set timestamp to current date if 0 or empty or NULL.
if ($item['timestamp'] == "") {
$item['timestamp'] = mktime(0, 0, 0, date("m"), date("d"), date("y"));
}

Everything seems to work fine now. I have successfully parsed feeds from PubMed, HubMed, and Science, all of which were given timestamps = 0 before, and would not display. They all are getting non-"0" (presumably current) timestamps now, and they all display properly.

I will create a patch to submit later, and then also do the corresponding changes for FeedsSyndicationParser.inc.

Thanks for helping me solve this problem, so that users can now parse most or all RSS/Atom feeds from PubMed and other scientific journals.

#3

mcaudy - October 30, 2009 - 17:38
Status:active» needs review

alex_b: does this revision seem likely to you to be sufficient? Or, do you think anything more would be required?

#4

alex_b - October 30, 2009 - 21:59
Status:needs review» needs work

I'd love to see a patch for FeedsSyndicationParser and FeedsSimplePieParser (may be FeedsSyndicationParser does not need any).

#5

mcaudy - October 31, 2009 - 14:54

alex_b: I will do the patch for FeedsSimplePieParser, and then look at the FeedsSyndicationParser to see if it needs any changes. I will try to do this by tomorrow, but I have some other commitments today.

#6

mcaudy - October 31, 2009 - 16:59

Here is the patch for FeedsSimplePieParser, to correct the pubDate = 0 problem with the parser.

This is the first patch that I have done for a project other than my own, so you might want to just look at the diff file. It is a very simple addition of a single "if" statement, so you can probably tell if diff file is correct just by looking at it.

The parser now works correctly for a variety of feeds that were not working before, so the patch seems to be OK.

#7

mcaudy - October 31, 2009 - 17:01

Here is the patch file attachment. It did not upload for some reason.

AttachmentSize
618018_FeedsSimplePieParser.patch 715 bytes

#8

alex_b - October 31, 2009 - 18:39

The approach seems solid. Some issues:

- Use empty() for the if clause http://php.net/manual/en/function.empty.php (Adjust comment accordingly)
- There is an indentation problem (each indentation should consist of 2 spaces)
- Use FEEDS_REQUEST_TIME instead of mktime() - instead of mktime you could use time() but even better is using the PHP constant FEEDS_REQUEST_TIME as it is cheaper than a call to time().

#9

mcaudy - November 1, 2009 - 03:52

alex_b: Thanks for your code improvement suggestions:

- I will use empty();
- I was aware of the 2 space indentation convention. I ran the diff in terminal and copied the output into TextMate without noticing that it added more spaces;
- I have googled for 'FEEDS_REQUEST_TIME' and also searched php.net and drupal.org. I cannot find this constant anywhere. Instead, I find the following, from drupal.org/node/429458:

    "Drupal 7 is using a constant REQUEST_TIME instead of time().
      if (!defined('REQUEST_TIME')) {
      define('REQUEST_TIME', time());
      }"

Is this good? If not, can you point me to where I can find 'FEEDS_REQUEST_TIME' defined?

#10

alex_b - November 1, 2009 - 15:21

FEEDS_REQUEST_TIME is defined in feeds.module

#11

mcaudy - November 1, 2009 - 20:39

OK, I see 'FEEDS_REQUEST_TIME' now. Since you said it is a 'PHP constant', I thought it was defined more generally, and looked for it outside the feeds module, at php.net.

#12

mcaudy - November 2, 2009 - 00:07
Status:needs work» needs review

Here is the patch with the revised use of "empty()" and "FEEDS_REQUEST_TIME" .

I have tested this with several new feeds from sources that were previously lacking timestamps, and the revised patch also works to set any empty timestamp values to the current time. Therefore, they display properly.

I next will test whether the FeedsSyndicationParser needs similar revisions, or whether it is working correctly regarding setting timestamps and displaying the feed items properly.

AttachmentSize
618018_FeedsSimplePieParser_v2.patch 952 bytes

#13

mcaudy - November 2, 2009 - 17:47

Preliminary Results for FeedsSyndicationParser: when FeedsSyndicationParser (with term extraction) is used, properly named feed nodes are created, but no feed-items are added to the feeds_data_syndication table, and the web page shows:

    HTTP - 500 Internal Server Error
    The page you are attempting to load returned an error.

The properly named feed nodes can be seen in the node table, and in tracker, but there are no feed items created and saved in the feeds_data_syndication table.

This testing was done on an app with the patched SimplePieParser. This should not affect the results for the FeedsSyndicationParser, but it is possible that it does. When I have time later, I will create a new virgin Managing News install (also with Feeds alpha5) and test the FeedsSyndicationParser again, but I suspect the results - and errors - will be the same.

If the errors are reproduced, I will created a new issue for FeedsSyndicationParser.

#14

alex_b - November 2, 2009 - 20:06
Status:needs review» fixed

I committed #12 with minor tweak to comment. Thank you.

#13: right, this would be a new issue.

#15

System Message - November 16, 2009 - 20:10
Status:fixed» closed

Automatically closed -- issue fixed for 2 weeks with no activity.

 
 

Drupal is a registered trademark of Dries Buytaert.