Common syndication parser fails to parse some valid feeds
jeffmurphy - May 27, 2008 - 16:44
| Project: | FeedAPI |
| Version: | 5.x-1.x-dev |
| Component: | Code |
| Category: | bug report |
| Priority: | critical |
| Assigned: | Aron Novak |
| Status: | active |
Jump to:
Description
Drupal 5.7. When I create a new feed and specify a URL, the URL is not saved. If I look at:
function _feedapi_insert(&$node, $teaser, $page) {
if (isset($node->feed->url) && isset($node->feed->feed_type)) {the $node variable contains no feed member. I don't know enough about feedapi internals to figure out why. Can you help? I've uninstalled and reinstalled FeedAPI and SimplePie in an attempt to start with a clean slate, but that had no effect.
print_r says that $node contains:
stdClass Object ( [nid] => 285 [vid] => 285 [type] => feedapi_node [status] => 1 [created] => 1210550714 [changed] => 1210554368 [comment] => 2 [promote] => 1 [sticky] => 0 [revision_timestamp] => 1210554352 [title] => test [body] => test [teaser] => test [log] => [format] => 3 [uid] => 1 [name] => jcmurphy [picture] => [data] => a:7:{s:23:"subscriptions_subscribe";i:0;s:18:"subscriptions_auto";i:0;s:20:"subscriptions_teaser";i:0;s:17:"messaging_default";s:9:"html_mail";s:17:"mimemail_textonly";i:0;s:25:"notifications_send_method";s:4:"mail";s:27:"notifications_send_interval";s:1:"0";} [nodewords] => Array ( [description] => [keywords] => ) [attachments] => [0] => [last_comment_timestamp] => 1210550714 [last_comment_name] => [comment_count] => 0 [taxonomy] => Array ( [2] => Array ( [5] => 5 [3] => 3 [4] => 4 ) ) [date] => 2008-05-11 19:05:14 -0500 [revision] => 0 [preview] => Preview [op] => Submit [submit] => Submit [delete] => Delete [form_token] => e74fcc2c4d837c1bd927288a8441fda0 [form_id] => feedapi_node_node_form [menu] => Array ( [title] => [description] => [pid] => 1 [path] => [weight] => 0 [mid] => 0 [type] => 86 ) [path] => [feedapi] => Array ( [feedapi_url] => https://rhn.redhat.com/rpc/recent-errata.pxt [refresh_on_create] => 0 [update_existing] => 0 [skip] => 0 [items_delete] => 0 [processors] => Array ( [feedapi_inherit] => Array ( [inherit_taxonomy] => 1 ) [feedapi_node] => Array ( [content_type] => story [node_date] => feed [promote] => 0 [x_dedupe] => ) ) ) [feedapi_feed_object] => stdClass Object ( [link] => ) [feed] => stdClass Object ( [url] => [processors] => Array ( [2] => feedapi_node [3] => feedapi_inherit ) [parsers] => Array ( [0] => parser_simplepie ) [link] => ) [validated] => 1 [is_new] => )
#1
Same problem - tried both parsers and dev snapshot form 2008-May-09 as well...
#2
Upon further looking at it, it can't possibly work for me. The data feedapi expects to make an insert into the {feedapi} table is simply not in the node object - why I don't know. I suspect that something on this current Drupal install is interfering, since I had this working previously on another install without problems.
Here's why it can't even get started:
<?php
// this is what feedapi expects upon submit to even try an insert...
// $node->feed->url
// $node->feed->feed_type
function _feedapi_insert(&$node, $teaser, $page) {
if (isset($node->feed->url) && isset($node->feed->feed_type)) {
db_query("INSERT INTO {feedapi} (
nid, url, link, feed_type, processors,
parsers, checked, settings) VALUES
(%d, '%s', '%s', '%s', '%s', '%s', %d, '%s')",
$node->nid,
$node->feed->url,
$node->feed->options->link,
$node->feed->feed_type,
serialize($node->feed->processors),
serialize($node->feed->parsers),
0,
serialize(array())
);
....
// but this is what gets submitted
/*
[feedapi] => Array
(
[feedapi_url] => feed://www.youtube.com/rss/user/youtube/videos.rss
[refresh_on_create] => 1
[update_existing] => 1
[skip] => 0
[items_delete] => 0
[processors] => Array
(
[feedapi_node] => Array
(
[content_type] => story
[node_date] => feed
[promote] => 0
[x_dedupe] =>
)
)
)
[feedapi_feed_object] => stdClass Object
(
[link] =>
)
[teaser] =>
[feed] => stdClass Object
(
[link] =>
)
*/
?>
Any ideas anybody?
jeffmurphy above has a few more bits in the feed object, but also misses the url...
#3
I also tried feedapi-5.x-1.x-dev with the same results. It won't save the URL and the $node object doesn't contain the [feed] member.
#4
Me too --- I thought I understood this yesterday, because when I turned OFF the feedapi_mapper module, I could start saving URL's. But today it's mysteriously stopped working again. No error logged at all, even with E_ALL turned on in PHP. But you can't save URL's at all. I don't know why this behavior seems to come and go at this point - but it's mostly not working.
#5
At least in my case, I've discovered that this seems to be totally feed-dependent.
For example, the feed from drupal.org will save just fine - http://drupal.org/node/feed
However - at least one blogger feed will not save
http://yanquimike.blogspot.com/feeds/posts/default (and yes, it also doesn't work if I use the alternate RSS feed).
So I think that (at least in my case) it works reasonably well except for Blogspot blogs.
I have ONE blogspot blog that is being run through feedburner - and after it goes through feedburner it works.
This is a pretty small sample of data, though, so it's entirely possible that something else is afoot.
#6
If feeds can't be read on saving (for whatever reason) the URL isn't saved.
I'm guessing that the difference in this case is simplepie isn't parsing the feed... At present the big difference between your feeds is the one that works (drupal.org is a valid feed http://feedvalidator.org/check.cgi?url=http%3A%2F%2Fdrupal.org%2Fnode%2F... ) and at the moment the blogspot one isn't ( http://feedvalidator.org/check.cgi?url=http%3A%2F%2Fyanquimike.blogspot.... ) This would also explain why it works after it's been through feedburner - as they will probably have some more relaxed parser and a stricter more correct output. I've used yahoo! pipes for this as well in the past.
#7
"If feeds can't be read on saving (for whatever reason) the URL isn't saved."
I think that sums up all our different scenarios. In my case the feeds are fine, but at my workplace it doesn't work because it's most likely a firewall issue - everything works fine from home.
An error message with a few suggestions (validate feed, make sure firewall doesn't block etc) probably will save confusion AND issue queue entries in the future.
#8
As a point of interest for people who may be stumbling through this -
after reading this note, I've discovered that many of the feeds that I could not read before (using the common syndication parser) seem to read just fine using SimplePie instead. So I'm going to shift over to using SimplePie
By the way - as an enhancement, I think it would be REALLY nice if there were some sort of error message when it can't parse a feed, rather than just failing silently.
I sort of suspected that this was the case, but it's great to have it confirmed. Switching to Simplepie seems to be helping a lot, at least for me.
Thanks for the help.
#9
the feedburner work around works for me. the feed i'm trying to incorporate is:
https://rhn.redhat.com/rpc/recent-errata.pxt
but simplepie and common syndication parser don't like it. adding that feed to feedburner and then pointing to the feedburner url works.
i agree with other peoples' comments on this thread that more verbose error reporting would be very helpful. i changed this from a bug to a feature request.
also, as an aside, if i use common syndication parser, the feed is silently ignored. if i use simplepie, the following error is reported:
that might help identify where it's choking on the feed.
warning: in_array() [function.in-array]: Wrong datatype for second argument in /usr/local/apache_1.3.37/htdocs/sites/all/modules/feedapi/feedapi.module on line 932.warning: in_array() [function.in-array]: Wrong datatype for second argument in /usr/local/apache_1.3.37/htdocs/sites/all/modules/feedapi/feedapi.module on line 932.
warning: Invalid argument supplied for foreach() in /usr/local/apache_1.3.37/htdocs/sites/all/modules/feedapi/feedapi.module on line 934.
#10
Actually, I'm thinking this is actually a bug in the code vs a malformed feed, at least in some cases (perhaps related to the version of the xml rss feed).
For example, Feed validator says the redhat feed is fine:
http://feedvalidator.org/check.cgi?url=http%3a%2f%2frhn.redhat.com%2frpc...
But FeedAPI won't accept it. For now, I'll go with the feed burner workaround.
#11
I have similar problems. i use feedapi for drupal5 and the php5 parser (simplepie throws errors) . I can't use flickr feeds (http://api.flickr.com/services/feeds/photos_public.gne?tags=drupal&forma...), not even when they are feedburned (http://feeds.feedburner.com/RecentUploadsTaggedDrupal?format=xml)! Flickr feeds validate although they have some strange stuff inside of them (http://feedvalidator.org/check.cgi?url=http%3A%2F%2Fapi.flickr.com%2Fser...).
The feed urls just don't get saved whe saving the new feed. From the above i deduce that's because they don't get parsed, although they are valid feeds.
#12
Okay, about the flickr feeds (see comment 11). They do work with the simplepie parser. (don't forget to enable the simplepie parser in the feed content-type after you have enable the simplepie parser module and disabled the php5 parser module!).
That turns this bug into: "Common syndication parser fails to parse some valid feeds (example feed)"
Quick solution: "Use simplepie parser instead"
#13
I tried to reproduce the problem, but I could add and refresh all of the example feeds (redhat, flickr,. etc) without problem.
Currently the ticket is assigned to version 1.2. I used the latest dev package.
Tips:
- FeedAPI had a nasty caching bug. Maybe it helps if you purge the cache directory of parser_common_syndication (files/parser_common_syndication_cache)
Please share the details:
- Which version do you use? If 1.2, please try it out also with the latest dev package.
- What about SimpleTest tests of FeedAPI?
- PHP version number
- Does purging the cache directory help?
#14
Not sure if this is the issue, but check that the feed is delivered as an xml/application in the response headers. I had a feed that would not work on my standard webserver, but when placed into a directory designed for feeds the issue was resolved. I think that the only difference in the two scenarios was the content type headers.
#15
We're having similar problems with https feeds and simplepie. Seems that at least part of the problems here are related to https.
#16
I've set #279248: Common Syndication Parser does not work as a Duplicate of this. They share the same problem regarding the Common Syndication Parser.