hook_feedapi_item_save($feed_item, $fid) seems to be called with $feed_item only having a few settings (title, description, etc).

How is an item processor supposed to get at all the other fields that were in the original feed? (and available in the parser). Without those, we can't populate the node with all the information available in the feed (eg: itunes tags, enclosures, etc).

Duplicating the parser, and adding your extra code would be an option of course, but that's cut'n'paste reuse :) Maybe the parsers could have a hook to allow modules to embellish the default behaviour implementation?

Comments

aron novak’s picture

Assigned: Unassigned » aron novak

As you maybe saw that there is a primary/secondary parser setup in FeedAPI. The primary parser is responsible for filling all the "must-have" fields (title, description) and secondary parsers can add fields to $feed->options and $feed->items[n]->options . So when you create a parser for using it as secondary only, you don't have to write the code that take care of core fields. And maybe you noticed that downloading the feed is not the part of the parser too. Yes, you're right that there is a little bit duplication (XML processing), but the main concept was that the parsers are fully interchangeable. I'll consider the possibility to put hooks into the default parser for adding additional fields by external modules.

lyricnz’s picture

StatusFileSize
new775 bytes

How's this? (see patch)

Here's what an implementation of the hooks might look like:

/**
 * Implementation of parser_simplepie_feed().
 * @param $feed The feed's url
 * @param $parser The instance of SimplePie
 * @param $parsed_source A reference to the object that will be returned from parser_simplepie_feedapi_parse()
 */
function motogpod_parser_simplepie_feed($feed, $parser, &$parsed_source) {
  drupal_set_message("motogpod_parser_simplepie_feed($parsed_source->title)");
  $parsed_source->title = 'FOO: ' . $parsed_source->title;
}

/**
 * Implementation of parser_simplepie_item().
 * @param $simplepie_item
 * @param $curr_item
 */
function motogpod_parser_simplepie_item($simplepie_item, &$curr_item) {
  drupal_set_message("motogpod_parser_simplepie_item($curr_item->title)");
  $curr_item->title = 'BAR: ' . $curr_item->title;
}

lyricnz’s picture

Title: How is an item processor supposed to get the other fields in the item? » Enable modules to hook into existing parsers
Category: support » feature
Status: Active » Needs review

Updated title, category and status.

alex_b’s picture

Hi lyricnz,

The problem with your suggestion is, that it requires add on modules to implement a hook-API and call those hooks. We should keep this functionality in feedapi - to keep add on modules simple.

I see the problem that you are describing, though.

Does the simple pie parser not pass on its parsed result to secondary parsers?

Alex

alex_b’s picture

Status: Needs review » Needs work
lyricnz’s picture

No, the parsers are independent.

The reason I had to patch simplepie was because simplepie doesn't keep track of any fields in the feed or item that it doesn't know about (eg: enclosures! and some iTunes specific stuff). This patch allows custom modules to save whatever they want, and use it later when creating the nodes.

FWIW, I also patched feedapi_item to allow modules to mess with the node being created, using the data saved in the parser. Both these changes were required so that I could perform a reasonably simple task: consume a podcast RSS feed, and create audio nodes from it:

- I used the first parser hook to set_time_limit(0); (since downloading the MP3s takes quite a while) IIRC, I also saved a couple of attributes, so I could use them later when creating nodes.

- I used the second parser hook to add the enclosure information from the $simplepie_item into the $feed_item (so we can download it in the item processor)

- I used the extra hook mentioned above to customize $node after feedapi_item had created a default. In my case, I created an audio node using the audio API, then copied the pertinent fields from $mynode into $node.

I think if I was doing it again now, I'd also pass $feed into the item processor, which may reduce the need for the first hook.

alex_b’s picture

Could you post the patch as a cvs diff -u patch?

lyricnz’s picture

StatusFileSize
new864 bytes

The patch in this issue still applies correctly to DRUPAL-5 feedapi (with an offset). Or are you talking about the patch to feedapi_item? It's trivial, attached.

PS: I'm not especially attached to the hook names, or even the exact parameters, I just wanted to open a discussion about the ability to extend existing parsers/processors without duplicating them entirely.

lyricnz’s picture

FWIW, the podcast example I talked about above, used the roughly the following hooks in my module:

(simplepie item hook) This particular bit of code saves the enclosure from the item into the feeditem options, and actually fiddles with the title/description (by pulling the first line of the RSS item description into the title, rather than using the title that was in the <item>.

/**
 * Implementation of parser_simplepie_item().
 * @param $simplepie_item
 * @param $curr_item
 */
function motogpod_parser_simplepie_item($simplepie_item, &$feed_item) {
  // save the enclosure from the simplepie item into the feed options
  $feed_item->options->enclosure = $simplepie_item->get_enclosure();

  // fixup $node from RSS feed
  if (preg_match('#^(Episode[^\r\n]*)#', $feed_item->description, $matches)) {
    $feed_item->title = $matches[1];
    $feed_item->description = preg_replace('#^Episode[^\r\n]*#', '', $feed_item->description);
    $feed_item->options->teaser = $feed_item->description;
  }
}

(feedapi_item save hook) This bit of code uses the information saved above to download the MP3 enclosure, and create an audio node using Audio API. It then copies the information from the node it just created into $node, so when the caller calls node_save, it's really an update, not a create-new. Some error checking/etc removed for clarity.

function motogpod_feedapi_item_savehook(&$feed_item, &$node) {
  // only download the MP3 on a create, not an update
  if (!isset($feed_item->nid)) {

    // determine the URL of the mp3
    $mp3url = $feed_item->options->enclosure->link;

    // removed: fetch the mp3, save as $filename

    // create the new node using audio_api_insert()
    $new_node = audio_api_insert($filename, $feed_item->title);

    // copy information about the audio node into $node
    $node->nid = $new_node->nid;
    $node->audio_tags = $new_node->audio_tags;
    $node->audio_file = $new_node->audio_file;
    $node->audio_fileinfo = $new_node->audio_fileinfo;
  }

  // make other updates to the node
  $vocabs = taxonomy_get_vocabularies('audio');
  if ($vocabs) {
    $vocab = array_shift($vocabs);
    $node->taxonomy['tags'][$vocab->vid] = implode(',', $feed_item->options->enclosure->keywords);
  }
}

To be honest, I haven't updated this client to newer versions of FeedAPI, because I did their site before Aron established an upgrade path :)

aron novak’s picture

Status: Needs work » Closed (works as designed)

FeedAPI was designed differently.
See this:
http://drupal.org/project/new_aggregator
Here the processors access to each other results, so no need to hook into at the most of the cases.