The RSS generation should be extensible so modules can add in various namespaced data, such as those listed at http://web.resource.org/rss/1.0/modules/proposed.html. In order for this to work there are cases where the existing RSS tags need to be modified, such as book pages need a different link and php static pages might want a different body.

I am convinced because of such cases that the XML must be passed around as an object. We are already doing this, but as an array. The array structure is not very flexible at all, so I plan to reimplement that section using the object structure of the upcoming SimpleXML functionality (http://www.php.net/manual/en/ref.simplexml.php). When PHP 5 hits we can ditch the code that makes the object into XML in favor of using php.

I plan on using this API extension immediately for the event module.

Comments

drumm’s picture

StatusFileSize
new4.39 KB

Here is an initial patch which shows the direction I want to go. Not everything is changed over, but the API remains the same for modules to make feeds. A patch to do that and the actual hook calling are coming.

drumm’s picture

StatusFileSize
new7.27 KB

Using SimpleXML proved to be more difficult than it was worth since it seems to use some tricky class coding and array trickery that I wasn't prepared to reimplement. So here is my revised patch with a remixed object/array strtucture for representing XML data. DOM or SimpleXML would definately be better, but this functionality is needed now. I have a patched version of event module ready to go, but I need to add a bit more before I submit it.

drumm’s picture

StatusFileSize
new7.39 KB

This is the final version, I think the only thing that I changed was upping the RSS version to 2.0. This is agaisnt 4.4, but it should apply to HEAD.

dries’s picture

Some comments:

  • For this to work, the format_rss_item() function has been changed to take a $node object, and has thus become less generic. Modules in the contributions repository (e.g. cvslog.module) use this function to generate RSS feeds of non-node data ...
  • The PHPdoc documentation seems to mix 'xml' and 'XML'. I suggest we stick to 'XML' (capital letters).
  • Can't we use module_invoke_all() instead of iterating over the modules and composing $function manually? Is that because $namespaces is passed by reference?
  • What's the point of format_w3c_date()? It does not seem to be used. Any reason why format_date() does not suffice?
drumm’s picture

StatusFileSize
new7.39 KB

format_rss_item wasn't referenced anywhere else in core, which is why I thought the arguments were alterable. Maybe keep the old function and make this new one specifically for nodes?

XML should all be capitalized in the PHPdoc.

Yes, this is because $namespaces is pass by reference. It could be done that way or by returning arrays which have to be merged. The current strategy allows for modules to dynamically check for namepace collisions. I did what is done everywhere else in Drupal where pass by reference is needed, wrote my own hook implementation.

format_w3c_date is used in a lot of rss namespaces. The need for this function is somewhat related to http://drupal.org/node/view/7058. Formatting the timezone properly is also not doable with the current format_date. Its certainly not the most convenient standard to implement, which makes it ideal for moving to another function.

Attached file has the case of XML fixed.

drumm’s picture

StatusFileSize
new7.39 KB

It would help if I actually made the patch.

killes@www.drop.org’s picture

Doesn't apply anymore.

drumm’s picture

StatusFileSize
new7.3 KB

To summarize:
-RSS 1.0
-Lacks documentation
-Documentation added upon request and increased chances of getting into core
-Huge
-Extensible as shown by the taxononomy.module section
-The RSS validates
-Requires http://drupal.org/node/7058
-Doesn't change any urls
-Doesn't change the RSS generation functions in common.inc
-Doesn't even use the RSS generation function in common.inc
-CivicSpace is using it

dries’s picture

I like the idea and acknowledge that this is much needed, but find the RSS generation code a _lot_ harder to read and use.

It is not clear what node_feed_add_xml() and node_feed_get_xml() are used for.

It is not clear what node_feed_add_namespaces() and node_feed_get_namespaces() are used for.

Why '.ATTRIBUTE'? Is that a common identifier in XML-land?

Do we really have to introduce a node_feed_invoke_all() function.

Steven’s picture

I really don't like that format_xml() function. Strip_chars strips out HTML but leaves entities untouched. This is bad. You should either convert it to plain-text completely (not advised) or simply escape it without touching the HTML.

In that case, you can use one of the existing check_() functions (i think check_form works).

I agree with Dries that '.ATTRIBUTES' is ugly. Also, there's a lot of voodoo magic going on on lines like this:

+ $rss['channel']['items']['rdf:Seq']['rdf:li'][]['.ATTRIBUTES'] = array('resource' => $item['link']);

I'm not sure if this is PHP5 safe, and maybe it should be cleaned up.

gábor hojtsy’s picture

Is this extensible? I thought that another layer will be added between the RSS functions and the node/taxonomy module, so node/taxonomy only needs to provide some data to serialize, and the RSS/Atom functions actually used would choose the format. Now this patch wires in a format into Drupal, and does not seem to be future proof to me. It seems to be completely impossible to plug in an Atom generator for this URL space. Or am I mistaken?

dries’s picture

Looks like another revision is needed. Marking this 'active'.

drumm’s picture