Hi, I've been working with the Views RSS module, and I've come across something a bit frustrating. If quotes or aposrophes are used in a node title (example: Stanley Kubrick's "2001") the RSS output comes out as this:

Stanley Kubrick#039;s "2001"

Is there a fix for this?

Comments

maciej.zgadzaj’s picture

Status: Active » Closed (works as designed)

Well, not really - this is an issue with Drupal's format_xml_elements() function, which Views RSS is using, rather than with Views RSS module - as this function calls check_plain() on all XML element values. You can see exactly the same thing happening to core Drupal feeds. (Nota bene, format_rss_item() is doing exactly the same thing.)

Also, even with HTML-encoded quotes in the <title> elements your feed is still valid (W3C Feed Validator does not return an error, although it suggests not using HTML quotes there).

I am contemplating switching Views RSS to use SimpleXML at one point in the future, until then though you could consider submitting an issue against Drupal core.

babbage’s picture

I'm guessing this may be associated with why I am seeing<h3> tags in my podcast episode titles in iTunes, even though I am instructing the feed to strip html tags?

Edit: Never mind... stupid too-inclusive field theming I'd apparently forgotten about!

modestmoes’s picture

A rough workaround I found is to use a PHP field.

Requires views_php module

Bensbury’s picture

Could you expand on what you do?

Bensbury’s picture

Here's a fix that appears to work, so if people test it, that'd be great.
It's based on a piece of code a guy wrote to solve the problem in D6 and I've used it on a lot of sites back then.
It's never caused me trouble.
For D7 things changed but here we go:

Go to views_rss/theme and open the theme.inc

Copy out the entire 'function template_preprocess_views_view_views_rss function, and put it in your theme's template.php.
Change the function name to: function yourthemename_precrocess_views_view_views_rss

Then at line 200 in the original theme, or where it reads '// Add XML element(s) to the item array' insert the following just above.
I included the above if statement to help find it.

if (empty($rss_elements)) continue;

        // Insert here -- clean up special characters
       $rss_elements[0]['value'] =  htmlspecialchars_decode(trim(strip_tags(decode_entities( $rss_elements[0]['value'])),"\n\t\r\v\0\x0B\xC2\xA0 "));
        $rss_elements[0]['value'] = htmlspecialchars($rss_elements[0]['value'], ENT_COMPAT);        
 	// end of cleaning
    
 	 // Add XML element(s) to the item array.   
        $rss_item['value'] = array_merge($rss_item['value'], $rss_elements);
      }

Check your RSS.... you might have to flush the cache a few times.
You can always check it works by hacking the theme.inc file, as I had a bit of trouble getting the theme_hook to work from the template file.

I testing to see how it's working as Twitter and Facebook use the feed to post socially.

oceanos’s picture

I use the views_php module (which I do anyway to construct a language-independent pubDate value on my multilingual site) for the RSS title field and just use str_replace() to solve the problem.

The code for the PHP field value code field is (make sure that you load the node title field, even if you don't use it directly):

return str_replace("&amp;#", "&#", $data->node_title);

Sinan Erdem’s picture

Is this issue related with: #779760: check_plain runs twice on title ?

Also I cannot see the same behavior on Drupal's default feeds. When I look at the source codes;

Drupal's default feed has &amp;

Views RSS's feed has &amp;amp;

seaneffel’s picture

Issue summary: View changes

So there's this thing in PHP thats designed encode whacked characters.

http://us1.php.net/htmlspecialchars

I am not a developer and I don't play one on television, but it seems to me that the output of the Views RSS fields could use this to force encoding on quotes, apostrophes, and ampersands.

seaneffel’s picture

Perhaps the issue is related to the formatting of the fields before they are inserted into the Views RSS display type.

My title fields contain quotes and apostrophes. When I assigned a title field as the title value of the RSS item then the feed displayed poorly encoded quotes and apostrophes. I double checked the field configuration of the title field and saw that it was set to output as "default". When I changed the value of the title field to "plain text" then the characters output in a way that RSS readers could properly display the characters.

I also noticed the automatic preview generated by Views would show bad characters, but the RSS reader I was using parsed the characters correctly.

If this does not resolve the problem then reopen.

jacobstella’s picture

I'm having the same issues as OP.

I'm trying to set up an iTunes RSS feed for a podcast. I have a field in my podcast content type called "enclosure." When creating a new podcast I enter the enclosure including link, duration, and file type, in the format called for. I then link my enclosure field in my views RSS feed using field settings, and here's what it spits out:

<enclosure>url=&quot;http://www.podtrac.com/pts/redirect.mp3/www.stellaculinary.net/audio/stella-culinary-school-podcast/scs-001.mp3&quot; length=&quot;40225677&quot; type=&quot;audio/mpeg&quot; </enclosure>

The enclosure tag should read:

url="http://www.podtrac.com/pts/redirect.mp3/www.stellaculinary.net/audio/stella-culinary-school-podcast/scs-001.mp3" length="40225677" type="audio/mpeg"

I've set plain text on both the input field for the content type, and the field configuration in views.

Any help would be greatly appreciated.

willibd’s picture

The decode_entities function worked for me. Put this code in a custom template for the views field you want to correct (in my case, views-view-field--feed--title.tpl.php):

<?php
$decoded = decode_entities($output);
print($decoded);
?>
simon_s’s picture

Thanks,
the tip from willibd in #11 worked perfect for me to eliminate &quot; in feed item titles.

dmetzcher’s picture

Solution in comment #11 by willibd worked perfectly.

mark_fullmer’s picture

Caveat emptor for using the workaround in #11.

As described in the API:

Decodes all HTML entities (including numerical ones) to regular UTF-8 bytes.

Double-escaped entities will only be decoded once ("&lt;" becomes "<" , not "<"). Be careful when using this function, as decode_entities can revert previous sanitization efforts (&lt;script&gt; will become <script>).

In other words, you're re-allowing **all** HTML, not just reverting the encoded quotes. Separate from security/unintended content issues, In the context of an RSS feed, this could make your feed not longer valid XML.