Characters like the double quotes (") and ampersand (&) are being converted to their html entities in titles (using FeedAPI) and when mapping tags (using FeedAPI mapper). So for instance the titles "What the heck they were thinking" ancient versions becomes "What the heck they were thinking" ancient versions and the tag q&a becomes q&a.

CommentFileSizeAuthor
#15 sample_news_feed_title.png3.88 KBapennington

Comments

aron novak’s picture

Status: Active » Postponed (maintainer needs more info)

This bug is a nightmare, i occasionally kill it and it appear again and again :)
I'm sure you use parser common syndication, don't you?
Please provide me some example feed URLs.

wmostrey’s picture

This one has the "Don't want your stinking hook" post: http://arancaytar.ermarian.net/news/technology/web/drupal/feed
This one has the q&a tag in the "The Deal With Nodes" post: http://feeds.learnbythedrop.com/learnbythedrop?format=xml

I'm using the SimplePie parser and no other modules that might do an html replace.

wmostrey’s picture

Status: Postponed (maintainer needs more info) » Active
aron novak’s picture

I could not reproduce it with the arancaytar.... feed and simplepie. Which version of simplepie.inc do you use? Please double check that you really use the 1.5 version.

wmostrey’s picture

I'm using FileAPI 6.x-1.5 and simplepie.inc 1.1.1.

likewhoa’s picture

try using 1.1.3 or development version of simplepie as it works for me.

wmostrey’s picture

Status: Active » Fixed

OK I'm using 1.1.3. I'm marking this as fixed for now. I'll reopen when this behavior pops up again. Thanks!

wmostrey’s picture

Unfortunately the problem still happens.

Post title: "Drupal For Education And E-Learning" Book Review
Feed: http://www.civicactions.com/taxonomy/term/55
Node: http://www.drupaldigest.net/all
Node title: "Drupal For Education And E-Learning" Book Review

wmostrey’s picture

Status: Fixed » Active
gsnedders’s picture

Is FeedAPI assuming the output of SP be text/plain for the title? It should be expected text/html. As far as I can tell, SP is doing what it should.

aron novak’s picture

Status: Active » Postponed (maintainer needs more info)

Well, i could not reproduce the problem at http://civicactions.com/taxonomy/term/55/0/feed using -dev.

likewhoa’s picture

Status: Postponed (maintainer needs more info) » Active

try this feed http://thinkmoult.com/?feed=rss2 that one and other feeds which bring in quotes in node titles fail, this on drupal-6.11 and -dev modules. My issue is not that those quotes are converted to their html entities but that links are broken because they contain quotes which normally should fail. I think this could be an issue with pathauto not converting or removing those characters for you.

dwightaspinwall’s picture

@wmostrey: I have had the exact same problem. I'm using feedapi 6.x-1.6 and simplepie.inc 1.1.3. After several hours digging through the code I finally gave up and implemented a kludge to fix titles on their way into nodes. I put the following snippet in a hook_nodeapi function:

...
if ($op == 'presave') {
  $node->title = fix_title($node->title);
}
...

And the title fixing function:

function fix_title($title) {
  $title = preg_replace('/'/', ''', $title);
  $title = html_entity_decode($title, ENT_QUOTES, 'UTF-8');
  return $title;
}

Of course it would be far preferable to fix the code.

alex_b’s picture

Version: 6.x-1.5 » 6.x-1.x-dev
Status: Active » Postponed (maintainer needs more info)

The feeds contain entity encoded characters, when they're rendered in Drupal and thus run through check_plain() to avoid XSS attacks, they're double encoded. The solution is to run html_entity_decode on feed items before storing them.

This should have been addressed with the introduction of _feedapi_process_text(). Apparently some items are still not being properly decoded. I'd love to see more digging by those affected on

- where exactly there are still HTML encoded characters stored to the database
- with which feed
- and what parser

Please only report on 6.x dev issues.

apennington’s picture

StatusFileSize
new3.88 KB

I'm running into a similar problem.

I've setup news feeds from a UAE newspaper called The National (http://www.thenational.ae/section/rsslist). Many of the RSS feeds look fine on their servers. The feed source has this encoding at the top: <?xml version="1.0" encoding="ISO-8859-1"?>. Our our Drupal database character encoding is UTF-8.

Here is one example. The feed shows two special characters (‘ and ’).

Job hunt ‘is toughest for the young’

http://www.thenational.ae/article/20090515/NATIONAL/705149825/1010/rss More than 80 per cent of unemployed Emiratis are young people between 15 and 24, according to a report. Fri, 15 May 2009 16:57:00 +0400

When it is parsed by FeedAPI (via ) the title comes over as
Job hunt ‘is toughest for the young’

I'm using FeedAPI as-is with no additional add-ons. The parser the feeds use is Common syndication parser.

Attached is a small screen capture to display what I see onscreen. Hope this information helps you find a solution to the problem. Thanks!

robertdjung’s picture

subscribe.

TimG1’s picture

I think I have a related problem.

I'm trying to display <img> tags that are in in a text field of a node created by feedapi and feed mapper. I'm trying to display them in a View and &lt; &gt; are being displayed instead of <>. On the page /admin/settings/feedapi I have the "Allow all HTML tags" checkbox checked.

I'm using...
Drupal 6.13
FeedAPI 6.x-1.7-beta3
Feed Element Mapper 6.x-1.0-beta12
SimplePie 1.2 from (www.simplepie.org)

Am I overlooking something obvious?

Thanks!
-Tim

ben610’s picture

Having a similar issue with FeedAPI turning apostrophe's into HTML special chars in the node title.

Using:
Drupal 6.13
FeedAPI 6.x-1.8
Common syndication parser 6.x-1.7
FeedAPI Node 6.x-1.7

Some examples:
http://eyebeam.org/reblog/09-08-13/how-big-is-it-how-big-you-want-it-awe...
http://eyebeam.org/reblog/09-08-13/bbc-news-europe-dutchman-builds-moder...

Help?
Thanks.

TimG1’s picture

I solved my problem by placing html_entity_decode() around all the variables in my node/view template files when displaying them. This is on the site that has the mapped content.

Ben 610, try doing html_entity_decode($title) in your node.tpl.php

-Tim

aron novak’s picture

http://newsrss.bbc.co.uk/rss/newsonline_world_edition/europe/rss.xml - I tried that feed URL w/ common syndication parser and the apostrophes just appear normal. Is this the feed url what you use?

bcobin’s picture

Thank you - thank you, TimG. #19 solved the problem I was having with imported tags brilliantly.

wmostrey’s picture

Status: Postponed (maintainer needs more info) » Fixed

Fixed with the comment in #19.

Status: Fixed » Closed (fixed)

Automatically closed -- issue fixed for 2 weeks with no activity.

iceous’s picture

I have a similar but different character parse errors.
char '=' and '&' are rendered into '%3D' and '%26'.

I have no clue where to begin...
please help....

Thanks,
Joko