This applies to 4.6RC.

ter upgrading my 4.5.2 site to 4.6, I seem to have some issues with the aggregator. The same feeds in the 4.5.2 aggregator showed no issues at all.

Here's an example of what I mean. Watch the uninterpreted HTML tags in the feed:

<a href="http://sports.yahoo.com/mlb/recap?gid=250401124&prov=ap">With five scoreless innings in his first spring start</a>, Bruce Chen locked up the fifth spot in the rotation in a 3-1 win over the Cardinals yesterday.<p>

or this

I don't know whether to laugh or cry.<br /><br />If this is Lloyd McClendon's idea of an April Fools' Day joke, it's not very nice to Tike Redman. <br /><br />If it's not, this is the cockamamiest baseball idea of all time. The effect of this move will be to take plate appearances away from Jason Bay and Craig Wilson, the two best hitters on the team, and give them to Tike Redman, who's possibly the worst hitter on the team.<br /><br /><a href = http://www.pittsburghlive.com/x/tribune-review/sports/pirateslive/s_319809.html>Here's</a> the Trib article. <a href = http://www.postgazette.com/pg/05092/481810.stm>Here's</a> the Post-Gazette writeup. I'm posting them both because of they offer an amazing array of half-baked explanations and bizarre reasonings.<br /><br />

This is a fairly major issue, it's not workable this way.

CommentFileSizeAuthor
#3 05_aggregator.module.patch887 bytesMorbus Iff
Support from Acquia helps fund testing for Drupal Acquia logo

Comments

Anonymous’s picture

For a quick fix - In aggregator.module, change line 1115 from:

 $output .= '  <div class="description">'. check_plain($item->description) ."</div>\n";

to

 $output .= '  <div class="description">'. html_entity_decode(check_plain($item->description)) ."</div>\n";
jvincher’s picture

This seems to solve the problem, at least for now.

Thanks for helping out.

Morbus Iff’s picture

FileSize
887 bytes

I'm more inclined to remove the check_plain call entirely. The added function, while it "works", undoes what the check_plain call accomplishes. I think the check_plain here is actually wrong:

  • This isn't user-submitted content - it's outside content in an XML file.
  • In a perfect world, entities should already be fixed in an XML file.
  • In a perfect world, we shouldn't have to fix the entities in an XML file.

Attached patch removes the check_plain().

Drumm, UnconeD: can you doublecheck this?

Steven’s picture

check_plain() converts plain-text to escaped HTML text, by escaping entities. The first suggestion removed these again after adding them, so indeed it was entirely redundant. Morbus is right.

Aggregated HTML is in fact validated when it is saved to the database. It is unescaped from the XML, only a limited set of tags is allowed, and CSS/Javascript is removed. Thus it is safe to put into HTML.

Commited to HEAD/4.6. Aggregator contained its own entity decoder, which I replaced with the recent Drupal function decode_entities(). I discovered a bug in that function while doing so, which is also fixed now.

jvincher’s picture

This is what I am seeing after downloading and installing a fresh 4.6RC this afternoon 4/8:

16:44
.... <i>Anything's</i> better than SBC Park
So if you're in the city and want to be a part of something meaningful, check out some OBM friends having their first Mays Field party of the year. To all the friends of Mays Field. Our first Mays Field...

In addition, an aggregator category view widens the center column (chameleon theme) and pushes the right column essential out of sight. This does not happen in all categories so it's tricky to reproduce.

Is this regular behavior?

inteja’s picture

I've got similar problems ever since 4.6rc1 install. Am now running latest CVS.

One of my feeds is displaying HTML tags in the title. See:
http://www.neocosm.net/aggregator/sources/1

My other 2 feeds are OK. I've contacted the website maintainer of the offending feed but he says there's been no changes to his syndication software. He even reverted to an old backup for me with the same result. So it must be something wrong with my aggregator.module which is latest CVS as of 1/2 hour ago.

Brian.

jvincher’s picture

Status: Active » Fixed

Fixed a long time ago.

Anonymous’s picture

Status: Fixed » Closed (fixed)