This applies to 4.6RC.
ter upgrading my 4.5.2 site to 4.6, I seem to have some issues with the aggregator. The same feeds in the 4.5.2 aggregator showed no issues at all.
Here's an example of what I mean. Watch the uninterpreted HTML tags in the feed:
<a href="http://sports.yahoo.com/mlb/recap?gid=250401124&prov=ap">With five scoreless innings in his first spring start</a>, Bruce Chen locked up the fifth spot in the rotation in a 3-1 win over the Cardinals yesterday.<p>
or this
I don't know whether to laugh or cry.<br /><br />If this is Lloyd McClendon's idea of an April Fools' Day joke, it's not very nice to Tike Redman. <br /><br />If it's not, this is the cockamamiest baseball idea of all time. The effect of this move will be to take plate appearances away from Jason Bay and Craig Wilson, the two best hitters on the team, and give them to Tike Redman, who's possibly the worst hitter on the team.<br /><br /><a href = http://www.pittsburghlive.com/x/tribune-review/sports/pirateslive/s_319809.html>Here's</a> the Trib article. <a href = http://www.postgazette.com/pg/05092/481810.stm>Here's</a> the Post-Gazette writeup. I'm posting them both because of they offer an amazing array of half-baked explanations and bizarre reasonings.<br /><br />
This is a fairly major issue, it's not workable this way.
Comment | File | Size | Author |
---|---|---|---|
#3 | 05_aggregator.module.patch | 887 bytes | Morbus Iff |
Comments
Comment #1
(not verified) CreditAttribution: commentedFor a quick fix - In aggregator.module, change line 1115 from:
to
Comment #2
jvincher CreditAttribution: jvincher commentedThis seems to solve the problem, at least for now.
Thanks for helping out.
Comment #3
Morbus IffI'm more inclined to remove the check_plain call entirely. The added function, while it "works", undoes what the check_plain call accomplishes. I think the check_plain here is actually wrong:
Attached patch removes the check_plain().
Drumm, UnconeD: can you doublecheck this?
Comment #4
Steven CreditAttribution: Steven commentedcheck_plain() converts plain-text to escaped HTML text, by escaping entities. The first suggestion removed these again after adding them, so indeed it was entirely redundant. Morbus is right.
Aggregated HTML is in fact validated when it is saved to the database. It is unescaped from the XML, only a limited set of tags is allowed, and CSS/Javascript is removed. Thus it is safe to put into HTML.
Commited to HEAD/4.6. Aggregator contained its own entity decoder, which I replaced with the recent Drupal function decode_entities(). I discovered a bug in that function while doing so, which is also fixed now.
Comment #5
jvincher CreditAttribution: jvincher commentedThis is what I am seeing after downloading and installing a fresh 4.6RC this afternoon 4/8:
In addition, an aggregator category view widens the center column (chameleon theme) and pushes the right column essential out of sight. This does not happen in all categories so it's tricky to reproduce.
Is this regular behavior?
Comment #6
inteja CreditAttribution: inteja commentedI've got similar problems ever since 4.6rc1 install. Am now running latest CVS.
One of my feeds is displaying HTML tags in the title. See:
http://www.neocosm.net/aggregator/sources/1
My other 2 feeds are OK. I've contacted the website maintainer of the offending feed but he says there's been no changes to his syndication software. He even reverted to an old backup for me with the same result. So it must be something wrong with my aggregator.module which is latest CVS as of 1/2 hour ago.
Brian.
Comment #7
jvincher CreditAttribution: jvincher commentedFixed a long time ago.
Comment #8
(not verified) CreditAttribution: commented