My installation of Aggregator is stripping out all < and > tags leaving all the HTML bare on the page. Pretty much gibberish.
I have tried to play with the allowed HTML tag setting, but it has no effect on the output.
My installation of Aggregator is stripping out all < and > tags leaving all the HTML bare on the page. Pretty much gibberish.
I have tried to play with the allowed HTML tag setting, but it has no effect on the output.
Comments
Comment #1
jonwatson commentedI think I have identified where the stripping is occurring, but I don't know how to stop it. Line 915 of aggregator.modules.inc is:
I'm kind of confused by this function. It seems that the allowed_html_tags are hardcoded into this function rather than being taken from the Aggregator settings, but in either case I have modified the allowed tags in both this file and the aggregator functions to no avail.
Here is a sample of the output:
It seems obvious that the problem is that all of the < and > tags are being removed and thus this is not valid HTML and therefore displayed as-is.
That's about the extent of my skills, though. I can't find out where in the code this error is being produced.
Does anyone have any pointers for me?
Thanks
Jon
Comment #2
jonwatson commentedAfter much mucking about. I have positively identified line 717 as the cuplrit:
When the global $items array is populated by the call to xml_parse, the < and > tags are stripped out of it.
However, since xml_parse seems to be an internal PHP class, I don't have a clue what to do about this issue.
Any help?
Jon
Comment #3
msielskiSubscribing to this. This is very much still an active bug in 6.9's Feed Aggregator. I too confirmed that it is php's xml_parse doing it, and am trying to isolate how I can fix it. Particularly, google's Blog and News RSS/Atom feeds make heavy use of embedded HTML, using what are valid XML predefined entities (
& < > ").Comment #4
damien tournoud commentedI can't reproduce any of the behavior you are describing on PHP 5.2.4. If this is really an issue in PHP XML parser, please report information about your PHP version.
Comment #5
jbsarma commentedPHP version 5.2.8 and Drupal 6.9. This is very much a problem. Appreciate urgent attention.
Comment #6
jbsarma commentedPHP version 5.2.8 and Drupal 6.9. This is very much a problem. Appreciate urgent attention.
Comment #7
dave reidWe had this same problem on drupal.org's aggregator, and I'm pretty sure it was identified as a problem with PHP's libxml. See #362294: Drupal.org aggregator stores news posts broken, Drupal Planet broken.
Comment #8
dave reidSee the PHP bug report at http://bugs.php.net/bug.php?id=45996 for the affected versions.
Comment #9
slimandslam commentedTo clarify, the expat parser in PHP (this one: http://www.php.net/manual/en/book.xml.php)
is broken in PHP 5.2.8. The bug is in libxml. The issue is that the parser ignores HTML entities
during parsing resulting in XML with the entities stripped out of the parsed content.
This means that any drupal module that uses expat is broken if you're running under PHP 5.2.8. The
aggregator module is broken (if your content has html entities in it). Fix is in PHP 5.2.9 (to be released).
More details: http://drupal.org/node/384060
Comment #10
slimandslam commentedPHP 5.2.9 was just released. This problem is fixed: http://www.php.net/ChangeLog-5.php#5.2.9 (Issue #45996)
Comment #11
Lakeside commentedHmm... The PHP 5.2.9 hasn't improved the problem on my system.