If you're running PHP 5.2.8 and wondering why your RSS feeds, using the aggregator module, are resulting in broken links, you'll be happy to know that the problem is in PHP (libxml2, actually) and not in Drupal.

In fact, any Drupal module that uses the ext/xml parser under PHP 5.2.8 is busted if there are HTML entities in the content.

The details are here: http://www.cdatazone.org/index.php?/archives/49-Entities,-extxml-and-lib...

In a nutshell, issues with libxml2 2.7.0-2.7.2 cause "the entities, &, < , >, ' and " to never get passed to the user's callbacks." This means that xml parsing is broken under PHP 5.2.8 (and possibly some earlier versions?).

For me, my Google News links (which contain lots of & entities) are all busted after being parsed by the aggregator module.

You'll have to wait until PHP 5.2.9 for this fix.

I'm not sure which earlier versions of PHP this affects or what other Drupal modules, but if you're running PHP 5.2.8 and the Aggregator module, you're definitely hosed.

Does this affect the SimpleXML parser?

Comments

slimandslam’s picture

This appears to be a PHP bug report about the issue: http://bugs.php.net/bug.php?id=45996

slimandslam’s picture

PHP 5.2.9 was just released. This problem is fixed: http://www.php.net/ChangeLog-5.php#5.2.9 (Issue #45996)

drupal1957’s picture

Hi

I am not sure I agree this is fixed. Using v5.2.9 is a workround - v5.2.8 is still broken. If you do not have direct control of your PHP version eg your hosting service decides (and decides to stay on v5.2.8) then you remain stuffed.

Souvent22’s picture

I too ran into this issue. I made a quick patch as a temp. work-around for this issue. Though i'm not using aggregator for my parsing, this methodology should still apply just fine.
The basis is to "re-namespace" the html-entities. And then format them back when parsing.
Example:

<?php

    $this->xml_parser = xml_parser_create();
    xml_set_object($this->xml_parser, $this);
    xml_parser_set_option($this->xml_parser, XML_OPTION_TARGET_ENCODING, 'UTF-8');
    xml_set_element_handler($this->xml_parser, "startElement", "endElement");
    xml_set_default_handler($this->xml_parser, "defaultHandler");
    xml_set_character_data_handler($this->xml_parser, "defaultCDataHandler");
    if (!($fp = fopen($this->source_path, "rb"))) {
        $this->setError("Unable to open XML file.");
        return FALSE;
    }
    
    // TODO: Nicer Error handeling.
    while ($data = fread($fp, 4096)) {
      if($this->stop !== FALSE) {
        $error_msg = 'Recieved STOP signal.';
        break;
      }
      /**
       * Name spacing the entities due to this bug:
       * http://bugs.php.net/bug.php?id=45996
       * PHP BUG ID: 45996
       * LibXML Bug ID: Unknown
       * NOTE: THIS IS WHERE WE ARE RE-NAME SPACING THE DATA AS WE READ CHUNKS OF IT
       * &amp; BECOMES [drupal-amp]amp; (WE'RE NAMSPACING AMPERSTANDS, SO &gt; BECOMES [drupal-amp]gt;)
       */
      $data = str_replace('&', '[drupal-amp]', $data);
      
      if (!xml_parse($this->xml_parser, $data, feof($fp))) {
          $error_msg = sprintf("XML error: %s at line %d column %d",
                      xml_error_string(xml_get_error_code($this->xml_parser)),
                      xml_get_current_line_number($this->xml_parser),
                      xml_get_current_column_number($this->xml_parser)
                      );
          $this->error = $error_msg;
          drupal_set_message($error_msg, 'error');
          return FALSE;
      }
    }
    xml_parser_free($this->xml_parser);

   /************* EXAMPLE PARSING FUNCTION WHEN READIN ********************/
  /** To be called before acting on data **/
  function filter_data($data) {
    $data = str_replace('[drupal-amp]', '&');
    $data = html_entity_decode($data);
    return trim($data);
  }
?>

Hopefully that pattern helps someone and make sense.

francisconi.org’s picture

Please mr. Souvent22, I could give more details of where I add the code you wrote.
Excuse me is that I am very novice

greywolfsspirit’s picture

While using the filebrowser module, every time I created a zip file, the files would be created, the contents would show up, but the zip files were always corrupt. I upgraded to php 5.2.11 and now it works as it is supposed to. Seems there are quite a few bugs in the 5.2.8 branch. So, anyone running WAMP 2.0i, I suggest upgrading yourself to the php 5.2.11 addon for it and you might see a lot of the problems disappear.