I've been poking around Drupal's XML-handling functions and don't see anything that provides simple XML-to-array conversion. To parse XML, it looks like you have to create your own parser, which means writing several functions for tag handling, etc. It would be nice to be able to do something like the following, which uses Technorati's REST API retrieve a list of blogs that link to a URL:

  $request = "http://api.technorati.com/cosmos?key=$mykey&url=$url";
  $result = drupal_http_request($request);
  $vals = xml2array( $result->data );
  $items = $vals['tapi'][0]['document'][0]['item'];
  foreach ($items as $item) {
    if ($item['nearestpermalink']) { 
      $links[] = l($item['weblog'][0]['name'], $item['nearestpermalink']);
    }

Here's the xml2array function that would do the heavy lifting:

function xml2array( $textXml )
{
   $regExElements = '/<(\w+)([^>]*)>(.*?)<\/\\1>/s';
   $regExAttributes = '/(\w+)="([^"]*)"/';
   preg_match_all( $regExElements, $textXml, $matchElements );
   foreach ( $matchElements[1] as $keyElements=>$valElements ) {
       if ( $matchElements[2][$keyElements] )
       {
           preg_match_all( $regExAttributes, $matchElements[2][$keyElements], $matchAttributes );
           foreach ( $matchAttributes[0] as $keyAttributes=>$valAttributes )
           {
               $arrayAttributes[ $valElements.' attributes' ][$matchAttributes[1][ $keyAttributes ] ] = $matchAttributes[2][ $keyAttributes ];
           }
       }
       else
       {
           $arrayAttributes = null;
       }
       if ( preg_match( $regExElements, $matchElements[3][$keyElements]) ) {
           if ( $arrayAttributes )
           {
               $arrayFinal[ $valElements ][ $valElements.' attributes' ] = $arrayAttributes[ $valElements.' attributes' ];
           }
           $arrayFinal[ $valElements ][] = wholinks_xml2array( $matchElements[3][$keyElements] );
       }
       else
       {
           $arrayFinal[ $valElements ] = $matchElements[3][ $keyElements ];
           $arrayFinal = array_merge( $arrayFinal, $arrayAttributes );
       }
   }
   return $arrayFinal;
}

Any chance of getting this added to core, or is there already some equivalent available that I don't know about?

Comments

nedjo’s picture

Here's a similar approach I've been looking at, cribbed (and slightly adapted) from some code in the Freja library. It handles both conversions (PHP array <> XML). Not sure what the CDATA part is about, maybe it's to avoid issues with illegal characters.

Likely XML serializing/handling will be best initially as a contrib module.


class XML_Unserializer {
  var $stack;
  var $arr_output;
  var $null_token = "null";

  function unserialize($str_input_xml) {
    $p = xml_parser_create();
    xml_set_element_handler($p, array(&$this, 'start_handler'), array(&$this, 'end_handler'));
    xml_set_character_data_handler($p, array(&$this, 'data_handler'));
    $this->stack = array(
      array(
        'name' => 'document',
        'attributes' => array(),
        'children' => array(),
        'data' => ''
       )
    );
    if (!xml_parse($p, $str_input_xml)) {
      trigger_error(xml_error_string(xml_get_error_code($p)) ."\n". $str_input_xml, E_USER_NOTICE);
      xml_parser_free($p);
      return;
    }
    xml_parser_free($p);

    $tmp = $this->build_array($this->stack[0]);
    if (count($tmp) == 1) {
      $this->arr_output = array_pop($tmp);
    }
    else {
      $this->arr_output = array();
    }
    unset($this->stack);
    return $this->arr_output;
  }

  function get_unserialized_data() {
    return $this->arr_output;
  }

  function build_array($stack) {
    $result = array();
    if (count($stack['attributes']) > 0) {
      $result = array_merge($result, $stack['attributes']);
    }

    if (count($stack['children']) > 0) {
      if (count($stack['children']) == 1) {
        $key = array_keys($stack['children']);
        if ($stack['children'][$key[0]]['name'] === $this->null_token) {
          return NULL;
        }
      }
      $keycount = array();
      foreach ($stack['children'] as $child) {
        $keycount[] = $child['name'];
      }
      if (count(array_unique($keycount)) != count($keycount)) {
        // enumerated array
        $children = array();
        foreach ($stack['children'] as $child) {
          $children[] = $this->build_array($child);
        }
      }
      else {
        // indexed array
        $children = array();
        foreach ($stack['children'] as $child) {
          $children[$child['name']] = $this->build_array($child);
        }
      }
      $result = array_merge($result, $children);
    }

    if (count($result) == 0) {
      return trim($stack['data']);
    }
    else {
      return $result;
    }
  }

  function start_handler($parser, $name, $attribs = array()) {
    $token = array();
    $token['name'] = strtolower($name);
    $token['attributes'] = $attribs;
    $token['data'] = '';
    $token['children'] = array();
    $this->stack[] = $token;
  }

  function end_handler($parser, $name, $attribs = array()) {
    $token = array_pop($this->stack);
    $this->stack[count($this->stack) - 1]['children'][] = $token;
  }

  function data_handler($parser, $data) {
    $this->stack[count($this->stack) - 1]['data'] .= $data;
  }
}

function xml_serialize($tagname, $data) {
  $xml = "<$tagname>";
  if (is_array($data)) {
    foreach ($data as $key => $value) {
      $xml .= xml_serialize($key, $value);
    }
  }
  else {
    $xml .= "<![CDATA[".$data."]]>";
  }
  $xml .= "</$tagname>\n";
  return $xml;
}

maybourne’s picture

Thanks nedjo this is just what I need!

jerome@drupal.org’s picture

Another way to parse XML is to use SimpleXML.
A good demonstration performed during Barcelona DrupalCon is available here: http://drupal.org/node/178374

Sheldon Rampton’s picture

Better yet, try QueryPath, a Drupal module that is simpler to use than SimpleXML and way more powerful:

http://drupal.org/project/querypath

----------------
Customer Support Engineer, Granicus
https://granicus.com