I've been poking around Drupal's XML-handling functions and don't see anything that provides simple XML-to-array conversion. To parse XML, it looks like you have to create your own parser, which means writing several functions for tag handling, etc. It would be nice to be able to do something like the following, which uses Technorati's REST API retrieve a list of blogs that link to a URL:

  $request = "http://api.technorati.com/cosmos?key=$mykey&url=$url";
  $result = drupal_http_request($request);
  $vals = xml2array( $result->data );
  $items = $vals['tapi'][0]['document'][0]['item'];
  foreach ($items as $item) {
    if ($item['nearestpermalink']) {
      $links[] = l($item['weblog'][0]['name'], $item['nearestpermalink']);
    }

Here's the xml2array function that would do the heavy lifting:

function xml2array( $textXml )
{
   $regExElements = '/<(\w+)([^>]*)>(.*?)<\/\\1>/s';
   $regExAttributes = '/(\w+)="([^"]*)"/';
   preg_match_all( $regExElements, $textXml, $matchElements );
   foreach ( $matchElements[1] as $keyElements=>$valElements ) {
       if ( $matchElements[2][$keyElements] )
       {
           preg_match_all( $regExAttributes, $matchElements[2][$keyElements], $matchAttributes );
           foreach ( $matchAttributes[0] as $keyAttributes=>$valAttributes )
           {
               $arrayAttributes[ $valElements.' attributes' ][$matchAttributes[1][ $keyAttributes ] ] = $matchAttributes[2][ $keyAttributes ];
           }
       }
       else
       {
           $arrayAttributes = null;
       }
       if ( preg_match( $regExElements, $matchElements[3][$keyElements]) ) {
           if ( $arrayAttributes )
           {
               $arrayFinal[ $valElements ][ $valElements.' attributes' ] = $arrayAttributes[ $valElements.' attributes' ];
           }
           $arrayFinal[ $valElements ][] = wholinks_xml2array( $matchElements[3][$keyElements] );
       }
       else
       {
           $arrayFinal[ $valElements ] = $matchElements[3][ $keyElements ];
           $arrayFinal = array_merge( $arrayFinal, $arrayAttributes );
       }
   }
   return $arrayFinal;
}

Any chance of getting this added to core, or is there already some equivalent available that I don't know about?

Comments

Here's a similar approach I've been looking at, cribbed (and slightly adapted) from some code in the Freja library. It handles both conversions (PHP array <> XML). Not sure what the CDATA part is about, maybe it's to avoid issues with illegal characters.

Likely XML serializing/handling will be best initially as a contrib module.

<?php
class XML_Unserializer {
  var
$stack;
  var
$arr_output;
  var
$null_token = "null";
  function
unserialize($str_input_xml) {
   
$p = xml_parser_create();
   
xml_set_element_handler($p, array(&$this, 'start_handler'), array(&$this, 'end_handler'));
   
xml_set_character_data_handler($p, array(&$this, 'data_handler'));
   
$this->stack = array(
      array(
       
'name' => 'document',
       
'attributes' => array(),
       
'children' => array(),
       
'data' => ''
      
)
    );
    if (!
xml_parse($p, $str_input_xml)) {
     
trigger_error(xml_error_string(xml_get_error_code($p)) ."\n". $str_input_xml, E_USER_NOTICE);
     
xml_parser_free($p);
      return;
    }
   
xml_parser_free($p);
   
$tmp = $this->build_array($this->stack[0]);
    if (
count($tmp) == 1) {
     
$this->arr_output = array_pop($tmp);
    }
    else {
     
$this->arr_output = array();
    }
    unset(
$this->stack);
    return
$this->arr_output;
  }
  function
get_unserialized_data() {
    return
$this->arr_output;
  }
  function
build_array($stack) {
   
$result = array();
    if (
count($stack['attributes']) > 0) {
     
$result = array_merge($result, $stack['attributes']);
    }
    if (
count($stack['children']) > 0) {
      if (
count($stack['children']) == 1) {
       
$key = array_keys($stack['children']);
        if (
$stack['children'][$key[0]]['name'] === $this->null_token) {
          return
NULL;
        }
      }
     
$keycount = array();
      foreach (
$stack['children'] as $child) {
       
$keycount[] = $child['name'];
      }
      if (
count(array_unique($keycount)) != count($keycount)) {
       
// enumerated array
       
$children = array();
        foreach (
$stack['children'] as $child) {
         
$children[] = $this->build_array($child);
        }
      }
      else {
       
// indexed array
       
$children = array();
        foreach (
$stack['children'] as $child) {
         
$children[$child['name']] = $this->build_array($child);
        }
      }
     
$result = array_merge($result, $children);
    }
    if (
count($result) == 0) {
      return
trim($stack['data']);
    }
    else {
      return
$result;
    }
  }
  function
start_handler($parser, $name, $attribs = array()) {
   
$token = array();
   
$token['name'] = strtolower($name);
   
$token['attributes'] = $attribs;
   
$token['data'] = '';
   
$token['children'] = array();
   
$this->stack[] = $token;
  }
  function
end_handler($parser, $name, $attribs = array()) {
   
$token = array_pop($this->stack);
   
$this->stack[count($this->stack) - 1]['children'][] = $token;
  }
  function
data_handler($parser, $data) {
   
$this->stack[count($this->stack) - 1]['data'] .= $data;
  }
}
function
xml_serialize($tagname, $data) {
 
$xml = "<$tagname>";
  if (
is_array($data)) {
    foreach (
$data as $key => $value) {
     
$xml .= xml_serialize($key, $value);
    }
  }
  else {
   
$xml .= "<![CDATA[".$data."]]>";
  }
 
$xml .= "</$tagname>\n";
  return
$xml;
}
?>

Thanks nedjo this is just what I need!

Another way to parse XML is to use SimpleXML.
A good demonstration performed during Barcelona DrupalCon is available here: http://drupal.org/node/178374

Better yet, try QueryPath, a Drupal module that is simpler to use than SimpleXML and way more powerful:

http://drupal.org/project/querypath

----------------
IT consultant, web designer, writer and researcher
http://www.sheldonrampton.com/portfolio