First, I'd like to start by giving a big thanks to danielb and James Andres for their work on Node Export. It looks like a very helpful module, and does almost exactly what I need.

What I would like to do is to develop an add-on module to Node Export (similar to node export file) that would allow users to export nodes in an XML format. I would appreciate some advice/guidance for cleanly integrating this functionality into the Node Export ecosystem.

My current plan is to provide an alternative to node_export_node_encode() to have it output nodes in xml-style format. I'm comfortable with php, but am still learning Drupal, especially in regards to hooks. Would it be possible to add a hook in node_export.module that would allow me to swap out the existing node_export_node_encode() function with my own?

My current attempt so far follows:

function node_export_xml_node_encode($var, $iteration = 0, $key = "") {
  if ($iteration == 0) {
    var_dump ($var);
  }
  
  $tab = '';
  for ($i = 0; $i < $iteration; $i++) {
    $tab = $tab ."  ";
  }
  $iteration++;
  if (is_object($var)) {
    $var = (array)$var;
    $var['#_export_node_encode_object'] = '1';
  }
  if (is_array($var)) {
    
    $empty = empty($var);
   // $code = "array(". ($empty ? '' : "\n");
    foreach ($var as $key => $value) {
      $prefix = "";
      $postfix = "";
        
      if (is_array($value)) { 
        $prefix = "<". $key .">";
        $postfix = "</". $key .">";
      }
      
      $out = $tab . $prefix . node_export_xml_node_encode($value, $iteration, $key) ."$postfix \n";
      drupal_alter('node_export_xml_node_encode', $out, $tab, $key, $value, $iteration);
      $code .= $out;
    }
    $code .= ($empty ? '' : $tab);
    return $code;
  }
  else {
    // this is the final level of recursion, wrap $var in an appropriate xml node
    $prefix = "<". $key .">";
    $postfix = "</". $key .">";
    
    if (is_string($var)) {
      return $prefix . addslashes($var) . $postfix;
    }
    elseif (is_numeric($var)) {
      return $prefix . $var . $postfix;
    }
    elseif (is_bool($var)) {
      return $prefix . ($var ? 'TRUE' : 'FALSE') . $postfix;
    }
    else {
      return $prefix . 'NULL' . $postfix;
    }
  }
}

Any assistance/advice is appreciated. Thanks!

CommentFileSizeAuthor
#5 node_export_xml.zip2.15 KBbenford

Comments

danielb’s picture

The alter hook "node_export_node_encode" will be completely unsuitable for you to reformat the structure of the output - it will only allow you to make small changes to the values, but you would have to maintain the existing output structure.

To enable you to do what you want I would need to add an additional hook at the top of the function node_export_node_encode() to allow you to completely bypass the default code of that function and return your own values.

Then another corresponding change would need to be done on the 'import' side to allow your module to hook in and interpret the code.

sasconsul’s picture

This will be useful for one of my projects.

I want to use node_export to write and read in nodes for an external process to use. So, I would like the hook that @danielb suggests.

The problem I see is that there maybe times when unaltered node export will be useful -- say to migrate content. I hope there is a way to add the hook for only some calls to the module.

benford’s picture

Sounds good. Thanks for the advice.

I'm planning on adding an option to the settings segment of this module so that the user may select the default export format. Would it also be beneficial to have a similar widget under the export tab of each node?

danielb’s picture

Have you written the code to generate the XML?
It would solve a few issues to completely switch this module over to something nice like XML.

benford’s picture

StatusFileSize
new2.15 KB

I have some code ready to generate XML. The attached zip is meant to be decompressed into the /path/to/node_export/modules/ directory. As usual the module will need to be enabled before being used.

node_export.module will need to be modified to call node_export_xml_node_encode() instead of node_export_node_encode(). I would appreciate some advice (or examples would be great) in how to craft a hook for this purpose.

This sub module also adds to the node_export settings page, allowing the user to export nodes in either the original format (php array) or XML format. I can also add some options for encoding in the future so that HTML tags do not get confused for XML structure.

I am making progress on the import side of things, but I do not have anything significant enough to share yet.

danielb’s picture

Hmm OK I will reconsider the design of node_export to make it easy for you to add the functionality when you're ready.

danielb’s picture

It would also be good to make a second default export option available, CSV export. I wouldn't even need to implement the import phase for CSV, just prompt the user to install Node Import for that.

The question is, do we force the user to pick the format for import, or should it be autodetected somehow?

benford’s picture

It would be extra work, but I say auto detection of input format would be an awesome way to go about this. For XML at least, we can do a test: if SimpleXMLReader can recognize the content as XML without errors (ie, doesn't return false), treat the content as XML.

Not quite sure about how to autodetect CSV or a php object, but I imagine similar methods can be used.

Edit: Nevermind on importing CSV, as it has already been implemented by another module.

danielb’s picture

So I'm doing what I originally said and adding this to the top of that function

<?php
  // Allow other modules to take over this entire process.
  if ($iteration == 0) {
    $return = FALSE;
    drupal_alter('node_export_node_encode', $return, $var);
    if ($return) {
      return $return;
    }
  }
?>

Now I just have to figure out something similar for the import side.

Now I have to decide if there is some way I can pass down another 'option' or parameter from somewhere, where you actually pick the export method :/

Nevermind on importing CSV, as it has already been implemented by another module.

I know, I said that. But the export would be a good idea. Even though you can export CSV with views bonus, it's still a bit of a mission to set it up. I could just provide a quick dirty way of doing it.

danielb’s picture

changed again slightly

<?php
  // Allow other modules to take over this entire process before the first 
  // iteration.  Typically the module would respond if $format was set to
  // something it recognises.
  if ($iteration == 0) {
    $return = FALSE;
    drupal_alter('node_export_node_encode', $return, $var, $format);
    if ($return !== FALSE) {
      return $return;
    }
  }
?>

and for imports:

<?php
  // Allow other modules to take over this entire process.
  $return = FALSE;
  drupal_alter('node_export_node_decode', $return, $string);
  if ($return !== FALSE) {
    return $return;
  }
?>

and bulk exports

<?php
  // Allow other modules to take over this entire process. Typically the module 
  // would respond if $format was set to something it recognises.
  $node_code = FALSE;
  drupal_alter('node_export_node_bulk_encode', $node_code, $nodes, $format);
?>

So the idea so far is this...

For exports of single nodes you would implement the hook like so

<?php
YOURMODULE_node_export_node_encode_alter(&$return, $var, $format) {
  // your code here
}
?>

Initially $return will be FALSE, if you change $return to something else, you will override node_export's default node code to whatever that value is.
$var contains the node object.
Now you could either decide to always override the $return value (which would be a problem if multiple modules decided to do that as only one of them would work), but it would be better to only intervene when $format is set to some string that your module recognises. The idea is that if we went to the URL node/40/node_export/xml it would try to export node 40 with 'xml' as the $format. I don't know if that work, but that's the idea.
(You do not need to actually return anything)

For imports, the hook is implemented like this:

<?php
YOURMODULE_node_export_node_decode_alter(&$return, $string) {
  // your code here
}
?>

No $format this time, I'm hoping we can automate that by detecting something in the node code or attempting to parse it and reject it if it's inappropriate.
For the import you would have to change $return to something that correponds to what the default handling of imports in node_exports would do, an array of node objects, or just a node object.
(don't return anything)

Now you also need to handle bulk exports for completeness

<?php
YOURMODULE_node_export_node_bulk_encode_alter(&$node_code, $nodes, $format) {
  // your code here
}
?>

again, $node_code starts off as FALSE, if you change it you override it
Your code would ideally loop through $nodes, pass them off to the function you created earlier to export single nodes, and then somehow concatenate all the returns from that function into $node_code.
(again, you don't actually return anything)

You should also implement your own hook_node_operations() following what node_export did, but changing the label slightly, and adding the 2nd callback argument that is your $format string. But keep the callback the same as node_export, don't replace that.

These changes will appear in the next dev release.
Now I haven't tested any of this, so let me know if it's unsuitable or just doesn't work, etc...

benford’s picture

Thanks for doing this. Has this been added to the dev release yet? I'd like to see these changes inline with the existing code to ensure that I don't make any wrong assumptions. I have export and bulk export working just fine; I'll be adding filtering functionality next.

Importing is going a bit rough. I'll implement that as I have time but I'll be glad to share once I have exporting implemented with the new hooks mentioned in post #10.

danielb’s picture

Yeah they should be in dev

danielb’s picture

Was any more progress made on this? I am considering going ahead with this myself because it seems fairly easy? There are quite a lot of resources on converting associative arrays to xml and vice-versa, which should be all that is needed I think?

benford’s picture

Sorry, development time got cut for this particular project and I have both not had time for it, and failed to notify anyone else. My apologies. If you wish to implement this, go on ahead.

Thanks.

danielb’s picture

Just a note to myself to remember to check/update the api file for changes I've made as a result of this thread.

edit: Done that now.

roball’s picture

Title: XML Export Format » XML and CSV Export Format
Version: 6.x-2.21 » 6.x-2.x-dev

So when do you think Node export will offer XML or CSV export format? That would be such an improvement! Thanks.

danielb’s picture

Title: XML and CSV Export Format » XML Export Format

Just focusing on XML at the moment. I've started looking into implementations. I think the best way would be to keep it generic but include as much knowledge as possible.

<node type="object" class="stdClass">
  <nid type="int">34</nid>
  <title type="string">My Node</title>
   etc...
</node>
danielb’s picture

Might be able to leverage this module http://drupal.org/project/xml_parser

danielb’s picture

turns out type via gettype() is very unreliable as most numeric value are treated as strings, so I'm ditching that as it would be misleading more than anything.

danielb’s picture

Status: Active » Fixed

I have committed an XML implementation, use with caution for now I guess.

roball’s picture

Great news - thanks for adding XML support. Will try it asap.

danielb’s picture

FYI have also added a CSV format, it should appear in the next snapshot

roball’s picture

Version: 6.x-2.x-dev » 6.x-3.x-dev
Component: Node Export » Node export
Status: Fixed » Needs work

Thanks a lot. I have installed the latest 6.x-3.x-dev (2010-Dec-07) of the Node export package and enabled the Node export and Node export CSV modules. I have CSV exported one simple page containing a newline within the body's HTML code.

The resulting CSV file contains the newline as is in both the body and teaser fields:

"'<p>Paragraph 1</p>
<p>Paragraph 2</p>'"

In addition, the last field - export_display - also contains several newlines:

'$display = new ;
$display->layout = \'\';
$display->layout_settings = \'\';
$display->panel_settings = \'\';
$display->cache = \'\';
$display->title = \'\';
$display->content = array();
$display->panels = array();
$display->hide_title = PANELS_TITLE_FIXED;
$display->title_pane = \'0\';
'

Of course a CSV file should only contain one single line per record (without any newlines).

danielb’s picture

Status: Needs work » Fixed

Of course a CSV file should only contain one single line per record (without any newlines).

New lines (\n) are perfectly fine in CSV data.

Additionally the CSV export rows are delimited by CRLF (\r\n) and any data fields containing CRLF are enclosed in double quotes as per RFC 4180.

http://tools.ietf.org/html/rfc4180

It will work in Excel and it should import back correctly as well.

roball’s picture

OK but look at the value of the 'export_display' field. It contains several newlines but the whole value is *not* enclosed within double quotes. but I think we should better open a new issue for this.

danielb’s picture

You example doesn't contain CRLF line breaks, so it doesn't need double quotes.

roball’s picture

Note that \r\n line delimiters may not be recognized by all OSes (for example, MacOS < 10), so I would not count on \r\n being distinguished from a single \n on all platforms.

danielb’s picture

In fact mac OS used to use \r. So \r\n will at least create a break on windows, old macs, and unix, I think that is why \r\n was recommended by the RFC. It doesn't matter anyway, when interpreted by a CSV parser it should be treated as a record delimiter, not strictly a 'line break'.

Status: Fixed » Closed (fixed)

Automatically closed -- issue fixed for 2 weeks with no activity.