xml2node module documentation

This documentation provides an overview of the xml2node xml structure needed to import content and some use cases to inspire you.

XML Format

The main task of the module is importing content from XML-files into drupal nodes of any content type. There are a lot of different field types in each content type (basic fields, cck, etc...). Instead of converting xml elements into the required structure for each field type during the import (which would be a never ending task), we format the xml file very similar to the node-object (e.q. returned by node_load()).

Basic structure

The basic xml structure looks like this:

<?xml version="1.0" encoding="UTF-8"?>
<items>
    // ... contents ...
</items>

Representation of a content type

If you would like to import contents into a content-type named "article" you would created the following element:

<article type="contenttype">
    // ... field definitions ...
</article>

You can add as many content-type definitions as you want next to each other.

Adding basic fields

In a drupal node-object a basic content-type field (e.q. title) looks like this:

[title] => A sample Title 1

In the xml-file these fields are defined very similar:

<title type="field">A sample Title 1</title>

Adding cck fields

cck text field

In a drupal node-object cck text fields look like this:

[field_lead] => Array (
	 [0] => Array (
	       [value] => This is my lead text.
	       [format] => 1
	  )
 )

In the xml-file these fields are defined very similar:

<field_lead type="field">
	<value>This is my lead text.</value>
	<format>1</format>
</field_lead>

All other cck fields are defined in the same way. You simply check the structure in the node-object and add it to the xml file. Here are some more examples to make it clear:

cck filefield

structure in the node-object

[field_file] => Array ( 
	[0] => Array ( 
	    [fid] => 37 
	    [data] => Array (
		[title] => …
	    )
	    [uid] => 1 
	    [filename] => file.pdf
	    [filepath] => …/file.pdf
	    [filemime] => …
	    [filesize] => 55597
	    [status] => 1 
	    [timestamp] => 1282224197
	)
 )

definition in the xml file

<field_file type="field">
	<filepath>http://…/file.pdf</filepath>
	<filename>file.pdf</filename>
	<data>
            <title>…</title>
	    <description>….</description>
	    <alt>…</alt>
	</data>
        <delete_origin>true</delete_origin>
</field_file>

In the xml file, the filepath would be the path to the files original location. The file will be downloaded and added to drupal during the import process.

The "delete_origin" flag is optional, you can use this, if your path is a local path (no url). xml2node will delete the original file after it has been imported to drupal.

Nodereferenes

There are three ways to process cck node reference fields during the import:

Create and reference

Let's say you want to create 2 nodes. Node#1 is the parent node and Node#2 is the "child" that will be referenced in the parent node. Also you want to create those 2 nodes during one import. You would structure the xml file like this:

<content_type_node_1 type="contenttype">
    // ... here are all your fields for Node#1 ...
    // here comes the node you want to reference
    // ref is the node reference field of the parent
    <referenced_content_type_2 type=“contenttype“ ref=“field_reference”> 
	// ... here are your fields for Node#2 --> the same structure as always ...
        // you can also reference other nodes here... the depth doesn't matter.
    </referenced_content_type_2>
</content_type_node_1>
Normal reference (if you already know the nid)

If you already know the nid of the node you want to reference, you can do it straight forward:

<content_type_node_1 type="contenttype">
     // ... other fields ....
     // node reference field
     <field_reference type="field">
          <nid>12312</nid>
     </field>
</content_type_node_1>
Search and Reference (NEW in 6.x-1.x-dev)

If you don't know the nid, but you have a criteria, which nodes should be referenced (e.g. all the nodes that have the value "ASDF" in the cck textfield "field_identifier", should be referenced here), you can do it like this:

// ...other fields here...


ASDF

Note: At the moment the "ref_key"-Attribute can only be a cck field.

Usually when I migrate/import content from another system, I create a cck-textfield where I store the content's ID of the old system. And then I can reference related articles during the import by letting xml2node search for a specific ID and reference that node.

Adding Taxonomy fields

If you want to import taxonomy fields, you can add the following code structure to your xml file:

<content_type type="contenttype">
    <taxonomy type="taxonomy">
        <term take_parents="1" vid="5">
            term_name
        </term>
    </taxonomy>
</content_type>

The "vid"-Attribute defines the vocabulary-ID. The "take_parents"-Attribute can be 1 (true), if you want to add the term and all of its parent term (e.g. hierarchical taxonomy) to the node, or 0 (false), if you only want the defined term to be added to the node.

If the term already exists, it will just be references to the node. If the term doesn't exist in the vocabulary, it will be created during the import.

Creating redirects

If you migrate content from an old system, you maybe want to make sure, that the url's of the old system get redirected to the url of the newly created drupal node.

this can be done with the module "path_redirect". however, with xml2node you can create a path_redirect-Redirect directly during the import (if the path_redirect module is enabeld). To do this, simply add the attribute "redirect" to the content-type definition in the xml:

<?xml version="1.0" encoding="UTF-8"?>
<items>
	<article type="contenttype" redirect="http://www.oldsite.com/old-url">
           // ... your fields ....
        </article>
</items>

Comments (only 6.x-1.x-dev)

If you want to import comments along with the node, you can add comment elements inside the content-type element:

<?xml version="1.0" encoding="UTF-8"?>
<article type="contenttype">
    // .. field definitions and stuff ..
    // comment element definition
    <comment type="comment">
	<name>protyze</name>
	<subject>My comment's title</subject>
	<comment>The actual comment text</comment>
	<format>1</format>
	<mail>email@email.com</mail>
	<homepage>www.drupal.org</homepage>
        // optionally you can define a timestamp for the comment date
        // if timestamp is not defined, it will take now()
        <timestamp>123123123123</timestamp>

        // you may also define child comments that are shown in threads
        <comment type="comment">
	    <name>name</name>
	    <subject>My Comment Child</subject>
	    <comment>the text</comment>
	    <format>1</format>
	    <mail>email@email.com</mail>
	    <homepage>www.google.ch</homepage>
            // there can be other child comments here...
	</comment>
    </comment>
</article>

For every content-type field that has not been defined in an xml file, the xml2node module will fill in the default values defined in drupal during the import.

Sample XML

Below you find the example of an xml file that would import two nodes of the content-type "article" with the fields title, field_lead, field_record_id and field_file.

<?xml version="1.0" encoding="UTF-8"?>
<items>
	<article type="contenttype">
		<title type="field">A sample Title 1</title>
		<field_lead type="field">
			<value>This is my lead text.</value>
			<format>1</format>
		</field_lead>
		<field_record_id type="field">
			<value>1</value>
		</field_record_id>
		<field_file type="field">
			<filepath>http://drupal.org/sites/all/themes/bluebeach/logos/drupal.org.png</filepath>
			<filename>Drupal_Logo.png</filename>
			<data>
				<title>the files title</title>
				<description>the files description</description>
				<alt>the files alt tag</alt>
			</data>
		</field_file>
	</article>
	<article type="contenttype">
		<title type="field">A sample Title 2</title>
		<field_lead type="field">
			<value>This is my lead text.</value>
			<format>1</format>
		</field_lead>
		<field_record_id type="field">
			<value>2</value>
		</field_record_id>
		<field_file type="field">
			<filepath>http://drupal.org/sites/all/themes/bluebeach/logos/drupal.org.png</filepath>
			<filename>Drupal_Logo.png</filename>
			<data>
				<title>the files title</title>
				<description>the files description</description>
				<alt>the files alt tag</alt>
			</data>
		</field_file>
		<taxonomy type="taxonomy">
			<term vid="1">Begriff</term>
			<term vid="1">Begriff 2</term>
		</taxonomy>
	</article>
</items>

Update or Delete imported nodes

Already imported nodes can be updated or deleted during later import. To do this, simply add the attributes "action", "key" and "value" to the content-type definition in the xml.

action: u (for update), d (for delete)
key: The fieldname of a cck field (e.q. field_record_id)
value: The value of the key-field
The key value pair should be unique values that can be used for referencing a single node

The key and value attributes are used to identify the node that should be updated, so you should define a field with unique values as the key.

Here is an xml-example to update a node:

<?xml version="1.0" encoding="UTF-8"?>
<items>
	<article type="contenttype" action="u" key="field_record_id" value="1">
		<title type="field">A sample Title 1</title>
		<field_lead type="field">
			<value>This is my new lead text, which is longer than the old one.</value>
			<format>1</format>
		</field_lead>
		<field_record_id type="field">
			<value>1</value>
		</field_record_id>
		<field_file type="field">
			<filepath>http://drupal.org/sites/all/themes/bluebeach/logos/drupal.org.png</filepath>
			<filename>Drupal_Logo.png</filename>
			<data>
				<title>the files title</title>
				<description>the files description</description>
				<alt>the files alt tag</alt>
			</data>
		</field_file>
	</article>
</items>

Comments

bailz777’s picture

I have used xml2Node to import tons of new data/nodes, but now some of these nodes need to be updated through an api.

I followed the instructions above, but upadting is not working.

I am not using cck module to create my content types or additional fields on my forms, could this be the problem and is there a workaround for this?

I did a test and created a cck form and this I was able to update...

bailz777’s picture

I had a look through the module for the section where the node id was being searched for, I came across the sql query, which relies solely on users creating content only through CCK module where table names are prefixed with there various types.

This can be found in xml2node/includes/xml2node.nodecreator.inc
The function is public static function getNode($type, $key = NULL, $value = NULL)

I realized that I needed to do my own node population.

I needed to do two things:
I needed to wrap the the $field variable in an if statement:

if(isset($contentTypeInfo['fields'][$key])) {
      $field = $contentTypeInfo['fields'][$key];
      }

and then I added an elseif statement inline with the if statement if (isset($field) && is_array($field)) {

if (isset($field) && is_array($field)) {
        if ($field['db_storage'] == 0) {
          $sql = "SELECT node.nid AS nid FROM {node} LEFT JOIN {content_%s} AS field_table ON node.vid = field_table.vid WHERE (node.type in ('%s')) AND ((field_table.%s_value) = ('%s'))";
          $result = db_query($sql, $key, $type, $key, $value);
          $node = db_fetch_object($result);
        }
        elseif ($field['db_storage'] == 1) {
          $sql = "SELECT nid FROM {content_type_%s} WHERE %s='%s'";
          $result = db_query($sql, $type, $key . '_value', $value);
          $node = db_fetch_object($result);
        }

        if (isset($node) && is_object($node) && isset($node->nid)) {
          // clear node_load cache before loading the node
          $node = node_load($node->nid, NULL, TRUE);
          return $node;
        }
        else {
          return null;
        }
      }
      elseif($contentTypeInfo['type'] == 'my_content_type') {// my own addittion to the module that provide $node data for my content type my_content_type
        $sql = "SELECT nid FROM {my_table} WHERE %s='%s'";
        $result = db_query($sql, $key, $value); //The Key Valu pair should always be key="mid" value="Value of the mid" (mid is a unique value for each of my nodes of this content type)
        $node = db_fetch_object($result);
        if (isset($node) && is_object($node) && isset($node->nid)) {
          // clear node_load cache before loading the node
          $node = node_load($node->nid, NULL, TRUE);
          return $node;
        }
        else {
          return null;
        }
      }

This all came about because I need to do certain calculations and modifications on my field values at certain times and due to this I was not able to create my fields using the CCK Module, but defned my own content type and my own tables.

apienczy’s picture

This looks like an awesome module and very useful. I have one challenge to use it. The xml I intend to import into Drupal comes from an external provider and its structure has already been predefined. I can attempt to match node definitions to the structure itself however the file will not contain action code (create or update). In fact it will always be create if the node doesn't exist and update if it exists. Is the only way I can address it through writing code that will transform the xml to contain the action codes? In fact it will probably have to query nodes first to determine existence of the node and then create proper action entry. Would anyone have any other suggestions.