Programmatically creating nodes with drupal_execute
This is an informative post. I've already solved my problem, and thought I would document it here so that future sufferers may have some hope of a solution via Google or similar.
I work for an academic department at a large university. Every semester, the office staff gather directory information for all the employees of the department: faculty, non-faculty teachers such as graduate students, staff, and so on. The head of the department wanted to publish that directory info to the Drupal-powered departmental web site, instead of distributing paper copies as they had been.
The office staff in charge of collecting this data have been storing it in a FileMaker Pro database for the last few years, which was entirely adequate to their needs. I was tasked with producing a Drupal module that would allow the office staff to export the directory data from FileMaker as an XML file, and then import it all into Drupal in one fell swoop.
Producing a basic module defining a new content type for directory entries was fairly straightforward. The challenge lay in automatically importing the data from the XML file, creating new nodes for any new records, and updating any existing ones. For a long time, I couldn't figure out how to do this programmatically. Finally, I stumbled across the following code snippet in the API documentation for drupal_execute():
<?php
// Create a new node
$form_state = array();
module_load_include('inc', 'node', 'node.pages');
$node = array('type' => 'story');
$form_state['values']['title'] = 'My node';
$form_state['values']['body'] = 'This is the body text!';
$form_state['values']['name'] = 'robo-user';
$form_state['values']['op'] = t('Save');
drupal_execute('story_node_form', $form_state, (object)$node);
?>Once I had assembled an interface to upload the XML file, parsed the file using SimpleXML and saved each record's values into an array, I used a modified version of the above snippet to create the nodes. The FileMaker database has a field called "eid", which contains a unique ID for each employee. I used that to determine whether the record needed to be updated or created from scratch:
<?php
foreach($records as $record){
$EID = $record['eid'];
// Run a check to see if there's an existing node for this employee.
$query = 'SELECT nid FROM {drw_directory} WHERE eid = "%s"';
$result = db_result(db_query($query, $EID));
if($result === false){
// No existing node. Prep a new node object and execute the form to create one.
global $user;
$node = array(
'uid' => (string) $user->uid,
'name' => (string) $user->name,
'type' => 'drw_directory',
'language' => '',
'body' => NULL,
'title' => NULL,
'format' => NULL,
'status' => true,
'promote' => false,
'sticky' => false,
'created' => time(),
'revision' => false,
'comment' => '0',
);
$node = (object) $node;
} else {
// Node exists. Load it.
$nid = (int) $result;
$node = node_load($nid);
}
// Import node functions.
module_load_include('inc', 'node', 'node.pages');
// Set the form_state values to those of the record, and set the title as the employee's full name.
$form_state['values'] = $record;
$form_state['values']['title'] = $record['FirstName'].' '.$record['LastName'];
// And set some things for the form.
$form_state['values']['name'] = $node->name;
$form_state['values']['op'] = t('Save');
// Finally, run the form in the background.
drupal_execute('drw_directory_node_form', $form_state, $node);
}
?>Having finished the function, I tried it out ... and it worked! Hooray, bounce bounce.
For thoroughness, I went through three iterations of uploading the XML file (which had 100 records) in order to observe the results. It worked fine for initial creation of the nodes, but it behaved very oddly when updating existing nodes. When I uploaded the same file, it would say that it had updated all the nodes; then I noticed that the very first person in the directory, with the last name Alyea, was gone. Where that node had been, I had instead a duplicate copy of the data for the very LAST person in the XML file, by the last name Wolfe. What? And then when I uploaded the same file a third time, it updated the Wolfe record and then proceeded to create new, duplicate nodes for every entry in the XML file. I then had: three distinct nodes for the Wolfe entry, and two nodes for everyone else, except for Alyea, who only had one node because the inital Alyea node had gotten turned into a Wolfe node.
I thought maybe I was parsing the XML wrong, but that turned out not to be the case. I tried all sorts of stuff. The crucial clue came from examining the SQL generated by my implementation of hook_update(). As you know, Drupal uses a field called VID (version ID) to keep track of distinct versions of individual nodes. The SQL for the update was just fine, but at some point before my hook_update() function got called, the VID was being reset to the VID of the Alyea record, no matter which node was the correct one. Every time I uploaded the file, the existing Alyea node was getting updated a hundred times, and winding up containing the data from the last record in the XML file. But I didn't know where the VID was being reset, or why.
Eventually I resorted to stepping through the execution flow starting from my call to drupal_execute(), using drupal_set_message() to tell me what the VID was at strategic points in the code. Yes, this involved editing the core code, which is ordinarily a no-no. But I wasn't changing functionality, just making it spit out status messages.
Anyway, drupal_execute() called drupal_retrieve_form(), which called node_form(). It was in node_form() that I finally found the culprit. Apparently node_form() expects that the $form_state variable should contain a duplicate copy of the $node object, as an array. The first thing it does is overwrite the $node variable that's passed to it as an argument with the one that comes inside the $form_state array. I hadn't set such a thing on my $form_state variable, and so it was resetting it the way it thought the $node really OUGHT to be.
So the very simple solution was to add the following single line of code immediately before the call to drupal_execute():
<?php
$form_state['node'] = (array) $node;
?>That fixed everything. It now updates smoothly, creates new nodes when it needs to, and doesn't create nodes when it doesn't have to.
And figuring out that I needed to put that line there only took seven hours. Sometimes, Drupal can be a serious pain in the ass.
I still don't know why exactly it kept choosing the VID of the first record in the XML file as the one to update. Nor do I know why a third upload of the file would cause it to create duplicate nodes. But I don't much care at this point. It works, and I've got to move on to other stuff.
It might be a good idea to update the documentation for drupal_execute() so that that code snippet sets a copy of the $node object in its $form_state, to head off anyone else running into this problem.
I hope someone else finds this account useful.

Whoops
Crud, I mis-wrote the URL for the API link to node_form(). The correct URL is:
http://api.drupal.org/api/function/node_form/6
Thanks for sharing!
Thanks for sharing!
Sure.
You're welcome.
Thank you very much for
Thank you very much for sharing your work with community.
:)
Beautifulmind
Bravo!
Great job sussing out that one! Pretty darn tricky...
Yep, I was doing the exact same type of import script but found myself ditching the need for updating the nodes after a few iterations.
I myself am quite aghast at the idea that the Drupal API pages contain everything BUT detailed and useful examples. Most programmers I know learn by example, not strictly by reverse engineering the function.
The Drupal API could learn a bit from the incredible usefulness of php.net API page comments and examples.....
...of course php.net could probably learn a few things from drupal ;)
--TechNinja
Oh and one more thing.. you probably shouldn't be importing the node functions (
<?php module_load_include('inc', 'node', 'node.pages'); ?>) for every iteration of the loop through the record rows. Doing it once above the foreach should have you covered ;)Thanks for the bug squash
Ooo, that's exactly right! Good eye. Thanks! I'll go tweak that.
Thanks!
This saved me some major headaches, and is the best documentaion for creating multiple nodes I have come across.
I had missed out the
<?php$form_state['values']['op'] = t('Save');
?>
Thanks again for this.
Some issues I have found
It's been a while since I did the same exact thing. If i remember correctly for custom modules you will have issues with the validate and submit handlers when submitting the form manually.
Another issues was in the submit handler (of the custom module) if I would save the node it would run through an endless loop or submit it twice.
Some points to watch out for-
Sorry, but I did it much
Sorry, but I did it much more strait foreward
/*
* Generate the basic node information to save an empty node (copied from the node module)
*/
global $user;
$node = array('uid' => $user->uid,
'name' => (isset($user->name) ? $user->name : ''),
'type' => 'your_type',
'language' => '');
// load the node.pages.inc file if required
if( ! function_exists("node_object_prepare")) {
include_once(drupal_get_path('module', 'node') . '/node.pages.inc');
}
$node = (object)$node;
node_object_prepare(&$node);
// Give the node a title
$node->title = 'title you want';
// fill in further values required and then
node_save(&$node);
Regards
Werner
That is certainly one to do it.
There are two basic ways to create nodes programmatically.
1. the node functions
2. the node form
the advantage of the node form is that it should give you error checking if you enter a wrong value type i.e. characters for numbers. This is very valuable for custom modules that have, for example, a drop down list of choices. If the value submitted is not one of the choices in the drop down then the Forms API will throw an error before you even get to the validation functions.
You can bypass this functionality with node_save() and the specific type will have to manually sanitize and validate the results.
one more thing using the form the code tends to be cleaner as you can see in the example given above.
One drawback is that it takes longer to process
I find it easier
I find it easier to use drupal_execute. Mostly because I like to use some fields that get pre-processed before they are entered.
For example, using drupal_execute with free tagging fields is much easier, you don't need to fuss with any business trying to figure out what term id the tag is or anything like that because the module is designed for you to just submit data.
Either way, I can't wait for Drupal 7 and the promise of a unified way to handle fields, so CCK fields aren't structure one way and taxonomy another, etc. etc. Because my biggest headache is trying to figure out the exact structure of that $form_state array with many modules adding fields to a node in many different ways.
Question regarding $form_state['node']
@wdmartin
Thanks for the example, I found this post while looking to answer the "updating with drupal_execute" question on a similar thread about programatically creating nodes. I'm curious about the $form_state['node'] value and wanted to run my findings by you and see if we can come up with a definitive example for programatically updating a node. Perhaps we can also look into updating the API to show an example.
You pointed out the need for
<?php$form_state['node'] = (array) $node
?>
I did not seem to need to set this array value to update a single node. So, I looked at the node_form API and noticed, as you did, that the function does check for the presence of $form_state['node'] . If $form_state['node'] doesn't exist it simply uses the $node parameter. If $form_state['node'] does exist then it merges it with a casted $node parameter.
In your original update example a copy of the $node variable is cast into an array called $form_state['node'] . It seems by the time we reach the node_form() function the $node object and $form_state['node'] array could be duplicates, aside from one being an object and the other an array. Since $form_state['node'] is set it looks like the array merge
$node = $form_state['node'] + (array)$node;is just merging two identical values and we end up with yet another identical copy of the node's values as an array. Then, a few lines latter it casts back to an object.
I didn't dig in any further to understand the use case for merging these arrays, but it doesn't appear necessary for doing an update with drupal_execute(). We seem to be doing a couple extra expense object to array and array to object casts.
Also, I admit I didn't test looping through multiple updates at once and my example content type do_dad uses only a small mix of CCK field types. Perhaps it is in one of these use cases that the $form_state['node'] becomes necessary with drupal_execute().
In short, I thank you again for point out a working example. I hope you would text my example below, only passing the node_load() value as the third parameter to drupal_execute() . Here's an example that worked to load an existing node/31 and update it's title, body, and set it to unpublished.
<?php
$form_state = array();
module_load_include('inc', 'node', 'node.pages');
$node = node_load(31); // load node # 31
$form_state['values']['status'] = 0; // unpublish the node
$form_state['values']['title'] = 'My Newly Named Do Dad';
$form_state['values']['body'] = 'Blah Blah Body of an Edited Do Dad Goes Here';
$form_state['values']['name'] = 'admin';
$form_state['values']['op'] = t('Save'); // this seems to be a required value
$errs = drupal_execute('do_dad_node_form', $form_state, $node);
?>
OrangeCoat.com
Thanks
Just to say big thanks to author. It helped me a lot.
I did it with Taxonomy, Locations from CVS
// Author: Jingsheng Wang
// Email: skyred (^) live.com
// Date: Nov.18, 2008
<?php
define ( 'C_PPCSV_HEADER_RAW', 0 );
define ( 'C_PPCSV_HEADER_NICE', 1 );
class PaperPear_CSVParser {
private $m_saHeader = array ();
private $m_sFileName = '';
private $m_fp = false;
private $m_naHeaderMap = array ();
private $m_saValues = array ();
function __construct($sFileName) {
//quick and dirty opening and processing.. you may wish to clean this up
if ($this->m_fp = fopen ( $sFileName, 'r' )) {
$this->processHeader ();
}
}
function __call($sMethodName, $saArgs) {
//check to see if this is a set() or get() request, and extract the name
if (preg_match ( "/[sg]et(.*)/", $sMethodName, $saFound )) {
//convert the name portion of the [gs]et to uppercase for header checking
$sName = strtoupper ( $saFound [1] );
//see if the entry exists in our named header-> index mapping
if (array_key_exists ( $sName, $this->m_naHeaderMap )) {
//it does.. so consult the header map for which index this header controls
$nIndex = $this->m_naHeaderMap [$sName];
if ($sMethodName {0} == 'g') {
//return the value stored in the index associated with this name
return $this->m_saValues [$nIndex];
} else {
//set the valuw
$this->m_saValues [$nIndex] = $saArgs [0];
return true;
}
}
}
//nothing we control so bail out with a false
return false;
}
//get a nicely formatted header name. This will take product_id and make
//it PRODUCTID in the header map. So now you won't need to worry about whether you need
//to do a getProductID, or getproductid, or getProductId.. all will work.
public static function GetNiceHeaderName($sName) {
return strtoupper ( preg_replace ( '/[^A-Za-z0-9]/', '', $sName ) );
}
//process the header entry so we can map our named header fields to a numerical index, which
//we'll use when we use fgetcsv().
private function processHeader() {
$sLine = fgets ( $this->m_fp );
//you'll want to make this configurable
$saFields = split ( ",", $sLine );
$nIndex = 0;
foreach ( $saFields as $sField ) {
//get the nice name to use for "get" and "set".
$sField = trim ( $sField );
$sNiceName = PaperPear_CSVParser::GetNiceHeaderName ( $sField );
//track correlation of raw -> nice name so we don't have to do on-the-fly nice name checks
$this->m_saHeader [$nIndex] = array (C_PPCSV_HEADER_RAW => $sField, C_PPCSV_HEADER_NICE => $sNiceName );
$this->m_naHeaderMap [$sNiceName] = $nIndex;
$nIndex ++;
}
}
//read the next CSV entry
public function getNext() {
//this is a basic read, you will likely want to change this to accomodate what
//you are using for CSV parameters (tabs, encapsulation, etc).
if (($saValues = fgetcsv ( $this->m_fp )) !== false) {
$this->m_saValues = $saValues;
return true;
}
return false;
}
}
$o = new PaperPear_CSVParser ( '/home/greatbre/public_html/sites/default/files/import/TJS Account List.csv' );
while ( $o->getNext () ) {
$node->field_account = array (array ('value' => $o->getAcct (), 'safe' => $o->getAcct () ) );
$node->field_premise = array (array ('value' => $o->getOnOff (), 'safe' => $o->getOnOff () ) );
$node->title = $o->getDBAname ();
$node->created = time ();
$node->status = 1; //published
$node->promote = 0;
$node->sticky = 0;
$node->uid = 1;
$node->format = 1;
$node->readmore = FALSE;
$node->body = '';
$node->type = 'retailer'; //or whatever other content type you need
$node->locations = array (array ('street' => $o->getAddress (), 'city' => $o->getCity (), 'province' => $o->getstate (), 'postal_code' => $o->getZipCode (), 'country' => 'us', 'phone' => $o->getPhone (), 'source' => LOCATION_LATLON_USER_SUBMITTED ) );
node_save ( $node );
location_save_locations ( $node->locations, array ('nid' => $node->nid, 'vid' => $node->vid ) );
$node->nid = NULL;
}
echo 'done';
?>
Hi Jingsheng, Do you also
Hi Jingsheng,
Do you also have your cvs file you use?
Do I just need to copy your stuff inside a node with php?
Thanks a lot in advance for your reply!
greetings,
Martijn
Here is a better version with Image Attach, Taxonomy, Noderefere
I am sorry that I cannot give you the CVS files I used, since it's my company's asset.
Yes, you just create a page, enable php, then visit the page. But keep in mind, you need to check the php timeout setups if you have a large amount of data
<?php
define ( 'C_PPCSV_HEADER_RAW', 0 );
define ( 'C_PPCSV_HEADER_NICE', 1 );
class PaperPear_CSVParser {
private $m_saHeader = array ();
private $m_sFileName = '';
private $m_fp = false;
private $m_naHeaderMap = array ();
private $m_saValues = array ();
function __construct($sFileName) {
//quick and dirty opening and processing.. you may wish to clean this up
if ($this->m_fp = fopen ( $sFileName, 'r' )) {
$this->processHeader ();
}
}
function __call($sMethodName, $saArgs) {
//check to see if this is a set() or get() request, and extract the name
if (preg_match ( "/[sg]et(.*)/", $sMethodName, $saFound )) {
//convert the name portion of the [gs]et to uppercase for header checking
$sName = strtoupper ( $saFound [1] );
//see if the entry exists in our named header-> index mapping
if (array_key_exists ( $sName, $this->m_naHeaderMap )) {
//it does.. so consult the header map for which index this header controls
$nIndex = $this->m_naHeaderMap [$sName];
if ($sMethodName {0} == 'g') {
//return the value stored in the index associated with this name
return $this->m_saValues [$nIndex];
} else {
//set the valuw
$this->m_saValues [$nIndex] = $saArgs [0];
return true;
}
}
}
//nothing we control so bail out with a false
return false;
}
//get a nicely formatted header name. This will take product_id and make
//it PRODUCTID in the header map. So now you won't need to worry about whether you need
//to do a getProductID, or getproductid, or getProductId.. all will work.
public static function GetNiceHeaderName($sName) {
return strtoupper ( preg_replace ( '/[^A-Za-z0-9]/', '', $sName ) );
}
//process the header entry so we can map our named header fields to a numerical index, which
//we'll use when we use fgetcsv().
private function processHeader() {
$sLine = fgets ( $this->m_fp );
//you'll want to make this configurable
$saFields = split ( ",", $sLine );
$nIndex = 0;
foreach ( $saFields as $sField ) {
//get the nice name to use for "get" and "set".
$sField = trim ( $sField );
$sNiceName = PaperPear_CSVParser::GetNiceHeaderName ( $sField );
//track correlation of raw -> nice name so we don't have to do on-the-fly nice name checks
$this->m_saHeader [$nIndex] = array (C_PPCSV_HEADER_RAW => $sField, C_PPCSV_HEADER_NICE => $sNiceName );
$this->m_naHeaderMap [$sNiceName] = $nIndex;
$nIndex ++;
}
}
//read the next CSV entry
public function getNext() {
//this is a basic read, you will likely want to change this to accomodate what
//you are using for CSV parameters (tabs, encapsulation, etc).
if (($saValues = fgetcsv ( $this->m_fp )) !== false) {
$this->m_saValues = $saValues;
return true;
}
return false;
}
}
$o = new PaperPear_CSVParser ( '/home/greatbre/public_html/sites/default/files/import/testing_products_updated.csv' );
while ( $o->getNext () ) {
$node->field_webid = array (array ('value' => $o->getwebid(), 'safe' => $o->getwebid() ) );
$node->field_abv = array (array ('value' => $o->getabv () ) );
$node->field_self_style = array (array ('value' => $o->getselfstyle(), 'safe' => $o->getselfstyle() ) );
$node->field_best_with = array (array ('value' => $o->getbestwith(), 'safe' => $o->getbestwith() ) );
$node->field_brand = array (array ('nid' => $o->getnid () ) );
$node->field_availability = array (array ('value' => $o->getSeansonal(), 'safe' => $o->getSeansonal() ) );
$node->field_speci_attr = array (array ('value' => $o->getNiche(), 'safe' => $o->getNiche()) ,array ('value' => $o->getNiche2(), 'safe' => $o->getNiche2()),array ('value' => $o->getNiche3(), 'safe' => $o->getNiche3()) );
$node->field_cases = array (array ('value' => $o->getPackage1(), 'safe' => $o->getPackage1()) ,array ('value' => $o->getPackage2(), 'safe' => $o->getPackage2()),array ('value' => $o->getPackage3(), 'safe' => $o->getPackage3()) );
$node->field_kegs = array (array ('value' => $o->getDraft1(), 'safe' => $o->getDraft1()) ,array ('value' => $o->getDraft2(), 'safe' => $o->getDraft2()),array ('value' => $o->getDraft3(), 'safe' => $o->getDraft3()) );
$node->field_keg_conn = array (array ('value' => $o->getKeg(), 'safe' => $o->getKeg() ) );
$node->field_upc = array (array ('barcode' => $o->getRetailPackage(), 'title' => 'Retail Package') ,array ('barcode' => $o->getCase(), 'title' =>'Case'),array ('barcode' => $o->getBottle(), 'title' =>'Bottle') );
$node->title = $o->getProduct ();
$node->taxonomy = array($o->gettid ());
$node->iid = $o->getiid (); // Image Attach
$node->created = time ();
$node->status = 1; //published
$node->promote = 0;
$node->sticky = 0;
$node->uid = 1;
$node->format = 1;
$node->readmore = FALSE;
$node->body = $o->getprofile ();
$node->type = 'product'; //or whatever other content type you need
node_save ( $node );
$node->nid = NULL;
}
echo 'done';
?>
Hi Jingsheng,Thanks a lot
Hi Jingsheng,
Thanks a lot for your quick reply.
Is this a module please? Are the product functions from Drupal itself?
Can't you please give the header and first line on cvs?
That way I can build my own easier on that, than looking through the code with trial-and-error?
Thanks a lot in advance for considering this!
greetings,
Martijn
Getting the nid of the newly created node
The node's nid can be found at $form_state['nid'] after drupal_execute has successfully returned.
Took me FOREVER to find
Took me FOREVER to find this, THANK YOU.
took me forever too!!
thank you!!!! $form_state['nid'])!!!!
Here's a Version that will use CCK Defaults in D6
I kept running into the problem where no approach would prefill CCK fields with their defined defaults. This is how I ended up getting it to work. Might have worked through drupal_execute but after several hours I couldn't make that work so I went back to node_save and TADA!
$node = (object)array('uid'=>$uid,
'name'=>$username,
'type'=>$type,
'title'=>$title
);
$contenttype = content_types($type);
foreach ($contenttype['fields'] as $fieldname => $field) {
if(isset($field['widget']['default_value'])) {
$node->$fieldname = $field['widget']['default_value'];
}
}
node_save($node);
Thanks
Thank you,
I'm working on an very VERY similar module to import people data from a legacy database to Drupal in a Research Centre. I just managed to make it work when I found your post, but it helped me to assure i was right. I have not found yet problems with duplicities or missings.
Are you posting your work as a contributed module, or are you keeping it private?
If I got a customizable enough module I will publish it on CVS. But it's still quite specific for my database.
Thank you again for your work and your post.
---
Antonio Jesús Sánchez Padial
Centre for Humanities and Social Sciences - Spanish National Research Council (CCHS- CSIC)
Madrid, Spain
re-using the same $form_state
I think your problem may be that you are not initializing $form_state before you use it.
$form_state is fed into the execute by reference so that any changes the rest of the system makes are left cluttering up that array. Some of which may be confusing the process.
The way your code is structured, $form_state is being recycled repeatedly, as if you are pressing 'save' repeatedly on the same page with different results, when you actually want to be submitting many subsequent forms.
Just setting
$form_state = array()at the start of the loop (good programming practice also - protects you from PHP lazyness) should have fixed that..dan.