Using the Table Wizard http://drupal.org/project/tw and the Migrate module http://drupal.org/project/migrate I imported 3500 articles into OpenPublish (Drupal). It was very intimidating but I just followed this tutorial carefully and it was easy. http://www.lullabot.com/articles/drupal-data-imports-migrate-and-table-w...
The best thing to do is to start with a CSV file with only a couple articles. Import it through the table wizard. Do a test run with the migrate module. Then import the entire article database through phpMyAdmin because the table wizard doesn't import more than 400 (on my system at least) and it doesn't handle UFT8 encoding (from my experience).
Fields -- columns in the CSV file -- that you need at minimum:
1. Author Name
2. Title
3. Body
The rest can be empty or you can set the default. Easy, easy, easy.
How I arranged the Migrate Configuration of the node type.
1. ID# -- This can be empty but putting the column in the CSV file saves one step later. All other fields are VARCHAR and this will be VARCHAR when imported. Through phpMyAdmin you should hit the pencil icon and change it to INT. Then you need to click the primary key icon. Once this is done re-analysis the table with the Table Wizard.
2. Author -- I needed to create a user with the same exact name first in the Drupal site. There needs to be a one to one correlation. I copied all the author names in the csv file then created a new table in Excel with Full Name, First Name, Last Name, User Name and Email Address (made up the email address by appending my domainname with an @ to their usernames), and Password (all the same). I used the User Import module to bring in the 300 users giving them the role of Author.
3. Published Date -- I think this is in the form YYYY-MM-DD HH:MM.
4. Published -- Change the default column to the number 1 .
5. Node: Import Format -- Because of HTML code used in the import files I changed the default to the written value of advanced .
6. Language -- I just put the default value of en .
7. Taxonomy -- I created a taxonomy vocabulary before I imported the articles using the exact term -- the characters have to be one for one --
as in the CSV file that I'm importing.
8. Show author info -- I set this to 1 .
9. Main Image source file path -- I dumped all the main images of the articles being imported into one folder at sites/default/files/imported/ . In the old database it was stored with just the image name. I needed to append the path sites/default/files/imported/ to each file name in the CSV file before importing.
That was pretty much it. Oh, one more thing, if there is a mismatch between an article that has a image location and the actually image location the Migrate module will not import that article and show you an error at the end. However, if the Migrate module imports and article without an author or the author's name in the CSV file does not correspond to any user name it's left blank. This weird because you will see the teaser but if you click on the link to see the article node/page you will get a page not found error. The way to find these articles without users attached to them is to use the Node Uid Cleanup module. Well, that module doesn't exist anymore.
I wish I knew where this came from. Make a folder in the module directory called nodeuidcleanup and put these two files in it.
Make a file nodeuidcleanup.module and put this php code in it.
<?php
// $Id$
/**
* @file
* Fix or remove nodes with a uid that has no corresponding user.
* See http://drupal.org/node/259632 for more info.
*/
/**
* Get list of broken nodes.
*/
function nodeuidcleanup_getNodes() {
$sql = "SELECT n.nid, n.title, n.uid
FROM {node} n
LEFT JOIN {users} u
ON n.uid = u.uid
WHERE u.uid IS NULL";
$query = db_query(db_rewrite_sql($sql));
$result = array();
while ($data = db_fetch_object($query)) {
$result[] = $data;
}
return $result;
}
/**
* Set node uids to zero.
*/
function nodeuidcleanup_updateuid() {
$sql = "UPDATE node n
LEFT JOIN users u
ON n.uid = u.uid
SET n.uid = 5
WHERE u.uid IS NULL";
db_query($sql);
}
/**
* Implementation of hook_menu().
*/
function nodeuidcleanup_menu() {
$items['admin/settings/nodeuidcleanup'] = array(
'title' => 'Node uid cleanup',
'page callback' => 'nodeuidcleanup_page',
'access callback' => TRUE,
'type' => MENU_NORMAL_ITEM,
);
return $items;
}
/**
* Menu callback.
*/
function nodeuidcleanup_page() {
$nodes = nodeuidcleanup_getNodes();
if ($nodes) {
$headers = array('Node ID', 'Original User ID', 'Node Title');
$rows = array();
foreach($nodes as $node) {
$rows[] = array(
$node->nid,
$node->uid,
$node->title
);
}
$html = theme('table', $headers, $rows);
$html .= drupal_get_form('nodeuidcleanup_setuserzero');
$html .= '<br />';
$html .= drupal_get_form('nodeuidcleanup_deletenodes');
return $html;
}
else {
return '<p>'. t('There are no broken nodes to fix.') .'</p>';
}
}
/**
* Implementation of forms.
*/
function nodeuidcleanup_setuserzero() {
$form['submit'] = array(
'#type' => 'submit',
'#value' => t('Set broken node uids to Staff Report')
);
return $form;
}
function nodeuidcleanup_setuserzero_submit($form, &$form_state) {
nodeuidcleanup_updateuid();
drupal_set_message(t('Node uids updated.'));
$form_state['redirect'] = 'admin/settings/nodeuidcleanup';
}
function nodeuidcleanup_deletenodes() {
$form['submit'] = array(
'#type' => 'submit',
'#value' => t('Delete broken nodes')
);
return $form;
}
function nodeuidcleanup_deletenodes_submit($form, &$form_state) {
$nodes = nodeuidcleanup_getNodes();
// node_delete() won't work yet since it can't load the broken nodes
// so first we fix the uids
nodeuidcleanup_updateuid();
foreach($nodes as $node) {
node_delete($node->nid);
}
$form_state['redirect'] = 'admin/settings/nodeuidcleanup';
}Make a file nodeuidcleanup.info and put this php code in it.
; $Id$
name = Node uid cleanup
description = Fix or remove nodes with a uid that has no corresponding user.
version = ""
package = Other
core = 6.x
Comments
Comment #1
Adam S commentedSorry if that last post was a little discombobulated.
I just discovered I made a minor mistake.
There are places in Drupal that use the Last Update date and not the Post date, such as in search results, so it makes sense when importing into openpublish through the Migrate module to also map the created date to the Last Update field.
Comment #2
mmorris commentedadamsohn,
Great tutorial! We're big fans of the Table Wizard and Migrate modules too. Thanks for posting.
Mike
Comment #3
mmorris commentedadamsohn,
Great tutorial! We're big fans of the Table Wizard and Migrate modules too. Thanks for posting.
Mike
[Sorry for the double post - drupal.org is flaky and slow sometimes]