Imports not working

styro - September 24, 2006 - 01:30
Project:Book Import
Version:4.7.x-1.x-dev
Component:Code
Category:bug report
Priority:normal
Assigned:Unassigned
Status:active
Description

I can't seem to get bookimport to work - even with small test books.

There seem to be two major issues:

1) The parents of each node aren't being set properly, so each node in the book appears as its own book when looking at the /book path.

2) Something else is screwy with the permissions on the new nodes, and they can't even be seen (let alone edited) by uid 1. The site importing the book, doesn't have any node_access modules installed - but the exporting site does have node_privacy_by role.

From the DEBUG logs the query in make_links that sets the nid and parent is looking for rows where vid=0 for some reason. It looks to me as though the earlier node_save step saved actual vids (which currently are the same as all my nids), so the make_links query doesn't end up changing anything. I haven't yet been able to figure out why make_links thinks the vids are 0 though.

eg: snippet from DEBUG log:

'Parent of' array:Array
(
    [413] => 411
    [411] => 410
    [412] => 410
    [410] =>
)
Old nid => New nid:Array
(
    [413] => 677
    [411] => 678
    [412] => 679
    [410] => 680
)


UPDATE {book} SET   nid=677,   parent=678,   weight=0 WHERE  vid=0
parent[413 (677)] = 411 (678)
UPDATE {book} SET   nid=678,   parent=680,   weight=0 WHERE  vid=0
parent[411 (678)] = 410 (680)
UPDATE {book} SET   nid=679,   parent=680,   weight=0 WHERE  vid=0
parent[412 (679)] = 410 (680)
UPDATE {book} SET   nid=680,   parent=0,   weight=0 WHERE  vid=0
parent[410 (680)] =  ()

eg: sample node object (with vid) from DEBUG log:

Array
(
    [type] => book
    [uid] => 3
    [status] => 1
    [created] => 1159054984
    [promote] => 0
    [moderate] => 0
    [changed] => 1159054984
    [sticky] => 0
    [vid] => 413
    [format] => 1
    [md5_body] => 872ccd9c6dce18ce6ea4d5106540f089
    [weight] => 0
    [depth] => 3
    [author] => Anton
)

#1

styro - September 24, 2006 - 03:36

Updates:

Issue 1)

If I hack the make_links function to use $new_value for $node->vid (which is the case on the test books I'm using - there are no extra node revisions) the import will set up the parent nodes correctly. It seems as though the node_load at the top of make_links isn't bringing in the correct $node->vid for some reason.

This isn't enough to allow our large production book to work though. I suspect the issue might be in export_dxml though - the number of nodes in the xml file is less than the number of nodes in the book. This book also gets the headers already sent error reported here: http://drupal.org/node/72589 - I'm not sure if this is related yet.

Issue 2)

Also regarding the node access issue - that was caused by the importing site not having the same author uids as the exporting site. Matching those up fixed that problem.

#2

puregin - September 24, 2006 - 10:57

Styro, thanks for your issue reports.

I'm glad that you made progress on the first issue, and solved the second. I'll need to add some checks to map UIDs or make content authored by 'alien' UIDs owned by admin. Now that I think about this, it seems to me that I did somethink like this, but it seems I didn't check that code in.

The headers already sent issue is likely caused by my hackish debugging output.

Could it be that some of your pages are not book pages but rather pages of other types included using the 'outline' tab?

I'm currently in Brussels at DrupalCON, but I'll have a look at this when I return to Vancouver in a day or two.

#3

styro - September 25, 2006 - 01:34

export_dxml is working properly after all. I'm not sure why an earlier export was missing nodes, but more recent ones have output everything they should.

Could it be that some of your pages are not book pages but rather pages of other types included using the 'outline' tab?

Yep, I suspected it had something to do with that. The book in question has well over a hundred nodes authored with TinyMCE, and wasn't written by me. There is all kinds of nastiness lurking in it :)

To get a better handle on the book, I ran this code using the devel module....

<?php
function prefunc($node,$depth,$nid) {
  echo
"<tr><td align=\"left\">";
  echo
str_repeat("&nbsp;&nbsp;", $depth) . $node->title;
  echo
"</td><td>$depth</td><td>$nid</td>";
  echo
"<td>$node->parent</td></tr>";
}

print
"<table>";
print
"<tr><th>Title</th><th>depth</th><th>nid</th><th>parent</th></tr>";
print
book_recurse(75, 1, prefunc, ""); // the nid of the books root node is 75
print "</table>";
?>

...and sure enough there were about 10 stories in there. The interesting thing (for me at least) was that even though book_recurse and export_dxml etc will handle them all properly, they had no $node->parent. I suspect this is a limitation of the book modules nodeapi stuff (possibly by design?).

After changing them all to book nodes, things went better. Still missing one node though.

Also only about 5% of the imported book nodes kept their original weight. All the others have been overwritten as 0 even though the xml file has the correct weights recorded. I'll be looking into that as well :)

#4

styro - September 26, 2006 - 02:22

Hmmm more strangeness...

I have two test import instances trying to import the book from hell. One is an practically empty clean site, the other is a copy of the production site the book needs to go. Both sites are importing the same xml file.

On the empty site, the book ends up with one missing node and nearly all the node weights reset to zero.

On the 'production' site (a copy not the real thing), the parents are all screwed up although the weights have come through (not sure if they are all correct though). Nodes appear in other books, and even weirder changing the parent nodes brings out some weird behaviour like the setting not sticking or the same node id jumping to another node altogether. Something is seriously corrupted - the php code I used above to list the book contents goes into an infinite recursion that crashed my browser - even though the book outline view called from the same node doesn't do that.

I'm a bit lost. I'll have to try again and painstaking compare the parent and new nid arrays from both sites I suppose.

#5

styro - September 26, 2006 - 04:27

More updates:

Bringing the same XML into a completely empty (not just practically empty) out of the box site with no existing nodes seems to have worked properly just by browsing through the book. The books outline page doesn't seem to want to load though - after a long wait it just produces a blank page (probably after exhausting all the memory). There are no orphaned pages and only one book, and the printer friendly version loads though.

Importing that XML into a copy of our production site still produces the same random mess though. This makes me suspect some issue with the site the book is being imported into. Although I haven't found any yet, is it possible that things like duplicate node titles could cause issues?

#6

styro - September 26, 2006 - 06:18

Success:

The book corruption issue was caused by the hack I made in comment #1 to get the make_links function setting the parents and weights properly.

Basically because all our book nodes had no extra revisions, I hard coded the new vids to be the same as the new nids. This works OK on my mostly empty test sites. But on my production site the next revision id was higher than the next node id, hence book nodes getting set to the revisions of other nodes and a corrupt book.

A successful workaround (ie hack on top of my original hack) was to change the sequences table so that the next node id was the same as the next revision id.

I might try to come up with a better fix for the original issue later :)

#7

shakey - April 13, 2007 - 16:35

Could someone post a simple book page for import for me to check my system. I have not been able to import (always get the "warning: Cannot modify header..."). I have exported a simple book using the export Drupal XML module. Then I modify it and try to import either by "updating" or "creating a new" book (with the "Allow PHP content to be created, but set type to Full HTML" setting).

In the simplest case, I added a character to each title of the exported book, and then tried to import. I have used several editors, most recently OpenOffice.org, and saved (on an XP machine) as text: UTF-8 with CR, LF, and CR & LF, to no avail.

I openly admit to being a moron, so I appreciate any comments. thx

 
 

Drupal is a registered trademark of Dries Buytaert.