Hello,

I'm currently working on a migration script which converts data from Drupal 4.6 to Drupal 5.3.

The overall structure of my script is:

$res = db_query("SELECT * FROM {table}");   // select the data to migrate
while ($o = db_fetch_object($res)) {
  $node = new stdClass();  // build a new node
  $node->type = $o->type;
  $node->title = $o->title;
  // define more node properties
  node_save($node);
}

NB. The tables that need to be migrated have been imported in the current database first.
NB2. I can't convert the tables in SQL directly, I need to do a node_save() because I need all the actions triggered by node_save() to happen.

If I limit the number of records in the SELECT statement (< 1000 records), everything goes fine.

But if I try to select them all (there are 3.5 million of them), the script halts after a while (about 10 minutes) saying PHP has no memory available. If I track the script's memory usage, it remains stable for a while (around 30 Mo) and then goes out of control (hundreds of Mo).

It seems that memory usage is proportional to the number of records I select. That's odd: shouldn't the records be processed one at a time and the memory usage remain stable?

Any idea how I should modify this script to make it work?

Thank you.

Vincent

Comments

criznach’s picture

I'm assuming you have some unique ID. Try something like this...

SELECT * FROM {table} WHERE `unique_id` > 'yyyyyy' LIMIT 100;

Then execute this multiple times. 'yyyyyy' is the highest unique_id from the previous iteration. You can even keep a running total of execution time and if you get close to the limit, save your state in a variable and redirect back to the same page to pick up where you left off.

vincentc’s picture

OK, will do. Thanks a lot.

------------
Web Developer in Lille, France
Création de sites Web Drupal - Tutoriels Drupal - Formation Drupal

jacquesm’s picture

I think I've tracked it down to a reference counting problem in PHP5, I'm not sure if the problem is platform dependent (I'm using a 64 bit machine), batching the imports seems to be the only workaround available right now.

If you dump all the variables that PHP knows about you'll see that that part is ok, but the memory consumption of the executable is not, which seems to indicate a leak in PHP.

I'd made a thread about this before I found your message here:

http://drupal.org/node/208052

The situation is very similar, I think that normally people don't call node_save frequently enough to encounter the problem. There is probably more to this than meets the eye, on my system just calling the bootstrap already consumes more than 16M!