Currently this module does everything in one hunk, which can be a problem for people wanting to do larger operations.
Both imports and exports should support batch processing which will greatly improve support for integration modules.

Comments

danielb’s picture

bojanz’s picture

This should work by opening a file when export starts, and appending each node. That's how the "Create an archive of selected files" in VBO changed from 6.x to 7.x

danielb’s picture

Disagreed. We cannot assume output will be in a file. Also the method of simply 'appending' to previous code will be unsuitable for most (all?) output formats.

bojanz’s picture

How else are you going to batch the export then, without saving the output of each batch? I see no other way.

danielb’s picture

My plan isn't great, but it would involve working out the code for the one batch of nodes, each node separately, and storing it in the database, to be fetched later when the final result is delivered. So there's still an overhead at the end which is proportional to how much code is being output, which is still going to give us an upper limit on how much can be done.

Problem is the functionality that 'glues' it all together, and does things like CSV headers, has to all be done at once, and preferably once the node data has been 'looked at' - so this has to be done at the end.
Some of the formats, like serialize() may have to be changed a bit to make this work.

The import would have to do the reverse, and split up the incoming code into the code for each node, then the batches can convert each of those pieces of code into the nodes.

But there should also be a way to bypass the batching as some callers will not be able to support this, specifically things like hook_node_operations cannot associate what happens to each node with the other nodes in the operation.

bojanz’s picture

I think you will gain very little by doing that. You're still loading a lot of stuff from the database, so max_packets and memory_limit limitations still stand.
That's why I said you need the file.

danielb’s picture

In order to return the export to another function (for the integration modules), or to pass it to Drupal Render or Forms API in order to print it on a page, the export code is going to have to be put into a string at some point anyway, so there is always going to be that limit on how much you can do.

Anyways, I created this issue mainly with imports in mind, as that is where the bottleneck seems to be. I have heard a few complaints about limits being reached on imports - which implies they were able to export without any problems.
I don't think it's the amount of code in the string either; it's the fact that hundreds/thousands of $node objects are being created and saved on one request.

I don't want to rush into it though, and I'll have a good think about what you've said.

There's some sort of idea floating around in my head about passing information around purely as files, and using an iframe or javascript or something to display the contents if it needs to be shown, but it isn't fully baked yet.

danielb’s picture

Just realised that batching will also heavily affect node_export_relation, which will attempt to iterate through references all in one hit. Not sure how to get around that.

danielb’s picture

Category: task » feature
danielb’s picture

Status: Active » Postponed

Since it's is a pretty big change to accommodate this feature, I am postponing it for a future round of development. At this point I just want to focus on getting a full release for the 3.x branch.

danielb’s picture

Issue summary: View changes

I think the way batching would work is that imported nodes are first stored as temporary data in the database. Any "after import" operations should occur on that stored data, rather than on data in memory. Then once all that is done the temporary data can be moved over into permanent storage. This way the heavy lifting can occur via cron/ajax and some kind of drush method as well.