Add batch support [#1289754]

Comment #1

danielb CreditAttribution: danielb commented 24 September 2011 at 15:37

#1273826: Add VBO / Actions support is postponed on this.

Log in or register to post comments

Comment #2

bojanz CreditAttribution: bojanz commented 26 September 2011 at 14:37

This should work by opening a file when export starts, and appending each node. That's how the "Create an archive of selected files" in VBO changed from 6.x to 7.x

Log in or register to post comments

Comment #3

danielb CreditAttribution: danielb commented 27 September 2011 at 02:23

Disagreed. We cannot assume output will be in a file. Also the method of simply 'appending' to previous code will be unsuitable for most (all?) output formats.

Log in or register to post comments

Comment #4

bojanz CreditAttribution: bojanz commented 27 September 2011 at 07:42

How else are you going to batch the export then, without saving the output of each batch? I see no other way.

Log in or register to post comments

Comment #5

danielb CreditAttribution: danielb commented 27 September 2011 at 07:56

My plan isn't great, but it would involve working out the code for the one batch of nodes, each node separately, and storing it in the database, to be fetched later when the final result is delivered. So there's still an overhead at the end which is proportional to how much code is being output, which is still going to give us an upper limit on how much can be done.

Problem is the functionality that 'glues' it all together, and does things like CSV headers, has to all be done at once, and preferably once the node data has been 'looked at' - so this has to be done at the end.
Some of the formats, like serialize() may have to be changed a bit to make this work.

The import would have to do the reverse, and split up the incoming code into the code for each node, then the batches can convert each of those pieces of code into the nodes.

But there should also be a way to bypass the batching as some callers will not be able to support this, specifically things like hook_node_operations cannot associate what happens to each node with the other nodes in the operation.

Log in or register to post comments

Comment #6

bojanz CreditAttribution: bojanz commented 27 September 2011 at 14:35

I think you will gain very little by doing that. You're still loading a lot of stuff from the database, so max_packets and memory_limit limitations still stand.
That's why I said you need the file.

Log in or register to post comments

Comment #7

danielb CreditAttribution: danielb commented 28 September 2011 at 01:44

In order to return the export to another function (for the integration modules), or to pass it to Drupal Render or Forms API in order to print it on a page, the export code is going to have to be put into a string at some point anyway, so there is always going to be that limit on how much you can do.

Anyways, I created this issue mainly with imports in mind, as that is where the bottleneck seems to be. I have heard a few complaints about limits being reached on imports - which implies they were able to export without any problems.
I don't think it's the amount of code in the string either; it's the fact that hundreds/thousands of $node objects are being created and saved on one request.

I don't want to rush into it though, and I'll have a good think about what you've said.

There's some sort of idea floating around in my head about passing information around purely as files, and using an iframe or javascript or something to display the contents if it needs to be shown, but it isn't fully baked yet.

Log in or register to post comments

Comment #8

danielb CreditAttribution: danielb commented 28 September 2011 at 09:23

Just realised that batching will also heavily affect node_export_relation, which will attempt to iterate through references all in one hit. Not sure how to get around that.

Log in or register to post comments

Comment #9

danielb CreditAttribution: danielb commented 28 September 2011 at 10:33

Category:

task

» feature

Log in or register to post comments

Comment #10

danielb CreditAttribution: danielb commented 5 December 2011 at 08:43

Status:

Active

» Postponed

Since it's is a pretty big change to accommodate this feature, I am postponing it for a future round of development. At this point I just want to focus on getting a full release for the 3.x branch.

Log in or register to post comments

Comment #11

danielb CreditAttribution: danielb commented 7 March 2016 at 03:06

Issue summary:

View changes

I think the way batching would work is that imported nodes are first stored as temporary data in the database. Any "after import" operations should occur on that stored data, rather than on data in memory. Then once all that is done the temporary data can be moved over into permanent storage. This way the heavy lifting can occur via cron/ajax and some kind of drush method as well.