Support for Drupal 7 is ending on 5 January 2025—it’s time to migrate to Drupal 10! Learn about the many benefits of Drupal 10 and find migration tools in our resource center.
Currently this module does everything in one hunk, which can be a problem for people wanting to do larger operations.
Both imports and exports should support batch processing which will greatly improve support for integration modules.
Comments
Comment #1
danielb CreditAttribution: danielb commented#1273826: Add VBO / Actions support is postponed on this.
Comment #2
bojanz CreditAttribution: bojanz commentedThis should work by opening a file when export starts, and appending each node. That's how the "Create an archive of selected files" in VBO changed from 6.x to 7.x
Comment #3
danielb CreditAttribution: danielb commentedDisagreed. We cannot assume output will be in a file. Also the method of simply 'appending' to previous code will be unsuitable for most (all?) output formats.
Comment #4
bojanz CreditAttribution: bojanz commentedHow else are you going to batch the export then, without saving the output of each batch? I see no other way.
Comment #5
danielb CreditAttribution: danielb commentedMy plan isn't great, but it would involve working out the code for the one batch of nodes, each node separately, and storing it in the database, to be fetched later when the final result is delivered. So there's still an overhead at the end which is proportional to how much code is being output, which is still going to give us an upper limit on how much can be done.
Problem is the functionality that 'glues' it all together, and does things like CSV headers, has to all be done at once, and preferably once the node data has been 'looked at' - so this has to be done at the end.
Some of the formats, like serialize() may have to be changed a bit to make this work.
The import would have to do the reverse, and split up the incoming code into the code for each node, then the batches can convert each of those pieces of code into the nodes.
But there should also be a way to bypass the batching as some callers will not be able to support this, specifically things like hook_node_operations cannot associate what happens to each node with the other nodes in the operation.
Comment #6
bojanz CreditAttribution: bojanz commentedI think you will gain very little by doing that. You're still loading a lot of stuff from the database, so max_packets and memory_limit limitations still stand.
That's why I said you need the file.
Comment #7
danielb CreditAttribution: danielb commentedIn order to return the export to another function (for the integration modules), or to pass it to Drupal Render or Forms API in order to print it on a page, the export code is going to have to be put into a string at some point anyway, so there is always going to be that limit on how much you can do.
Anyways, I created this issue mainly with imports in mind, as that is where the bottleneck seems to be. I have heard a few complaints about limits being reached on imports - which implies they were able to export without any problems.
I don't think it's the amount of code in the string either; it's the fact that hundreds/thousands of $node objects are being created and saved on one request.
I don't want to rush into it though, and I'll have a good think about what you've said.
There's some sort of idea floating around in my head about passing information around purely as files, and using an iframe or javascript or something to display the contents if it needs to be shown, but it isn't fully baked yet.
Comment #8
danielb CreditAttribution: danielb commentedJust realised that batching will also heavily affect node_export_relation, which will attempt to iterate through references all in one hit. Not sure how to get around that.
Comment #9
danielb CreditAttribution: danielb commentedComment #10
danielb CreditAttribution: danielb commentedSince it's is a pretty big change to accommodate this feature, I am postponing it for a future round of development. At this point I just want to focus on getting a full release for the 3.x branch.
Comment #11
danielb CreditAttribution: danielb commentedI think the way batching would work is that imported nodes are first stored as temporary data in the database. Any "after import" operations should occur on that stored data, rather than on data in memory. Then once all that is done the temporary data can be moved over into permanent storage. This way the heavy lifting can occur via cron/ajax and some kind of drush method as well.