I am trying to understand whether I have a problem with the new VBO approach using own checkbox numbering for eg. node views when the views results may vary over time.

My question:
How does VBO behave when actions are executed concurrently:

Lets say I have a view that displays unpublished nodes and an action to publish them. For demonstration purposes the nodes are sorted by descending nids,so the newest comes first.
1. Administrator opens vbo view described above
2. user adds an unpublished node, which would appear first in the view
3. Administrator executes action on selected nodes

My question:
Does VBO ensure that the action is applied to the previously selected nids, or is a new view executed and actions applied to the first, second ... xth selected element.

Comments

infojunkie’s picture

Version: 6.x-3.x-dev » 6.x-1.10-beta2
Category: support » bug
Priority: Normal » Critical

Great point. VBO doesn't account for this. Marking as critical bug for 6.x-1.10 (note that 6.x-1.10-beta2 is also compatible with Views 6.x-3.x).

infojunkie’s picture

I've run a couple of quick tests against this scenario.

When selecting a few nodes off the current page, I found that adding new nodes before reaching VBO execution does NOT alter the selection.

On the other hand, when selecting ALL nodes across all pages, then the new nodes are indeed included in the selection.

Can you confirm this behaviour?

digi24’s picture

Priority: Critical » Normal
Status: Active » Postponed

Yes I can confirm your observations for the test case. I have now used the current 6.1 dev. The behaviour for all nodes across several pages is perfectly acceptable from my point of view.

I probably should have experimented with the test case first. The reason for my support request, the question how the id handling works, was because of some inconsistencies when overlapping or interdependent batch actions were executed at the same time.

Anyway I think I found my problem. VBO is clever to use the session variable to store the necessary information, but as far as I see, for a given user only one batch API action should be running at a time.

I will try to solve this problem myself and post a patch here, but it should not affect too many users, so I lowered the priority and postponed the bug.

Bodo Maass’s picture

I support the notion that a given user should only be able to run one batch api / vbo job at a time. I currently have a problem where users get data corruption by starting a long-running job again before it has finished. This should not even be possible.

bojanz’s picture

Version: 6.x-1.10-beta2 » 6.x-1.x-dev
Status: Postponed » Active

Reactivating.

bojanz’s picture

Title: Concurrent actions on interdependent views » Concurrent Batch API operations.

Retitling.

bojanz’s picture

Okay, so the problem is that VBO has code that looks like this (7.x-3.x):

    // Save the options in the session because Batch API doesn't give a way to
    // send a parameter to the finished callback.
    $_SESSION['vbo_options']['display_result'] = $options['display_result'];
    $_SESSION['vbo_options']['operation'] = $operation;
    $_SESSION['vbo_options']['params'] = serialize($params);

    $batch = array(
      'operations' => array(
        array('_views_bulk_operations_batch_process', array($rows)),
      ),
      'finished' => '_views_bulk_operations_execute_finished',
      'progress_message' => '',
      'title' => t('Performing %action on selected rows...', array('%action' => $operation['label'])),
    );
    batch_set($batch);

We store the values we need in the "finished" callback in the session. In total I guess that's between 15 and 30 keys & values.
Once you have more than one Batch API job running, the session gets overwritten.

We have several options:
1) Start passing the required params to the operation (right next to $rows), then in _views_bulk_operations_batch_process assigning those values to $context['results'] so that they are available in the "finished" callback. Not sure how heavy this is for Batch API, because $rows might already have thousands of entries.
2) Key the session storage by batch id (get the $batch through batch_get() and use $batch['id']). Not sure I can get the batch id again in the "finished" callback, so maybe #1 is the only option.

In general, we need to prevent two parallel Batch API jobs on the same type of data (nodes, for example). Or prevent all concurrent VBO batch api jobs?

What makes baby Jesus cry the least?

damien tournoud’s picture

$batch is serialized, so you should be able to store everything you need in there. You can access the batch definition from a batch context using &batch_get()

yched’s picture

Batch API indeed doesn't include any mechanism to prevent simultaneous batches from the same user. That would be something the calling code needs to implement (e.g put a lock before setting the batch, release the lock in the finished callback).

As for passing arguments to the finished callback :
- your proposal 1) would work. The overhead is extra unserialisation of the 'display_result', 'operation' and 'params' entries on each call of your batch op. I don't know how big those might be, but I'm not sure this is much compared to the exising $rows.
- proposal 2) would probably not work, because $batch['id'] is only set in batch_process() (called either explicitly or, more likely in your case, automatically by Form API), which then redirects to the 'progress bar' processing page.
- @Damz proposal would work - a batch definition is just serialized / unserialized. Although that's not an official feature ("put any custom property in there, we'll hand it to you at the other end"), I doubt this will change within the D7 cycle. Overhead is also more unserialization (+ serialization), but only once per HTTP request, not once per batch op call.

Also, a couple gotchas in this last approach :
- The finished callback doesn't explicitly receive the $batch array. You'll have to get it with batch_get().
- The definition array you pass to batch_set() is only one of the 'sets' within the $batch returned by batch_get() (there might be several submit handlers that want to batch stuff for a given form submit).
In most reasonable cases, $batch['sets'][0] is probably the one you want, but strictly speaking, it would probably be safer to iterate on $batch['sets'] and find yours in there :-(.

infojunkie’s picture

The reason I put everything in the session instead of the operation params is that they took much space and exhausted the memory on too many installations. I would go with the approach in #8.

bojanz’s picture

We store much less in 7.x-3.x (no view, no rows unless asked for), so it should be better.
6.x-1.x still stores too much, but that can and should be changed.

So, #8 + acquire a lock for 10s when batch api starts, renew it at the end of the process function (after a group of entities was processed), release it when done. If the lock can't be acquired, show an error message to the user.

My only remaining dilemma is whether to have a lock for all VBO operations, or a lock per entity type (allowing you paralel operations on users and nodes, for example). 99.99% of users won't care, but it's worth a thought.

digi24’s picture

Regarding the lock type and selection storage, I would like to add my usage experience:
I would find it helpful to have locks based on views, not on entities. I frequently have a situation of a very long-running VBO-Action, essentially blocking one of the admin users for several hours. In the meantime I may want to do other, completely unrelated stuff and completely mess up the whole operation by accidently opening other VBO-views.

With the new selection storage this should be better, but I do not see any major reason, why not to include the name of the view in the lock-key and thus enable the user to have multiple node operations at the same time at the cost of some potential overlaps.

bojanz’s picture

digi24, your post makes me think we shouldn't lock at all.
The changes described in #8 will allow parallel operations to run without killing each other.
It is then up to the user to not modify the same data in parallel (or face the consequences).

Locking per view would not allow people to use the same view for two different things (different actions on different selected nodes) while still allowing overlapping operations on the same data (same entity type). So you get nothing.

Btw, if you have operations that run for hours, I'm guessing they should go through the Queue, not through Batch API.

bojanz’s picture

Status: Active » Patch (to be ported)

http://drupalcode.org/project/views_bulk_operations.git/commitdiff/f4e6e...

Committed a fix for concurrency to 7.x-3.x (and removed hook_views_bulk_operations_finished() at the same time, but that doesn't need to go into 6.x-1.x).
Decided not to do any locking after all. Since the use cases differ so much, I think it's better to leave the responsibility of knowing what you're doing to the user, rather than limiting sometimes more than needed.

bojanz’s picture

Version: 6.x-1.x-dev » 7.x-3.x-dev
Status: Patch (to be ported) » Closed (fixed)

At this point of the release cycle (D6 contrib is dead, VBO included), it's safe to say VBO 6.x-1.x won't be getting changes of this scale.