Following recent discussion on development ML : http://lists.drupal.org/archives/development/2007-03/msg00028.html

Attached patch is an initial attempt to abstract the 'progress-bar operations' implemented by update.php into a generic API, that can be used by core and contrib modules to safely perform potentially time consuming operations without dreading a php timeout and the db inconsistencies which might follow.

For now I called 'batch' these 'à la update.php progressive operations' - not sure it's the best name, this can be discussed, obviously.

Examples :
- http://drupal.org/node/124727 (core - node_access_rebuild)
- http://drupal.org/node/97861 (cck - content type deletion)
- http://drupal.org/node/123721 (cck - field add / remove with large datasets)
- .po file import, or more generally any import / export operation

The patch is currently not that much more than update.php refactored. It provides a functionnal engine, but the shape of the api, and the way to slip in drupal workflow still needs some work and thinking. I'm posting what I have so far as a starting base, more savvy people are welcome to jump in and hack the thing to death :-)

Further considerations and an example 'batch_test' module to come in the next post.

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

yched’s picture

FileSize
1.58 KB

Attached is a sample dummy module to illustrate the current use of batch.inc (zipped, remove the .txt extension)

Roughly :
- A batch is an associative array containing the list of function calls to be performed, plus some UI messages.
- A form _submit callback defines its batch, registers it with drupal_set_batch, and redirects to 'batch' page for processing. Ideally, it is possible that the _submit callback implements some logic to determine whether it does it job straight ahead, or deems wiser to batch it.
- The 'batch' page gets the batch in $_SESSION, iterates through it, and calls the batch's 'finished' callback at the end for result display.

The main issue I see is that _submit hooks can be chained : another module can add subsequent _submit callbacks using hook_form_alter. These additional callbacks possibly rely on the fact that the previous callbacks have done their job already - and they might want to batch their stuff too...

So there probably needs a tighter integration with Form API. A possible workflow could be :
- drupal_submit_form iterates the form's submit callbacks
- when a batch has been registered : stop looping, save the remaining callbacks, redirect to '?q=batch' for processing
- when the batch is finished processing, go on with the remaining submit callbacks.

I'm not sure of all the implications here, and I'm no FAPI guru, so I thought it best to pause and ask for feedback before I try to over-engineer this in the wrong direction.

yched’s picture

Forgot to add : programmatic form submissions might be troublesome too...

Gábor Hojtsy’s picture

Great start! The patch would be complete however if you modify update.php to use this batch processing, so we can see it doing its job. Otherwise a batch_id would be probably needed to support different batches at the same time on the same session, if required.

What would also come up if you look into converting update.php, is that some functions are themselfs required to run in multiple HTTP requests, and the number of times needed to run is not known up front. So a list of functions, from which the first function is popped when executed might not be enough. The function might need to be added back into the start of the list (or not popped out in the first place), and some return value would specify if the function was complete, or should be called again. The utf8 conversion was done this way as far as I remember, but you will see by looking into the updates.

yched’s picture

modify update.php to use this batch processing, so we can see it doing its job
The sample 'batch_test.module' I provided in comment #2 should let you see the thing in action, and provide a place for testing / hacking for now.
I'd rather not embark in the critical process of converting update.php _that_ early in the development of this patch - not even sure that should be done, BTW : update.php is sort of a special case, out of drupal regular bootstrap / menu workflow. It's better if we can unify things, but update.php is not the primary target of this patch.

Support for multi-pass callbacks :
It's already in there - See 'test batch 2' in 'batch_test.module'.

Gábor Hojtsy’s picture

IMHO, unless update.php is modified to use this, it is highly unlikely that this gets into Drupal core. Such a big amount of code duplication is not something to introduce there.

Also your no js version seems to be bypassed with the JS version in the patch.

yched’s picture

unless update.php is modified to use this, it is highly unlikely that this gets into Drupal core
Maybe so, that can be discussed. There are other areas to work on before this can be envisioned RTBC anyway.
I really don't think we should focus on update.php just right now.

your no js version seems to be bypassed with the JS version
Yes, that's another work-in-progress point. To decide whether to use an ajax progress bar (faster) or not, update.php uses a
hidden 'has-js' element in its submit form. Requiring this in any form that might potentially trigger a batch op on submit seems awkward - Still investigating a workaround (cookie based ?).

Dries’s picture

Gabor is right when he says that core should eat its own dogfood -- update.php will need to be converted.

Of course, you can use an example modules to simplify the work and testing. Nothing wrong with that.

Looks like a great start! I don't think we should support multiple batches -- how would that work from a UI point of view?

yched’s picture

not supporting multiple batches
Well, even so, I'm not sure of the best way _not_ to support it :-)
We still have to account for multiple '_submit' callbacks for a form, don't we ?
I think the workflow I proposed in comment #1 could do the trick, but It would be great to have some feedback on this before I start actually messing with drupal_form_submit.

+ here's a case for 'multiple batches' :
the initial reason for me to work on this was http://drupal.org/node/97861 (cck reacting to content type deletion)
In short, we should ideally execute the fields 'delete' callback for every field of every node of the content type :-)
CCK (5.0) currently reacts to a content-type deletion by adding its _submit callback to core's 'content_type_delete_confirm_submit' form. If we don't support multiple batches, and core decides to batch something in the regular submit for the form, we're screwed.
But maybe we are and I'm trying to achieve something too complicated ?

update update.php
I'll try to submit a version of update.php when the API is a little more ironed out. This will be a good workbench obviously, but I still think it's a bit early for that - from what I roughly tested, there should be no major issues.

I think a core-module feature example, like .po importing, or node access rebuild would be more appropriate right now. After all, update.php is currently working perfectly, we're aiming at providing the feature to modules.

Gábor Hojtsy’s picture

Let's clear up the terminology. 'Multiple batches' is a proper terms for multiple batch operations running on the same session but in different browser windows/tabs AFAIS. That's like importing PO files and content in the same time in multiple tabs. Since these batch operations are tied to sessions now, it is possible to have multiple users running different batches. Just the same user can run only one batch. There should be some protection, so if a batch is still running, starting of another batch throws an error to the user.

Multiple submit hooks can be added to the batch, can't they? They just need the form data as multistep forms. Maybe multistep form logic can be reused (or at least looked at for inspiration) to implement this part.

yched’s picture

'Multiple batches' = simultaneous batches in different browser tabs
OK, why not. This has to be either banned or accounted for (probably using a batch id), but it's more a side case IMO.

So let's call 'Chained _submit / chained batches' the case of multiple form _submit callbacks, that may want to provide their own batches.

Yes, I thought about simply appending the remaining _submit calls at the end of the $batch['operations'] array. This would probably be OK for 'regular' (non-batching) _submit callbacks.
But if these callbacks happen to batch something themselves, this would mean dynamically modifying the batch while it's being performed, in order to insert the added batch ops. Well, this might be feasible, I'm kind of afraid it turns into a can of worms.

It might also be strange from a user pov : jump back from '99% done' to '60% done' ?
And I don't know what it would mean regarding the UI messages (title, progress message...) - each batch is supposedly able to define its own - maybe this is minor ?
We might be better off starting with a fresh batch / fresh progress page...

I'll try to investigate, along with your multiform hint - i'm not familiar with these yet :-)

chx’s picture

Priority: Normal » Critical

This feature request is critical and if needed I will move it to critical bug report -- the reason is, without this core tries to iterate all nodes when you enable a node access module. This must not be.

Dries’s picture

Are self-modifying batches common? I doesn't look like it might be super-popular. If it is uncommon, then it's probably OK for the progress bar to be bumpy. Maybe we can show a 'Recalculating ...' when that happens? Doesn't seem to be a major issue.

yched’s picture

Are self-modifying batches common?
Do you mean : would that happen often ? That seems hard to tell in advance.
Wild-guessing : most cases of multiple submit callbacks for a form probably (?) are contrib modules plugging on core forms, so it probably depends on the number of core forms that would batch their processing if the feature went in.
I'm not sure node_access rebuild gets hijacked by many contribs; maybe mass node / user operations could be more impacted ?

Well, I'll try to take the dynamic batch road, see where it leads.

I'm also interested in a hint from JS-gurus out there :
is there a clean way to decide if a request comes from a JS-enabled browser ? (see my second point in comment #6 above)

Gábor Hojtsy’s picture

is there a clean way to decide if a request comes from a JS-enabled browser ? (see my second point in comment #6 above)

Modify the link/form leading to the HTTP request with JS on the page where the user can initiate the start of a batch. Add a new parameter to the URL or the form, so the server side code knows that the JS kicked in, and it is enabled. This is a matter of a small unobtrusive JavaScript.

yched’s picture

I'm doing some progress in the 'multiple _submit callback' and 'js detection' areas. I hope to come up with an updated patch soon.

However, I only just realized that the whole $_SESSION approach has a serious flaw : if the user keeps browsing the site while a batch is running (say, in a background firefox tab), he can seriously mess up the batch (and thus possibly his db).
Any page that changes $_SESSION (any form submit, now that we have a 'duplicate submit' check, or the mere display of a multistep form) sets the batch back to its state at the time the offending page execution started, and all the batch operations that were executed in between get executed again.

I guess it was OK with update.php, for you'd expect a site admin to stay still while running updates, but if this patch gets in, there's no real predictability on who might trigger batches and when.
And I don't think a "please don't browse while this is processing" message on the progress page is nearly enough, since I can hit submit on my form and move to another tab before the page reloads and I get to be warned (I do that a lot on d.o :-) ).

The only safe workaround I can think of for now is moving the storage of the batch out of $_SESSION and into a dedicated db table...

Any thoughts ?

Gábor Hojtsy’s picture

yced, indeed. This might become interesting. Using a database backend in update time is still doable (there are a few trick in update.php to have a proper database structure for bootstrap, which would be possible to use in this case).

yched’s picture

FileSize
20.52 KB

Attached is an updated patch :

- Stores the running batch in a db table as per my comment #15 above. The table scheme (very simple for now) is in system_install(), I did not write an update path yet.
Storing batches gets us closer to drumm's Job queues, there are possibly interesting (future) synergies here, even if I don't see them clearly right now.
Well, for now, my {batch} table stores the serialized $batch array, just like when we used $_SESSION. Does the job.

- successive _submit callbacks : I did not take the 'dynamic batches' road for now. when a _submit callback defines a batch, it gets executed right away, and subsequent _submit callbacks are delayed after processing. You get a new progress page if they declare batches too.

- Uses a session cookie to detect js, so you don't have to use $form['has-js'] tricks to make your batch ajax-progress bar. It is set on _every_ page calling drupal.js. Maybe it's not a right thing to do, I'm not sure. Seems to work pretty well, at least.
I included Klaus Hartl's Jquery cookie plugin in drupal.js, which might be overkill for what I'm doing...

- In order to allow more descriptive progress messages (update.php displays the currently updating module), I slightly reordered the elements inside theme_progress_bar.

TODO :
- some coding style / hacks here and there
- the end of the process (redirection / result display / finished_callback) is not very clear yet.
- implement a 'real life' (core) use case (node_access_rebuild ?)
- adapt update.php

yched’s picture

FileSize
2.42 KB

And an updated batch_test.module.
Illustrates and tests :
- simple batch form
- chained batches in successive _submit callbacks
- batches in a programmatically submitted form

Gábor Hojtsy’s picture

FileSize
21.44 KB

I started to look deeply into this issue, so we can actually get this change into Drupal 6 and implement at least the automatic locale import feature based on it. Got the latest patch, applied to latest head (was possible with a few offsets), then started to go through the code line-by line, getting to know how it works and trying to document what is happening and why are some awkward looking stuff used (like while/list/each constructs).

In the spirit of trying to help yched keep focus on core issues, and not arbitrary features (so we can get these changes into core sooner then later), I tried to strengthen what is already there, document, clean up, apply coding standards and so on.

So the batch API as architected by yched looks like this:
- instead of returning a redirect path, you return a redirect array in your form submit hook
- this array specifies the operations to run in the batch and other properties like a title for the batch to display
- from here, the form processing takes on, creating a batch for you in the database and running the operations with your callbacks
- once done, your 'finished' callback gets called (if specified), so you can provide a page with information on the batch being done

I did not change anything in the architecture, just tried to strengthen what was already in there, document, identify core todo items, and such.

I also grabbed the autolocale update started by Jakub Suchy (http://lists.drupal.org/pipermail/translations/2007-April/000344.html), cleaned that up and made it compatible with both more recent HEAD changes and the interfaces of the batch API. I am comitting the adapted version of autolocale to autolocale HEAD, so anyone can test. Only an admin interface initiated import works now with the batch import there, but that works nicely (the big page progress control looks sexy in Garland :).

What I did not like about the patch was mostly the inclusion of the big cookie lib in drupal.js, which is probably not desired. It also seemingly rewrites the whole document.cookie string, destroying existing cookies on the same site. As far as I see, I got it working only because I test in a subfolder, and the cookie is set for the whole domain (which is also unfortunate BTW). So that cookie things should be rethought. Otherwise the functionality seems to be quite stable.

Now as far as I see, it would be nice to get the direction of core adaptation with this patch. Batch processes should work in install (late install phase) and update time. Locale needs to import multiple translation files late in the install phase in a batch, and update is essentially a big batch process.

It would be really cool to get this into core as soon as possible. yched, keep up the good work!

yched’s picture

Gabor, thanks for your active interest in this, and sorry for not updating.

Although silently, I have kept working on this since my laste post.
I'm still fumbling on some FAPI integration issues, and I've been really trying in the last few days to polish the thing enough at least to publish an update here and ask for help / feedback on the remaining issues. I will really try to do so this week end. I will also look closely at your patch and integrate your comments / additions.

In short, the cookie lib was definitely overkill and is now gone, multiple submit callbacks should be handled more cleanly, and I have a working update.php.

What I'm fighting with is the integration in the FAPI workflow, and trying to account for all the features / subtleties (multistep, programmatic forms).
I'll try to sum this up when I upload my current code.

Dries’s picture

I had a look at this again and it looks really interesting. I'd like to see this in core.

I also agree with Gabor's assesments:

1. We need to use this patch in core -- update, installer, and eventually file uploads
2. The cookie handling seems clumpsy -- it would be good to get rid of that

Gábor Hojtsy’s picture

Status: Needs review » Needs work

@yched: please share your updated code as soon as you feel it is good for another round of reviews. It would be important to get this in as soon as possible so we can work on more functionality in core depending on this (node access rebuild, locale import and such).

yched’s picture

FileSize
47.85 KB

OK, here's an updated version :
- much simpler (1 line) JS cookie - not sure whether it solves the issues mentionned by Gabor
- API change : _submit callbacks now simply call batch_open($batch) to batch their stuff, and return redirect path
- reworked handling of successive "batched" _submit callbacks : they are now included in one batch, but operate in their own 'batch sets'.
This should be transparent API-wise, and allow for a clean separation of result sets and success status, while allowing proper redirection after the batch has performed.
This area still needs some polishing, the current implemenation is kind of clumsy (batch_add_operations / _batch_operations_insert_pending stuff should go away)
- Added a 'sandbox' for multipass operations to store their stuff (since the use on $_SESSION is not safe - see comment #15 above)
- Supports batches from standalone scripts (update.php...). See the separate patch for update.php in the next post for an example.

Now, the outstanding roadblocks :

- multisteps forms are not working yet (currently when the submit for step n triggers a batch, you get redirected to step 1 istead of step n+1 - see the batch_test.module below for an example)
While this is somewhat beyond me for now, I'm quite confident that this can be made OK with some love and mind-bending (and maybe the help from some FAPI guru ?)

- more problematic : programmatic forms
Basically, we are breaking the execution workflow, and i don't know yet how we should handle this.
More precisely, the problem is this :

  drupal_execute('some_form', $form_values);
  $foo = $bar;

If the submit callbacks for 'some_form' happen to batch something :
- either you execute the batch right away, but then '$foo = $bar' will never get executed,
- or you somehow defer the batch execution until after the page has been executed, but then you lose the execution order (and the code after drupal_execute might rely on the fact that the form submission is completed)

I probably should have realized that earlier on :-(. Any suggestions welcome here...

yched’s picture

FileSize
13.1 KB

And the patch for update.php.
I submit this here as a proof-of-concept - update.php being the critical beast that we know, this can't really be considered bullet-proof yet. Seems to work ok, though.

yched’s picture

FileSize
2.6 KB

And the zip for 'batch_test.module'.
The module defines several forms for testing (multiple submit callbacks, multistep and programmatic forms), and provides examples of the batch API in use.

If you installed the previous version of the module, you should menu_rebuild();

Gábor Hojtsy’s picture

Status: Needs work » Needs review

@yched: great, you are doing very well! I don't think we have a way to add in arbitrary code run after a programmatic submit (or any other submit for that matter) to the batch. If PHP would have some code to run after all the HTTP requests took place, it is unlikely that we would be able to provide a way to rebuild the environment and go to that exact line. So it seems to be logical to accept this as a limitation of batch operations. Unless batches unpredictably pop out from form submissions, developers should now what submission would end up in a batch operation.

yched’s picture

Unless batches unpredictably pop out from form submissions...
Well, unfortunately they sort of do.

Your contrib Foo.module might build some functionnality on drupal_execute('some_form'), knowing that the 'default' submit callbacks for 'some_form' do not batch.
Everything works OK until the admin installs my contrib Bar.module, that happens to add his own _submit callback to 'some_form'. That callback sometimes batches - et voila, my Bar module just broke your Foo module :-).

FAPI allows a high level of distant tweaking of forms processing - that's really powerful, but makes it difficult to add a new "progressive" method of form submission that stays consistent with the existing features. All the more frustrating that these conflicting cases will probably be quite rare in fact.

Dries, chx : you expressed some interest in this hitting core - do you have any suggestions ?

chx’s picture

I only have a little time now, but batches are interactive therefore I see no sense in starting a batch if $form['#programmed'] is TRUE.

yched’s picture

Agreed, but you still have to be able to do

drupal_execute('some_form');
// some other code
foo($bar):

without wondering whether some_form triggers a batch or not. For if it doesn't today, it could very well tomorrow if the admin installs a new module.

2 propositions :
1 - I think I could adapt my code so that drupal_execute of a 'batching form' performs the batched operations 'in one go', without progress bar page and stuff. You don't get the 'no PHP timeout' effect (and in fact the execution will probably take much longer than a non-batched version), but at least no functionality is broken, and everything gets executed in the expected order.

2 - And / or : I could add a batch_allowed() function that returns TRUE only if we are not in the execution of a programmatic form. Then, form_submit callbacks can (or must if we don't implement proposition 1) propose a 'regular' (non-batched) version of their processing.

"1 only" means programmatic batching forms possibly really slow - won't happen everyday, but kind of counter-productive wrt our goal here, which is to avoid php timeouts.
"2 only" means that you have to code two flavors of any batching form_submit callback - i fear many contribs probably won't bother, and drupal_execute becomes 'unpredictably not trust-worthy'.
"1+2" means the message to developpers is "you can safely batch, and with a little extra-work, you don't loose efficiency" - undoubtedly better, but, well, my time is limited, and I'd like to avoid 'code freeze timeout' as well :-)

Maybe we could go for "1 only" for now, polish and commit the thing, and then add "2" later on ? Like, er, after the code freeze ?

Gábor Hojtsy’s picture

@yched, you now block programmatic forms. chx pointed me to this part of your latest patch, once I got him to review:

 function drupal_redirect_form($form, $redirect = NULL) {
+  if (!$form['#programmed'] && ($_batch = batch_get()) && !isset($_batch['finished'])) {
+    // Redirect to batch page
+    batch_execute();
+  }

This is it. You are not starting a new batch if you encounter a programmed form as it seems. So drupal_execute() is predictable. It will not run any batch. That is one solution :)

It could also be possible that we run all operations in the same request, as you pointed out. It might be against the role of this whole batch thing, but as far as I see is a workable solution.

chx’s picture

batch.inc needs to be included on demand. waste of resources to get it all the time.

yched’s picture

 function drupal_redirect_form($form, $redirect = NULL) {
+  if (!$form['#programmed'] && ($_batch = batch_get()) && !isset($_batch['finished'])) {
...

Yes, which is a silly thing to do since drupal_redirect_form does not get called for programmatic forms :-)
Anyway, that would at best prevent a batch from being executed : you can continue execution after drupal_execute, but drupal_execute itself did not get the job done...
Since this can be triggered simply by installing a new module, I consider that to be 'unpredictable form breaking' - or am I missing something ?

Anyway, implementing my proposition 1 above ('all in one go') took me less time than actually writing the post. Works allright, and at first sight it does not seem to be that much of a performance drain.
I'll post an updated patch ASAP.

I consider programmatic forms resolved - now, multistep :-)

@chx : right, of course. Will be in the next patch.

yched’s picture

FileSize
23.03 KB

Updated patch (no API change) :
- rerolled against current HEAD
- reworked multiple submit callbacks management :
I moved those out of the regular 'batch operations' array (altered the 'total operations' count, which could be confusing, plus forced me to allow batch operations to add batch operations and go through some no-so-nice hoops), and into some sort of 'batch sets initializers'
- programmatic forms are cleanly supported :
I implemented my suggestion 1 in comment #29 above (drupal_execute processes all the batched operations in one step)

batch.inc is still loaded on every page, since some functions are called on every form submission. I need to reorganize some stuff. Will be for next iteration.

the example / test batch_test.module included in #25 should still apply. Not so sure about the update.php patch in #24.

moshe weitzman’s picture

subscribe ... Once this patch lands, we desperately need to apply it to node_access_rebuild().

yched’s picture

Had a small discussion with chx and Eaton on IRC. It appears they are going to have the multistep part of FAPI go through serious reworking during their FAPI summit this week-end.

it needs to be done, but if multistep stuff is your last remaining roadblock, i would say just say that the patch as it stands now is ready but has issues with multistep -- which needs revisiting anyhow.
I think it's perfectly valid, at this point, to say that batch ops + multistep is a known incompatability that will be/must be resolved as part of an inherent form system rewrite of multistep.

So I guess I'll post an updated patch that gets rid of the attempts to handle multistep for now, and with the raodblocks being solved or postponed, we can start focusing the reviews on coding style and 'core inclusion' polishing ?

Now is also probably a good time to start looking at the first use cases for batches in core - at least to find possible bugs / enhancements. Not sure if we should merge those with the current patch, though.

yched’s picture

Er, the second paragraph in my previous post was Eaton on IRC, but I was stupid enough to let the [eaton] quotation between < >

yched’s picture

new patch for update.php - the one in #24 was broken
(there _was_ a small API change in my last batch.inc.patch after all - only affected batches outside of regular bootstrap)

The only change is line 405 :
batch_open($batch, FALSE, $base_url .'/update.php', $base_url .'/update.php?op=results');
goes :
batch_open($batch, $base_url .'/update.php', $base_url .'/update.php?op=results');

yched’s picture

FileSize
12.62 KB

Forgot the patch.

yched’s picture

FileSize
38.87 KB

Updated patch :
- Moved the external batch API to form.inc, allowing for a more selective inclusion of batch.inc, as per chx's request in #31
- Renamed some of the API functions to be more consistent with other parts of drupal API
- Optimized the amount of data stored in the $batch array (and thus in the {batch} table) : no more 'callback' / 'arguments' array keys; store only the $form information actually needed.
- Added a 'session id' column to the batch table, so that only the user that triggered the batch can access the progress page
- Improved documentation - some more work in this area is probably needed...

The patch includes update.php, to make this thread easier to follow - I still think this should be committed separately, though.
Updated batch_test.module comes in my next post.

yched’s picture

FileSize
2.66 KB

batch_test.module zip file.
Includes a dummy update function to test the new update.php

The batch patch as stayed under "needs review" for some time, while actually needing work on some of the roadblocks.
As stated in #35 / #36, I think it is now ready for core consideration, and thus actually "needing review".

Dries’s picture

* "open batch" sounds funny -- maybe use batch_init() or batch_start()?

* Maybe rename drupal_batch_op() to batch_add_operation() or batch_enqueue_operation()? Either way, don't use the words "op" or "ops". They are not very descriptive. If you want a short word, maybe use "task" instead of "operation" but use is consistently.

* The batch operation can be prefixed with batch_ ... they don't have to be in the drupal_ namespace.

* Write $current_set or $current instead of $curr_set.

* In the examples, try not to use "func". In the code guidelines we advise against such abbrevations.

* In the example and update.php, why don't we call the batch context $batch instead of $a? $a sounds like a particularly bad variable name to use.

* The concept of sets doesn't seem to be documented in the code. We'll want to explain the difference between a set and an operation, and its motives somewhere.

* In update.php, it's probably a good idea to rename certain functions. For example, if you'd rename "update_finished_page" to "update_finished_batch" it would be more clear that that function was called from the batch API (if that is what happens).

* There are still some TODOs in the code.

Overall this looks excellent. There is some work left to do, but nearly all of my points are details. My biggest gripe is the API -- once we commit this, the API is not likely to change, so I encourage you/us to think it through really hard and to make it as intuitive as possible. If there is room for certain simplifications, I'd like to see those made. The API is somewhat complex.

Also, I wouldn't mind to see Gabor drive this patch home. I know Gabor is busy right now so if he doesn't have the time, I'd be happy to.

Gábor Hojtsy’s picture

Status: Needs review » Needs work

I hope to be able to separate some quality time for this patch. In the meantime, waiting for the smaller issues pointed out by Dries to get fixed.

yched’s picture

Status: Needs work » Needs review
FileSize
41.6 KB

New patch after Dries comments :
- API renaming :
drupal_open_batch => batch_init
(I think batch_open was more accurate - does it really sound odd ? I wouldn't really know :-) )
drupal_batch_op => batch_operation
- 'batch context' variable in batch callbacks renamed form $a to ... $bc. I'd like to keep it short, if possible, it clutters the code :-)
- update_finished_page renamed to update_results_page (the batch is already 'finished' when we get there)
- added the $operations array (empty if everything went OK) to the data provided to the batch finished_callback's.
This was needed for update.php to provide feedback on which update caused an error.
I'm not quite sure how to test this - what kind of stuff in a hook_update is currently supposed to break
the update process ? (i'm not talking about sql errors caught by update_sql of course...)
- fixed the remaining todos, mainly :
non-js mode : delay drupal_set_messages until after the batch execution - as was (only) intended in current update.php.
This required adding a param $show_messages to theme('page') - a similar param existed for theme('maintanance_page'), but was not actually used.

- Yet to be done : finish documentation

The only remaining glitch I see is when you hit 'back' after completing a batch : you get the progress page, but there's no batch anymore so you get an error (JS only)
There's not much I can do about that, The solution would be to have the progressbar displayed in a JS popup without leaving the submittting page, but that's a whole different story...

About the API : it's actually not that complex.
The external API consists of two functions : batch_init to open 'batch mode', and batch_operation to add function calls. Then you have a 'syntax' for processing callbacks and finished callbacks that massage your results (er, not properly documented yet). FAPI / batch API take care of the rest.

All along I tried to shape the API to be as simple and non intrusive as possible (that's part of what took so long :-) ). That being said, I'm open to suggestions / improvements, of course.
I do have something in the back of my head that might soften the coding style barrier between writing 'batching' and 'non batching' code, but that needs further investigation, and I wanted to update the patch first. And it's late, and I've written too many long posts in this thread already :-)

yched’s picture

FileSize
2.65 KB

and the updated batch_test.module companion :-)

Dries’s picture

Thanks for the quick round-trip times! I'll leave this up to Gabor now, unless he bounces it back to me.

(If there is ever going to be a Drupal Pro book about Drupal 6, you'll want the API to be nice and clean so it's easy to explain to other people. ;-))

yched’s picture

Well, I guess all of this is sort of frozen until "FAPI 3.00" (http://drupal.org/node/138706 ) lands...

Gábor Hojtsy’s picture

Erm, we should not wait with batches for FAPI 3.0. So many things are dependent on batches in core (at least three major changes of which only one is covered here as an example implementation) The FAPI 3.0 patch is quite young, and we are five weeks close to the code freeze, so we should get this into core-fitting shape as soon as possible and get it in. Will review deeply in a few hours.

yched’s picture

So it's a 'first patch to get in forces the other one to adapt' case ? Fine by me.

Dries’s picture

I agree that we can go ahead.

You were first in line, dude. ;-) Joking aside, it's probably useful for yched to look at the proposed form api changes, and to provide them feedback about how they can better support batches.

Gábor Hojtsy’s picture

The progress (pun intended :) here is great! Done a deep code review with the following results:

- Document (in phpdoc) why $show_messages exist, what's the role of that (I understand it is for postponing display of status messages, but it is not documented there).

- Interesting to see your theme_password changes... It seems like you have some phpdoc aware editor fixing that up. Anyway, no problem with that in the patch.

- API docs should mention at the start that 'operations' and batch_operation() are both valid for adding operations... For a big set of operations, calling batch_operation() repeatedly is cumbersome.

- You should not add any explanation on the same line as @param (never).

- drupal_batch_op() is left in phpdoc.

- You should add dots at end of sentences in phpdoc.

- What does "crush the 'submit' key" mean, where would a 'submit' key come from?

- 'the error page' string not translatable in error message

- Why is there code to deal with $REQUEST['edit'] in drupal_process_batch()? AFAIK the edit array is eliminated from Drupal.

- I would rename drupal_process_batch() to batch_process() for consistency. Dries already pointed out that the drupal prefix is not required.

- I kinda understand what batch sets are about, but there is no documentation about this in the phpdoc. How does sets affect the user experience? What do users see from multiple sets?

- The cookie setting seems to overwrite any existing session cookie (or other cookies set on the site). It really should append that cookie, if not already there.

- System_batch_page() has a comment with a question mark, which seems to indicate unfinished code?!

- I would rename $bc to $context, $a was a bad name, but $bc is not much better...

- Update_do_one have $a and $$bc in multiple places, which need fixing

- You commented out drupal_add_js('misc/update.js'...)... Remove, if we don't need it, add it back, if we need it, but you only removed for testing.

- Update_create_batch_table() has this which seems to be a copy-paste error: "If cache_filter exists, update is not necessary"

- Batch_start() also has a note with a question mark. I don't understand what that note suggests anyway (how would a 'nojs' cookie get set?)

- The update.js comment in _batch_progress_page_js() does not seem to be right. Copy-paste or something magically happens in drupal_add_js()?

- In batch_do() write "HTTP POST" not "HTTP Post" (both in comment and error message.

- The comment on _batch_finished(): "allow for results massaging" is rather unclear. What about documenting what really happens? :) "call the 'finished' callback to allow custom handling of results" or something...

- Tried to run the update.php batch with no updates selected. Resulted in a white screen on http://localhost/batch/update.php?op=selection.

If possible, please also look into how batches can be executed as part of an install process. Drupal core will need this (for importing interface translations), as well as several install profiles (like an aggregator profile would grab some default feeds, so the site is already populated with live sample data).

After the above small API fixes are in, I will look into updating autolocale, to see how another practical batch implementation would fit with this solution. Thanks for your hard work, it is very appreciated!

yched’s picture

I'm on it, just takes more time than I wish, I've been over-busy the last few days.

Could you just provide additional info / steps to reproduce the js cookie problem ?
On my local setup, the 'has-js' cookie does not seem to interfere with the 'PHPSESSID' cookie.

About install : while I'm now pretty familiar with update.php (obviously - I stole the code !), I think I never even opened install.php / install.inc files, and I know very little, if anything, about profiles and their install process. Being under some deadlines, I'm not sure I'll be able to find some time to dig into this soon (or soon enough ?).
I can probably, however, provide some code for some of the first batch candidates identified in core so far (node access rebuild, maybe .po import / export but I think you have already done some work on this yourself.

Plus general question : getting this batch feature in is one thing - has to happen before code freeze, obviously.
Can't the possible uses of the feature, to actually batch parts of core, be considered as 'usability improvements', and therefore be allowed to happen after code freeze ?

Gábor Hojtsy’s picture

OK, I did not really check the cookie problem, just implied it. Seems like (because we make no use of PHPSESSID on the client side), replacing the cookie value of the document with 'has_js=1' does no hurt. But testing of the cookie functionality showed that you need to go to at least one page with drupal.js before the cookie is set, so if I log in and go directly to one of your batch module screens, the cookie is not set, so I get the plain reload behavior. (Possibly actual implementations of batches should ensure that the initiating form's HTML page had drupal.js included).

All right , do not worry about the installation stuff, it is probably only some form submission which we can piggy-back anyway. I will look into that later. We can get this patch in anyway :) Also I doubt that selling the batch improvements as usability changes is viable. Let's clean this up and get it in to Drupal. I will make sure the locale autoimport stuff gets into core once we have a polished batch API in.

yched’s picture

JS cookie :
OK, then there's no problem. document.cookie="name=value"; is the regular way to set a cookie in JS. Despite the syntax, it does not actually wipe the existing cookies, but adds a new one (or modifies it if it exists).

New patch probably later today.

yched’s picture

What could be missing, on the other hand, is that I should probably respect the session.cookie_domain variable. I'll look into that.

yched’s picture

OK, actually, that's probably not needed. The php cookie_domain setting is important for the session cookie (cross-site authentication, etc...), but much less for our 'has-js' cookie, and exposing the value to drupal.js would add unneeded cruft.
The current behaviour sets the 'has-js' cookie only for the current site, which should be OK.

yched’s picture

FileSize
47.49 KB

Updated patch. Should correct all the points Gabor raised in comment #50, and provide (I hope) satisfying documentation / code comments.

Additionally :
- updated all core themes (not only Garland) to account for the new $show_messages variable.
- strengthened the code to account for 'empty batches' (no operations)
Thus, running update.php with no updates selected now works, with the same behaviour as currently :
go through an 'empty' progressbar page, and then to an 'empty' results page.
- improve install-time compatibility ($t = get_t() instead of t() inside batch API functions)
- Moved the $redirect and $url params from batch_init() to batch_process() where they belong best.
- API changes (sorry...) :
The batch_operation / batch_operations API functions, although kind of nicely 'call_user_func_array'-like, were not such a good idea after all. The fact that you can batch_init() in one place and batch_operation() somewhere else (some nested function call, for instance...) opens the door all kinds of tricky situations that I don't think we can handle nor want to encourage, so I removed them for now : batch_init is the only place you can define the operations for your batch.
Therefore, batch_init() now becomes batch_set() ('set' can be understood as a verb or a substantive, since the function basically defines a 'batch set' :-) See below.), and the API is reduced to two functions (batch_set / batch_process). Make that only one (batch_set) for batches inside form _submit callbacks, which should be the most frequent use.

The api is now also more flexible : successive batch_set($batch) (not in separate form submit callbacks, I mean; that was already OK) do not override each other anymore, but define separate batch sets.

Both Gabor and Dries found the 'batch sets' concept somewhat obscure, I tried to provide a better description in the code.
Basically, 'batch sets' are separate sets of operations, processed sequentially, building separate sets of results, and calling separate 'finished' callbacks.
They allow clean code independency, ensuring batches defined in different places / modules can be happily unaware of each other. Each batch / each form _submit callback acts as if it were alone.

Visually, the user sees the progress bar start fresh again when execution goes from set n to set n+1 - no page reload (if using JS, that is). Batch sets can define their own UI messages (progress message, error message...) - the only glitch is that under JS mode, only the first set gets its title displayed (page title lies outside of the Ajax-updated markup). I think we can live with that.

In the process, the code added to form.inc (and thus loaded on every page) got reduced to the only two functions in the API : batch_set / batch_process.

The documentation could probably use a reread and possibly some rewording (my english is awkward at times...), but I think this is about ready to go.

yched’s picture

FileSize
2.69 KB

batch_test.module.
I added a test batch in a simple page callback (outside form submission workflow). Should probably be used sparingly, but works...

yched’s picture

FileSize
47.03 KB

Small update :
- removed global $batch variable, to be more inline with the forthcoming FAPI3 patch, that removes form globals altogether. (Re)introduced batch_get() instead.
- minor documentation updates.

Trying to, er, 'batchify' node_access_rebuild got me thinking. More on this tomorrow...

yched’s picture

FileSize
3.42 KB

So, working on node_access_rebuild as an exercise / use-case proof :
While I thought the batch patch was quite solid now for batching form submissions (things inside _form_submit), the interesting thing about node_access_rebuild is that it's an API function - it gets called in node_configure_rebuild_confirm_submit (the 'Rebuild permissions' button on admin/content/node-settings form), but is also virtually callable from anywhere in core / contrib code.

So here's the approach I came up with - this is of course not an actual patch for node.module, but rather proof of concept code as an illustration. Basically, we keep the infamous node_access_rebuild() function as is, but provide a new node_access_rebuild_batch(), which returns a ready-to-go $batch array.

Thus in _most_ places where you'd call node_access_rebuild(), you'd go batch_set(node_access_rebuild_batch()).
(I think it's important that you go through an explicit call to batch_set() - makes it clear that the stuff you ordered _will_ be delivered, but maybe _after_ the code following the batch_set() is executed).

I search through all of (D) contrib for calls to node_access_rebuild().
- Most of them happen in hook_enable/disable - batching there is OK since those are triggered in the modules form submit.
- Some happen in hook_update_n - batching there is OK since, and I was not even really aware of that, the latest versions of the patch allow batch operations to open new batches (they get added as new batch sets, executed right after the current one is finished).
- Some possibly tricky cases happen in (nodeaccess)_node_type, (taxonomy_access)_taxonomy, (taxonomy_access)_user.
Out of safety, those should probably stick to the old node_access_rebuild...

All in all, I do think the batch patch is ready as it currently is. 'Simple' form submit stuff works smoothly and with a fairly simple API, and more complex cases can be handled with a little extra care...

yched’s picture

I search through all of (D) contrib for calls to node_access_rebuild().
Er, that was :
I searched through all of (D5) contrib for calls to node_access_rebuild().

yched’s picture

updated : rerolled, plus fixed documentation (one sentence left unfinished...)

yched’s picture

FileSize
47.34 KB

with the patch

eaton’s picture

Just a heads up regarding the previous multistep/programmatic woes. The FormAPI 3 patch at http://drupal.org/node/111079 eliminates the use of sessions by Multistep, which should make things quite a bit simpler.

meba’s picture

batch_6.patch applied smoothly...

yched’s picture

FileSize
47.37 KB

Re-rolled - In the previous patch I commented out a few lines in function batch_process for testing, and forgot to uncomment them back.

meba’s picture

Please see autolocale module which now supports batch operations with this patch:

http://drupal.org/node/140886#comment-237725

It's an living example that batch operations do work.

Gábor Hojtsy’s picture

Status: Needs review » Fixed

OK, after careful review, testing with empty update.php operations, with some artifical update.php operations, fixing phpdoc comments for grammar mistakes, fixing some Drupal coding style issues, I have landed the patch in Drupal core.

Looking forward to upcoming patches around node access rebuilding and other batch related tasks! Please submit standalone issues with those patches. This issue should be kept for follow up issues and fixes about this functionality, if any.

My commit: http://drupal.org/cvs?commit=66322

#127539: progressive operation support, refactoring update.php code to a generic batch API to support runnning operations in multiple HTTP requests
    - update.php is already on the batch API
    - node access rebuilding is in the works
    - automatic locale importing is in the works
   
   Thanks to Yves Chedemois (yched) for the good code quality, very wide awareness of issues related to batches,
   and the fantastic turnaround times. Hats off.
yched’s picture

Status: Fixed » Needs review
FileSize
1.88 KB

Yay ! And thanks for the awesome CVS log, Gabor :-)

Small patch fixes the replacement token used in the error link (I just saw a post on the dev list about that today).

Plus a proposed simplification for the documentation about the 'finished' param for operation callbacks, which was a little confusing IMO.

I'll try to adapt the FAPI3 patch to the new batch parts - unfortunately, I'll be away from home for a few days...

yched’s picture

I posted a patch for a 'Call-time pass-by-reference has been deprecated' PHP warning in update.php ; http://drupal.org/node/141575.

Also note that we can now remove misc/update.js - batches use misc/batch.js instead
(I don't know how to submit a patch that removes a file...)

Gábor Hojtsy’s picture

Status: Needs review » Fixed

OK, comitted the fixes, as well as removed update.js.

bjaspan’s picture

Status: Fixed » Needs work

There is a syntax error in the pgsql CREATE TABLE {batch} statement in system.install: it has a comma after the "PRIMARY KEY (bid)" statement. I'd submit a patch but my system.install is all hacked up for another pending patch at the moment so the line numbers would not work out.

Change:

      db_query("CREATE TABLE {batch} (
        bid int NOT NULL default '0',
        sid varchar(64) NOT NULL default '',
        timestamp int NOT NULL default '0',
        batch text,
        PRIMARY KEY (bid),
      )");

to

      db_query("CREATE TABLE {batch} (
        bid int NOT NULL default '0',
        sid varchar(64) NOT NULL default '',
        timestamp int NOT NULL default '0',
        batch text,
        PRIMARY KEY (bid)
      )");
bjaspan’s picture

The same problem exists in update_create_batch_table() in update.php.

Gábor Hojtsy’s picture

Status: Needs work » Fixed

Committed the fix, thanks for reporting!

yched’s picture

Status: Fixed » Needs review
FileSize
1.85 KB

Patch for the $user->sid error when user not logged in (troublesome for update.php)

I'm not able to actually test this, I don't have access to my coding env. right now.

yched’s picture

OK, I have been able to actually test the above fix for $user->sid, and it works as expected.

Not setting to RTBC as it's my own patch, but this is ready, really...

Dries’s picture

When merlinofchaos worked on the installer code that just went in, he complained about the batch.inc code being hard to understand. Specifically, he mentioned that batch.inc matched an architectural overview/explanation. He gave up and bailed out. It would be great if we could take merlin's feedback and improve the code comments so it becomes more accessible to developers.

yched’s picture

Yes, I saw that and your request for a better doc of the batch engine.
I've put that on my todo - after I've finished some polishing the adaptation of the FAPI3 patch.

Dries’s picture

yched: thanks in advance. :)

yched’s picture

Status: Needs review » Reviewed & tested by the community

bump - the patch in #74 fixes 'unauthenticated users can't run batches' (and thus can't run update.php - not sure about the locales import on install)

Dries’s picture

Rather than using session_id(), I think you can use $user->session (not $user->sid).

Gábor Hojtsy’s picture

Status: Reviewed & tested by the community » Needs review

Dries, $user->session has information about the *contents* of the sessions as far as I see, which is in effect the values stored in the session for the user. This does not seem to be enough to protect the same batch from running by multiple users at the same time (which the user session binding tries to protect against here).

Yched, the installer works with locale imports, because (in our proposed patch), the $user already has the sid when the batch starts (he is already logged in anyway).

yched’s picture

True. $user->session is session data, not unique, won't do it.
actually, session_id() is used by common.inc's drupal_get_token.
Maybe I should use drupal_get_token instead ?

yched’s picture

Attached patch does that.
Just pick between this one (drupal_get_token) and #74 (session_id) :-)
Both equally do the job, it's just a matter of coding style.

Throughout all core, session_id() is used only by drupal_get_token (well, and in session.inc, obviously), so maybe it's cleaner to consider that drupal_get_token is the 'drupal layer' above session_id, and should be used instead ?

The patch also consequently renames the 'sid' column to 'token' in the {batch} table - should I provide an update function as well, for people testing current HEAD ?

yched’s picture

FileSize
4.29 KB

Er, the patch...

Dries’s picture

db_set_active() is known to be a pain, and we've had lots of problems with other global variables being switched (i.e. $user object). So, I'd prefer a different approach.

yched’s picture

Dries, I think you meant that for another thread ?

Gábor Hojtsy’s picture

Yes, Dries definitely written this to a wrong window (but then copied it to the right one after all).

chx’s picture

Title: progressive operations (à la update.php) » node_access_rebuild takes forever
Category: feature » bug

This is not a feature request but a critical bug.

yched’s picture

OK, chx gets impatient about node_access_rebuild :-)
I'll post a patch about that asap.

There still is a batch bugfix to commit in either #74 or #84 above.

yched’s picture

Title: node_access_rebuild takes forever » batch - progressive operations (à la update.php)
Priority: Critical » Normal
FileSize
7.57 KB

updated patch after FAPI3 landed :
- rerolled version of #84 (anonymous user / drupal_get_token)
- fixes a batch bug I introduced in the fapi3 batch code (multiple submit handlers broken after one of them sets a batch)
- fixes PHP warnings when the last submit handler of a batching form does not define a batch itself (those warnings were already there before FAPI3).

Of course, it's easier for me to bundle this in a single patch, but I can provide separate patches if preferred.

@chx : I'll post the 'batchify node_access_rebuild' patch in a separate task - I'd rather keep this one focused on the initial batch feature.

bjaspan’s picture

Priority: Normal » Critical

The batch table in pgsql uses type 'int' instead of type 'serial' for column 'bid'. It then calls db_next_id('{batch}_bid') which results in all kinds of fun because without being a 'serial' the batch_bid_seq sequence is not created:

    * warning: pg_query() [function.pg-query]: Query failed: ERROR: relation "batch_bid_seq" does not exist in C:\Documents and Settings\bjaspan\Home\WORK\drupal\head\includes\database.pgsql.inc on line 125.
    * user warning: query: SELECT nextval('batch_bid_seq') in C:\Documents and Settings\bjaspan\Home\WORK\drupal\head\includes\database.pgsql.inc on line 144.
    * warning: pg_query() [function.pg-query]: Query failed: ERROR: duplicate key violates unique constraint "batch_pkey" in C:\Documents and Settings\bjaspan\Home\WORK\drupal\head\includes\database.pgsql.inc on line 125.
    * user warning: query: INSERT INTO batch (bid, sid, timestamp, batch) VALUES (0, 4, 1179182941, 'a:8:{s:2:"id";b:0;s:4:"sets";a:1:{i:0;a:10:{s:7:"sandbox";a:0:{}s:7:"results";a:0:{}s:7:"success";b:0;s:10:"operations";a:1:{i:0;a:2:{i:0;s:13:"update_do_one";i:1;a:2:{i:0;s:6:"system";i:1;s:4:"6016";}}}s:5:"title";s:8:"Updating";s:12:"init_message";s:27:"Starting updates<br/>&nbsp;";s:13:"error_message";s:130:"An unrecoverable error has occured. You can find the error message below. It is advised to copy it to the clipboard for reference.";s:8:"finished";s:15:"update_finished";s:16:"progress_message";s:31:"Remaining @remaining of @total.";s:5:"total";i:1;}}s:11:"current_set";i:0;s:11:"progressive";b:1;s:3:"url";s:32:"http://head.localhost/update.php";s:11:"source_page";s:4:"node";s:8:"redirect";s:43:"http://head.localhost/update.php?op=results";s:13:"error_message";s:97:"Please continue to <a href="http://head.localhost/update.php?id=&amp;op=error">the error page</a>";}') in C:\Documents and Settings\bjaspan\Home\WORK\drupal\head\includes\database.pgsql.inc on line 144.

The fix is to update the bid column from type 'int' to type 'serial'. db_change_column() in update.php can do this. Note that you will have to manually re-create the table's primary key, too. I suppose you could also manually create the sequence with CREATE SEQUENCE or whatever.

Teaser: What you *wish* you could do is update bid's 'type' field in the schema structure and write a very simple update function

function system_update_nnn() {
  $ret = array();
  db_update_field($ret, 'batch', 'bid');
  return $ret;
}

which automatically re-creates the column, copies the data, and restores the primary key, unique keys, and indexes. The code is written and coming to a core schema patch near you soon.

yched’s picture

FileSize
7.61 KB

I guess I stole the wrong pre-existing pgsql line. I definitely can't wait for that ddl patch to get in :-)

Same patch as in #90, plus the pgsql fix.
Once again, I can try to separate in standalone patches if required.

yched’s picture

FYI : patch for node_access_rebuild_batch() submitted here : http://drupal.org/node/144337

Dries’s picture

Status: Needs review » Fixed

I've committed these fixes to CVS HEAD. Thanks! :)

yched’s picture

Status: Fixed » Needs work

Setting back to CNW for doc task.

Gábor Hojtsy’s picture

yched, please also commit your batch_test module (preferably renaming it to batch_example) to the contrib doc folder. Like node examples, block examples, menu examples and so on, this could serve as a great condensed source documentation for batches of different kinds. It would be a pity, to let your example module vanish.

yched’s picture

Good idea - will do as soon as I come back home in a frew days.

I'm also not at all forgetting the code documentation Dries asked for batch.inc. Er, I've been really overbusy these days...

yched’s picture

@Gabor : I just committed the batch_example.module to http://cvs.drupal.org/viewcvs/drupal/contributions/docs/developer/examples/

My initial batch_test.module is more of an 'exhaustive' regression test, probably more confusing than anything else, so I simplified it to illustrate mainly :
- plain and simple batch
- multipart batch operations
- multipart update function using the new $sandbox param

Johan A’s picture

small thing, i'm trying out yched's batch_example on the head, 5th, july. I think it works as advertised accept I get this after every step:

notice: Undefined index: id in C:\Data\Sites\2007-06-05-drupal-6.x-dev\includes\form.inc on line 2041.

yched’s picture

Category: bug » task
Priority: Critical » Normal

@ Johan A :
This is fixed by the patch in http://drupal.org/node/149593 (pending reviews ;-) )

moshe weitzman’s picture

Status: Needs work » Closed (fixed)

seems like there is nothing to do here. please reopen if needed.