Along with locales auto installer, the rework of node_access_rebuild was the main reason batches were wanted in D6.

Here is a patch for this. A few notes (mainly quoting myself from http://drupal.org/node/127539#comment-236643 in the 'batch patch' thread) :

The batch API was designed to work primarily in a form submission context. What makes our task here non-trivial is that node_access_rebuild is an API function, virtually callable from any place in core / contrib code, where batching is not always possible / suitable.
So here's the approach I came up with : basically, we keep the infamous node_access_rebuild() function as is, but provide a new node_access_rebuild_batch(), which returns a ready-to-go $batch array. In _most_ places where you'd call node_access_rebuild(), you'd go batch_set(node_access_rebuild_batch()) (I think it's important that you go through an explicit call to batch_set() - makes it clear that the stuff you ordered _will_ be delivered, but maybe in the subsequent HTTP request.)

I searched through all of D5 contribs for calls to node_access_rebuild() :
- Most of them happen in hook_enable/disable or hook_update_n - node_access_rebuild_batch() is ok there, since basically you're in a form submission context.
- the other occurrences are in (nodeaccess)_node_type, (taxonomy_access)_taxonomy, (taxonomy_access)_user.
For those 3 I'm not so sure : are we 100% sure we're processing a form submission here ? If not, those cases should probably stick to the 'regular' node_access_rebuild().

PS : I transferred the 'critical' status set by chx for this task in http://drupal.org/node/127539#comment-243405.

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

yched’s picture

FileSize
3.01 KB

I updated my good friend batch_test.module to implement some dumb node access hooks.
Demonstrates node_access_rebuild_batch() in :
- hook_enable
- hook_disable
- hook_update_n

meba’s picture

Issuing a batch rebuild using admin/content/node-settings/rebuild works perfectly. Patch landed without issues.

But clicking "Rebuild permissions" at admin/content/node-settings just prints "Settings saved", doesn't trigger confirm page.

yched’s picture

Status: Active » Needs review

True, but not because of this patch :-)
The 'Rebuild permissions' button on this form is broken, does not redirect to admin/content/node-settings/rebuild as intended, even without the patch applied.

Probably a followup bug from the fapi3 commit. I suggest you report it in a new issue.

Forgot to set this to 'needs review', BTW.

yched’s picture

Updated :

- I finally decided it was OK to let node_access_rebuild_batch() actually _set_ the batch, instead of returning the $batch array. You just call node_access_rebuild_batch() instead of batch_set(node_access_rebuild_batch()) previously.

- I my initial post I forgot to say a word about the TODO that was left in the code, about a slightly convoluted workaround used for a race-condition on module disable (basically, the module is still 'active' when node_access_rebuild is called in its _disable hook, but is already 'disabled' when the batch gets actually processed on the following http request, possibly resulting in {node_access} table being left completely empty).

This new patch
1) comes with a much cleaner approach, introducing a new module_disabling() funtion in module.inc,
2) frees contrib developpers of the burden of implementing their own my_module_disabling() function so that their module does not declare grants while being disabled (see node_access_example on api.d.o, or forum access for instance).

yched’s picture

FileSize
8.75 KB

updated batch_test.module, with new / simpler hook_disable, hook_node_access_records (see the commented out bits)

yched’s picture

chx pointed on IRC that I should clarify about the batch_test.module above :
it's not in the patch , its just there to let people test with an actual module, since I guess there are not many D6-ready node access modules out there yet.

chx’s picture

Category: task » bug

This is not a task, this is a bug, and a critical one. I already expressed my feeling about releasing D5 with a loop that iterates all nodes -- this must not be repeated, that iteration can take several (tens of) minutes on a sufficiently large site. So, I am marking this a critical bug to stop D6 from releasing without it.

All that said, I am not yet familiar enough with batch to say +1 or -1 on the approach but reading the code does not reveal any glaring holes. module_disabling is a bit icky, yes, but how else do you get this information?

moshe weitzman’s picture

seems like we are hard coding 5 nids per request. thats far too few IMO. we'll never get to the end on big sites. i'd like to see a limit of 30 or 50.

code looks good to me.

catch’s picture

*subscribing* - happy to test when things get to beta.

yched’s picture

@moshe : I shoudl probably comment that better, chx also misunderstood this part on IRC

Actually it's not 5 nodes per request, its 'process all nodes, by groups of 5 - after 1 second execution time, report progress and process the rest in the next request'. So if you have 100 nodes and processing all of them takes less than 1 sec, everything is done in one request. Just like how 'multipart update functions' worked so far.

The '1 sec execution time' is hardcoded in the batch processing engine. I took that value straight out of the old update.php code.
I considered having this as a parameter you can set when defining a batch, and then thought it was probably not such a good idea.

Frando’s picture

What about a variable for the execution time, with a default of 1sec? Sites that want a higher time could then set it in their settings.php

yched’s picture

Why not - but you have to keep in mind that this also sets how often the progress bar gets updated and the user gets feedback.

This is not really related to this specific node_access_rebuild thread, though.
I suggest discussion about this happen in a separate thread.

yched’s picture

BTW - there is possibly an edge-case glitch when node access modules get activated on drupal install time (say, in an install profile).
I'm trying to grab merlinofchaos on IRC about that, as I think this is also a problem with current (non-batch) node_access_rebuild : basically, you have no nodes yet, and {node_access} comes out empty, _without_ the default 'grant view perm to all' record.

jlmeredith’s picture

Title: 'batchify' node_access_rebuild » *subscribing* - ready to test

This is currently a show stopper for a site I am building. Will test once committed. Thanks

jlmeredith’s picture

Title: *subscribing* - ready to test » 'batchify' node_access_rebuild

oops fixing title

yched’s picture

following some discussion with merlin on IRC, i've been working on a slightly different angle on this, where node access modules do not directly call node_acces_rebuild(_batch), but simply set a 'needs_rebuild' flag. This should provide a better control of when exactly rebuild processes are triggered.

Unfortunately, I'm under an insane deadline for a, well, paying project, and am not moving as fast as I'd like on this. New patch in a few days, I hope.

David Strauss’s picture

Title: 'batchify' node_access_rebuild » Amortize node_access_rebuild

A batch is just a group of common operations. This technique is called amortized analysis because it splits a long operation into fixed-size chunks for easy computational digestion.

This is a good idea for node_access_rebuild.

yched’s picture

Title: Amortize node_access_rebuild » node_access_rebuild in a batch

David, Thanks for the hint - alas, the D6 implementation already went in under the name 'batch', and the API functions are named batch_set, batch_process... I proposed that name when starting the work, and no-one objected or came up with a better term during the several weeks it took to get the thing right and committed ;-)

We _might_ want to rename the whole thing, but until then, I still think that keeping the 'batch' term in the issue title makes it clearer what it is actually about. But, OK 'batchify' was probably too barbaric...

jlmeredith’s picture

Follow-up note for those who find this thread in searching for a solution to 5.x installs that are timing out due to memory limits during: a) node access permission rebuilds b) the enabling / disabling of node access modules. There is a patch which has been committed to the current HEAD of 5.x which fixes the issue. I searched for some time before finding this. I am currently using the Node Access, Forum Access and ACL module along with Organic Groups with no issues. Here is the link to the 5.x patch info: http://drupal.org/node/108752

yched’s picture

OK, getting back at this, with the approach I mentioned in #16, suggested by chatting with Earl and Moshe on, IRC (i _think_ it was them, it's been a few weeks now...) :

- When node_access table needs to be rebuilt, a flag is set using node_access_needs_rebuild(TRUE);
This is done in module.inc's module_enable / module_disable, that check if node access modules are involved.
As opposed to D5, no special hook_enable/disable or my_module_disabling() dance is needed from n_a modules themselves.
- When the 'needs rebuild' flag is set, a message is displayed on all '/admin' pages, inviting the user to visit 'admin/content/node-settings/rebuild' (confirm form) - just like in dww's update.module patch.
This avoids unexpected long batch operations when enabling / disabling modules.

- No API change for node_access_rebuild() which works in 'dual' mode, using a $batch_mode param (defaults to FALSE).
- Modules can use node_access_rebuild($batch_mode = TRUE) in hook_update_n. The batch rebuild will be appended after the last update. They can also simply use node_access_needs_rebuild(TRUE); to set the flag and let the admin trigger the actual rebuild (we might *recommend* one of the two, but both should work).

The last thing that's missing AFAICT is about install profiles. Install profiles that enable n_a modules at install time should be able to rebuild the table at the end of installation - esp. if they created some nodes as well ? With the current patch, they can either :
- set the 'needs rebuild' flag and let the user finish the work (not that 'clean', probably...)
- use node_access_rebuild() ('old' mode, no batch), which should be more than enough if they did not define 100's of nodes.

If we really want batch mode for profiles as well, additional work the installer is needed to let profiles have an opportunity to batch some stuff if they need to. I do think this can be tackled in a subsequent patch, though.

The thread was already in 'needs review' state. Well, it actually does :-)

yched’s picture

FileSize
1022 bytes

Attached is an *example* node_access module, to let reviewers actually test the patch. The module also defines an update function to test node_access_rebuild() in updates.

Note : running the patch probably requires the batch fixes in http://drupal.org/node/149593.

yched’s picture

FileSize
1.02 KB

My sample na_example module in the previous post had some cruft left from previous attempts. Here it is again, cleaned up.
I kept and commented out the parts that na modules had to implement in D5 to work properly, and that are not needed anymore with the 'batch na_rebuild' D6 approach.

merlinofchaos’s picture

Subscribing as a reminder that I need to review this.

moshe weitzman’s picture

Status: Needs review » Needs work

i tested the example module along with this patch along with its dependancy patch. all worked as designed. minor comments below, then this is RTBC.

  • the last letter needs to change in the following phrase(2 occurances): "need to be rebuild" => "need to be rebuilt"
  • please add doxygen for node_access_needs_rebuild()
yched’s picture

Improved code comments as per moshe's suggestions.
Should also perform a tad faster by reducing the number of iterations (process by packs of 20 nodes instead of 5 by 5 previously - batch processing pauses after 1 sec anyway...)

A small correction / rewording to what I was saying about install profiles at the end of comment #20 :
- with current patch, install profiles that enable n_a modules will let the user see the 'node access needs rebuilding' message after install completes.
- to avoid that, they can call node_access_rebuild() ('old' mode, no batch) themselves in their custom install steps. That's ok if they do not define 100's of nodes.
- if we want install profiles to be able to use the new spiffy (and safe) batch mode, additional work on the installer is needed to let profiles have an opportunity to batch some stuff if they need to. I do think this can be tackled in a subsequent patch, though.

yched’s picture

Status: Needs work » Needs review
moshe weitzman’s picture

Status: Needs review » Reviewed & tested by the community

thanks yched. moving to rtbc

Dries’s picture

Status: Reviewed & tested by the community » Needs work

Is there a reason why we just want to 'flag' the table for rebuild, instead of rebuilding it right away? Performance? It would be good to document that in the PHPdoc of the rebuild function.

Also, looking at the PHPdoc comments, it's not clear when I'd want to $batch_mode to FALSE. What is a good reason not to use batch mode? Can a module developer realistically decide ahead of time whether an update might result in a timeout or not? Shouldn't we always use batch mode? If there is a strong reason, please document this.

yched’s picture

Status: Needs work » Needs review
FileSize
11.22 KB

Rerolled after node.module split, plus added comments requested by Dries.

Basically, the node_access_needs_rebuild() flag is a user friendly way to let the admin perform the setting changes he intends to do without going through a long rebuild on each step. Or to avoid 'enable an n_a module and disable another one means 2 full rebuilds'.

And the need for both batch and non-batch mode is about the trickiness of providing a drop in 'batch' replacement for an API function. In short, batching is kosher while in a FAPI submission workflow, but not in cron, for instance (you need an actual browser to perform the redirects).

A grep through all of current D5 contrib repository brought up only 3 problematic instances of node_access_rebuild() calls :
hook_node_type('insert') in nodeaccess.module
hook_taxonomy('delete') in taxonomy_access.module
hook_user('update' / 'insert') in taxonomy_access.module
For this type of context, the callstack might be all sorts of things : form submission, cron, direct call to user_save, install profile... There is AFAICT no way we can decide if we are in a safe context for batching, thus the need for a fallback non-batch mode.
The fact that 'non-batch' is the default mode also means no actual API change.

bdragon’s picture

Status: Needs review » Needs work

Needs a reroll...
I'll reroll.

bdragon’s picture

Status: Needs work » Needs review
FileSize
11.5 KB

OK.

bdragon’s picture

OK, after testing:

* Running in cron hook: Works great.

* Running in browser: Works great.

* Running in browser with JS disabled: Empties the table and sits there.

Is there a fallback for this new batching stuff to work when JS is disabled?

yched’s picture

Thx for the reroll.

About non-JS mode, this is not specific to this patch, but is related to the batch engine.
See http://drupal.org/node/168870#comment-291546. In short : it does work when JS is disabled - but *not* if you just disabled JS in order to test it, because the has_js session cookie is still here.

If you want to test non-JS batch execution, disable JS, close your browser, open it, and test. Maybe we want to improve that, but it's something to discuss on http://drupal.org/node/168870.

bdragon’s picture

OK, I suspected that might be the case.

chx’s picture

Status: Needs review » Reviewed & tested by the community

Following up on the last few reports, this is done.

bdragon’s picture

I concur.

Gábor Hojtsy’s picture

Status: Reviewed & tested by the community » Fixed

Although I think this is quite late in the release cycle to have such a new feature added, I understand that this is a major pain point for those using node permission modules and batch API was supposed to solve this. So as Dries' comments are solved and I see no problems (apart from the slightly hackish node_help() checks), committed.

yched’s picture

Status: Fixed » Needs work

Cool. Marking as 'needs work' until I update the "D5->D6 migration' handbook page, and example_n_a.module

yched’s picture

Anonymous’s picture

Status: Fixed » Closed (fixed)