Download & Extend

Scalability / Memory limit / Chunked execution of the karma mass recalculation

Project:User Karma
Version:6.x-1.x-dev
Component:Code
Category:feature request
Priority:normal
Assigned:Unassigned
Status:active

Issue Summary

I am playing with this on a (copy of a) site with around 15k users.
The karma_mass_recalculation easily hits the memory limit.
Maybe it will also hit the max_execution_time if it does not hit the memory_limit first.

Would it be possible to do the mass recalculation in chunks, so we only process 2000 users at once, and then wait for the next cron run? Or even do it in one request, but keep only 2000 users in memory at once.
This could be done in different ways - process users where i*2000 < uid < (i+1)*2000. Or, where (uid+i)%n == 0. Etc.
(2000 is just a magic number I came up with, maybe you find something more reasonable).

Thanks!

Comments

#1

hm, could be that the debug messages are the cause. I will investigate.
Or rather, I'm waiting for the mass recalculation to complete.. a progress bar could be nice.

#2

I did a bit of benchmarking on user_karma_karma_mass_recalculation().
- all sub-modules disabled
- debug messages disabled
- different checkbox settings on admin/settings/user_karma
- fixed time limit of 20 seconds for user_karma_karma_mass_recalculation(), and counting the number of users.

Result:
- Between 230 and 400 users processed within the 20 seconds.
- This is 10 - 20 users per second.
- Or 15.000 users in around 16 minutes.

Some remarks:
- With the debug messages disabled, I did not hit the memory limit anymore, but I didn't really try it either - always stopped out of impatience.
- Probably it would be even slower with the sub-modules enabled.

I think chunking and progress bar would be a good idea. Chunking for cron, and ajax + chunking + progress bar for web browser requests.

If you are interested, I could work on a patch. But would like to hear your opinion first.

#3

Btw, when is the "mass recalculation" actually needed? I guess 1x on first install, but then the system will keep itself up to date without the recalculation?

#4

Looking at user_karma_calculate_karma()
- module_invoke_all('user_karma_partial') -> has almost no effect on performance
- votingapi_set_votes($new_vote, $criteria); -> heavy performance impact
- user_karma_calculate_role($uid); -> heavy performance impact
- module_invoke_all('user_karma_recalculate_user') -> has almost no effect on performance

If I comment out votingapi_set_votes and user_karma_calculate_role, the 15k users are processed in less than 20 seconds.
With any of the two function calls enabled, I get no more than 800 users processed in the 20 seconds.

#5

I have same problem during recalculation of 5k. Finally I have 504 Time-out. Batch API ?

For the moment I use @set_time_limit(0); in user_karma.module

nobody click here