Some sites need fine grained timing for cron runs. For example, a site that relies heavily on moblogging, mailhandler or any other email based function would want to connect with the email server pretty often, say ever 2-5 minutes. On another site I'm working on we generate flat files for all of a certain node type on a regular basis, though not every 2-5 minutes. Thus the need for cron tasks on different schedules. This patch allows you to make a list of modules to either include or exclude when running cron, so that several cron tasks for one site can be defined. The default behavior in the absence of both parameters is to run all, so the patch preserves backwards compatibility.
* Examples:
*
* runs all hooks
* http://mysite.com/cron.php
*
* runs all hooks except the search and aggregator modules
* http://mysite.com/cron.php?exclude=search,aggregator
*
* runs only search, archive and aggregator modules
* http://mysite.com/cron.php?includ=search,archive,aggregator
Comment | File | Size | Author |
---|---|---|---|
#66 | drupal_cron-19173-66-axyjo.patch | 12.05 KB | axyjo |
#63 | drupal_cron-19173-63-axyjo.patch | 11.13 KB | axyjo |
#61 | drupal_cron-19173-61-axyjo.patch | 11.31 KB | axyjo |
#59 | drupal_cron-19173-59-axyjo.patch | 11.14 KB | axyjo |
#57 | drupal_cron-19173-57-axyjo.patch | 10.1 KB | axyjo |
Comments
Comment #1
robertDouglass CreditAttribution: robertDouglass commentedComment #2
Dries CreditAttribution: Dries commentedIsn't it possible to run all cron functions every 2-5 minutes? The impact of that should be minimal.
Comment #3
moshe weitzman CreditAttribution: moshe weitzman commentedan alternative approach is to make a quick PHP page which simply calls mailhandler_cron and any others that need to be called more frequently than normal. then call that php page from cron every couple minutes ... not so sure this complexity is worth having.
Comment #4
robertDouglass CreditAttribution: robertDouglass commentedWhether or not this "complexity" is needed depends a lot on what you expect to do with cron. As soon as one starts aggregating not tens but several hundred RSS feeds or more, it is impossible to run cron every 2 minutes. Yet if the site depends on mailhandler keeping user moblog submissions current, 2 minutes is already about the maximum latency time acceptable. Thus the need for crons on different schedules.
In terms of usage, writing
?include=mailhandler,search,event
isn't so complex. On the code level, splitting that list on comma and looping over the resultant array also isn't really that complex. The patch defaults to normal behavior for people who don't need the feature, and I've seen more than one site that had to solve this problem.
Comment #5
Robert Castelo CreditAttribution: Robert Castelo commentedCould different cron runs be set to run different tasks based on a cron ID number....
For instance, set up 3 separate cron to run every hour, each one calls the cron.php page including a number variable which runs a certain set of tasks.
An Admin control panel would list all cron tasks and allow administrators to split them into separate cron runs based on the number variable passed to the script when calling cron.php.
Comment #6
robertDouglass CreditAttribution: robertDouglass commentedRobert,
the id parameter is already a practical necessity for this patch if cron_busy is to be effective. I'm open to suggestions as to how we would track various cron tasks and whether they are busy.
First things first, though: does anyone except me need this?
I'm not going to program an admin-configurable cron mechanism unless there is demand for it. Furthermore, such a system would really be incomplete unless there were some actual mechanism for setting these cron runs up from the admin interface as well. This wouldn't be such a bad idea, I know a lot of people who still type /cron.php into their browser every day or so to updated thier sites (yes I've mentioned poormanscron). Anyway, first reactions to this patch have been cold, so maybe nobody wants it.
Comment #7
Robert Castelo CreditAttribution: Robert Castelo commented+1 from me.
I'll be needing something like this once my email newsletter is released and starts hogging cron runs.
Comment #8
Chris Johnson CreditAttribution: Chris Johnson commentedIt does seem like a good idea to allow running different cron tasks at different intervals, so as to be efficient and reduce web server loads. There is little point in running tasks that do not need to be run. Many cron tasks only need to be run once to a few times per day.
Note also that cron job scheduling on most Unix-variants is only within a +/- 1 minute range, so trying to run a cron program every 2 minutes is approaching the crumbling edge of being "on time."
Comment #9
robertDouglass CreditAttribution: robertDouglass commentedI think I am leaning in favor of extending cron.php in the following way:
1) Add admin screen which scans all modules for existence of cron hook
2) set a cron_interval variable for each one
3) allow admin to set an interval
4) check whether the current cron run falls outside of each module's interval before invoking its cron hook
5) before a module's cron hook gets called a {module_name}_cron_busy variable gets set. This would function like the current cron_busy now, just at a modular level.
The admin would then set one cron tab to run at the smallest needed interval and could then decide on a module-by-module basis how often each is to run.
How does this sound to people?
Comment #10
Bèr Kessels CreditAttribution: Bèr Kessels commented-1 on this one.
I think the hook_cron should carry the logic whether or not it wants to be ran. Maybe aintroduce an API that allows theis checkto be done easier? drupal_elapsed_run($name, interval) that will check when $name was last updated, if it was longer then $interval ago, return TRUE, and update the last run of $name.
Comment #11
robertDouglass CreditAttribution: robertDouglass commentedBèr, how can you adjust the interval then? If it is hardcoded how often the cron should run, administrators loose all control and you can't have different sites have different schedules, unless of course you have cron configuration in the interface itself. I think cron scheduling should be completely done with the cron tool itself, and that the URL or command that is given should be able to handle the fine tuning of which module's hooks get called.
Comment #12
ezheidtmann CreditAttribution: ezheidtmann commentedWhen I last used aggregator (which was probably back in the 4.4 days), it let me set a refresh interval for each feed. Is this no longer the case?
Note I have nothing against this patch. It sounds like a good general solution for removing calls to hooks that are resource-heavy every time they run but still allowing some other hooks to be run more often. However, I also like Ber's idea of a convenience function to make delays easier to measure.
Comment #13
Uwe Hermann CreditAttribution: Uwe Hermann commented+1 for the include/exclude feature, I'd use this on multiple sites... Uwe.
Comment #14
kbahey CreditAttribution: kbahey commented+1 on an include/exclude mechanism.
However, I am not sure that having it in the URL is a good idea. This could expose things that we may not want exposed in the future. Anyone on the net can run your stuff more frequently than you want to.
what I would like to see is a cron settings page with all the modules that have a cron hook listed, and a frequency that can be changed.
cron.php can then check that setting per module and decide to run it or not.
Something along those lines would be better than URL based solutions which requires many entries in crontab, and may open exploits.
Comment #15
Dries CreditAttribution: Dries commentedNot sure this is really needed. Most (if not all) cron hooks are smart enough not to trash your system. If you have to do include/exclude-tricks something is wrong. I'm tempted to say '-1'.
There is a typo in the documentation: 'includ' -> 'include'. There is also a tab in the code.
Comment #16
moshe weitzman CreditAttribution: moshe weitzman commentedseems that interest has waned.
Comment #17
Bèr Kessels CreditAttribution: Bèr Kessels commentedassigning this to me. We need this too, for a mobile moblog alike thingy, where we need to check for mail etc every 30 seconds or so, but dont want drupal do collapse under being cronned every so manyu seconds.
Besides that. Drupal is not able to handle fast crons. A lot of cron functions are architectured like this:
if cron is fired *before* do_tough_server_stuff() (because some joker points his browser at yoursite.com/cron.php) your expensice thing is ran twice. Same for when your crontab fores it faster then that expensice stuff was finished.
Note: i am no php guru, so I guess this can be solved with forking and so.
/i/ really need finer grained cron runs; So I will surely come up with a new path. take it or leave it.
Comment #18
LAsan CreditAttribution: LAsan commentedFeature request moving to cvs.
Comment #19
wedge CreditAttribution: wedge commentedI needed to specify the interval for different module hook_crons as well. But when I looked at some code this turned out to be really easy to do inside my cron hook. I used some code from statistics.module
Thanks open source!
Comment #20
kbahey CreditAttribution: kbahey commented@wedge
Please do not stray off the topic at hand.
Your post is a separate topic, and an interesting one. It warrants its own separate issue. Please search the queue to to see if this has been filed as an existing issue, and if not, post a new issue about it.
Comment #21
Bèr Kessels CreditAttribution: Bèr Kessels commentedComment #22
moshe weitzman CreditAttribution: moshe weitzman commentedThe problem that Ber points out in #17 no longer exists. we a have a cron semaphore which prevents multiple crons at same time.
Comment #23
axyjo CreditAttribution: axyjo commentedIf Drupal would support this, it would be fantastic!
Comment #24
axyjo CreditAttribution: axyjo commentedHere's a patch that I made. cron.php now takes one of the following:
Comment #25
swentel CreditAttribution: swentel commentedI like the patch. Looks clean at first glance, but not yet tested. As also told on irc, I'd like to see a little change in it:
When cron is completed, a watchdog message is inserted with a simple dry 'Cron run completed' message. It might be interesting to also list included/excluded module names if applicable.
Also should we write tests for this right now? Or another testingpart08 issue mabye (if this patch get's in of course)
Comment #26
axyjo CreditAttribution: axyjo commentedOk, fixed the patch with swentel's (#25) comments. Also removed two unnecessary
return true;
statements.Comment #27
kbahey CreditAttribution: kbahey commented@axyjo
You are not using t(), and just concatenating messages, which is not the way we do things in Drupal.
But more importantly, if we go with this patch, it would be best if we including timing information, as per my patch here http://drupal.org/node/131536. Dries was worried about cluttering watchdog, but too often cron does not complete or takes too long. It would be helpful to see which module(s) is the slow one.
Oh, and my patch shows you an example of using t() for watchdog messages with arguments.
Comment #28
axyjo CreditAttribution: axyjo commentedHmm. The core itself didn't use t() for the watchdog messages, but I updated the patch anyways.
Comment #29
axyjo CreditAttribution: axyjo commentedForgot to mention that it does NOT allow both an include parameter and an exclude parameter.
Comment #30
axyjo CreditAttribution: axyjo commentedForgot to change this.
Comment #32
axyjo CreditAttribution: axyjo commentedRe-rolled patch against HEAD so that #31 won't fail.
Comment #34
axyjo CreditAttribution: axyjo commentedStupid me. I thought I already checked it with php -f. Definitely checked now.
Comment #36
axyjo CreditAttribution: axyjo commentedOkay. Added two more tests for the functionality I've added, and I've ensured that tests pass.
Comment #38
Dries CreditAttribution: Dries commentedI'm not sure this is the proper long term fix. The "job queue" might be both a better and an easier solution. This patch implements something which resembles a job queue, but it is not quite a job queue.
Comment #39
axyjo CreditAttribution: axyjo commentedA job queue, however, won't be run until regular cron is run. The cron would need to run every minute in order for that much granularity. Also, the job queue (if it is stored in the database, which it probably is) would have many duplicate entries for people who want to run things often, unless we come up with a cron-like syntax with which it'll be efficient. It would then need a user interface so that it would be customizable.
Is anything that I just said correct? Can anything be improved upon? I'm still learning the ropes around Drupal's code.
Comment #40
axyjo CreditAttribution: axyjo commentedRe-rolled against head.
Comment #41
axyjo CreditAttribution: axyjo commentedHopefully adding a tag might get more eyes on this.
Comment #42
mikeytown2 CreditAttribution: mikeytown2 commentedSupport for command line arguments would be nice as well
http://php.net/reserved.variables.argv
Post about this
http://www.opensourcery.com/blog/dylan-tack/choosier-cron-runs
Comment #43
axyjo CreditAttribution: axyjo commentedRerolled against HEAD. @mikeytown2: Wouldn't that need implementation in projects like Drush: http://drupal.org/project/drush?
Comment #44
mikeytown2 CreditAttribution: mikeytown2 commentedDepends on how you set cron. I call mine via a system call, not by url.
Comment #46
axyjo CreditAttribution: axyjo commentedI'll fix the exceptions tomorrow morning. I thought I already took care of that that in an earlier post.
Comment #47
axyjo CreditAttribution: axyjo commentedWhoops, an extraneous line got in the way. Yay for testing!
Comment #48
mikeytown2 CreditAttribution: mikeytown2 commentedComment #50
mikeytown2 CreditAttribution: mikeytown2 commentedOffsets where couple of hundred... re-submitting new patch with correct offsets now
Comment #51
axyjo CreditAttribution: axyjo commentedSome major change seems to have occurred on the 31st or the 1st that messed around with the offsets within the files.
Comment #52
mikeytown2 CreditAttribution: mikeytown2 commentedComment #53
webchickI too have had the need for this, mainly when performing long, expensive operations like data imports. Something like this would've helped me not have to write a separate script for importing Calais terms on a client site at #409630: Create a way to do mass-importing without batch API.
Here's a review of the patch, although Dries does not seem too hot on it, and his OK will be needed for this to get in. I too am not crazy about passing in modules to run via the URL, but with $cron_key there now this is a little less of an issue.
Let's change this to use the ternary operator:
This would clean up the code below so it could just do
if ($include && $exclude) { ... }
Let's default these to NULL. Makes the code below cleaner. (if (!$list && !$mode) rather than $list == ''..)
Also, passing in a delimiter by reference seems a little funny. Is there a use case for this? If not, I'd just document "use commas" and be done with it.
End comments with a period.
Needs to be a space between the elseif and the (
Why module_list() and not module_implements('cron')?
module_invoke is indented too much, needs to be space between foreach and (
Yuck. :) How about simply build implode() on the $run_array array?
Let's do if (!empty($cron_message)) which is more consistent with other places in core.
90% of the code above is re-copy/pasted here. I wonder if there's a way to handle this in the same loop above? If not, we may want to pull this out into a function that can be called in both places.
FALSE
Let's test multiple include/exclude parameters as well.
Beer-o-mania starts in 23 days! Don't drink and patch.
Comment #54
axyjo CreditAttribution: axyjo commentedOkay, here's a new and improved version:
* Takes care of all of the coding standards issues webchick was talking about (except one).
* Added more tests that check whether Drupal accepts invalid module names
* Added tests for multiple parameters (webchick's idea)
* Redid the code such that code isn't needlessly repeated. (again, webchick's idea)
* Added docblocks and fixed comments
I couldn't find the issue with "module_invoke is indented too much" though.
Comment #55
axyjo CreditAttribution: axyjo commentedArgh. Forgot to update HEAD before creating the patch. Hunks could be off.
Comment #56
moshe weitzman CreditAttribution: moshe weitzman commentedArguably, cron_last should now be an array keyed by module. It is kinda ambigous without that.
Comment #57
axyjo CreditAttribution: axyjo commentedSounds like a logical idea. Done with a test.
Comment #58
mikeytown2 CreditAttribution: mikeytown2 commentedIf going for an array, you need to change the status page code for cron last run. This is the only other place where I can find 'cron_last'.
http://api.drupal.org/api/function/system_requirements/7
Also to make reporting that easier, you might want to store the last run value under the key name 'cron' or 'drupal', or something like that. Pick a name that is a reserved name, so contrib will not be interfering.
Comment #59
axyjo CreditAttribution: axyjo commentedThe way I did it was that I used max() in system_requirements. Also, when running cron for all modules, I individually set the time for each module so that if the module wants, it can depend on that variable for the accurate time it last ran.
Comment #61
axyjo CreditAttribution: axyjo commentedFixes exceptions.
Comment #62
tic2000 CreditAttribution: tic2000 commentedThis comments need a little work. "If both arguments are provided, then the function includes or excludes the modules specified in the first argument." sounds better to me.
Extra space on the empty lines. And is like that on many places in the patch.
It should be ', ' for better readability of the output message (one space after the comma).
21 days to code freeze. Better review yourself.
Comment #63
axyjo CreditAttribution: axyjo commentedFixed with suggestions from #62
Comment #64
mikeytown2 CreditAttribution: mikeytown2 commented#63
+1 works for me!
Comment #66
axyjo CreditAttribution: axyjo commentedSame patch, essentially, but changed after the poormanscron patch got in.
Comment #68
Dave ReidMoving to new cron system component.
Comment #69
axyjo CreditAttribution: axyjo commentedNeeds a rewrite because #578676: Use queue for cron is in.
Comment #70
robertDouglass CreditAttribution: robertDouglass commentedHere's more evidence that this functionality is a good idea.
http://metaltoad.com/blog/how-drupals-cron-killing-you-your-sleep-simple...
Comment #71
greg.harveyShould this now be moved to D8? I notice un-committed D7 issues are slowly being moved to D8. Just a thought. (Plus subscribing.)
Comment #72
axyjo CreditAttribution: axyjo commentedYes. This is too late for core now.
Comment #73
Philw CreditAttribution: Philw commented#43: drupal_cron-19173-43-axyjo.patch queued for re-testing.
Ignore this as you probably will. Mistakenly posted.
Apologies.
Comment #74
lpalgarvio CreditAttribution: lpalgarvio commentedsub
Comment #75
amonteroLinking to relevant issue:
#154043: Cron should not remain monolithic (Implement a scheduling hook for cron)
Comment #76
thedavidmeister CreditAttribution: thedavidmeister commentedI've had a look at this patch, I've read over this thread and #154043: Cron should not remain monolithic (Implement a scheduling hook for cron) and #1442434: Do not port Elysia Cron, recommend Ultimate Cron for Drupal 8, I've personally used Elysia Cron in D6 and D7 on multiple sites for multiple use-cases and I really don't think this approach is the best (or even a good) solution to the problem.
Have a look at how modules like Elysia Cron and Ultimate Cron tackle this exact issue and we can see that putting an explicit schedule in a hook_cronapi() style configuration when the task is defined is the standard, rather than trying to mess with GET parameters in the URL to define what tasks are run for each execution of cron.
The EC/UC approach keeps the scheduling "inside Drupal" rather than requiring complex crontabs to be built - in the end we can build something more robust and easier for beginners to understand and for everyone to manage long term.
I'm closing this issue because I strongly believe the approach presented in patches in this thread is inferior to solutions provided elsewhere in contrib, and other outstanding core issues propose a better approach, but kudos to @robertDouglas for attempting to solve this problem right back in 2005!
I would hope and assume that the attitude of @Dries to finer-than-binary granularity in scheduled background tasks has changed to be more positive since 2005 and the official stance is no longer "you should be able to run all tasks every 2-5 minutes without any trouble", because that would have to be a joke for many sites in 2014 that schedule many different tasks that put rather varied loads on the server.
I'm using this review as #2094585: [policy, no patch] Core review bonus for #154043: Cron should not remain monolithic (Implement a scheduling hook for cron) as I think that's appropriate :)