At least since 4.6 and probably earlier too, cron tasks have been (as I see them) an unclassified amount of tasks handled by module on all cron runs, requiring each module to take uncoordinated steps on its own to manage its own scheduling, with the net result that cron tasks have to fire all existing hook_cron implementations, potentially to memory exhaustion problems in many hosting situations (see the number of issues around this, notably regarding search).
One way to work around this would be to enable cron-based scheduling instead of module-based scheduling, and/or allow cron to be called to invoke specific cron-ed tasks instead of the whole lot, much like the original UNIX cron uses a crontab instead of relying on every cron job to know when it must be scheduled.
Upsides:
- one instance of scheduling code instead of many incompatible ones
- ... which can justifiy the work going into a nice scheduling UI
- ...without preventing modules from doing their scheduling themselves
- ...and can still work with the existing hook_cron specification
- ability to reduce the memory/cpu requirements on cron runs. For instance, long-running tasks like search indexing can be run on their own instead of along with the othet tasks, reducing breakage probability
- ability to minimize the impact of choking tasks, like aggregator updates failing to obtain the upstream source, since these would be performed on their own runs instead of along other tasks
- probably still compatible with poormanscron
Downsides
- potential compatibility loss for some situations (which ones ?)
- some form of reentrancy would have to be considered: unproperly scheduled cron tasks would be more susceptible than now to being invoked without the previous run being finished. A solution would probably to handle the runs within cron as some form of critical section, now allowing a scheduled subtask to be started if the previous run has not returned (possibly with a failure situation)
What do you think of it ?
Comments
Comment #1
fgmSetting to future version.
Comment #2
cburschkaIt would be nice to have Drupal emulate a "virtual crontab" (only that it can't work in real time, so it would have to be slightly different).
The problem with a simple solution - like having hook_cron() pass an argument to the module that tells it when it was last triggered - is that it's still relatively useless information to the module: It needs to know when it last did whatever it does, not when cron was last fired.
A better method might be a scheduling hook. Primitive example:
Comment #3
cburschkaNice feature, this. Please don't forget to fix cron in D7.
Comment #4
EvanDonovan CreditAttribution: EvanDonovan commentedThis would be great, and would help a lot with the issues that cause #131536: Make cron watchdog more granular and informative to be necessary for cron debugging.
Comment #5
EvanDonovan CreditAttribution: EvanDonovan commentedAdded subtitle.
Comment #6
Dave ReidMoving to new cron system component.
Comment #7
Anonymous (not verified) CreditAttribution: Anonymous commentedMarked #246871: Flexibility in Drupal Cron scheduling a duplicate of this issue.
Comment #8
Anonymous (not verified) CreditAttribution: Anonymous commentedWhat about a modification to hook_cron that would make it a registration of events to occur in an array much like the hook_theme. The data in the array would contain a default value for the cron task timings and a UI is created to schedule those timings to their desires. For for module foo the hook_cron implementation would look something like the following to be refined in discussion to follow.
Obviously I'm whiteboarding here and the data returned from the hook_cron implementations needs to be refined. The cron.php script would be changed to call the registered tasks at the appropriate time instead of iterating through the list of implemented cron_hook.
Comment #9
skesslerIt would seam to me that the best way would be to be able to weight tasks for cron and then set parameters for various weights that should run in a different job. Also a specific module should be able to specify itself as its own cron job. So for example migrate module (http://drupal.org/project/migrate) could have its own cron.
Would there be one cron task that would look for something telling it which cron tasks to run or should files be created that represent the various cron task to run. How does this integrate with the implementation of poormans cron in Drupal 7.
Comment #10
gielfeldt CreditAttribution: gielfeldt commentedI suggest hook_cron() to act like hook_menu() or hook_theme() (like already proposed). This also opens up for implementing a hook_cron_alter() like the hook_menu_alter().
Elysia Cron and Ultimate Cron already does this in some way or another, which could provide inspiration to a new and more versatile cron system in Drupal 8.
Comment #11
Anonymous (not verified) CreditAttribution: Anonymous commentedI would love to see the likes of Elysia Cron as default install for core.
Comment #12
timhilliard CreditAttribution: timhilliard commentedI would like to see a weighted cron system, simply specifying an interval does not take into account the specific site implementing the cron. Not all sites need the recommended interval set on cron runs. Having an interval could also be problematic due to the fact that if several cron passes have been missed, all the crons will be fired at the same time which should be ok but probably undesirable. What I think would be better is if the cron system takes all of the crons hooks in the system (btw I like the idea of being able to declare multiple crons in one cron hook), look at the crons weight (which can be configurable in a cron admin page), looks at how long that particular cron run took to run last pass and using an equation decides which cron runs it will run this time based off a calculated weighting system. I think also implementing a hook_cron_info command to declare cron functions would enable this to be backwards compatible.
@gielfeldt what would the hook_cron_alter be able to change? the weighting/interval? the callback? stop the callback?
I might have a crack at writing this system in the next few days.
Comment #13
joachim CreditAttribution: joachim commentedBTW, http://drupal.org/project/job_scheduler allows scheduled jobs to have a crontab syntax for determining when they are run.
Comment #14
gielfeldt CreditAttribution: gielfeldt commented@timhilliard I would say that hook_cron_alter() should be able to change anything that has been declared in hook_cron(). Including removing the job by unsetting the job in the array.
Comment #15
amonteroLet's suppose we have 3 cron hooks to be execd:
-node_cron
-system_cron
-search_cron
They are (as all cron hooks by default) included in the group "all" or "default". Let's call it a "channel".
Cron.php is called hourly from crontab as usual and the "all" channel is run by default.
Now we want to change search index scheduling:
We create a new channel "daily" and via some kind of drag interface we move search_cron from channel "all" to newly created channel "daily".
We add to the OS crontab another line like
cron.php?cron_key=KEY&channel=daily
with its own scheduling.Channels have not to match scheduling intervals, there can be channels like "housekeeping", "heavy_tasks", "offpeak" or whatever your particular installation requires, as long as you later add the specific channel crontab entry. Anyway if you are tweaking cron, you should probably have at least the knowledge of how to add a crontab entry. I prefer some sort of weighed system instead of specifying time intervals from inside Drupal as mentioned earlier.
This respects the assumption that cron is simply invoked externally from Drupal without having any control or beforehand knowledge of how often it will be called as I think it is currently.
Perhaps this can better split responsibilities between the site admin and the host system admin on who decides/controls what as it is now, but that's a supposition since I usually wear both hats. This may or may not be desirable, too.
At crontab file level, it's a "call cron.php very often enough and let it decide the scheduling interval" vs. "call each channel with this exact scheduling and don't let Drupal [site admin] decide anything about the scheduling interval".
Note that I have not a deep understanding on cron's implementation, so:
-Some base assumptions on my part may be plain wrong.
-I might be biased in favor of keeping it simpler or at least not much different of how it works currently.
What do you think about this system?
Comment #16
geerlingguy CreditAttribution: geerlingguy commentedHaving found elysia_cron (or an equivalent) to be a necessity on anything but the simplest of Drupal sites, and having helped a ton of people with cron issues (usually having one module's cron task break cron on a site), I think at least having the ability to run different module's cron tasks at different times would be helpful. The other thing I would love to see is the ability for a module to specify an array of cron tasks (rather than have to try to schedule different things to happen on different cron runs).
See #1442434-15: Do not port Elysia Cron, recommend Ultimate Cron for Drupal 8.
Comment #17
Anonymous (not verified) CreditAttribution: Anonymous commentedSomething like a registry of callback functions to execute with the registry entry giving the cron run frequency and time limit?
Comment #18
geerlingguy CreditAttribution: geerlingguy commentedSomething along those lines, yes. Although I don't know if setting a frequency would be as flexible as I'd like. I often run cron every minute on a site, and I have some tasks run every minute, some every 5, some every hour, some twice a day, once a day, once a week, etc., so it'd be tough figuring out all those intervals (but much better than the current situation).
Comment #19
thedavidmeister CreditAttribution: thedavidmeister commentedI'd definitely like to see this functionality in core and for the API to be close to how hook_cronapi() works for the latest versions of Elysia Cron.
Working off a "frequency" that requires you to know how the system cron is configured sounds harder to understand than necessary and error prone (what if your local dev cron is configured differently to the production environment?).
I personally have experienced that about 10-15% of the site's I've built need a custom cron schedule to stay alive without the server periodically falling over (often because of the system cron cache flush, but there are other reasons). Many of the sites could probably have benefited from a custom cron even if I didn't spend the time tweaking them carefully.
Usually what happens is I have one or more processes that need to be run somewhere between every 1 and 10 minutes, but I want all my other processes to be run less than once per hour, or even barely once per day.
Some examples I've come across that would tempt/require me to dial the cron frequency right up on a site:
- Setting search indexing to be many small batches greatly reduced server load for one site with very spiky content creation
- Need to send batches of 1000's of emails through a third party service (like Mandrill) by working with the queue API. These emails need to go out every minute.
- Pulling time sensitive data from a third party service
- Using Views Bulk Operation's "enqueue instead of execute directly" functionality relies on cron to be running regularly
I think maybe in 2007, or 2005 when #19173: Pass include and exclude parameters to cron.php for fine grained cron timing was opened, this wasn't such a big deal and was probably a "normal" feature request, but I feel that cron management is pretty important now that we're in 2014 and even "simple" sites have a decent chance to be doing time sensitive background processes but don't want their server to die from too-frequent, too-heavy cron runs.
Comment #20
thedavidmeister CreditAttribution: thedavidmeister commentedhttps://drupal.org/comment/8463873#comment-8463873
Comment #21
joachim CreditAttribution: joachim commented> even "simple" sites have a decent chance to be doing time sensitive background processes but don't want their server to die from too-frequent, too-heavy cron runs.
And also, sites may want to run processing jobs frequently, without clearing their site's page cache every time.
Comment #22
thedavidmeister CreditAttribution: thedavidmeister commented@joachim - I agree, that particular task is very often a server killer and I've never quite understood why it does what it does to the cache, but there can be less obvious processes like large feed imports that we don't want to be running frequently.
Another "bad' cron task could be the update module polling for updates to modules.
Comment #23
jhedstromMoving to 8.1.x.
Comment #31
AaronMcHaleWouldn't it be nice if we could revive this maybe using a service and OO practices.
Comment #34
heddnThis could an interesting feature to help with scheduling things, for say automatic updates. Tagging.
Comment #35
cburschkaIn D8, I've indeed needed to implement a scheduling pattern using the state system quite often - something like
if ($state->get('last_run') + $interval < $now) run();
. It'd be quite useful to have a common API for that.Comment #36
AaronMcHale@heddn @cburschka
Perhaps bringing some or all of https://www.drupal.org/project/advancedqueue into Core could be a good solution here, Advanced Queue seems like a good solution to scheduling tasks.
Comment #43
dpiSee also newer discussions in #3383487: Add CronSubscriberInterface so that services can execute cron tasks directly