At least since 4.6 and probably earlier too, cron tasks have been (as I see them) an unclassified amount of tasks handled by module on all cron runs, requiring each module to take uncoordinated steps on its own to manage its own scheduling, with the net result that cron tasks have to fire all existing hook_cron implementations, potentially to memory exhaustion problems in many hosting situations (see the number of issues around this, notably regarding search).

One way to work around this would be to enable cron-based scheduling instead of module-based scheduling, and/or allow cron to be called to invoke specific cron-ed tasks instead of the whole lot, much like the original UNIX cron uses a crontab instead of relying on every cron job to know when it must be scheduled.

Upsides:

  • one instance of scheduling code instead of many incompatible ones
  • ... which can justifiy the work going into a nice scheduling UI
  • ...without preventing modules from doing their scheduling themselves
  • ...and can still work with the existing hook_cron specification
  • ability to reduce the memory/cpu requirements on cron runs. For instance, long-running tasks like search indexing can be run on their own instead of along with the othet tasks, reducing breakage probability
  • ability to minimize the impact of choking tasks, like aggregator updates failing to obtain the upstream source, since these would be performed on their own runs instead of along other tasks
  • probably still compatible with poormanscron

Downsides

  • potential compatibility loss for some situations (which ones ?)
  • some form of reentrancy would have to be considered: unproperly scheduled cron tasks would be more susceptible than now to being invoked without the previous run being finished. A solution would probably to handle the runs within cron as some form of critical section, now allowing a scheduled subtask to be started if the previous run has not returned (possibly with a failure situation)

What do you think of it ?

Comments

fgm’s picture

Version: 6.x-dev » 7.x-dev

Setting to future version.

cburschka’s picture

It would be nice to have Drupal emulate a "virtual crontab" (only that it can't work in real time, so it would have to be slightly different).

The problem with a simple solution - like having hook_cron() pass an argument to the module that tells it when it was last triggered - is that it's still relatively useless information to the module: It needs to know when it last did whatever it does, not when cron was last fired.

A better method might be a scheduling hook. Primitive example:


function hook_cron($op, $delta) {
  switch ($op) {
    case 'list':
      return array(
        0 => array('description' => t("Optimize table"), 'interval' => 3600),
        1 => array('description' => t("Rebuild search index"), 'interval' => 300),
      );
    }
    case 'run':
      switch ($delta) {
        case 0: return optimize_table();
        case 1: return rebuild_index();
      }
  }
}

cburschka’s picture

Nice feature, this. Please don't forget to fix cron in D7.

EvanDonovan’s picture

This would be great, and would help a lot with the issues that cause #131536: Make cron watchdog more granular and informative to be necessary for cron debugging.

EvanDonovan’s picture

Title: Cron should not remain monolithic » Cron should not remain monolithic (Implement a scheduling hook for cron)

Added subtitle.

Dave Reid’s picture

Component: other » cron system

Moving to new cron system component.

Anonymous’s picture

Version: 7.x-dev » 8.x-dev

Marked #246871: Flexibility in Drupal Cron scheduling a duplicate of this issue.

Anonymous’s picture

What about a modification to hook_cron that would make it a registration of events to occur in an array much like the hook_theme. The data in the array would contain a default value for the cron task timings and a UI is created to schedule those timings to their desires. For for module foo the hook_cron implementation would look something like the following to be refined in discussion to follow.

/**
 * Implementation of hook_cron
 */
function foo_cron() {
  $items[] = array(
    'task' => 'foo_batch_task',
    'time' => array(
      'period' => 'hour',
      'hours' => array(
        08:00,
        20:00,
      ),
    ),
  );
  return $items;
}

/**
 * A task to be performed in cron
 */
function cron_foo_batch_task() {
  ...
}

Obviously I'm whiteboarding here and the data returned from the hook_cron implementations needs to be refined. The cron.php script would be changed to call the registered tasks at the appropriate time instead of iterating through the list of implemented cron_hook.

skessler’s picture

It would seam to me that the best way would be to be able to weight tasks for cron and then set parameters for various weights that should run in a different job. Also a specific module should be able to specify itself as its own cron job. So for example migrate module (http://drupal.org/project/migrate) could have its own cron.

Would there be one cron task that would look for something telling it which cron tasks to run or should files be created that represent the various cron task to run. How does this integrate with the implementation of poormans cron in Drupal 7.

gielfeldt’s picture

I suggest hook_cron() to act like hook_menu() or hook_theme() (like already proposed). This also opens up for implementing a hook_cron_alter() like the hook_menu_alter().

Elysia Cron and Ultimate Cron already does this in some way or another, which could provide inspiration to a new and more versatile cron system in Drupal 8.

Anonymous’s picture

I would love to see the likes of Elysia Cron as default install for core.

timhilliard’s picture

I would like to see a weighted cron system, simply specifying an interval does not take into account the specific site implementing the cron. Not all sites need the recommended interval set on cron runs. Having an interval could also be problematic due to the fact that if several cron passes have been missed, all the crons will be fired at the same time which should be ok but probably undesirable. What I think would be better is if the cron system takes all of the crons hooks in the system (btw I like the idea of being able to declare multiple crons in one cron hook), look at the crons weight (which can be configurable in a cron admin page), looks at how long that particular cron run took to run last pass and using an equation decides which cron runs it will run this time based off a calculated weighting system. I think also implementing a hook_cron_info command to declare cron functions would enable this to be backwards compatible.

@gielfeldt what would the hook_cron_alter be able to change? the weighting/interval? the callback? stop the callback?

I might have a crack at writing this system in the next few days.

joachim’s picture

BTW, http://drupal.org/project/job_scheduler allows scheduled jobs to have a crontab syntax for determining when they are run.

gielfeldt’s picture

@timhilliard I would say that hook_cron_alter() should be able to change anything that has been declared in hook_cron(). Including removing the job by unsetting the job in the array.

amontero’s picture

Let's suppose we have 3 cron hooks to be execd:
-node_cron
-system_cron
-search_cron

They are (as all cron hooks by default) included in the group "all" or "default". Let's call it a "channel".
Cron.php is called hourly from crontab as usual and the "all" channel is run by default.

Now we want to change search index scheduling:
We create a new channel "daily" and via some kind of drag interface we move search_cron from channel "all" to newly created channel "daily".
We add to the OS crontab another line like cron.php?cron_key=KEY&channel=daily with its own scheduling.

Channels have not to match scheduling intervals, there can be channels like "housekeeping", "heavy_tasks", "offpeak" or whatever your particular installation requires, as long as you later add the specific channel crontab entry. Anyway if you are tweaking cron, you should probably have at least the knowledge of how to add a crontab entry. I prefer some sort of weighed system instead of specifying time intervals from inside Drupal as mentioned earlier.

This respects the assumption that cron is simply invoked externally from Drupal without having any control or beforehand knowledge of how often it will be called as I think it is currently.
Perhaps this can better split responsibilities between the site admin and the host system admin on who decides/controls what as it is now, but that's a supposition since I usually wear both hats. This may or may not be desirable, too.
At crontab file level, it's a "call cron.php very often enough and let it decide the scheduling interval" vs. "call each channel with this exact scheduling and don't let Drupal [site admin] decide anything about the scheduling interval".
Note that I have not a deep understanding on cron's implementation, so:
-Some base assumptions on my part may be plain wrong.
-I might be biased in favor of keeping it simpler or at least not much different of how it works currently.

What do you think about this system?

geerlingguy’s picture

Having found elysia_cron (or an equivalent) to be a necessity on anything but the simplest of Drupal sites, and having helped a ton of people with cron issues (usually having one module's cron task break cron on a site), I think at least having the ability to run different module's cron tasks at different times would be helpful. The other thing I would love to see is the ability for a module to specify an array of cron tasks (rather than have to try to schedule different things to happen on different cron runs).

See #1442434-15: Do not port Elysia Cron, recommend Ultimate Cron for Drupal 8.

Anonymous’s picture

The other thing I would love to see is the ability for a module to specify an array of cron tasks (rather than have to try to schedule different things to happen on different cron runs).

Something like a registry of callback functions to execute with the registry entry giving the cron run frequency and time limit?

Something like
registry[] = array(
  'callback' => 'mymod_foo'
  'parameters' => array(1, 2)
  'frequency' => 1
  'time limit' => 60
)

# 'frequency' values of 1 means always, 2 means every 2nd run, 3 means every 3rd run, etc.
# 'time limit' integer values represents max time in seconds, string values of integer followed by '%' represents a percentage of max time.
geerlingguy’s picture

Something along those lines, yes. Although I don't know if setting a frequency would be as flexible as I'd like. I often run cron every minute on a site, and I have some tasks run every minute, some every 5, some every hour, some twice a day, once a day, once a week, etc., so it'd be tough figuring out all those intervals (but much better than the current situation).

thedavidmeister’s picture

Priority: Normal » Major

I'd definitely like to see this functionality in core and for the API to be close to how hook_cronapi() works for the latest versions of Elysia Cron.

Working off a "frequency" that requires you to know how the system cron is configured sounds harder to understand than necessary and error prone (what if your local dev cron is configured differently to the production environment?).

I personally have experienced that about 10-15% of the site's I've built need a custom cron schedule to stay alive without the server periodically falling over (often because of the system cron cache flush, but there are other reasons). Many of the sites could probably have benefited from a custom cron even if I didn't spend the time tweaking them carefully.

Usually what happens is I have one or more processes that need to be run somewhere between every 1 and 10 minutes, but I want all my other processes to be run less than once per hour, or even barely once per day.

Some examples I've come across that would tempt/require me to dial the cron frequency right up on a site:

- Setting search indexing to be many small batches greatly reduced server load for one site with very spiky content creation
- Need to send batches of 1000's of emails through a third party service (like Mandrill) by working with the queue API. These emails need to go out every minute.
- Pulling time sensitive data from a third party service
- Using Views Bulk Operation's "enqueue instead of execute directly" functionality relies on cron to be running regularly

I think maybe in 2007, or 2005 when #19173: Pass include and exclude parameters to cron.php for fine grained cron timing was opened, this wasn't such a big deal and was probably a "normal" feature request, but I feel that cron management is pretty important now that we're in 2014 and even "simple" sites have a decent chance to be doing time sensitive background processes but don't want their server to die from too-frequent, too-heavy cron runs.

thedavidmeister’s picture

joachim’s picture

> even "simple" sites have a decent chance to be doing time sensitive background processes but don't want their server to die from too-frequent, too-heavy cron runs.

And also, sites may want to run processing jobs frequently, without clearing their site's page cache every time.

thedavidmeister’s picture

@joachim - I agree, that particular task is very often a server killer and I've never quite understood why it does what it does to the cache, but there can be less obvious processes like large feed imports that we don't want to be running frequently.

Another "bad' cron task could be the update module polling for updates to modules.

jhedstrom’s picture

Version: 8.0.x-dev » 8.1.x-dev

Moving to 8.1.x.

Version: 8.1.x-dev » 8.2.x-dev

Drupal 8.1.0-beta1 was released on March 2, 2016, which means new developments and disruptive changes should now be targeted against the 8.2.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

Version: 8.2.x-dev » 8.3.x-dev

Drupal 8.2.0-beta1 was released on August 3, 2016, which means new developments and disruptive changes should now be targeted against the 8.3.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

Version: 8.3.x-dev » 8.4.x-dev

Drupal 8.3.0-alpha1 will be released the week of January 30, 2017, which means new developments and disruptive changes should now be targeted against the 8.4.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

Version: 8.4.x-dev » 8.5.x-dev

Drupal 8.4.0-alpha1 will be released the week of July 31, 2017, which means new developments and disruptive changes should now be targeted against the 8.5.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

Version: 8.5.x-dev » 8.6.x-dev

Drupal 8.5.0-alpha1 will be released the week of January 17, 2018, which means new developments and disruptive changes should now be targeted against the 8.6.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

Version: 8.6.x-dev » 8.7.x-dev

Drupal 8.6.0-alpha1 will be released the week of July 16, 2018, which means new developments and disruptive changes should now be targeted against the 8.7.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

Version: 8.7.x-dev » 8.8.x-dev

Drupal 8.7.0-alpha1 will be released the week of March 11, 2019, which means new developments and disruptive changes should now be targeted against the 8.8.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

AaronMcHale’s picture

Wouldn't it be nice if we could revive this maybe using a service and OO practices.

Version: 8.8.x-dev » 8.9.x-dev

Drupal 8.8.0-alpha1 will be released the week of October 14th, 2019, which means new developments and disruptive changes should now be targeted against the 8.9.x-dev branch. (Any changes to 8.9.x will also be committed to 9.0.x in preparation for Drupal 9’s release, but some changes like significant feature additions will be deferred to 9.1.x.). For more information see the Drupal 8 and 9 minor version schedule and the Allowed changes during the Drupal 8 and 9 release cycles.

Version: 8.9.x-dev » 9.1.x-dev

Drupal 8.9.0-beta1 was released on March 20, 2020. 8.9.x is the final, long-term support (LTS) minor release of Drupal 8, which means new developments and disruptive changes should now be targeted against the 9.1.x-dev branch. For more information see the Drupal 8 and 9 minor version schedule and the Allowed changes during the Drupal 8 and 9 release cycles.

heddn’s picture

This could an interesting feature to help with scheduling things, for say automatic updates. Tagging.

cburschka’s picture

In D8, I've indeed needed to implement a scheduling pattern using the state system quite often - something like if ($state->get('last_run') + $interval < $now) run();. It'd be quite useful to have a common API for that.

AaronMcHale’s picture

@heddn @cburschka

Perhaps bringing some or all of https://www.drupal.org/project/advancedqueue into Core could be a good solution here, Advanced Queue seems like a good solution to scheduling tasks.

Version: 9.1.x-dev » 9.2.x-dev

Drupal 9.1.0-alpha1 will be released the week of October 19, 2020, which means new developments and disruptive changes should now be targeted for the 9.2.x-dev branch. For more information see the Drupal 9 minor version schedule and the Allowed changes during the Drupal 9 release cycle.

Version: 9.2.x-dev » 9.3.x-dev

Drupal 9.2.0-alpha1 will be released the week of May 3, 2021, which means new developments and disruptive changes should now be targeted for the 9.3.x-dev branch. For more information see the Drupal core minor version schedule and the Allowed changes during the Drupal core release cycle.

Version: 9.3.x-dev » 9.4.x-dev

Drupal 9.3.0-rc1 was released on November 26, 2021, which means new developments and disruptive changes should now be targeted for the 9.4.x-dev branch. For more information see the Drupal core minor version schedule and the Allowed changes during the Drupal core release cycle.

Version: 9.4.x-dev » 9.5.x-dev

Drupal 9.4.0-alpha1 was released on May 6, 2022, which means new developments and disruptive changes should now be targeted for the 9.5.x-dev branch. For more information see the Drupal core minor version schedule and the Allowed changes during the Drupal core release cycle.

Version: 9.5.x-dev » 10.1.x-dev

Drupal 9.5.0-beta2 and Drupal 10.0.0-beta2 were released on September 29, 2022, which means new developments and disruptive changes should now be targeted for the 10.1.x-dev branch. For more information see the Drupal core minor version schedule and the Allowed changes during the Drupal core release cycle.

Version: 10.1.x-dev » 11.x-dev

Drupal core is moving towards using a “main” branch. As an interim step, a new 11.x branch has been opened, as Drupal.org infrastructure cannot currently fully support a branch named main. New developments and disruptive changes should now be targeted for the 11.x branch, which currently accepts only minor-version allowed changes. For more information, see the Drupal core minor version schedule and the Allowed changes during the Drupal core release cycle.

dpi’s picture