Some sites need fine grained timing for cron runs. For example, a site that relies heavily on moblogging, mailhandler or any other email based function would want to connect with the email server pretty often, say ever 2-5 minutes. On another site I'm working on we generate flat files for all of a certain node type on a regular basis, though not every 2-5 minutes. Thus the need for cron tasks on different schedules. This patch allows you to make a list of modules to either include or exclude when running cron, so that several cron tasks for one site can be defined. The default behavior in the absence of both parameters is to run all, so the patch preserves backwards compatibility.

* Examples:
*
* runs all hooks
* http://mysite.com/cron.php
*
* runs all hooks except the search and aggregator modules
* http://mysite.com/cron.php?exclude=search,aggregator
*
* runs only search, archive and aggregator modules
* http://mysite.com/cron.php?includ=search,archive,aggregator

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

robertDouglass’s picture

Dries’s picture

Isn't it possible to run all cron functions every 2-5 minutes? The impact of that should be minimal.

moshe weitzman’s picture

an alternative approach is to make a quick PHP page which simply calls mailhandler_cron and any others that need to be called more frequently than normal. then call that php page from cron every couple minutes ... not so sure this complexity is worth having.

robertDouglass’s picture

Whether or not this "complexity" is needed depends a lot on what you expect to do with cron. As soon as one starts aggregating not tens but several hundred RSS feeds or more, it is impossible to run cron every 2 minutes. Yet if the site depends on mailhandler keeping user moblog submissions current, 2 minutes is already about the maximum latency time acceptable. Thus the need for crons on different schedules.

In terms of usage, writing

?include=mailhandler,search,event

isn't so complex. On the code level, splitting that list on comma and looping over the resultant array also isn't really that complex. The patch defaults to normal behavior for people who don't need the feature, and I've seen more than one site that had to solve this problem.

Robert Castelo’s picture

Could different cron runs be set to run different tasks based on a cron ID number....

For instance, set up 3 separate cron to run every hour, each one calls the cron.php page including a number variable which runs a certain set of tasks.

An Admin control panel would list all cron tasks and allow administrators to split them into separate cron runs based on the number variable passed to the script when calling cron.php.

robertDouglass’s picture

Robert,

the id parameter is already a practical necessity for this patch if cron_busy is to be effective. I'm open to suggestions as to how we would track various cron tasks and whether they are busy.

First things first, though: does anyone except me need this?

I'm not going to program an admin-configurable cron mechanism unless there is demand for it. Furthermore, such a system would really be incomplete unless there were some actual mechanism for setting these cron runs up from the admin interface as well. This wouldn't be such a bad idea, I know a lot of people who still type /cron.php into their browser every day or so to updated thier sites (yes I've mentioned poormanscron). Anyway, first reactions to this patch have been cold, so maybe nobody wants it.

Robert Castelo’s picture

+1 from me.

I'll be needing something like this once my email newsletter is released and starts hogging cron runs.

Chris Johnson’s picture

It does seem like a good idea to allow running different cron tasks at different intervals, so as to be efficient and reduce web server loads. There is little point in running tasks that do not need to be run. Many cron tasks only need to be run once to a few times per day.

Note also that cron job scheduling on most Unix-variants is only within a +/- 1 minute range, so trying to run a cron program every 2 minutes is approaching the crumbling edge of being "on time."

robertDouglass’s picture

I think I am leaning in favor of extending cron.php in the following way:

1) Add admin screen which scans all modules for existence of cron hook
2) set a cron_interval variable for each one
3) allow admin to set an interval
4) check whether the current cron run falls outside of each module's interval before invoking its cron hook
5) before a module's cron hook gets called a {module_name}_cron_busy variable gets set. This would function like the current cron_busy now, just at a modular level.

The admin would then set one cron tab to run at the smallest needed interval and could then decide on a module-by-module basis how often each is to run.

How does this sound to people?

Bèr Kessels’s picture

-1 on this one.
I think the hook_cron should carry the logic whether or not it wants to be ran. Maybe aintroduce an API that allows theis checkto be done easier? drupal_elapsed_run($name, interval) that will check when $name was last updated, if it was longer then $interval ago, return TRUE, and update the last run of $name.

robertDouglass’s picture

Bèr, how can you adjust the interval then? If it is hardcoded how often the cron should run, administrators loose all control and you can't have different sites have different schedules, unless of course you have cron configuration in the interface itself. I think cron scheduling should be completely done with the cron tool itself, and that the URL or command that is given should be able to handle the fine tuning of which module's hooks get called.

ezheidtmann’s picture

When I last used aggregator (which was probably back in the 4.4 days), it let me set a refresh interval for each feed. Is this no longer the case?

Note I have nothing against this patch. It sounds like a good general solution for removing calls to hooks that are resource-heavy every time they run but still allowing some other hooks to be run more often. However, I also like Ber's idea of a convenience function to make delays easier to measure.

Uwe Hermann’s picture

+1 for the include/exclude feature, I'd use this on multiple sites... Uwe.

kbahey’s picture

+1 on an include/exclude mechanism.

However, I am not sure that having it in the URL is a good idea. This could expose things that we may not want exposed in the future. Anyone on the net can run your stuff more frequently than you want to.

what I would like to see is a cron settings page with all the modules that have a cron hook listed, and a frequency that can be changed.

cron.php can then check that setting per module and decide to run it or not.

Something along those lines would be better than URL based solutions which requires many entries in crontab, and may open exploits.

Dries’s picture

Not sure this is really needed. Most (if not all) cron hooks are smart enough not to trash your system. If you have to do include/exclude-tricks something is wrong. I'm tempted to say '-1'.

There is a typo in the documentation: 'includ' -> 'include'. There is also a tab in the code.

moshe weitzman’s picture

Status: Needs review » Closed (works as designed)

seems that interest has waned.

Bèr Kessels’s picture

Assigned: robertDouglass » Bèr Kessels
Status: Closed (works as designed) » Active

assigning this to me. We need this too, for a mobile moblog alike thingy, where we need to check for mail etc every 30 seconds or so, but dont want drupal do collapse under being cronned every so manyu seconds.

Besides that. Drupal is not able to handle fast crons. A lot of cron functions are architectured like this:

function tough_stuff_cron() {
  if(variable_get('tough_stuff_last', now()) {
     while (do_tough_server_stuff() ) { //mencode all DVDs
       do_other_tough_stuff();
     }
  }
  variable_set('tough_stuff_last', now());
}

if cron is fired *before* do_tough_server_stuff() (because some joker points his browser at yoursite.com/cron.php) your expensice thing is ran twice. Same for when your crontab fores it faster then that expensice stuff was finished.

Note: i am no php guru, so I guess this can be solved with forking and so.

/i/ really need finer grained cron runs; So I will surely come up with a new path. take it or leave it.

LAsan’s picture

Version: x.y.z » 7.x-dev

Feature request moving to cvs.

wedge’s picture

I needed to specify the interval for different module hook_crons as well. But when I looked at some code this turned out to be really easy to do inside my cron hook. I used some code from statistics.module

function my_cron() {
  $my_timestamp = variable_get('my_timestamp', '');
  if ((time() - $my_timestamp) >= 86400) {
    /* do daily work */
    variable_set('my_timestamp', time());
  }
  /* do hourly work */
}

Thanks open source!

kbahey’s picture

@wedge

Please do not stray off the topic at hand.

Your post is a separate topic, and an interesting one. It warrants its own separate issue. Please search the queue to to see if this has been filed as an existing issue, and if not, post a new issue about it.

Bèr Kessels’s picture

Assigned: Bèr Kessels » Unassigned
moshe weitzman’s picture

The problem that Ber points out in #17 no longer exists. we a have a cron semaphore which prevents multiple crons at same time.

axyjo’s picture

If Drupal would support this, it would be fantastic!

axyjo’s picture

Status: Active » Needs review
FileSize
3.08 KB

Here's a patch that I made. cron.php now takes one of the following:

  • No parameters (runs all modules' cron)
  • An include parameter with a comma separated list of modules
  • An exclude parameter with a comma separated list of modules
swentel’s picture

I like the patch. Looks clean at first glance, but not yet tested. As also told on irc, I'd like to see a little change in it:

When cron is completed, a watchdog message is inserted with a simple dry 'Cron run completed' message. It might be interesting to also list included/excluded module names if applicable.

Also should we write tests for this right now? Or another testingpart08 issue mabye (if this patch get's in of course)

axyjo’s picture

Ok, fixed the patch with swentel's (#25) comments. Also removed two unnecessary return true; statements.

kbahey’s picture

Status: Needs review » Needs work

@axyjo

You are not using t(), and just concatenating messages, which is not the way we do things in Drupal.

But more importantly, if we go with this patch, it would be best if we including timing information, as per my patch here http://drupal.org/node/131536. Dries was worried about cluttering watchdog, but too often cron does not complete or takes too long. It would be helpful to see which module(s) is the slow one.

Oh, and my patch shows you an example of using t() for watchdog messages with arguments.

axyjo’s picture

Hmm. The core itself didn't use t() for the watchdog messages, but I updated the patch anyways.

axyjo’s picture

Forgot to mention that it does NOT allow both an include parameter and an exclude parameter.

axyjo’s picture

Status: Needs work » Needs review

Forgot to change this.

Status: Needs review » Needs work

The last submitted patch failed testing.

axyjo’s picture

Assigned: Unassigned » axyjo
Status: Needs work » Needs review
FileSize
3.43 KB

Re-rolled patch against HEAD so that #31 won't fail.

Status: Needs review » Needs work

The last submitted patch failed testing.

axyjo’s picture

Status: Needs work » Needs review
FileSize
3.47 KB

Stupid me. I thought I already checked it with php -f. Definitely checked now.

Status: Needs review » Needs work

The last submitted patch failed testing.

axyjo’s picture

Status: Needs work » Needs review
FileSize
4.39 KB

Okay. Added two more tests for the functionality I've added, and I've ensured that tests pass.

Status: Needs review » Needs work

The last submitted patch failed testing.

Dries’s picture

I'm not sure this is the proper long term fix. The "job queue" might be both a better and an easier solution. This patch implements something which resembles a job queue, but it is not quite a job queue.

axyjo’s picture

A job queue, however, won't be run until regular cron is run. The cron would need to run every minute in order for that much granularity. Also, the job queue (if it is stored in the database, which it probably is) would have many duplicate entries for people who want to run things often, unless we come up with a cron-like syntax with which it'll be efficient. It would then need a user interface so that it would be customizable.

Is anything that I just said correct? Can anything be improved upon? I'm still learning the ropes around Drupal's code.

axyjo’s picture

Re-rolled against head.

axyjo’s picture

Issue tags: +cron

Hopefully adding a tag might get more eyes on this.

mikeytown2’s picture

Support for command line arguments would be nice as well
http://php.net/reserved.variables.argv

Post about this
http://www.opensourcery.com/blog/dylan-tack/choosier-cron-runs

axyjo’s picture

Rerolled against HEAD. @mikeytown2: Wouldn't that need implementation in projects like Drush: http://drupal.org/project/drush?

mikeytown2’s picture

Status: Needs work » Needs review

Depends on how you set cron. I call mine via a system call, not by url.

Status: Needs review » Needs work

The last submitted patch failed testing.

axyjo’s picture

I'll fix the exceptions tomorrow morning. I thought I already took care of that that in an earlier post.

axyjo’s picture

Status: Needs work » Needs review
FileSize
5.92 KB

Whoops, an extraneous line got in the way. Yay for testing!

mikeytown2’s picture

Status: Needs review » Reviewed & tested by the community

Status: Reviewed & tested by the community » Needs work

The last submitted patch failed testing.

mikeytown2’s picture

Status: Needs work » Needs review
FileSize
5.8 KB

Offsets where couple of hundred... re-submitting new patch with correct offsets now

axyjo’s picture

Some major change seems to have occurred on the 31st or the 1st that messed around with the offsets within the files.

mikeytown2’s picture

Status: Needs review » Reviewed & tested by the community
webchick’s picture

Status: Reviewed & tested by the community » Needs work

I too have had the need for this, mainly when performing long, expensive operations like data imports. Something like this would've helped me not have to write a separate script for importing Calais terms on a client site at #409630: Create a way to do mass-importing without batch API.

Here's a review of the patch, although Dries does not seem too hot on it, and his OK will be needed for this to get in. I too am not crazy about passing in modules to run via the URL, but with $cron_key there now this is a little less of an issue.

+++ cron.php	2 Aug 2009 04:07:09 -0000
@@ -14,7 +16,27 @@ define('DRUPAL_ROOT', getcwd());
+  if (isset($_GET['include'])) $include = $_GET['include'];
+  if (isset($_GET['exclude'])) $exclude = $_GET['exclude'];

Let's change this to use the ternary operator:

$include = (isset($_GET['include']) ? $_GET['include'] : '';

This would clean up the code below so it could just do if ($include && $exclude) { ... }

+++ includes/common.inc	2 Aug 2009 04:07:11 -0000
@@ -3551,7 +3551,7 @@ function drupal_page_set_cache() {
+function drupal_cron_run($list = '', $mode = '', $delimiter = ',') {

Let's default these to NULL. Makes the code below cleaner. (if (!$list && !$mode) rather than $list == ''..)

Also, passing in a delimiter by reference seems a little funny. Is there a use case for this? If not, I'd just document "use commas" and be done with it.

+++ includes/common.inc	2 Aug 2009 04:07:11 -0000
@@ -3583,13 +3583,59 @@ function drupal_cron_run() {
+      // No parameters were passed, so we're running all modules

End comments with a period.

+++ includes/common.inc	2 Aug 2009 04:07:11 -0000
@@ -3583,13 +3583,59 @@ function drupal_cron_run() {
+    elseif($list != '' && $mode == 'include') {

Needs to be a space between the elseif and the (

+++ includes/common.inc	2 Aug 2009 04:07:11 -0000
@@ -3583,13 +3583,59 @@ function drupal_cron_run() {
+      $module_list = module_list();

Why module_list() and not module_implements('cron')?

+++ includes/common.inc	2 Aug 2009 04:07:11 -0000
@@ -3583,13 +3583,59 @@ function drupal_cron_run() {
+      foreach($run_array as $run_module) {
+	      module_invoke($run_module, 'cron');
+      	$cron_message .= t($run_module).', ';

module_invoke is indented too much, needs to be space between foreach and (

+++ includes/common.inc	2 Aug 2009 04:07:11 -0000
@@ -3583,13 +3583,59 @@ function drupal_cron_run() {
+      	$cron_message .= t($run_module).', ';
+      }
+      // Remove the extra ', ' from the end of the string
+      $cron_message = substr($cron_message, 0, -2);

Yuck. :) How about simply build implode() on the $run_array array?

+++ includes/common.inc	2 Aug 2009 04:07:11 -0000
@@ -3583,13 +3583,59 @@ function drupal_cron_run() {
+      if ($cron_message != '') {

Let's do if (!empty($cron_message)) which is more consistent with other places in core.

+++ includes/common.inc	2 Aug 2009 04:07:11 -0000
@@ -3583,13 +3583,59 @@ function drupal_cron_run() {
+    elseif($list != '' && $mode == 'exclude') {

90% of the code above is re-copy/pasted here. I wonder if there's a way to handle this in the same loop above? If not, we may want to pull this out into a function that can be called in both places.

+++ includes/common.inc	2 Aug 2009 04:07:11 -0000
@@ -3583,13 +3583,59 @@ function drupal_cron_run() {
+      return false;

FALSE

+++ modules/system/system.test	2 Aug 2009 04:07:12 -0000
@@ -379,6 +379,14 @@ class CronRunTestCase extends DrupalWebT
+    // Run cron with an include parameter.
+    $this->drupalGet($base_url . '/cron.php', array('external' => TRUE, 'query' => array('cron_key' => $key, 'include' => 'system')));
+    $this->assertResponse(200);
+    
+    // Run cron with an exclude parameter.
+    $this->drupalGet($base_url . '/cron.php', array('external' => TRUE, 'query' => array('cron_key' => $key, 'exclude' => 'system')));
+    $this->assertResponse(200);

Let's test multiple include/exclude parameters as well.

Beer-o-mania starts in 23 days! Don't drink and patch.

axyjo’s picture

Status: Needs work » Needs review
FileSize
8.26 KB

Okay, here's a new and improved version:

* Takes care of all of the coding standards issues webchick was talking about (except one).
* Added more tests that check whether Drupal accepts invalid module names
* Added tests for multiple parameters (webchick's idea)
* Redid the code such that code isn't needlessly repeated. (again, webchick's idea)
* Added docblocks and fixed comments

I couldn't find the issue with "module_invoke is indented too much" though.

axyjo’s picture

Argh. Forgot to update HEAD before creating the patch. Hunks could be off.

moshe weitzman’s picture

+++ includes/common.inc	8 Aug 2009 22:21:58 -0000
@@ -3639,23 +3653,56 @@ function drupal_cron_run() {
     variable_set('cron_last', REQUEST_TIME);

Arguably, cron_last should now be an array keyed by module. It is kinda ambigous without that.

axyjo’s picture

Sounds like a logical idea. Done with a test.

mikeytown2’s picture

If going for an array, you need to change the status page code for cron last run. This is the only other place where I can find 'cron_last'.
http://api.drupal.org/api/function/system_requirements/7

Also to make reporting that easier, you might want to store the last run value under the key name 'cron' or 'drupal', or something like that. Pick a name that is a reserved name, so contrib will not be interfering.

axyjo’s picture

The way I did it was that I used max() in system_requirements. Also, when running cron for all modules, I individually set the time for each module so that if the module wants, it can depend on that variable for the accurate time it last ran.

Status: Needs review » Needs work

The last submitted patch failed testing.

axyjo’s picture

Status: Needs work » Needs review
FileSize
11.31 KB

Fixes exceptions.

tic2000’s picture

+++ includes/common.inc	10 Aug 2009 01:23:20 -0000
@@ -3610,11 +3610,25 @@ function drupal_page_set_cache() {
+ * If the function is called with no arguments, then cron for all modules is
+ * run. If only one argument is passed in, the result is FALSE. If both
+ * arguments are provided, then the function includes or excludes the specified
+ * modules in the first argument.

This comments need a little work. "If both arguments are provided, then the function includes or excludes the modules specified in the first argument." sounds better to me.

+++ includes/common.inc	10 Aug 2009 01:23:20 -0000
@@ -3639,23 +3653,63 @@ function drupal_cron_run() {
+    
+    // Create $cron_message so that it is defined.
+    $cron_message = '';
+    

Extra space on the empty lines. And is like that on many places in the patch.

+++ includes/common.inc	10 Aug 2009 01:23:20 -0000
@@ -3639,23 +3653,63 @@ function drupal_cron_run() {
+        $msg = implode(',', $run_array);

It should be ', ' for better readability of the output message (one space after the comma).

21 days to code freeze. Better review yourself.

axyjo’s picture

Fixed with suggestions from #62

mikeytown2’s picture

#63
+1 works for me!

Status: Needs review » Needs work

The last submitted patch failed testing.

axyjo’s picture

Status: Needs work » Needs review
FileSize
12.05 KB

Same patch, essentially, but changed after the poormanscron patch got in.

Status: Needs review » Needs work

The last submitted patch failed testing.

Dave Reid’s picture

Component: base system » cron system

Moving to new cron system component.

axyjo’s picture

Needs a rewrite because #578676: Use queue for cron is in.

robertDouglass’s picture

Here's more evidence that this functionality is a good idea.
http://metaltoad.com/blog/how-drupals-cron-killing-you-your-sleep-simple...

These graphs show what happens to a site that runs cron.php every six hours. Remember, system_cron() calls cache_clear_all(), and every time the cache is cleared the hit rate crashes to zero.

greg.harvey’s picture

Should this now be moved to D8? I notice un-committed D7 issues are slowly being moved to D8. Just a thought. (Plus subscribing.)

axyjo’s picture

Version: 7.x-dev » 8.x-dev

Yes. This is too late for core now.

Philw’s picture

Status: Needs work » Needs review

#43: drupal_cron-19173-43-axyjo.patch queued for re-testing.

Ignore this as you probably will. Mistakenly posted.

Apologies.

lpalgarvio’s picture

sub

amontero’s picture

thedavidmeister’s picture

Assigned: axyjo » Unassigned
Issue summary: View changes
Status: Needs review » Closed (works as designed)
Related issues: +#154043: Cron should not remain monolithic (Implement a scheduling hook for cron), +#1442434: Do not port Elysia Cron, recommend Ultimate Cron for Drupal 8

I've had a look at this patch, I've read over this thread and #154043: Cron should not remain monolithic (Implement a scheduling hook for cron) and #1442434: Do not port Elysia Cron, recommend Ultimate Cron for Drupal 8, I've personally used Elysia Cron in D6 and D7 on multiple sites for multiple use-cases and I really don't think this approach is the best (or even a good) solution to the problem.

Have a look at how modules like Elysia Cron and Ultimate Cron tackle this exact issue and we can see that putting an explicit schedule in a hook_cronapi() style configuration when the task is defined is the standard, rather than trying to mess with GET parameters in the URL to define what tasks are run for each execution of cron.

The EC/UC approach keeps the scheduling "inside Drupal" rather than requiring complex crontabs to be built - in the end we can build something more robust and easier for beginners to understand and for everyone to manage long term.

I'm closing this issue because I strongly believe the approach presented in patches in this thread is inferior to solutions provided elsewhere in contrib, and other outstanding core issues propose a better approach, but kudos to @robertDouglas for attempting to solve this problem right back in 2005!

I would hope and assume that the attitude of @Dries to finer-than-binary granularity in scheduled background tasks has changed to be more positive since 2005 and the official stance is no longer "you should be able to run all tasks every 2-5 minutes without any trouble", because that would have to be a joke for many sites in 2014 that schedule many different tasks that put rather varied loads on the server.

I'm using this review as #2094585: [policy, no patch] Core review bonus for #154043: Cron should not remain monolithic (Implement a scheduling hook for cron) as I think that's appropriate :)