The code from the module has a large race condition:
1. Check time of last run
2. If > 1 hour, run cron
... time passes for cron to finish
3. Update db with current time.

This will result in multiple instances of the cron tasks running depending on how many requests there are between steps 1-3.

e.g. Say User 1 makes a page request and the script decides it is time to fire off the cron job
For every other page request (from this user or others) that make a request before this cron job finishes and the script falls through to update the time *another* cron task will be started.

The solution is to use lockfiles (or lock columns in a db table - more appropriate for drupal - this also means you should move your last run timestamp into the same table). So your check for the last time it was run will check:
SELECT lastrun FROM crontable WHERE lastrun < NOW()-3600 AND cronrunning=0
(horribly mysqlised, use $then = time()-3600; instead).

If you get a row back then you know that you *may* be running cron this time.
Next you try and lock the column:
UPDATE crontable SET lastrun=$timenow, cronrunning=1 WHERE cronrunning=0

If the number of affected rows / rows updated == 1, then you got the lock, if it ==0 then some other process got the lock and this script can exit without running the cron tasks.
Finally remember to update the cronrunning back to 0 when the cron tasks finish.

If you want to be able to run the cron tasks in the background then have php exec() a shell script which contains "nohup wget -O - -q http://domain/cron.php &" (unix only) - if you get a wget equiv for windows you can use a batch script to launch it. The exec should return immediately and you get control back in php so you can release the cron lock and the crontab will continue to run in the background.

There is a risk that if the cron tasks take > 1 hour to complete then you will again have 2 simultaneous crons running, but then that is true of the regular cron system.

Hope this helps.

Paul.

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

greggles’s picture

Is this different from http://drupal.org/node/24001

I feel like this is a duplicate.

pgregg’s picture

It is different in that the other report appears to try to describe it in relation to a user session. The bug is the same, just the description of the other report is wrong.

James Harvard’s picture

Does PHP not have the ability to spawn a separate thread in which to execute some code? If so then the original thread (i.e. the user request) could continue and finish immediately, but the cron code could then run for as long as necessary in its new thread.

pgregg’s picture

Yes, I already described this in my original post: exec() the shell script with nohup and & to put it into the background.

James Harvard’s picture

Sorry, I actually meant a way to execute PHP code in a separate thread. As you pointed out, your exec() would only work on Unix. Also, would it work in a shared-hosting environment, or do ISPs disable exec()?

In the middleware I normally use (Lasso) one can very easily execute a block of code in a new thread. This is useful if you want to do something time-consuming (e.g. send a bunch of e-mails) and the current page doesn't need to wait for completion. However I can see that because of the way that (AFAIK) PHP is invoked for most web apps (embedded as an Apache module) this might not be possible.

Alternatively, is there a way for PHP to initiate an HTTP request for cron.php without waiting for the response? I assume that include('http://www.example.com/cron.php') would wait. Is there a lower-level TCP function that could be used to make the request but not wait for the response?

dmuth’s picture

Title: Race Condition in the code. » I fixed this
Status: Active » Needs review
FileSize
8.19 KB

I got bit by a very real instance of this race condition wherein my site got hit by a bot which caused several copies of poormanscron to run at the same time. This in turn caused a higher load which slowed down those copies, which caused even more copies to get run on subsequent pageloads, which sent Apache into a death spiral.

So, I wrote a fix for this. In the process, since the code in poormanscron_exit() was a little complicated (and not commented so well), I refactored that function into a few smaller ones.

The fix uses MySQL's GET_LOCK() function, so at the time this is not database agnostic. Sorry.

Also, I added in code to tell how long it took to execute the crontab, and to print out details on each module's cron hook that gets executed.

Comments welcome!

-- Doug

dmuth’s picture

Title: I fixed this » Race Condition in the code.

Undoing my change to the title of the issue. I thought the title would be attached to my comment, and not the title of the bug. Sorry about that!

-- Doug

conkdg’s picture

FileSize
8.51 KB

I've attached a version of dmuth's module that's updated for the 4.7.x form API. It seems to work fine for me. I noticed that it doesn't register a cron run under the settings at http://www.yoursite.com/admin/settings but it does register all the info under watchdog.

Maybe someone out there knows how to fix it up so it registers undr settings as well?

Cheers!

enxox’s picture

I need to solve the problem descripted, but the poormanscron_1.module doesn't work on my 4.7.5 site.
How can I obtain a release or a patch?
thanks

peterx’s picture

For PostgreSQL you could use a .install file to set up a table named poormanscron then use a table lock on that. You could also use a variable as a generic lock. I am using the following code in Drupal 5.0 and the code uses the MySQL lock when available. The code was tested first with the variable lock then with the MySQL lock.

Note that this code has limited testing and may not work on any computer that has a bit set to zero or any other value.

/* From PeterMoulding.com 2007
4/ Add lock processing.
*/
function poormanscron_lock()
	{
	if($GLOBALS['db_type'] == 'mysql' or $GLOBALS['db_type'] == 'mysqli')
		{
		$query = db_query("select get_lock('poormanscron_lock_'" . conf_path() . ", 1) as get_lock");
		$result = db_fetch_array($query);
		if(empty($result['get_lock']))
			{
			return false;
			}
		}
	else
		{
		$lock = variable_get('poormanscron_lock', false);
		if($lock)
			{
			return false;
			}
		variable_set('poormanscron_lock', time());
		}
	return true;
	}
function poormanscron_unlock()
	{
	if($GLOBALS['db_type'] == 'mysql' or $GLOBALS['db_type'] == 'mysqli')
		{
		$query = db_query("select release_lock('poormanscron_lock_'" . conf_path() . ")");
		}
	else
		{
		variable_set('poormanscron_lock', false);
		}
	}

petermoulding.com/web_architect

Peter Bex’s picture

This patch ought to do the trick.

Simply reading the configuration variable is not enough because the configuration variables are all read at bootstrap time. Then the request is processed, which can take several seconds (or even minutes on very big sites). Only then will poormanscron be started, and in the meantime another request may be started which also reads in the variables (which are still unchanged) and also starts poormanscron after processing.

The correct solution is to lock the table, read the variable's current status, then make a decision (optionally writing a new value if poormanscron needs to be executed) and unlock the table.

Please at least do something about this, this ticket has been open for 2 years and it's still causing people massive problems. It brought our server to its knees because it started sending out many many e-mails from the newsletter module on a very busy site.

here's the patch, inline because I wasn't able to add an attachment (it simply didn't show up in my post)

===================================================================
--- modules/poormanscron/poormanscron.module    (revision 1522)
+++ modules/poormanscron/poormanscron.module    (working copy)
@@ -30,8 +30,14 @@
  */
 function poormanscron_exit() {
 
+  $lastrun = 0;
+  db_lock_table('variable');
+  $result = db_query("SELECT * FROM {variable} where name = 'poormanscron_interval'");
+  while ($variable = db_fetch_object($result)) {
+    $lastrun = unserialize($variable->value);
+  }
+
   // Calculate when the next poormanscron run is due.
-  $lastrun = variable_get('poormanscron_lastrun', 0);
   $nextrun = $lastrun + 60 * variable_get('poormanscron_interval', 60);
 
   // If the configured time has passed, start the next poormanscron run.
@@ -40,6 +46,7 @@
     // If this cron run fails to complete, wait a few minutes before retrying.
     variable_set('poormanscron_lastrun',
        $lastrun + 60 * variable_get('poormanscron_retry_interval', 10));
+    db_unlock_tables();
 
     // Get the current Drupal messages. Use drupal_set_message() so that
     // the messages aren't deleted in case the cron run fails and
@@ -91,6 +98,8 @@
       }
     }
 
+  } else {
+    db_unlock_tables();
   }
 }
Peter Bex’s picture

trying again, as somehow the edits in my post do not show up when I log out. Looks like this bugtracking system is broken

This patch ought to do the trick.

Simply reading the configuration variable is not enough because the configuration variables are all read at bootstrap time. Then the request is processed, which can take several seconds (or even minutes on very big sites). Only then will poormanscron be started, and in the meantime another request may be started which also reads in the variables (which are still unchanged) and also starts poormanscron after processing.

The correct solution is to lock the table, read the variable's current status, then make a decision (optionally writing a new value if poormanscron needs to be executed) and unlock the table.

Please at least do something about this, this ticket has been open for 2 years and it's still causing people massive problems. It brought our server to its knees because it started sending out many many e-mails from the newsletter module on a very busy site.

here's the patch, inline because I wasn't able to add an attachment (it simply didn't show up in my post)

===================================================================
--- modules/poormanscron/poormanscron.module    (revision 1522)
+++ modules/poormanscron/poormanscron.module    (working copy)
@@ -30,8 +30,14 @@
  */
 function poormanscron_exit() {
 
+  $lastrun = 0;
+  db_lock_table('variable');
+  $result = db_query("SELECT * FROM {variable} where name = 'poormanscron_interval'");
+  while ($variable = db_fetch_object($result)) {
+    $lastrun = unserialize($variable->value);
+  }
+
   // Calculate when the next poormanscron run is due.
-  $lastrun = variable_get('poormanscron_lastrun', 0);
   $nextrun = $lastrun + 60 * variable_get('poormanscron_interval', 60);
 
   // If the configured time has passed, start the next poormanscron run.
@@ -40,6 +46,7 @@
     // If this cron run fails to complete, wait a few minutes before retrying.
     variable_set('poormanscron_lastrun',
        $lastrun + 60 * variable_get('poormanscron_retry_interval', 10));
+    db_unlock_tables();
 
     // Get the current Drupal messages. Use drupal_set_message() so that
     // the messages aren't deleted in case the cron run fails and
@@ -91,6 +98,8 @@
       }
     }
 
+  } else {
+    db_unlock_tables();
   }
 }
ericnielsen’s picture

I'm using poormanscron and simplenews with a 3000 subscription list, in a production server. I've had the 3 concurrent cron task problem several times, each of my subscribers receiving 3 repetead email messages.

Peter, thank's for sharing your solution! I'm testing your code and it's working so far. It just needs a correction: the query should read the poormanscron_lastrun variable.

  $result = db_query("SELECT * FROM {variable} where name = 'poormanscron_lastrun'");

Greetings.

ericnielsen’s picture

Status: Needs review » Reviewed & tested by the community

Code above (#12 with correction in #13) tested with simplenews using a 3000 subscription list and sending 3 newsletters. Worked just fine!

peterx’s picture

Just a question or two.

How do Drupal cron runs handle race conditions? I have sites with cron running every minute and some cron runs occasionally exceeding a minute. They currently do not collide because of luck or my brilliant coding, with luck the far more likely answer. If they collide, they would copy themselves up their own input chutes and loop longer than Brad and Angelina. If standard Drupal cron stuff has a protection mechanism, we could reproduce it. If standard Drupal cron stuff does not use protection then we need to beat up the Drupal cron people, get them to build a standard something, and copy their something.

I am trying to avoid using the term doohickey here. Nothing more technical than doohickey.

Whatever we call it, there is justification for a standard approach to locks, race conditions, and similar curses in Drupal. The ideal would be a system level lock function and a standard way of naming tables by module. Example module could automatically get a table named example purely for a module related lock. Poormanscron should probably share a cron table with cron and use cron functions to control processing.

There is no reason why poormancscron should run only as an alternative to cron. In sites where cron runs every hour and we need something current, poormanscron could run along side cron to provide up to date processing, so long as they share the same processing lock.

dubitable’s picture

FileSize
1.22 KB

I couldn't get the inline patch to work without "malformed patch" errors. So, I made the changes manually, and created my own patch, which incorporates the inline patch from #12 with the tweak from #13. Please take a look and confirm that this is correct.

DD

RobLoach’s picture

This will run an extra database query on every page refresh. Is this the desired effect?

Peter Bex’s picture

@Rob: It's the simplest way to reduce the race condition boundaries, because there can be a very long time lag between the bootstrap process where the variables are read in and the time cron is invoked.

Theoretically it doesn't even solve the race condition 100% because two processes can do the SELECT simultaneously. If you have a site that's that high-traffic, I'm afraid the conclusion is you shouldn't be using poormanscron, but real system cron.

Kevin Rogers’s picture

FileSize
989 bytes

As a simple alternative would this patch do? Rather than introduce more SQL, setting the poormanscron_lastrun variable to the time() + 60 * interval rather than the $lastrun time + 60 * interval minimises the problem.

The current last run time behaviour is probably a bug - consider if the last run time is 2 hours ago (assume a low traffic site), the last runtime is updated to 1 hour 50 minutes ago. Any subsequent request that comes in before the first cron finishes will trigger a second run...

In particular, the maximum script execution time for PHP is set to 30 seconds by default, so a script should generally have finished well within 10 minutes. The only problem with this code is if a second (or third) request comes in and executes variable_get('poormanscron_lastrun', 0); before the first request has executed variable_set('poormanscron_lastrun', $t + 60 * variable_get('poormanscron_retry_interval', 10));

I suspect it would be near impossible to eliminate race conditions - there is always a very slim chance more than one request will come in within a fraction of a second of each other, leading to more than one job running at the very similar times.

monti’s picture

This is an old discussion, but I just ran into this issue.

Two questions:
i) How come these fixes are not integrated into the module?

ii) The original post shows the race condition:

1. Check time of last run
2. If > 1 hour, run cron
... time passes for cron to finish
3. Update db with current time

Wouldn't it be easier to change the order of lines 3 and 4, this way:
1. Check time of last run
2. If > 1 hour, set a flag
3. Update db with current time
4. if flag is on, run cron. time passes for cron to finish. set flag to off.

The select statement, the test (if > 1), and the update are all done at a single lock.
I might be wrong (if you think I am, please explain), but I like to keep it simple.

Peter Bex’s picture

Kevin's suggestion sounds excellent. I haven't checked to see if cron indeed acts this way but it sounds like a very real bug.

In practice you won't see it that much because usually a site is being hit constantly by web search crawlers and users, but it's good to fix it anyway! If you are using Drupal on an low-traffic intranet this bug could certainly manifest itself.

Monti's idea sounds like a good simplification.

This ticket is probably still open because poormanscron has gone for a while without a maintainer; the last development snapshot is from 2007! Also, see #426504: Request to help co-maintain Poormanscron.

Dave Reid’s picture

Version: 7.x-2.x-dev » 6.x-1.x-dev