Race Condition in the code. [#43511]

The code from the module has a large race condition:
1. Check time of last run
2. If > 1 hour, run cron
... time passes for cron to finish
3. Update db with current time.

This will result in multiple instances of the cron tasks running depending on how many requests there are between steps 1-3.

e.g. Say User 1 makes a page request and the script decides it is time to fire off the cron job
For every other page request (from this user or others) that make a request before this cron job finishes and the script falls through to update the time *another* cron task will be started.

The solution is to use lockfiles (or lock columns in a db table - more appropriate for drupal - this also means you should move your last run timestamp into the same table). So your check for the last time it was run will check:
SELECT lastrun FROM crontable WHERE lastrun < NOW()-3600 AND cronrunning=0
(horribly mysqlised, use $then = time()-3600; instead).

If you get a row back then you know that you *may* be running cron this time.
Next you try and lock the column:
UPDATE crontable SET lastrun=$timenow, cronrunning=1 WHERE cronrunning=0

If the number of affected rows / rows updated == 1, then you got the lock, if it ==0 then some other process got the lock and this script can exit without running the cron tasks.
Finally remember to update the cronrunning back to 0 when the cron tasks finish.

If you want to be able to run the cron tasks in the background then have php exec() a shell script which contains "nohup wget -O - -q http://domain/cron.php &" (unix only) - if you get a wget equiv for windows you can use a batch script to launch it. The exec should return immediately and you get control back in php so you can release the cron lock and the crontab will continue to run in the background.

There is a risk that if the cron tasks take > 1 hour to complete then you will again have 2 simultaneous crons running, but then that is true of the regular cron system.

Hope this helps.

Paul.

Comment	File	Size	Author
#19	poormanscron.module.patch	989 bytes	Kevin Rogers
#16	poormanscron.patch	1.22 KB	dubitable
#8	poormanscron_1.module	8.51 KB	conkdg
#6	poormanscron_0.module	8.19 KB	dmuth

Comments

Comment #1

greggles

he/him

English

Denver, Colorado, USA

commented 6 January 2006 at 01:06

Is this different from http://drupal.org/node/24001

I feel like this is a duplicate.

Comment #2

pgregg commented 6 January 2006 at 01:10

It is different in that the other report appears to try to describe it in relation to a user session. The bug is the same, just the description of the other report is wrong.

Comment #3

James Harvard commented 6 January 2006 at 01:46

Does PHP not have the ability to spawn a separate thread in which to execute some code? If so then the original thread (i.e. the user request) could continue and finish immediately, but the cron code could then run for as long as necessary in its new thread.

Comment #4

pgregg commented 6 January 2006 at 22:20

Yes, I already described this in my original post: exec() the shell script with nohup and & to put it into the background.

Comment #5

James Harvard commented 9 January 2006 at 04:17

Sorry, I actually meant a way to execute PHP code in a separate thread. As you pointed out, your exec() would only work on Unix. Also, would it work in a shared-hosting environment, or do ISPs disable exec()?

In the middleware I normally use (Lasso) one can very easily execute a block of code in a new thread. This is useful if you want to do something time-consuming (e.g. send a bunch of e-mails) and the current page doesn't need to wait for completion. However I can see that because of the way that (AFAIK) PHP is invoked for most web apps (embedded as an Apache module) this might not be possible.

Alternatively, is there a way for PHP to initiate an HTTP request for cron.php without waiting for the response? I assume that include('http://www.example.com/cron.php') would wait. Is there a lower-level TCP function that could be used to make the request but not wait for the response?

Comment #6

dmuth commented 7 May 2006 at 18:48

Title:	Race Condition in the code.	» I fixed this
Status:	Active	» Needs review

Status	File	Size
new	poormanscron_0.module	8.19 KB

I got bit by a very real instance of this race condition wherein my site got hit by a bot which caused several copies of poormanscron to run at the same time. This in turn caused a higher load which slowed down those copies, which caused even more copies to get run on subsequent pageloads, which sent Apache into a death spiral.

So, I wrote a fix for this. In the process, since the code in poormanscron_exit() was a little complicated (and not commented so well), I refactored that function into a few smaller ones.

The fix uses MySQL's GET_LOCK() function, so at the time this is not database agnostic. Sorry.

Also, I added in code to tell how long it took to execute the crontab, and to print out details on each module's cron hook that gets executed.

Comments welcome!

-- Doug

Comment #7

dmuth commented 7 May 2006 at 18:58

Title:

I fixed this

» Race Condition in the code.

Undoing my change to the title of the issue. I thought the title would be attached to my comment, and not the title of the bug. Sorry about that!

-- Doug

Comment #8

conkdg commented 12 June 2006 at 18:19

Status	File	Size
new	poormanscron_1.module	8.51 KB

I've attached a version of dmuth's module that's updated for the 4.7.x form API. It seems to work fine for me. I noticed that it doesn't register a cron run under the settings at http://www.yoursite.com/admin/settings but it does register all the info under watchdog.

Maybe someone out there knows how to fix it up so it registers undr settings as well?

Cheers!

Comment #9

enxox commented 22 January 2007 at 20:43

I need to solve the problem descripted, but the poormanscron_1.module doesn't work on my 4.7.5 site.
How can I obtain a release or a patch?
thanks

Comment #10

peterx commented 11 February 2007 at 08:21

For PostgreSQL you could use a .install file to set up a table named poormanscron then use a table lock on that. You could also use a variable as a generic lock. I am using the following code in Drupal 5.0 and the code uses the MySQL lock when available. The code was tested first with the variable lock then with the MySQL lock.

Note that this code has limited testing and may not work on any computer that has a bit set to zero or any other value.

/* From PeterMoulding.com 2007
4/ Add lock processing.
*/
function poormanscron_lock()
	{
	if($GLOBALS['db_type'] == 'mysql' or $GLOBALS['db_type'] == 'mysqli')
		{
		$query = db_query("select get_lock('poormanscron_lock_'" . conf_path() . ", 1) as get_lock");
		$result = db_fetch_array($query);
		if(empty($result['get_lock']))
			{
			return false;
			}
		}
	else
		{
		$lock = variable_get('poormanscron_lock', false);
		if($lock)
			{
			return false;
			}
		variable_set('poormanscron_lock', time());
		}
	return true;
	}
function poormanscron_unlock()
	{
	if($GLOBALS['db_type'] == 'mysql' or $GLOBALS['db_type'] == 'mysqli')
		{
		$query = db_query("select release_lock('poormanscron_lock_'" . conf_path() . ")");
		}
	else
		{
		variable_set('poormanscron_lock', false);
		}
	}

petermoulding.com/web_architect

Comment #11

Peter Bex commented 6 June 2008 at 09:12

This patch ought to do the trick.

Simply reading the configuration variable is not enough because the configuration variables are all read at bootstrap time. Then the request is processed, which can take several seconds (or even minutes on very big sites). Only then will poormanscron be started, and in the meantime another request may be started which also reads in the variables (which are still unchanged) and also starts poormanscron after processing.

The correct solution is to lock the table, read the variable's current status, then make a decision (optionally writing a new value if poormanscron needs to be executed) and unlock the table.

Please at least do something about this, this ticket has been open for 2 years and it's still causing people massive problems. It brought our server to its knees because it started sending out many many e-mails from the newsletter module on a very busy site.

here's the patch, inline because I wasn't able to add an attachment (it simply didn't show up in my post)

===================================================================
--- modules/poormanscron/poormanscron.module    (revision 1522)
+++ modules/poormanscron/poormanscron.module    (working copy)
@@ -30,8 +30,14 @@
  */
 function poormanscron_exit() {
 
+  $lastrun = 0;
+  db_lock_table('variable');
+  $result = db_query("SELECT * FROM {variable} where name = 'poormanscron_interval'");
+  while ($variable = db_fetch_object($result)) {
+    $lastrun = unserialize($variable->value);
+  }
+
   // Calculate when the next poormanscron run is due.
-  $lastrun = variable_get('poormanscron_lastrun', 0);
   $nextrun = $lastrun + 60 * variable_get('poormanscron_interval', 60);
 
   // If the configured time has passed, start the next poormanscron run.
@@ -40,6 +46,7 @@
     // If this cron run fails to complete, wait a few minutes before retrying.
     variable_set('poormanscron_lastrun',
        $lastrun + 60 * variable_get('poormanscron_retry_interval', 10));
+    db_unlock_tables();
 
     // Get the current Drupal messages. Use drupal_set_message() so that
     // the messages aren't deleted in case the cron run fails and
@@ -91,6 +98,8 @@
       }
     }
 
+  } else {
+    db_unlock_tables();
   }
 }

Comment #12

Peter Bex commented 6 June 2008 at 09:18

trying again, as somehow the edits in my post do not show up when I log out. Looks like this bugtracking system is broken

This patch ought to do the trick.

The correct solution is to lock the table, read the variable's current status, then make a decision (optionally writing a new value if poormanscron needs to be executed) and unlock the table.

here's the patch, inline because I wasn't able to add an attachment (it simply didn't show up in my post)

===================================================================
--- modules/poormanscron/poormanscron.module    (revision 1522)
+++ modules/poormanscron/poormanscron.module    (working copy)
@@ -30,8 +30,14 @@
  */
 function poormanscron_exit() {
 
+  $lastrun = 0;
+  db_lock_table('variable');
+  $result = db_query("SELECT * FROM {variable} where name = 'poormanscron_interval'");
+  while ($variable = db_fetch_object($result)) {
+    $lastrun = unserialize($variable->value);
+  }
+
   // Calculate when the next poormanscron run is due.
-  $lastrun = variable_get('poormanscron_lastrun', 0);
   $nextrun = $lastrun + 60 * variable_get('poormanscron_interval', 60);
 
   // If the configured time has passed, start the next poormanscron run.
@@ -40,6 +46,7 @@
     // If this cron run fails to complete, wait a few minutes before retrying.
     variable_set('poormanscron_lastrun',
        $lastrun + 60 * variable_get('poormanscron_retry_interval', 10));
+    db_unlock_tables();
 
     // Get the current Drupal messages. Use drupal_set_message() so that
     // the messages aren't deleted in case the cron run fails and
@@ -91,6 +98,8 @@
       }
     }
 
+  } else {
+    db_unlock_tables();
   }
 }

Comment #13

ericnielsen commented 11 September 2008 at 01:40

I'm using poormanscron and simplenews with a 3000 subscription list, in a production server. I've had the 3 concurrent cron task problem several times, each of my subscribers receiving 3 repetead email messages.

Peter, thank's for sharing your solution! I'm testing your code and it's working so far. It just needs a correction: the query should read the poormanscron_lastrun variable.

  $result = db_query("SELECT * FROM {variable} where name = 'poormanscron_lastrun'");

Greetings.

Comment #14

ericnielsen commented 28 October 2008 at 02:24

Status:

Needs review

» Reviewed & tested by the community

Code above (#12 with correction in #13) tested with simplenews using a 3000 subscription list and sending 3 newsletters. Worked just fine!

Comment #15

peterx commented 29 October 2008 at 13:20

Just a question or two.

How do Drupal cron runs handle race conditions? I have sites with cron running every minute and some cron runs occasionally exceeding a minute. They currently do not collide because of luck or my brilliant coding, with luck the far more likely answer. If they collide, they would copy themselves up their own input chutes and loop longer than Brad and Angelina. If standard Drupal cron stuff has a protection mechanism, we could reproduce it. If standard Drupal cron stuff does not use protection then we need to beat up the Drupal cron people, get them to build a standard something, and copy their something.

I am trying to avoid using the term doohickey here. Nothing more technical than doohickey.

Whatever we call it, there is justification for a standard approach to locks, race conditions, and similar curses in Drupal. The ideal would be a system level lock function and a standard way of naming tables by module. Example module could automatically get a table named example purely for a module related lock. Poormanscron should probably share a cron table with cron and use cron functions to control processing.

There is no reason why poormancscron should run only as an alternative to cron. In sites where cron runs every hour and we need something current, poormanscron could run along side cron to provide up to date processing, so long as they share the same processing lock.

Comment #16

dubitable commented 19 December 2008 at 21:13

Status	File	Size
new	poormanscron.patch	1.22 KB

I couldn't get the inline patch to work without "malformed patch" errors. So, I made the changes manually, and created my own patch, which incorporates the inline patch from #12 with the tweak from #13. Please take a look and confirm that this is correct.

Comment #17

robloach

he/him

commented 12 January 2009 at 04:56

This will run an extra database query on every page refresh. Is this the desired effect?

Comment #18

Peter Bex commented 2 July 2009 at 09:26

@Rob: It's the simplest way to reduce the race condition boundaries, because there can be a very long time lag between the bootstrap process where the variables are read in and the time cron is invoked.

Theoretically it doesn't even solve the race condition 100% because two processes can do the SELECT simultaneously. If you have a site that's that high-traffic, I'm afraid the conclusion is you shouldn't be using poormanscron, but real system cron.

Comment #19

Kevin Rogers commented 6 August 2009 at 15:14

Status	File	Size
new	poormanscron.module.patch	989 bytes

As a simple alternative would this patch do? Rather than introduce more SQL, setting the poormanscron_lastrun variable to the time() + 60 * interval rather than the $lastrun time + 60 * interval minimises the problem.

The current last run time behaviour is probably a bug - consider if the last run time is 2 hours ago (assume a low traffic site), the last runtime is updated to 1 hour 50 minutes ago. Any subsequent request that comes in before the first cron finishes will trigger a second run...

In particular, the maximum script execution time for PHP is set to 30 seconds by default, so a script should generally have finished well within 10 minutes. The only problem with this code is if a second (or third) request comes in and executes variable_get('poormanscron_lastrun', 0); before the first request has executed variable_set('poormanscron_lastrun', $t + 60 * variable_get('poormanscron_retry_interval', 10));

I suspect it would be near impossible to eliminate race conditions - there is always a very slim chance more than one request will come in within a fraction of a second of each other, leading to more than one job running at the very similar times.

Comment #20

monti commented 10 August 2009 at 14:25

This is an old discussion, but I just ran into this issue.

Two questions:
i) How come these fixes are not integrated into the module?

ii) The original post shows the race condition:

1. Check time of last run
2. If > 1 hour, run cron
... time passes for cron to finish
3. Update db with current time

Wouldn't it be easier to change the order of lines 3 and 4, this way:
1. Check time of last run
2. If > 1 hour, set a flag
3. Update db with current time
4. if flag is on, run cron. time passes for cron to finish. set flag to off.

The select statement, the test (if > 1), and the update are all done at a single lock.
I might be wrong (if you think I am, please explain), but I like to keep it simple.

Comment #21

Peter Bex commented 27 August 2009 at 10:42

Kevin's suggestion sounds excellent. I haven't checked to see if cron indeed acts this way but it sounds like a very real bug.

In practice you won't see it that much because usually a site is being hit constantly by web search crawlers and users, but it's good to fix it anyway! If you are using Drupal on an low-traffic intranet this bug could certainly manifest itself.

Monti's idea sounds like a good simplification.

This ticket is probably still open because poormanscron has gone for a while without a maintainer; the last development snapshot is from 2007! Also, see #426504: Request to help co-maintain Poormanscron.

Comment #22

dave reid

he/him

English

Nebraska USA

commented 27 November 2009 at 01:07

Version:

7.x-2.x-dev

» 6.x-1.x-dev

Race Condition in the code.

Comments

Comment #1

Comment #2

Comment #3

Comment #4

Comment #5

Comment #6

Comment #7

Comment #8

Comment #9

Comment #10

Comment #11

Comment #12

Comment #13

Comment #14

Comment #15

Comment #16

Comment #17

Comment #18

Comment #19

Comment #20

Comment #21

Comment #22

News items

Our community

Documentation

Drupal code base

Governance of community