How do you run custom SQLs statements at the beginning the Cron cycle?

Sam308 - November 27, 2008 - 17:04

How do you run custom SQLs statements at the beginning the Cron cycle?

I have a few heavely aggregatted 6.x websites that utilize both the standard built-in Drupal aggregator and FeedAPI aggregator modules. The problem I am having is that I get cron errors that state "MySQL server has gone away query" and "Cron has been running for more than an hour and is most likely stuck.".

From researching the Forums, I found out that theses problems can be solved by deleting the cron_semaphore and cron_last items from the variable table and deleting the variables item from the cache table within the database.

Below are the three SQL statements needed to perform the tasks:

Deletes the cron_semaphore variable from the variable table:
DELETE FROM `variable` WHERE name = 'cron_semaphore';

Deletes the cron_last variable from the variable table:
DELETE FROM `variable` WHERE name = 'cron_last';

Deletes the variables variable from the cache table:
DELETE FROM `cache` WHERE name = 'variables';

In order to solve this problem, I need to run these custom SQLs statements at the beginning of each cron cycle.

I am currently using the add-on module called SQL Crons that allows you to execute custom SQLs statements with each cron run, but it seems that this module runs the custom cron statements at the end of the cron cycle.

Does anyone know an alternative way to run custom SQLs statements at the beginning the cron cycle?

Thanks in advance,

Sam

If you set the weight of SQL

gpk - November 27, 2008 - 18:28

If you set the weight of SQL cron module in the {system} table to be less than the lowest weight of all the other modules that implement the cron hook (-1 will probably do the trick) then it will run first.

Note that by deleting cron_semaphore while cron.php is actually still running you might cause simultaneous cron runs to occur. At best this probably won't work cleanly and you'll get loads of error messages in watchdog. I'd suggest trying to find out what is causing cron to stick (having first run those 3 queries against the DB). The easiest way to do this might be to run it from the link on the status page - at least any errors should be reported on-screen. If however it just hangs then you'll have to dig deeper.

gpk
----
www.alexoria.co.uk

Thanks, I will give it a try

Sam308 - November 27, 2008 - 19:09

Thanks, I will give it a try.

Sam

BTW if cron is often timing

gpk - November 27, 2008 - 22:11

BTW if cron is often timing out then you may just need to make it run more often so that there's less to do each time. But you'd probably thought of that. ;)

gpk
----
www.alexoria.co.uk

RE: BTW if cron is often timing out

Sam308 - November 28, 2008 - 15:35

Thanks for the suggestion, but I tried in the past to running the cron jobs 2 and 3 times per hour, but this did not help. Currently, I am running the cron jobs once per hour.

I think the reason the server is having problems is because I have so many RSS feeds coming into the websites.

Site 1 has 1403 RSS feeds converted to nodes using the FeedAPI module. The newly created nodes are associated with over 800 Taxonomy terms.

Site 2 has 868 RSS feeds using the standard built-in Drupal aggregator.

I don't know of any other way around this or how to fix the cron error problems.

Thanks,

Sam

Ah yes I see what you mean,

gpk - November 28, 2008 - 16:46

Ah yes I see what you mean, with that many feeds running cron more frequently probably just makes things worse.

As you can see, http://api.drupal.org/api/file/cron.php/6/source is pretty simple, quite possibly you are going to have to roll your own cron2.php script which is a bit cleverer. Exactly what will work will depend on your server and where your cron runs are getting stuck. Some random thoughts: in your situation I'd try setting a cron run off and then monitoring the watchdog table to see how it's progressing with the feed updates, trying to notice if it times out after a certain time or anything like that. Is there anything in your error log? http://api.drupal.org/api/function/drupal_cron_run/6 tries to increase the PHP execution time to 240 seconds, but probably that's nothing like enough given that doubtless some of the feeds being processed will be slow to respond. You might have success by increasing the execution time further, or by hacking http://api.drupal.org/api/function/aggregator_cron/6 (or, better, making your own version in a custom module) to process only a certain number of feeds in a single cron run. You might also want to hack http://api.drupal.org/api/function/aggregator_refresh/6 during your testing so that in the 304 case (no new syndicated content) it generates a watchdog message as well as a DSM (drupal_set_message). The best way forward will all depend a bit on the specifics of your situation though.

gpk
----
www.alexoria.co.uk

Thanks for the suggestions

Sam308 - November 28, 2008 - 17:08

Thanks for the suggestions. I do appreciate the time you have taken to help me out.

I will try some of your suggestions to see if any of these things will work.

I will first try to increase the cron time from 240 to 340 seconds by changing the code shown on the http://api.drupal.org/api/function/drupal_cron_run/6 page.

Then I will move on to the other items if this does not work.

I dont quite understand how to make my own version of http://api.drupal.org/api/function/aggregator_cron/6 to process only a certain number of feeds in a single cron run. I do understand a little PHP but not enough to modify the peice of code on that page to process only a certain number of feeds per cron run. I don't see or understand how the system would know where to start, end, and pickup again on processing the RSS feeds on subsequent runs.

Thank you very much for all your help,

Sam

I'd strongly suggest also

gpk - November 28, 2008 - 20:15

I'd strongly suggest also trying to work out where the cron run is crashing out, and, if possible, why. --> error logs.

>I dont quite understand how to make my own version of aggregator_cron() to process only a certain number of feeds
Well yes it's probably slightly involved. First of all you'd have to bypass/not use cron.php at all (i.e. make sure you don't invoke it from anywhere!!) since I can't see any way to modify already-defined cron behaviour/hooks.

So copy cron.php to custom_cron.php, and in the latter change drupal_cron_run(); to module_invoke('custom_cron', 'run');

Now we need to create module custom_cron. Create folder sites/all/custom_cron, then file custom_cron.info which will need (http://drupal.org/node/231276)

; $Id$
name = Custom cron
description = "An alternative to the built-in cron. Heavily based on drupal_cron_run and the standard cron hooks, but customized so as to deal with large numbers of aggregator.module and feedapi.module feeds."
core = 6.x

And custom_cron.module (but omit the closing ?> tag)

<?php
// $Id$
/**
* Perform a cron run, customised so that large numbers of feeds are dealt with reliably.
*
* This is basically a rehashed version of drupal_cron_run().
*/
function custom_cron_run() {
 
// If not in 'safe mode', increase the maximum execution time:
 
if (!ini_get('safe_mode')) {
   
set_time_limit(240);
  }

 
// Fetch the cron semaphore
 
$semaphore = variable_get('cron_semaphore', FALSE);

  if (
$semaphore) {
    if (
time() - $semaphore > 3600) {
     
// Either cron has been running for more than an hour or the semaphore
      // was not reset due to a database error.
     
watchdog('cron', 'Cron has been running for more than an hour and is most likely stuck.', array(), WATCHDOG_ERROR);

     
// Release cron semaphore
     
variable_del('cron_semaphore');
    }
    else {
     
// Cron is still running normally.
     
watchdog('cron', 'Attempting to re-run cron while it is already running.', array(), WATCHDOG_WARNING);
    }
  }
  else {
   
// Register shutdown callback
   
register_shutdown_function('drupal_cron_cleanup');

   
// Lock cron semaphore
   
variable_set('cron_semaphore', time());

   
// Iterate through the modules calling their cron handlers (if any).
    // We need to do this manually to customize cron handling for aggregator.module
    // and feedapi.module.
   
foreach (module_implements('cron') as $module) {
      if (
$module == 'aggregator' || $module == 'feedapi') {
       
call_user_func('custom_cron_'. $module);
      }
      else {
       
$function = $module .'_cron';
       
call_user_func($function);
      }
    }

   
// Record cron time
   
variable_set('cron_last', time());
   
watchdog('cron', 'Cron run completed.', array(), WATCHDOG_NOTICE);

   
// Release cron semaphore
   
variable_del('cron_semaphore');

   
// Return TRUE so other functions can check if it did run successfully
   
return TRUE;
  }
}

/**
* Do a customized cron job for aggregator.module.
*
* This is basically a rehashed version of aggregator_cron().
*/
function custom_cron_aggregator() {
<?
php
function aggregator_cron() {
 
// Process at most say 100 feeds at a time.
 
$result = db_query_range('SELECT * FROM {aggregator_feed} WHERE checked + refresh < %d', time(), 0, 100);
  while (
$feed = db_fetch_array($result)) {
   
aggregator_refresh($feed);
  }
}
?>

Obviously function custom_cron_feedapi() also needs to be written, hopefully this is enough to get you started. You may find 100 is too many, or too few - you will have to experiment.

In some ways it might appear simpler just to hack Drupal core and also feedapi.module, but keeping track of your mods and reimplementing them when you upgrade your site with new security releases of core etc. is a pretty thankless task.

BTW none of this has been tested ...

gpk
----
www.alexoria.co.uk

RE: I'd strongly suggest also

Sam308 - November 29, 2008 - 02:32

I can't thank you enough for all your help. You have gone beyond the call of duty. You should get an award for being so helpful.

I will definitely take a look at the custom cron script modifications and go from there. This is a great step forward.

At this time, since I increased the cron time from the default 240 seconds to 340 seconds, using the method you suggested earlier, I have not experienced any cron errors for the last 9 hours for both sites, where as the cron is scheduled to run every hour. So far there has been 9 successful consecutive cron runs for each of the two heavily aggregated 6.x websites.

Hopefully this will continue to work without cron errors.

By the way, over the last 9 cron runs for both sites, I kept the following three SQL statements in the site using the SQL Crons module. I don't know if they helped at all, but I guess I can try to remove them and see what happens.

Deletes the cron_semaphore variable from the variable table:
DELETE FROM `variable` WHERE name = 'cron_semaphore';

Deletes the cron_last variable from the variable table:
CODE>DELETE FROM `variable` WHERE name = 'cron_last';

Deletes the variables variable from the cache table:
DELETE FROM `cache` WHERE name = 'variables';

This whole cron error thing has been a problem for me for months, and your solution seems to be the best one out there.

As a thanks for all your help, you can choose any software program I sell at my http://sam308.com or http://xlecom.com site, and I will send you a registration code for free. Just send an email to developer + @ + sam308.com and I will send you the registration code.

Thank you again,

Sam Raheb (Sam308)

Thanks, you're very kind,

gpk - November 29, 2008 - 14:03

Thanks, you're very kind, I'll have a look some time.

Maybe the SQL crons trick is sufficient. It's unlikely that a previous cron will actually still be running after an hour, so it may be pretty safe after all to release the cron_semaphore etc. http://api.drupal.org/api/function/drupal_cron_run/6 actually releases the semaphore itself if it appears to have been running for more than an hour, but in this situation won't actually do the run till the next invocation, hence you would then get 2 hours between runs (or possibly 3 hours if timing is slightly awry) - which give the run more work to do --> more chance of it falling over.

I guess your main challenge is to keep everything ticking over nicely, and avoid accidentally invoking simultaneous cron runs, though it's hard to say whether that would do anything much worse than possibly (or not!) spew out error messages.

gpk
----
www.alexoria.co.uk

 
 

Drupal is a registered trademark of Dries Buytaert.