Currently, the wget line included in INSTALL.txt instructs people to run their cron task on the full hour.

While that works, it also causes some not-so-nice spikes in our access graphs. During the 5-minute interval around the full hour we have about 150% more requests than in any other 5-minute interval.

I would therefore like to see this line changed.

Current line:
0 * * * * wget -O - -q -t 1 http://www.example.com/cron.php?cron_key=RANDOMTEXT

Suggestion:
[some number between 1 and 59] * * * * wget -O - -q -t 1 http://www.example.com/cron.php?cron_key=RANDOMTEXT

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

Dave Reid’s picture

Or we could just remove this because of the poormanscron in D7 and just like to the cron documentation.

YesCT’s picture

Priority: Critical » Normal

is this really critical? I dont think anything is "broken"...

mfer’s picture

Priority: Normal » Critical

@YesCT This is critical because drupal.org has an access spike. As drupal grows this is going to get much more significant. We need to fix this before it breaks drupal.org every hour or requires a lot more infrastructure at that point in the hour (which costs).

Damien Tournoud’s picture

All the update status requests are served 100% from Varnish. I don't believe this is critical.

catch’s picture

Priority: Critical » Normal

We can fix INSTALL.txt during any point release of Drupal 7, hence this doesn't block release, demoting from critical.

I think this should probably be replaced with a link to http://drupal.org/cron anyway (which also uses 0 in the example).

jhodgdon’s picture

Title: Improve wget instructions in INSTALL.txt » wget instructions in INSTALL.txt - suggest a different time for running cron
Priority: Normal » Critical

If you change it to some other number in the suggested crontab, won't it just make a spike at that other time?

Also, as the vast majority of Drupal sites are small sites on shared hosting, I don't think they are probably directly entering the crontab. Most shared hosting accounts have a "control panel" that makes entering a cron job a bit easier, but doesn't give you fine control over when the jobs are run. So I am not convinced that changing this crontab line will really do much.

I'm also changing the subject line here. Changing the wget instructions in INSTALL.txt is not improving anything for the Drupal user. It is instead a (I think doomed) attempt to change the load at drupal.org.

Maybe a better idea would be to change the way the update module checks for updates -- for instance, does it really need to check every single cron run?

Garrett Albright’s picture

FileSize
1.44 KB

95% of Drupal sites don't need to run cron hourly (also, 86% of statistics, etc). And let's tell the human decide a random minute value.

Garrett Albright’s picture

Status: Active » Needs review
jhodgdon’s picture

Priority: Critical » Normal

Sorry I cross-posted on the priority.

dww’s picture

@jhodgdon: Re: Maybe a better idea would be to change the way the update module checks for updates -- for instance, does it really need to check every single cron run?

It doesn't. I believe the access spike Gerhard is talking about is the 1/24th of all sites that do their daily check for updates at the start of any given hour. We'd already have a much bigger problem if this was happening on every site every hour.

This issue is an attempt to spread out the requests even more by asking site admins to randomize which minute of each hour their cron run happens on so as to spread out the 1/24th of all sites hitting simultaneously to more like 1/1440th. ;)

However, I don't think this is the best solution to this problem. Seems like it'd be better to encourage people to configure cron to run every minute, then make hook_cron() implementations smarter about the frequency that they actually do their work vs. when they're invoked. Any code that's written to just blindly launch on every hook_cron() without its own throttle is already a bug, since folks can hit cron.php directly, etc. This way, update (status|manager) could do smarter things like add its own random minute offset for each "daily" update check. Then, it'd be truly random. This documentation change seems unlikely to have any noticeable effect on its own.

jhodgdon’s picture

Hmmm.

Every minute... Well, take a look at search_cron(), which definitely does something on every cron run (though after the search index is built, that may not be a big deal, since it will realize it doesn't need to add any new content). I'm not sure how many other core (not to mention contrib) modules are also doing that, but given that the core modules are often used as a model, I would imagine a lot. I think that if we encouraged people to run every minute, that would require a lot of changes to a lot of modules, I would imagine?

Also, for the vast majority of sites running on shared hosting, many of the hosting control panels I've seen don't allow you to run cron. And I'm not convinced most small sites would need to run cron every minute, or gain any real benefit from the added server load incurred. So if we changed the advice in this way, I think maybe we would need to give different advice for different types of sites.

The more I think about it, the more I think the best course of action is to not even have the crontab in INSTALL.txt and instead point people to the Handbook page http://drupal.org/cron, which we can update more easily and cover more bases.

Garrett Albright’s picture

FileSize
1.96 KB

I concur that telling users to run cron every minute is a horrible idea and we should force dww to wear a silly hat for even suggesting it. That being said, as there's technically nothing stopping users from configuring this anyway, the idea of suggesting that implementations of hook_cron() keep their own internal timers to make sure expensive operations are not happening too frequently is sound. (I myself do this in PIRETS, which has very expensive cron tasks.)

Here's a patch which sends people to the online handbook.

dww’s picture

No, if we had a reasonable cron (and queue scheduling) API in core, running cron every minute to poll if there's anything to do makes perfect sense. I'm talking about how things should be, not necessarily how they are right now. ;) I said it might take a bit of work to fix all the broken hook_cron() implementations that blindly assume they're only being spawned every 1 hour. Perhaps it's too late in D7 to fix this. I certainly wasn't suggesting to document "run cron every minute" independently of fixing hook_cron() implementations. ;)

The bigger point is that if this issue is about trying to load balance when every Drupal site phones home to updates.d.o, relying on humans to "randomly" configure cron based on some updated documentation seems like an extremely weak "solution". I was laying out an architecture that would solve this (and potentially many other problems), although perhaps it's too late to make this work before 7.0. Depends on how seriously the infra team's concerns will be taken by the D7 maintainers, and/or if a better solution is proposed.

jhodgdon’s picture

dww: I think you're talking D8 material here. Separate issue?

I am in favor of the patch in #13 BUT only after the cron page has been updated with the correct information for D7. Currently it is suggesting that the URL is http://www.example.com/drupal/cron.php, which I think will not work for D7, correct?

catch’s picture

We have a nice queue system in core, along with a nice lock system, so it'd make sense to do an audit of cron implementations now anyway. This doesn't mean we should suggest this now (although every 10 minutes shouldn't hurt too much - at least if we fix system_cron() not to clear caches every time).

Dries’s picture

I'd prefer to keep basic documentation in INSTALL.txt instead of moving it to drupal.org. Sometimes people work in off-line mode, you know. :)

jhodgdon’s picture

Status: Needs review » Needs work

#17 Point taken.

In which case the patch in #13 is not OK.

dww’s picture

@catch: Re: "We have a nice queue system in core, along with a nice lock system"

Right, but we don't have a nice cron/queue scheduling system in core (yet). That's one of these issues, depending on your perspective:

#410656: Job scheduler
#154043: Cron should not remain monolithic (Implement a scheduling hook for cron)
#28797: Consolidate cron scheduling and administration

Anyway, point being I'm very pessimistic that this documentation change will have any measurable effect on the problem killes is trying to solve. If we actually think this is a real problem, we should work on a real solution...

jhodgdon’s picture

#19: Agreed. Again, keep in mind that in terms of numbers of sites, the vast majority are small sites running on shared hosting, who probably don't have (or exercise) fine-grained control of their crontabs -- they'll just be setting up a cron job from the hosting control panel to run hourly, daily, or whatever. For these sites, nothing you put in the crontab instructions is going to affect when they run cron.

Also, that section about cron in the INSTALL.txt is kind of a mess -- needs an edit -- and maybe it should say something useful to those not using crontab? My guess is that most of the people who actually are reading INSTALL.txt are in the camp of hosting control panels rather than command-line crontab, right?

Dave Reid’s picture

I'm still not sure *why* we need to have anything about setting up cron since we have poormanscron in core and enabled to run by default.

jhodgdon’s picture

Well, that would certainly resolve all the questions about the cron section if we just removed it. :)

Um. So. Why do we think some people might prefer to use real cron rather than poor man's? If there's a good reason, then we should state it in that section and document how to do it. But I really don't think we actually need to document how to set up a cron job in general, with crontab or control panel or whatever. We don't document how to use tar, gzip, or other basic system tools. In my opinion, what we should be documenting is
a) Why you would want to set up cron instead of using poormanscron.
b) How to make sure poormanscron is being used (is it possible some install profiles would not enable it?), if you want to go that route.
c) What web page needs to be visited in the cron job, if you are using real cron.

catch’s picture

Agreed with jhogdon except I'd prefer if we didn't go out of our way to recommend poormanscron. However in terms of randomizing requests to Drupal.org, what could be better!

Also while I appreciate offline docs sometimes too, I don't know which planet it is where people setup cron tabs on web servers without an internet connection...

Garrett Albright’s picture

Also while I appreciate offline docs sometimes too, I don't know which planet it is where people setup cron tabs on web servers without an internet connection...

Agreed. But given poormanscron in core, maybe we can meet in the middle with something in INSTALL.txt that says something like "Cron jobs will be taken care of for you, but for more control, check out the handbook page at drupal.org/cron." n00bs can skip over it without worry, but l33ts and l33ts-in-training can check out the page for more info.

DevElCuy’s picture

Update manager is a service provided by drupal.org, what about changing Update module logic? I mean that it can display a warning saying something like "for faster cron running change your cron settings to check for updates in a time different to minute 0, please read[link] for more information". I think that something like this will spread the word faster.

DevElCuy’s picture

Component: documentation » update.module
Status: Needs work » Active
dww’s picture

Component: update.module » documentation
Category: bug » task

a) this is certainly not an update manager bug. ;)

b) i've already suggested what update manager *should* do to solve this problem, but it's going to require deeper changes than are probably going to be allowed at this stage in D7's life cycle.

c) the status report about when cron runs seems like the "right" place to add this check if you wanted a "warning" in the admin UI somewhere. I'm just not sure that's a good idea, nor what such a "warning" would actually say...