Closed (fixed)
Project:
Hostmaster (Aegir)
Version:
6.x-1.2
Component:
Code
Priority:
Critical
Category:
Bug report
Assigned:
Issue tags:
Reporter:
Created:
18 Jul 2011 at 15:39 UTC
Updated:
30 May 2014 at 10:04 UTC
Jump to comment: Most recent
Comments
Comment #1
anarcat commentedDo you see something in the watchdog?
Is it just the site or also aegir that believe the cron hasn't ran?
Comment #2
omega8cc commentedWhen running it manually, it always displays weird info for
hosting-cron:found running tasks, starting 1 out of 1 items:And this on fresh install, with one site created.
Comment #3
omega8cc commentedThis time it is also Aegir. It doesn't run the cron for sites for, say, 1 day, and it correctly displays on every site node that the cron was run 1 day ago - of course also for the hostmaster site.
Comment #4
anarcat commentedCan you try running hosting-cron --debug to see what happens there?
Comment #5
omega8cc commentedHah, it shows more:
Comment #6
omega8cc commentedNow, after I run
drush hosting-cron --debugon this server with over 110 sites hosted, the cron queue unlocked and runs after 2 days and 2 hours being locked. I will monitor the web server log to see when/if it stops again.Comment #7
anarcat commentedOkay, let us know how that goes.
For the record, ever since we implemented that patch at koumbit, cron has ran reliably everywhere and we at least had consistency between the frontend and the sites when cron wouldn't run for some reason.
Comment #8
omega8cc commentedOK, that was quick.
When the queue hits the first site which is switched to offline mode (so error 403), it got stuck again and tries to run the cron *only for this site*, on every new run, forever:
Comment #9
anarcat commentedI see, this is indeed what the code does.
If you try to run more than one task at a time, a single broken site will not block the queue anymore, so that's a first workaround. Oddly enough, disabling the site will also fix that problem.
Note that this was a solution to another fairly annoying bug (but not as critical): #1197048: cron uses external command; cron only run if canonical hostname exists.
I'll work on a patch now.
Comment #10
anarcat commentedPlease try commit [f46c7ce63d761ee933d548dab462dfa850086789] I just pushed to head, it should fix the problem.
Comment #11
omega8cc commentedThis doesn't help at all. The queue is still locked and runs the same site's cron with the same error 403, every minute.
I don't think it is related to the number of crons set to run at once. There is no such setting for cron queue in Aegir anyway, only for tasks queue. In this case it normally runs cron for 20 sites at every run, when I set the frequency to 1 minute. Yet, it doesn't help.
I believe that we need to properly handle any non-200 response to fix this issue.
Comment #12
anarcat commentedI committed a new fix for this.
Comment #13
omega8cc commentedThis commit http://drupalcode.org/project/hostmaster.git/patch/cb0c2d2 fixes the issue.
Now it just runs further after hitting any site with 403 response.
Thanks!
Comment #14
anarcat commentedmerged into 1.x.
Comment #15
anarcat commentedFor those stumbling upon this problem in 1.2, you can apply this patch directly:
http://drupalcode.org/project/hostmaster.git/patch/734d49b07e4dc174bcd07...