add cron job to restart apache every N hours? [#135176]

so, once again, d.o was dying with "server not responding", etc. lots of folks were complaining in IRC, and it was dying for me, too. so, i just restarted apache on both d1 and d2. i've had to do this quite a bit in the last few weeks. :( also, i notice that the webmaster issue queue and contact page is getting hit a lot with reports of internal server errors on the project download pages and when replying to issues. those problems also seem to go away after an apache restart.

i know it'd be nice to understand what's actually going wrong and fix the underlying problem(s). however, until that time, what about a cron job on d1 and d2 that just restarts apache every N hours (N==24? 12?) automatically? i have interest and ability to make this happen, i just need someone else (Dries, killes, etc) to agree it's a good idea.

thanks,
-derek

Comments

Comment #1

dww

we/he/they

commented 10 April 2007 at 18:44

update: d.o fell over *again* 15 minutes after i restarted apache. just kicked it again. we desperately need to figure out what's going on here. ;)

Comment #2

david strauss

he/him

commented 22 May 2007 at 16:09

How about a cron job that checks if Drupal.org is online and only restarts if it's not?

wget | grep -> apache restart if necessary

Comment #3

dww

we/he/they

commented 22 May 2007 at 16:37

sure, that sounds good to me.

Comment #4

kbahey commented 23 May 2007 at 03:04

Do we get a seg fault error in Apache's error.log?

Like so:

[Sun May 20 22:42:48 2007] [notice] child pid 12747 exit signal Segmentation fault (11)

If we do, then there is a solution more elegant than this: a script that checks for that every 45 seconds (or 60 or whatever) and restarts Apache when this happens. The down time is a minute or less.

You can grab the script (actually one .sh and one .php) from here.

Comment #5

emsearcy commented 25 May 2007 at 17:01

On May 23, 2007, at 12:06 PM, Matt Rae wrote:

> If I remember right, the issue was that apache processes would start
> to segfault after a random time. The problem was attributed to APC
> which has become more stable since the work around was implemented.
>
> Apache wasn't really going down, it would just thrown a larger amount
> of http 500s.
>
> The osuosl has drupal.org monitored through nagios, but having a cron
> job to restart apache when it dies makes sense, especially if admins
> are asleep.
>
> Matt Rae

This would better be done through a cfengine `services' operation. I'll commit a change into our management repository to do this.

However, like you say Matt, Apache wasn't really going down of late (in the process sense), and in my experience restarting Apache when there *are* 500s is putting the cart before the horse, as the 500s were the result of intended activity, went away on their own, and restarting the server meant leaving important cron-jobs that were causing them (like the search index rebuild) only half-done.

I'm glad that we've worked towards fixing the problem (table locks and long queries), rather than just hacking at the symptom.

I'd suggest we resolve the ticket?

--
Eric Searcy
OSU Open Source Lab

Comment #6

jlambert commented 11 June 2007 at 22:11

Based on my experience, this is usually related to the accelerator taking a crap.

Install our script here:
http://fisheye.firebright.com/browse/firebright_public/logwatcher

This will watch the apache logs and bounce them as necessary when a 500 occurs.

It's not a permanant fix (I'm still looking for one, let me know), but it works, and it means you don't have 14 minutes of downtime on a cron that runs 15 minutes. Cron is not a viable solution here, unless you do something like this.

Let me know if you have questions.

Jonathan Lambert
Principal, WorkHabit

http://www.WorkHabit.com/
A FireBright Company

Comment #7

emsearcy commented 15 August 2007 at 23:17

Assigned:	dww	» emsearcy
Status:	Active	» Fixed

We have been running a similar python script for the last few months, but thanks to APC tweaks these errors do not emerge any more. I'm assigning to myself and resolving the task.

Comment #8

(not verified) commented 29 August 2007 at 23:19

Status:

Fixed

» Closed (fixed)

Comment #9

21 August 2014 at 21:00

Component:

Webserver

» Servers

add cron job to restart apache every N hours?

Comments

Comment #1

Comment #2

Comment #3

Comment #4

Comment #5

Comment #6

Comment #7

Comment #8

Comment #9

News items

Our community

Documentation

Drupal code base

Governance of community