Every 2 or 3 days we are getting a very high server load (5 minutes average up to 70), the server starts to swap, and mysql reports some slow queries. After 15 to 20 minutes monit restarts the apache server because it does not respond anymore and after about 25 minutes everything is normal again. I was not able to find out the reason why this happens. There seem to be only 1 to 2 anonymous users and no logged-in users when this happens.
Drupal 5.2 is running on a dedicated server with 700 Mhz and 250 Mb RAM (Linux Sarge). The main server load is coming from Drupal.
I wrote a script to log the output of „top“ when this happens. Here is the result of top:
top - 14:47:58 up 3 days, 5:43, 0 users, load average: 9.37, 3.67, 1.53
Tasks: 86 total, 4 running, 82 sleeping, 0 stopped, 0 zombie
Cpu(s): 4.0% us, 3.8% sy, 3.0% ni, 87.1% id, 1.9% wa, 0.0% hi, 0.0% si
Mem: 255380k total, 253268k used, 2112k free, 992k buffers
Swap: 2097136k total, 175396k used, 1921740k free, 12524k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
24399 drupal 17 0 25396 9864 20m R 3.1 3.9 0:00.32 php
24407 drupal 16 0 25396 9844 20m D 3.1 3.9 0:00.29 php
24412 drupal 17 0 25436 9764 20m R 2.0 3.8 0:00.30 php
24414 drupal 17 0 25396 9864 20m R 2.0 3.9 0:00.29 php
47 root 15 0 0 0 0 S 1.0 0.0 0:27.87 kswapd0
24360 drupal 16 0 40020 17m 20m D 1.0 7.2 0:00.97 php
24364 drupal 16 0 40000 19m 20m D 1.0 7.8 0:00.93 php
24367 drupal 16 0 40024 18m 20m D 1.0 7.6 0:00.95 php
24417 root 15 0 2060 944 1852 R 1.0 0.4 0:00.02 top
1 root 16 0 1504 400 1352 S 0.0 0.2 0:01.12 init
2 root RT 0 0 0 0 S 0.0 0.0 0:00.00 migration/0
3 root 34 19 0 0 0 S 0.0 0.0 0:00.20 ksoftirqd/0
4 root 5 -10 0 0 0 S 0.0 0.0 0:04.86 events/0
5 root 10 -10 0 0 0 S 0.0 0.0 0:00.00 khelper
6 root 14 -10 0 0 0 S 0.0 0.0 0:00.00 kacpid
35 root 5 -10 0 0 0 S 0.0 0.0 0:07.12 kblockd/0
48 root 5 -10 0 0 0 S 0.0 0.0 0:00.00 aio/0
184 root 25 0 0 0 0 S 0.0 0.0 0:00.00 kseriod
313 root 15 0 0 0 0 S 0.0 0.0 0:18.89 kjournald
795 root 15 0 0 0 0 S 0.0 0.0 0:00.00 khubd
1690 daemon 16 0 1612 260 1440 S 0.0 0.1 0:00.00 portmap
2072 root 16 0 2260 636 2092 S 0.0 0.2 0:05.68 syslogd
2075 root 15 0 2384 428 1344 S 0.0 0.2 0:00.63 klogd
2083 bind 18 0 30856 2180 4252 S 0.0 0.9 0:00.00 named
2132 root 16 0 2628 408 2192 S 0.0 0.2 0:00.00 rpc.dracd
2138 root 19 0 2240 388 2084 S 0.0 0.2 0:00.00 inetd
2154 root 25 0 2508 404 2380 S 0.0 0.2 0:00.01 mysqld_safe
2191 mysql 15 0 122m 14m 8920 S 0.0 6.0 0:02.84 mysqld
2192 root 15 0 1488 364 1332 S 0.0 0.1 0:00.00 logger
2236 root 16 0 4416 1072 2624 S 0.0 0.4 0:11.35 logdog.pl
2243 root 19 0 6556 488 6168 S 0.0 0.2 0:00.00 saslauthd
2245 root 19 0 6556 488 6168 S 0.0 0.2 0:00.00 saslauthd
2246 root 19 0 6556 488 6168 S 0.0 0.2 0:00.00 saslauthd
2247 root 19 0 6556 488 6168 S 0.0 0.2 0:00.00 saslauthd
2248 root 19 0 6556 488 6168 S 0.0 0.2 0:00.00 saslauthd
2253 root 16 0 3468 520 3092 S 0.0 0.2 0:00.59 sshd
2334 root 16 0 7596 1040 6564 S 0.0 0.4 0:00.24 sendmail-mta
2349 root 15 0 1764 552 1576 S 0.0 0.2 0:00.53 cron
2367 root 16 0 6884 1172 3636 S 0.0 0.5 0:01.81 munin-node
2399 root 16 0 11712 588 3156 S 0.0 0.2 0:10.81 monit
2477 root 16 0 9672 1160 5300 S 0.0 0.5 0:00.21 miniserv.pl
27667 root 16 0 7076 1544 5556 S 0.0 0.6 0:00.92 apache
14953 www-data 15 0 7212 1600 5556 S 0.0 0.6 0:00.85 apache
14958 www-data 16 0 7344 1580 5556 S 0.0 0.6 0:00.97 apache
13307 www-data 15 0 7216 1672 5556 S 0.0 0.7 0:00.65 apache
16247 www-data 16 0 7348 1580 5556 S 0.0 0.6 0:00.27 apache
18378 www-data 16 0 7216 1600 5556 S 0.0 0.6 0:00.25 apache
18684 www-data 16 0 7216 1672 5556 S 0.0 0.7 0:00.27 apache
20386 www-data 16 0 7348 1580 5556 S 0.0 0.6 0:00.21 apache
22473 www-data 15 0 7216 1600 5556 S 0.0 0.6 0:00.09 apache
22566 www-data 16 0 7216 1732 5556 S 0.0 0.7 0:00.05 apache
23574 www-data 17 0 7212 1792 5556 S 0.0 0.7 0:00.02 apache
24337 drupal 16 0 39996 14m 20m D 0.0 6.0 0:00.94 php
24338 drupal 16 0 39732 11m 20m D 0.0 4.8 0:00.91 php
24339 drupal 16 0 41296 10m 20m D 0.0 4.2 0:01.08 php
24340 drupal 16 0 39996 14m 20m D 0.0 5.8 0:00.92 php
24341 www-data 15 0 7212 1596 5556 S 0.0 0.6 0:00.00 apache
24347 drupal 16 0 39996 15m 20m D 0.0 6.4 0:00.94 php
24348 www-data 17 0 7212 1596 5556 S 0.0 0.6 0:00.00 apache
24349 www-data 16 0 7212 1596 5556 S 0.0 0.6 0:00.00 apache
24351 drupal 16 0 39996 17m 20m D 0.0 7.1 0:00.96 php
24353 www-data 16 0 7212 1596 5556 S 0.0 0.6 0:00.00 apache
24354 www-data 15 0 7212 1596 5556 S 0.0 0.6 0:00.00 apache
24355 www-data 15 0 7212 1596 5556 S 0.0 0.6 0:00.00 apache
24356 www-data 17 0 7212 1816 5556 S 0.0 0.7 0:00.02 apache
24357 drupal 16 0 41296 11m 20m D 0.0 4.8 0:01.06 php
24359 drupal 16 0 39996 17m 20m D 0.0 7.0 0:00.94 php
24361 www-data 17 0 7208 1768 5556 S 0.0 0.7 0:00.05 apache
24365 www-data 16 0 7212 1884 5556 S 0.0 0.7 0:00.00 apache
24366 www-data 16 0 7212 1884 5556 S 0.0 0.7 0:00.00 apache
24370 drupal 16 0 39864 15m 20m D 0.0 6.1 0:00.94 php
24371 www-data 17 0 7212 1884 5556 S 0.0 0.7 0:00.00 apache
24372 www-data 16 0 7212 1884 5556 S 0.0 0.7 0:00.00 apache
24373 www-data 18 0 7076 1524 5556 S 0.0 0.6 0:00.00 apache
24374 www-data 17 0 7076 1536 5556 S 0.0 0.6 0:00.00 apache
24380 root 18 0 2052 768 1660 S 0.0 0.3 0:00.00 cron
24381 root 16 0 2732 1260 2528 S 0.0 0.5 0:00.03 load-average
24382 root 15 0 0 0 0 S 0.0 0.0 0:00.00 pdflush
24408 root 15 0 0 0 0 S 0.0 0.0 0:00.00 pdflush
24416 root 15 0 8204 2972 6580 S 0.0 1.2 0:00.00 sendmail-mta
During the high load-average, there are up to 20 php processes! I can't find out what causes this to happen. There were no users logged into drupal and the only unusual I could find in the apache access log were some strange calls to nonexisting files, like e.g.
196.0.7.129 - - [31/Aug/2007:14:44:39 +0200] "GET /effects-of-global-warming HTTP/1.1" 200 6150 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)" "-"
196.0.7.129 - - [31/Aug/2007:14:44:44 +0200] "GET /Effects%20of%20global%20warming%20%20Time%20for%20change_files/aggregator.css HTTP/1.1" 404 6068 "http://example.com/effects-of-global-warming" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)" "-"
196.0.7.129 - - [31/Aug/2007:14:44:48 +0200] "GET /Effects%20of%20global%20warming%20%20Time%20for%20change_files/book.css HTTP/1.1" 404 6068 "http://example.com/effects-of-global-warming" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)" "-"
196.0.7.129 - - [31/Aug/2007:14:44:50 +0200] "GET /Effects%20of%20global%20warming%20%20Time%20for%20change_files/forum.css HTTP/1.1" 404 6068 "http://example.com/effects-of-global-warming" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)" "-"
122.2.92.219 - - [31/Aug/2007:14:44:51 +0200] "GET /modules/aggregator/aggregator.
196.0.7.129 - - [31/Aug/2007:14:47:49 +0200] "GET /adsa_files/book.css HTTP/1.1" 404 6068 "http://example.com/some-page" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)" "-"
What can I do to find out what happens and how can I prevent this?
All hints will be much appreciated.
Juerg
Comments
Calls to non-existing pages probably causing overload
Installation of the module badbehavior and modifications to .htaccess in order to let Apache be the handler for non-existing flat files (404 error) improved the situation.
However about three to five times per week I am still getting a server overload due to 10 to 20 calls for non-existing pages or files within a short time. Usually the same (non-existing) page is called 10 to 20 times within a few seconds. This seems to cause the overloads.
I would appreciate any comments or suggestions how I could avoid the overload situations!
Juerg
I have had bots/open proxy
I have had bots/open proxy hacker attacks cause this problem.
Google for the pages that are being called to see if anyone else is talking about that pattern. If so, it is likely a bot. Also, traceroute the IP address doing the page calls