When using the Apache benchmark utility (ab), what is a fair number of concurrent connections to test a Drupal install with?

I see most people testing with numbers around 5-10.

I can safely benchmark our server with concurrent connections up to around 40, but if I go higher, say in the hundreds, the server starts choking and will hang. The load averages start soaring and if I can't get get the Apache processes killed we end up having to reboot the server.

The reason I am testing with such high numbers is that we are having an issue where our server is coming to a standstill. We are not sure if it is being caused by load. However, we can duplicate it in a test environment by holding down F5 (refresh) in Firefox 3 or IE7 while we are browsing the site with caching turned off. If we are an authenticated user, the cache doesn't apply, and we can take the site down using a constant refresh. The constant refresh from the browser is triggering another connection on the server before the previous connection has ended. I can see this with Apache server-status as the processes are all working. I don't understand why this behavior occurs and how we can prevent it. What I do know is that with a plain vanilla Drupal install, it doesn't happen. However, once we throw our site on it and put the modules in, there is a bit more to process so things are naturally slower.

When we have the issue, the number of httpd processes increases up to the MaxClients setting in the preform mpm in Apache. The mysqld process eats up a lot of CPU, and we see a MySQL connection for each of the Apache processes. Most of those connections are sleeping (monitoring using MySQLadmin processlist). When checking the server-status page for Apache during these high loads, the processes are all W sending replies.

We lowered the MaxClients in the preform mpm in Apache to a number down around 40 (was default I think 256). That helps keep the server alive during these high loads, but the website suffers because the client browsers eventually timeout. However, it is a plus in that we don't have to power cycle the server.

Some things we have tried:
* RHEL 4, RHEL 5, Ubuntu 8
* MySQL 4 and 5
* PHP 4 and 5
* Accessing site locally from GUI on server using Firefox 3 (eliminating network)
* Apache KeepAlive on
* Hardware is Intel Xeon 2.4GHz HT w/ 4 GB RAM, RAID 0 SCSI (Dell PE2600)
We have also tried various other hardware for various flavors of Linux.

Any thoughts would be greatly appreciated. Thanks guys,

Matt

Comments

Are you already running a production site? What are your actual loads? Is it reasonable to expect 100s of simultaneous connections or are you just capacity planning?

Pretty beefy hardware, so it should be able to take a pretty good beating. I'm just going to throw out some obvious scenarios to get dialog moving.

First, I think the MaxClients is going to be a balancing game, but you really need to find what the underlying cause of the problem is. My guess would be mysql straining under the load and then refusing connections. I'd turn on some logging and see what queries are taking the longest, then see if there is something you can optimize/cache to reduce that load. Speaking of caching, that is one of the first things that should be looked at if you are actually expecting 100s of simultaneous connections. Maybe look into memcached as that can speed things up greatly.

Hope that helps out a little.

This is supposed to be in production, but we pulled it and put our previous site back up until we can get the issues resolved. More or less we are capacity planning at a late stage.

The actual production server has even more horsepower. I have this dev box up to work this out as I have tried various configurations with no luck.

We lowered MaxClients but still see the issue. We also upped our min and start servers as we see a lot of delay creating new processes. I am thinking of trying the worker mpm and seeing if that helps.

We tried turning on the MySQL cache. This greatly helps and we plan on leaving it on. We fired up the Drupal devel module and see the biggest query log is the views module. Unfortunately we need the views for a handful of pages, including our frontpage. The biggest hog is a query that does some randomization of images in a block. If we turn the views off on the pages that seem to cause the most server load (in fact these pages crash the server if we sit and hold refresh for > 10 seconds), those pages load significantly faster (according to devel module) and do not usually crash the server (although it does become very sluggish for a while).

This is a bit off topic, but the problem with the randomization is the query uses the MySQL rand() function. We looked through the Drupal code and notice that the code is well written for modularity, but not with database performance in mind when the randomization is used. By using the MySQL rand() function, the query is never cached in MySQL. A better approach would have probably been to select all the indexes in the table into a PHP array, run the PHP rand function and grab a random array member, and then select that index in MySQL for the results. Or, the limit function could be passed a random lo/hi if the number of rows in the table was known. Food for thought for anyone using the randomization routines in the views module. Performance suffers using the randomization routines in Drupal views module because the database won't cache those queries. We looked at possibly modifying the code, but it would be rather difficult without tearing into a lot of code that we don't understand how it all ties together.

So the biggest thing is this: We fire up a browser, hit the site, and hold down F5 to refresh. The server starts working, starts opening new processes for each new connection, and eventually things die. It's basically a DoS attack from a single client. However, the same procedure of holding F5 does not cause simple php scripts and other plain vanilla Drupal sites on the same server to crash. I am guessing that because our Drupal site is so heavy that the server is simply overwhelmed.

I started looking into DoS prevention for Apache. I tried using the mod_evasive (aka mod_dosevasive). However, it yielded no results, even on static pages. server-info shows it installed and shows the options I configured in the Apache configuration, but it never blocks the client that is doing tons of refreshes. The only thing I can figure is the module is somewhat old and was written for a slightly older revision of Apache 2.0, and something in the new Apache revisions broke the module.

Thanks for the reply, it is greatly appreciated. I will paste our relevant configurations below. If anyone wants to see anything else let me know.

Matt

Current my.cnf:
[mysqld]
datadir=/var/lib/mysql
socket=/var/lib/mysql/mysql.sock
user=mysql
max_allowed_packet=32M

max_connections = 100
max_user_connections = 100
key_buffer = 36M
myisam_sort_buffer_size = 64M
join_buffer_size = 2M
read_buffer_size = 2M
sort_buffer_size = 3M
table_cache = 1024
thread_cache_size = 286
interactive_timeout = 25
wait_timeout = 5
connect_timeout = 60
max_connect_errors = 999999
query_cache_limit = 4M
query_cache_size = 512M
query_cache_type = 1
tmp_table_size = 16M

# Log slow queries
#long_query_time = 1
#log_slow_query = /var/log/mysql/mysql-slow.log

# Default to using old password format for compatibility with mysql 3.x
# clients (those using the mysqlclient10 compatibility package).
old_passwords=1

[mysqld_safe]
err-log=/var/log/mysqld.log
pid-file=/var/run/mysqld/mysqld.pid

Current httpd.conf relevant config:
Timeout 10
KeepAlive On
MaxKeepAliveRequests 100
KeepAliveTimeout 15

StartServers 30
MinSpareServers 30
MaxSpareServers 30
ServerLimit 256
MaxClients 256
MaxRequestsPerChild 4000

Relevant php.ini lines:
extension=apc.so
max_execution_time = 10 ; Maximum execution time of each script, in seconds
max_input_time = 10 ; Maximum amount of time each script may spend parsing request data
memory_limit = 32M ; Maximum amount of memory a script may consume (8MB)

PS: Here are the devel stats on a page that really bogs the server down. This is with MySQL cache on and apc on. The stats were taken with no load on the server (I was the only one hitting it).

When there are no sort criteria's for the view:
Executed 89 queries in 59.83 milliseconds. Queries taking longer than 5 ms and queries executed more than once, are highlighted. Page execution time was 399.99 ms.
Memory used at devel_init(): 1.16 MB
Memory used at devel_shutdown(): 11.5 MB

When the view sort criteria is set to random:
Executed 89 queries in 175.88 milliseconds. Queries taking longer than 5 ms and queries executed more than once, are highlighted. Page execution time was 593.67 ms.
Memory used at devel_init(): 1.16 MB
Memory used at devel_shutdown(): 11.5 MB

Subscribing