I wasn't very happy with the performance of my drupal websites and i/o seemed to be the culprit. I'm running the drupal sites on a collection of virtual guests. A drupal site runs off a webserver and a separate database server, both virtual guests. Each guest uses 4 cores with 1GB RAM. Each request to the site(s) would consume somewhere between 4-10 seconds. The same guests also host phpbb sites for which each request consumes less than 2 seconds.
I ran iostat on both the www and database server and then issued a(n http) request to one of the drupal sites. The database server only reports about 9 iops for the request while the web server reports somewhere around 2000 iops per request. I was expecting the database server to be doing most of the i/o but apparently the webserver claims most of the i/o. Wondering which files were actually being accessed, on the webserver I ran inotifywatch on /etc and various other directories and (expectedly) all the drupal php files etc. are being accessed for each request. Somewhere between 150-200 files are accessed for each request. None of the files are actually written to so one would expect the disk cache to prevent actually having to access the disk for all these files.
For whatever reason I put the drupal files on a ramdisk (tmpfs) and now all my sites are much more responsive. Now each request takes no more than 4 seconds. Everything feels much more snappy.
I've setup a shadow directory which contains the drupal files on disk. I run a(n hourly) cron job which rsync's the drupal files from the ramdisk to its corresponding shadow directory. When the machine boots, it creates the drupal ramdisk(s) and rsyncs the data from the shadow directories to the corresponding drupal directories.
Initially, I had been seriously considering getting SSDs but that's not necessary with these improvements. I wonder if SSDs would even have improved i/o in this case. Previously I had tried to improve performance using memcached but that didn't really improve performance and is nowhere near the improvement I get from running drupal from a ramdisk. I realise I may lose up to 1 hour of settings in the event of a crash but that's acceptable considering the improvement. A more elaborate setup could have some replicated filesystem attached to the ramdisk with eventual consistency but using rsync every hour is fine for me at the moment.
Comments
APC better than a RAMdisk
Not necessarily. With only 1GB of memory it's possible that you're dumping disk cache and may even have some process swapping going on. That will kill your performance faster than anything.
We recently took a server from 4-8 seconds down to 0.1-0.2 second. Here's the recipe: APC will cache all of the PHP files in memory in precompiled form and significantly improve performance. Install APC+Pressflow+CacheRouter with the cache tables in APC shared memory, consolidate your servers to make better use of your RAM with 2-4 GB minimum, install Boost to move all anonymous traffic away from PHP and MySQL, and make sure MySQL has enough memory for the query cache and isn't doing anything performance-killing like running binlog.
The www server still reports
The www server still reports ~250MB free RAM memory. It's not swapping. The drupal files amount to about 80 megabytes.
The caching of precompiled PHP sounds very attractive but I already had APC enabled. I tried out cacherouter with the following settings which I've since disabled. I noticed that with cacherouter enabled (memcached is disabled), the data displayed in the drupal website becomes incorrect. Certain entries which should show up in a view don't show up. They show up in another view with incorrect attribute values. I'm the only one using the site so I don't know what cacherouter is doing.
#$conf['cache_inc'] = './sites/all/modules/cacherouter/cacherouter.inc';
#$conf['cacherouter'] = array(
# 'default' => array(
# 'engine' => 'apc',
# 'servers' => array(),
# 'shared' => TRUE,
# 'prefix' => '',
# 'path' => 'sites/sitename/files/filecache',
# 'static' => FALSE,
# 'fast_cache' => TRUE,
# ),
#);
Wrt file caching, it would seem that if at least one of the drupal files can't be fetched from cache that the performance suddenly drops due to the disk access. That's the only explanation I can think of to explain the difference between having the drupal files on disk and having the same files on a ramdisk.
I'll have to investigate what's going when using cacherouter more thoroughly. Thanks for the insights.
Cron
I set up my new dev box to use RAM disk for MySQL based on these instructions:
http://wolfgangziegler.net/ubuntu-11.04-simpletest-performance-upstart-m...
and it would be useful to know how you did your cron if you are able to share those steps.
Thanks!
-Kristen
Profile: https://www.linkedin.com/in/kristenpol
Drupal 7 Multilingual Sites: http://kristen.org/book
I'm not managing that server
I'm not managing that server setup anymore, and not even really doing much with drupal (system) management as of lately. I don't have a record of how I had setup cron, but I had put together the following init script. The crontab would have to be setup to call the save_fs function.
This was on Fedora so it'll likely be somewhat different for Ubuntu:
Thanks!
Thanks for sharing that... it might help with the Ubuntu script. Cheers!
-Kristen
Profile: https://www.linkedin.com/in/kristenpol
Drupal 7 Multilingual Sites: http://kristen.org/book