performance question - getting regular load spikes

reikiman - October 17, 2009 - 22:32

I'm getting regular load spikes on my server - CPU percentage is generally idle (10%) but every half hour there's a spike to about 80% CPU. The spike roughly corresponds to when I run cron.php - That is the server has multiple sites (virtual hosting) and for each virtual site I run cron.php once an hour, and schedule the cron.php runs at 10 minute intervals. There are two spikes per hour and the server has two large sites and a couple less large sites.

The CPU spike had formerly been over 150% and the hosting company contacted me that my VPS had been consuming 100% of the host server resources and could I do something about that.

Poking around the mysql slow query log showed some slow queries which were on tables where some of the fields did not have keys. Executing the following alters to add some keys did decrease the size of these spikes.

create index xmlsitemaploc on xmlsitemap (loc);
create index ernid on encl_remotes (nid);

Looking at the other slow queries didn't lead me to any other simple changes.

But it gives me a thought to explore mysql performance tuning. Am curious if there are good Drupal oriented tutorials for this, as the typical mysql performance tutorials are too broad.

For example an InnoDB versus MyISAM recommendation in a general tutorial may not be a good recommendation for Drupal. I just searched d.o and it seems there's some controversy over whether to use InnoDB. The mysqltuner.pl script tells me InnoDB is enabled in MySQL but isn't being used in any tables.

Obviously I have mysqltuner.pl and also the tuning-primer.sh and also the High Performance MySQL book and also maatkit is installed on the server .. plenty of tools and reference material.

Ah, some good results have

reikiman - October 19, 2009 - 00:27

Ah, some good results have come from using mysqltuner.pl and tuning-primer.sh ...

Between the two I increased
- key_buffer
- table_cache
- query_cache_size

And added values for the following which weren't in my.cnf
- tmp_table_size
- max_heap_table_size

Restarted MySQL and for about 4 hours there were zero CPU spikes and then they started again. Clearly these values can be increased further.

Currently,
- the table cache hit rate is 3%
- 47% of temporary tables are being created on disk
- the max_connections is being recommended to decrease since only 8% is being used but I'd be concerned about connections required when traffic is higher
- key_buffer free ratio of 0% indicates it's way over used

I increased the table_cache and key_buffer sizes for now.

FWIW I've continued cranking

reikiman - October 20, 2009 - 18:28

FWIW I've continued cranking the settings upwards. I noticed while editing my.cnf that disabling InnoDB is claimed to save perhaps 100MB of memory, and it's simply done by commenting out the "skip innodb" line in the file.

My server is still having the same pattern of load spikes so clearly I haven't found the configuration that accommodates whatever it is that's happening.

e.g. the table cache hit rate is still well under 10% indicating that MySQL is doing a lot of churn of closing and reopening tables. e.g. more than 50% of temporary tables are still being created on disk, and I gather from the book that temporary tables are used in joins (and Drupal does a lot of joins).

I'm focusing on increasing those two items. The key buffer may be a little small still.

Thanks, your findings are

steveadamo - October 20, 2009 - 19:40

Thanks, your findings are very helpful... i'll be watching for any updates! :)

Oct 21 ... Last night I

reikiman - October 21, 2009 - 23:50

Oct 21 ...

Last night I started to bump up some numbers but thought to take a look at the fragmented tables. Both mysqltuner and tuning-primer.sh said I had a lot of fragmented tables. I learned from the MySQL documentation that this is only true for MyISAM tables (due to B-Tree details) and that the way to fix it is with the OPTIMIZE TABLE command. What wasn't clear was how to find which tables needed to be optimized. By reading the source to mysqltuner I came up with this SQL query:

SELECT TABLE_SCHEMA, TABLE_NAME FROM information_schema.TABLES WHERE TABLE_SCHEMA NOT IN ('information_schema','mysql') AND Data_free > 0 AND NOT ENGINE='MEMORY';

This lists the database (TABLE_SCHEMA) & table (TABLE_NAME) of each table that's fragmented. This ended up at around 100 tables. After a few rounds of doing OPTIMIZE TABLE they were all defragmented, right? Well, not so simple. It seems that some of the tables become fragmented simply through use. I ran the above command twice within a couple minutes and the number of fragmented tables increased. Clearly over time the tables will fragment.

mysql> SELECT TABLE_SCHEMA, TABLE_NAME FROM information_schema.TABLES WHERE TABLE_SCHEMA NOT IN ('information_schema','mysql') AND Data_free > 0 AND NOT ENGINE='MEMORY';
+---------------+------------------+
| TABLE_SCHEMA  | TABLE_NAME       |
+---------------+------------------+
| energyhealing | captcha_sessions |
| energyhealing | variable         |
| energyhealing | watchdog         |
+---------------+------------------+
3 rows in set (0.08 sec)

mysql> SELECT TABLE_SCHEMA, TABLE_NAME FROM information_schema.TABLES WHERE TABLE_SCHEMA NOT IN ('information_schema','mysql') AND Data_free > 0 AND NOT ENGINE='MEMORY';
+---------------+------------------+
| TABLE_SCHEMA  | TABLE_NAME       |
+---------------+------------------+
| energyhealing | cache            |
| energyhealing | captcha_sessions |
| energyhealing | variable         |
| energyhealing | watchdog         |
+---------------+------------------+
4 rows in set (0.05 sec)

Further, after a day of that the system is still seeing load spikes. Curiously the base CPU load was a little higher. Mebbe this made a positive change that allowed more activity to occur. But it didn't erase the thing I'm focused on (those load spikes) whereas over the weekend they did disappear for about 3 hours, due to some change to increase buffer sizes.

I just did some jiggering of the numbers again.

On tmp_table_size and max_heap_table_size the script reports this:

Of 612402 temp tables, 42% were created on disk

While that seems like a low percentage IIRC the last time I increased those numbers did not produce more temp tables created in memory. I saw a note somewhere that some of the temp tables (e.g. if the join includes a BLOB or TEXT) are forced to do the temp table on disk. Under that theory I dropped the size of this buffer a bit because I'm guessing this memory isn't being spent well. It'd been raised to 128M and I've dropped it back to 108M

I had increased table_cache to 3500 but the script now reports only 1450 open tables. Clearly this could be dropped a bit but I haven't done so.

With query_cache=48M and query cache used of 32M it's 67.77% utilization and the script says that's fine.

With Current MyISAM index space = 169 M and Current key_buffer_size = 74 M the tuning-primer.sh script says "Key buffer free ratio = 0 %" and recommends increasing key_buffer. On the other hand mysqltuner says "Key buffer hit rate: 100.0% (305M cached / 73K reads)" and says this is okay. The two are in conflict but I am thinking increasing key_buffer will help so have put it at 148M.

Mysqltuner recommended increasing query_buffer and tmp_table_size and max_heap_table_size but as explained above I've left them alone or decreased them.

Took a trip.. that let the

reikiman - October 27, 2009 - 23:45

Took a trip.. that let the server soak a bit with the last config change.

I see the maximum number of open connections was 14% of the configured allowable amount. That tells me it's likely possible to decrease this number a bit to recover some memory that can be used for other things.

"Slow queries" is 0% right now (80K/45M) so maybe decreasing the number of slow queries is not the thing to pursue. The load spikes are occuring in the same pattern as before.

Key Buffer is 148.0M max with 176.5M of indexes. The hit ratio is 100.0% (1B cached / 101K reads). Tuning-primer is still suggesting to increase the key buffer and says there is 0% free.

Query cache efficiency: 32.4% (14M cached / 44M selects) and mysqltuner puts red exclamations next to the number of query cache prunes. However tuning-primer.sh seems happy with the setting for this but notes "MySQL won't cache query results that are larger than query_cache_limit in size" .. query_cache_limit = 1 M so I wonder whether any of them are bigger than 1M?

I had decreased max_heap_table_size to 108 M and tmp_table_size to 108 M and 44% are being created on disk. The percentage of temp tables created on disk isn't changing regardless of the size of these buffers. Hence it seems increasing this buffer size is not a wise use of memory, despite the scripts telling me to do so.

At the moment I'm most curious about: join_buffer_size = 132.00 K. tuning-primer reports: 123 queries where a join could not use an index properly

CHANGED:

#WAS: max_connections        = 100
max_connections        = 70

join_buffer_size = 1M

#WAS: query_cache_limit       = 1M
query_cache_limit       = 2M

Hmm.. Apache?

reikiman - November 2, 2009 - 21:22

Interesting observation came to light. I'd needed to restart apache on my server for a different reason (added a new virtual host) and a little while later noticed the memory use on the server looked like it had fallen off a cliff. Coincidentally performance on the server had been eexxttrreemmeellyy ssllooooooooww up until restarting the server, and then became really good afterward. Coincidentally the CPU load monitor widget showed flat CPU load for about two hours, no CPU spikes. But then it returned to the regular CPU load spike every half hour.

I haven't made an MySQL change in over five days. But have restarted Apache a couple times. Each time it resulted in a small decrease in memory use, plus it looks like there's a gentle upslope in memory use over time.

Obviously this indicates a resource leak in Apache being triggered by something. But... what? Very curious.

The core behavior (regular CPU load spikes) has remained the same. The behavior has abated somewhat with the spikes remaining well under 100% CPU and the baseline remaining about 20% CPU. The spikes appear to coincide with cron runs.

I've been focusing on MySQL tuning because, well, not sure why. One reason is it's not likely to be an apache issue corresponding to cron runs since cron runs involve a single fetch of the cron.php URL so Apache isn't very much involved with it any more than the single request of the URL.

Out of curiosity .. the traditional way to execute cron is using wget to fetch cron.php. Nowadays drush can execute cron hooks so I just changed crontab this way and will report back in a couple hours:

10 * * * * (cd /var/vhosts/example.com ; /usr/local/bin/drush cron)
# 10 * * * * /usr/bin/wget -O- http://example.com/cron.php >/dev/null 2>&1

 
 

Drupal is a registered trademark of Dries Buytaert.