Database locking up under high load
Hi all
We developed a site at the end of last year. This site is become more and more popular as time goes on.
This site is running on a shared server and uses a lot of modules including views and CCK and has many users.
The problem is that every once in a while (mostly middle of the day) the server’s load goes through the roof and mySQL runs out of connections. The only way to get the server working normally again is to get the server admin to restart mySQL.
I have tried finding the problem and every time I think I found it a few days later (some times a week later) I get proved wrong.
The last time it happened I got the server admin to send me the mySQL process list (see bottom of post).
I have gotten the server admin’s to change the mysql max connections to 250. I had caching enable and disabled and it has happened with both settings.
The only thing left that I can think of is table locking. I found this patch http://drupal.org/node/55516#comment-875125 but I am not sure about it at all. The other thing I can think of is converting to InnoDB. I have never used InnoDB before and I don’t fully understand the consequences of running a Drupal site off InnoDB.
I don’t want to convert to InnoDB unless I am sure that is the problem and InnoDB will solve the problem.
I actually don’t even know if the server supports InnoDB and don’t really know how to check.
Any help would be greatly appreciated.
Thanks.
+------+------------+------------------------------------------------------------------------------------------------------+
| Time | State | Info |
+------+------------+------------------------------------------------------------------------------------------------------+
| 6226 | statistics | SELECT count(DISTINCT(node.nid)) FROM node node LEFT JOIN content_type_job node_data_field_job_expi |
| 6184 | Locked | DELETE FROM term_node WHERE nid = 1829 |
| 6115 | Locked | SELECT t.* FROM term_node r INNER JOIN term_data t ON r.tid = t.tid INNER JOIN vocabulary v ON t.vid |
| 6111 | Locked | SELECT t.* FROM term_node r INNER JOIN term_data t ON r.tid = t.tid INNER JOIN vocabulary v ON t.vid |
| 6032 | Locked | SELECT count(DISTINCT(node.nid)) FROM node node LEFT JOIN content_type_job node_data_field_job_expi |
| 5978 | Locked | SELECT t.* FROM term_node r INNER JOIN term_data t ON r.tid = t.tid INNER JOIN vocabulary v ON t.vid |
| 5937 | Locked | SELECT t.* FROM term_node r INNER JOIN term_data t ON r.tid = t.tid INNER JOIN vocabulary v ON t.vid |
| 5910 | Locked | DELETE FROM term_node WHERE nid = 1830 |
| 5799 | Locked | SELECT t.* FROM term_node r INNER JOIN term_data t ON r.tid = t.tid INNER JOIN vocabulary v ON t.vid |
| 5741 | Locked | SELECT count( DISTINCT(node.nid)) FROM node node LEFT JOIN content_type_job node_data_field_job_exp |
| 5614 | Locked | SELECT t.* FROM term_node r INNER JOIN term_data t ON r.tid = t.tid INNER JOIN vocabulary v ON t.vid |
| 5491 | Locked | SELECT count( DISTINCT(node.nid)) FROM node node LEFT JOIN content_type_job node_data_field_job_exp |
| 5439 | Locked | SELECT count( DISTINCT(node.nid)) FROM node node LEFT JOIN content_type_job node_data_field_job_exp |
| 5398 | Locked | SELECT count(DISTINCT(node.nid)) FROM node node LEFT JOIN content_type_job node_data_field_job_expi |
| 5369 | Locked | SELECT count( DISTINCT(node.nid)) FROM node node LEFT JOIN content_type_job node_data_field_job_exp |
| 5346 | Locked | SELECT count(DISTINCT(node.nid)) FROM node node LEFT JOIN content_type_job node_data_field_job_expi |
| 5319 | Locked | SELECT count( DISTINCT(node.nid)) FROM node node LEFT JOIN content_type_job node_data_field_job_exp |
| 5308 | Locked | SELECT t.* FROM term_node r INNER JOIN term_data t ON r.tid = t.tid INNER JOIN vocabulary v ON t.vid |
| 5274 | Locked | SELECT t.* FROM term_node r INNER JOIN term_data t ON r.tid = t.tid INNER JOIN vocabulary v ON t.vid |
| 5269 | Locked | SELECT t.* FROM term_node r INNER JOIN term_data t ON r.tid = t.tid INNER JOIN vocabulary v ON t.vid |
| 5215 | Locked | SELECT count( DISTINCT(node.nid)) FROM node node LEFT JOIN content_type_job node_data_field_job_exp |
| 5199 | Locked | SELECT t.* FROM term_node r INNER JOIN term_data t ON r.tid = t.tid INNER JOIN vocabulary v ON t.vid |
| 5095 | Locked | SELECT count( DISTINCT(node.nid)) FROM node node LEFT JOIN content_type_job node_data_field_job_exp |
| 4975 | Locked | SELECT count( DISTINCT(node.nid)) FROM node node LEFT JOIN content_type_job node_data_field_job_exp |
| 4858 | Locked | SELECT t.* FROM term_node r INNER JOIN term_data t ON r.tid = t.tid INNER JOIN vocabulary v ON t.vid |
| 4726 | Locked | SELECT t.* FROM term_node r INNER JOIN term_data t ON r.tid = t.tid INNER JOIN vocabulary v ON t.vid |
| 4605 | Locked | SELECT t.* FROM term_node r INNER JOIN term_data t ON r.tid = t.tid INNER JOIN vocabulary v ON t.vid |
| 4590 | Locked | SELECT t.* FROM term_node r INNER JOIN term_data t ON r.tid = t.tid INNER JOIN vocabulary v ON t.vid |
| 4563 | Locked | SELECT count( DISTINCT(node.nid)) FROM node node LEFT JOIN content_type_job node_data_field_job_exp |
| 4369 | Locked | SELECT t.* FROM term_node r INNER JOIN term_data t ON r.tid = t.tid INNER JOIN vocabulary v ON t.vid |
| 4303 | Locked | SELECT t.* FROM term_node r INNER JOIN term_data t ON r.tid = t.tid INNER JOIN vocabulary v ON t.vid |
| 4277 | Locked | SELECT t.* FROM term_node r INNER JOIN term_data t ON r.tid = t.tid INNER JOIN vocabulary v ON t.vid |
| 4149 | Locked | SELECT t.* FROM term_node r INNER JOIN term_data t ON r.tid = t.tid INNER JOIN vocabulary v ON t.vid |
| 4027 | Locked | SELECT t.* FROM term_node r INNER JOIN term_data t ON r.tid = t.tid INNER JOIN vocabulary v ON t.vid |
| 3897 | Locked | SELECT t.* FROM term_node r INNER JOIN term_data t ON r.tid = t.tid INNER JOIN vocabulary v ON t.vid |
| 3656 | Locked | SELECT t.* FROM term_node r INNER JOIN term_data t ON r.tid = t.tid INNER JOIN vocabulary v ON t.vid |
| 3564 | Locked | DELETE FROM term_node WHERE nid = 1831 |
| 3543 | Locked | DELETE FROM term_node WHERE nid = 1832 |
| 3517 | Locked | DELETE FROM term_node WHERE nid = 1833 |
| 3503 | Locked | DELETE FROM term_node WHERE nid = 1834 |
| 3463 | Locked | DELETE FROM term_node WHERE nid = 1835 |
| 3283 | Locked | SELECT t.* FROM term_node r INNER JOIN term_data t ON r.tid = t.tid INNER JOIN vocabulary v ON t.vid |
| 2918 | Locked | DELETE FROM term_node WHERE nid = 1836 |
| 2824 | Locked | SELECT t.* FROM term_node r INNER JOIN term_data t ON r.tid = t.tid INNER JOIN vocabulary v ON t.vid |
| 2820 | Locked | SELECT t.* FROM term_node r INNER JOIN term_data t ON r.tid = t.tid INNER JOIN vocabulary v ON t.vid |
| 2801 | Locked | SELECT t.* FROM term_node r INNER JOIN term_data t ON r.tid = t.tid INNER JOIN vocabulary v ON t.vid |
| 2738 | Locked | DELETE FROM term_node WHERE nid = 1837 |
| 2726 | Locked | SELECT t.* FROM term_node r INNER JOIN term_data t ON r.tid = t.tid INNER JOIN vocabulary v ON t.vid |
| 2653 | Locked | SELECT t.* FROM term_node r INNER JOIN term_data t ON r.tid = t.tid INNER JOIN vocabulary v ON t.vid |
| 2628 | Locked | SELECT t.* FROM term_node r INNER JOIN term_data t ON r.tid = t.tid INNER JOIN vocabulary v ON t.vid |
| 2523 | Locked | SELECT t.* FROM term_node r INNER JOIN term_data t ON r.tid = t.tid INNER JOIN vocabulary v ON t.vid |
| 2349 | Locked | SELECT t.* FROM term_node r INNER JOIN term_data t ON r.tid = t.tid INNER JOIN vocabulary v ON t.vid |
| 2086 | Locked | DELETE FROM term_node WHERE nid = 1838 |
| 2061 | Locked | DELETE FROM term_node WHERE nid = 1839 |
| 2033 | Locked | SELECT t.* FROM term_node r INNER JOIN term_data t ON r.tid = t.tid INNER JOIN vocabulary v ON t.vid |
| 2012 | Locked | SELECT t.* FROM term_node r INNER JOIN term_data t ON r.tid = t.tid INNER JOIN vocabulary v ON t.vid |
| 1932 | Locked | DELETE FROM term_node WHERE nid = 1840 |
| 1930 | Locked | DELETE FROM term_node WHERE nid = 1841 |
| 1928 | Locked | DELETE FROM term_node WHERE nid = 1842 |
| 1750 | Locked | DELETE FROM term_node WHERE nid = 1843 |
| 1750 | Locked | DELETE FROM term_node WHERE nid = 1844 |
| 1749 | Locked | DELETE FROM term_node WHERE nid = 1845 |
| 1746 | Locked | DELETE FROM term_node WHERE nid = 1846 |
| 1676 | Locked | DELETE FROM term_node WHERE nid = 1847 |
| 1670 | Locked | DELETE FROM term_node WHERE nid = 1848 |
| 1629 | Locked | DELETE FROM term_node WHERE nid = 1849 |
| 1248 | Locked | SELECT count( DISTINCT(node.nid)) FROM node node LEFT JOIN content_type_job node_data_field_job_exp |
| 1110 | Locked | SELECT t.* FROM term_node r INNER JOIN term_data t ON r.tid = t.tid INNER JOIN vocabulary v ON t.vid |
| 1108 | Locked | SELECT t.* FROM term_node r INNER JOIN term_data t ON r.tid = t.tid INNER JOIN vocabulary v ON t.vid |
| 1106 | Locked | SELECT t.* FROM term_node r INNER JOIN term_data t ON r.tid = t.tid INNER JOIN vocabulary v ON t.vid |
| 1104 | Locked | SELECT t.* FROM term_node r INNER JOIN term_data t ON r.tid = t.tid INNER JOIN vocabulary v ON t.vid |
| 1050 | Locked | DELETE FROM term_node WHERE nid = 1850 |
| 1034 | Locked | SELECT t.* FROM term_node r INNER JOIN term_data t ON r.tid = t.tid INNER JOIN vocabulary v ON t.vid |
| 1033 | Locked | SELECT t.* FROM term_node r INNER JOIN term_data t ON r.tid = t.tid INNER JOIN vocabulary v ON t.vid |
| 1032 | Locked | SELECT t.* FROM term_node r INNER JOIN term_data t ON r.tid = t.tid INNER JOIN vocabulary v ON t.vid |
| 1032 | Locked | SELECT t.* FROM term_node r INNER JOIN term_data t ON r.tid = t.tid INNER JOIN vocabulary v ON t.vid |
| 1019 | Locked | SELECT t.* FROM term_node r INNER JOIN term_data t ON r.tid = t.tid INNER JOIN vocabulary v ON t.vid |
| 1018 | Locked | SELECT t.* FROM term_node r INNER JOIN term_data t ON r.tid = t.tid INNER JOIN vocabulary v ON t.vid |
| 1018 | Locked | SELECT t.* FROM term_node r INNER JOIN term_data t ON r.tid = t.tid INNER JOIN vocabulary v ON t.vid |
| 1017 | Locked | SELECT t.* FROM term_node r INNER JOIN term_data t ON r.tid = t.tid INNER JOIN vocabulary v ON t.vid |
| 1015 | Locked | SELECT t.* FROM term_node r INNER JOIN term_data t ON r.tid = t.tid INNER JOIN vocabulary v ON t.vid |
| 1012 | Locked | SELECT t.* FROM term_node r INNER JOIN term_data t ON r.tid = t.tid INNER JOIN vocabulary v ON t.vid |
| 1009 | Locked | SELECT t.* FROM term_node r INNER JOIN term_data t ON r.tid = t.tid INNER JOIN vocabulary v ON t.vid |
| 975 | Locked | SELECT t.* FROM term_node r INNER JOIN term_data t ON r.tid = t.tid INNER JOIN vocabulary v ON t.vid |
| 934 | Locked | SELECT t.* FROM term_node r INNER JOIN term_data t ON r.tid = t.tid INNER JOIN vocabulary v ON t.vid |
| 928 | Locked | SELECT t.* FROM term_node r INNER JOIN term_data t ON r.tid = t.tid INNER JOIN vocabulary v ON t.vid |
| 903 | Locked | SELECT t.* FROM term_node r INNER JOIN term_data t ON r.tid = t.tid INNER JOIN vocabulary v ON t.vid |
| 892 | Locked | SELECT t.* FROM term_node r INNER JOIN term_data t ON r.tid = t.tid INNER JOIN vocabulary v ON t.vid |
| 853 | Locked | SELECT t.* FROM term_node r INNER JOIN term_data t ON r.tid = t.tid INNER JOIN vocabulary v ON t.vid |
| 718 | Locked | SELECT t.* FROM term_node r INNER JOIN term_data t ON r.tid = t.tid INNER JOIN vocabulary v ON t.vid |
| 566 | Locked | SELECT t.* FROM term_node r INNER JOIN term_data t ON r.tid = t.tid INNER JOIN vocabulary v ON t.vid |
| 556 | Locked | SELECT t.* FROM term_node r INNER JOIN term_data t ON r.tid = t.tid INNER JOIN vocabulary v ON t.vid |
| 552 | Locked | SELECT t.* FROM term_node r INNER JOIN term_data t ON r.tid = t.tid INNER JOIN vocabulary v ON t.vid |
| 543 | Locked | SELECT t.* FROM term_node r INNER JOIN term_data t ON r.tid = t.tid INNER JOIN vocabulary v ON t.vid |
| 528 | Locked | SELECT t.* FROM term_node r INNER JOIN term_data t ON r.tid = t.tid INNER JOIN vocabulary v ON t.vid |
| 520 | Locked | SELECT t.* FROM term_node r INNER JOIN term_data t ON r.tid = t.tid INNER JOIN vocabulary v ON t.vid |
| 250 | Locked | SELECT t.* FROM term_node r INNER JOIN term_data t ON r.tid = t.tid INNER JOIN vocabulary v ON t.vid |
| 0 | | show processlist |
+------+------------+------------------------------------------------------------------------------------------------------+
Deletes
Are you doing some sort of large-scale deletion when this happens? Sure looks that way from your post. You don't want to be expiring content during peak access hours.
We've had similar issues. The typical symptom is that the servers are happily chugging along until there's a sudden and catastrophic traffic jam at the database level.
We've mitigated it by moving node-related tables to innodb (row locking) from MyISAM (table locking). We also installed the Boost module to take most anonymous traffic off the database, turned off Organic Groups (which creates some heinous queries), configured MySQL to use a RAMdisk for temp tables, moved all cache tables into Memcached ... there's always one more bottleneck behind the one you remove.
Besides the cron we are not
Besides the cron we are not running any scripts against the database. The site is an employment website were users have profile nodes and users are able to add and delete CV’s. Recruiters are able to add and remove Job postings.
So there are a lot of nodes being added and updated. And it seems that if 2 users are trying to do something at the same time it hangs everything. I have tried to search d.org for possible race conditions but I have only found and patch one in the session.inc file.
I have traced the top SELECT to a view that is using views fast search. The “DELETE FROM term_node WHERE nid =” query I have traced to the taxonomy module for when a node is being deleted.
I am considering innodb but I don’t know how to check if the server supports it and changing mySQL storage engines on a busy site makes me nervous.
InnoDB is supported (show
InnoDB is supported (
show engines;). Does any body know what the recommended practices are for changing storage engines? Does the site need to be taken offline? Should I only do it to the term tables? Is their any problems with joins when one table is InnoDB and the other is MyISAM.I am still not 100% convinced this problem will be fixed with row-level locking, but it is the only thing I can think of at the moment.
Not only traffic jam
I hve to confirm... I've started to have the same problem when "improved" a bigger MySQL query cache, mainly when updating a node, the delete in term_node causes a HUGH lock. Until like to 50 secs, record is released and thing goes smooth again (is there any mysql variable about locking 50secs?).
Query cache is flushed daily at night but at midnight is very fragmentated, like 110k to 200k blocks and this causes a performance degradation instead of an improvement. Also using Drupal's agressive cache.
Going to decrease query cache memory and confirm in few days.
Gustavo
I converted term_data and
I converted term_data and term_node to InnoDB and since then (touch wood) I have not had any problems.
I dont know much about mySQL query cache or if InnoDB would make any difference.
What I do know is you can check the settings by running
show variables like 'query%';This page could also prove helpful http://www.mysqlperformanceblog.com/2006/07/27/mysql-query-cache/
I wasn't the cache
Hi Nax, I would try also your solution. Is it as simple as changing the engine for the tables or something else?
thanks
Seems to work
Now, before moving these tables to innodb, I've commented the lock and unlock in cache.inc, the heaviest traffic table, and things seems to work. This is not easy to test because didn't happen all the time, but after the change never happen. I'll back with updates
Similar Database Locking Problem
I am administering several Drupal sites and I have started to experience a similar issue with my database locking. Best I can tell, the problem is triggered when users list all content via the Admin Menu >Content >List. Unfortunately, I did not capture output when the problem was happening (and I don't want to take down my sites until later in the day/weekend), but here is one of the queries that I know was locking the database:
SELECT n.*, u.name FROM node n INNER JOIN term_node tn0 ON n.nid = tn0.nid INNER JOIN term_node tn1 ON n.nid = tn1.nid INNER JOIN term_node tn2 ON n.nid = tn2.nid INNER JOIN term_node tn3 ON n.nid = tn3.nid INNER JOIN term_node tn4 ON n.nid = tn4.nid INNER JOIN term_node tn5 ON n.nid = tn5.nid INNER JOIN term_node tn6 ON n.nid = tn6.nid INNER JOIN term_node tn7 ON n.nid = tn7.nid INNER JOIN term_node tn8 ON n.nid = tn8.nid INNER JOIN term_node tn9 ON n.nid = tn9.nid INNER JOIN DATABASENAME.users u ON n.uid = u.uid WHERE tn0.tid = 443 AND tn1.tid = 443 AND tn2.tid = 443 AND tn3.tid = 610 AND tn4.tid = 445 AND tn5.tid = 445 AND tn6.tid = 611 AND tn7.tid = 455 AND tn8.tid = 455 AND tn9.tid = 455 ORDER BY n.changed DESC LIMIT 0, 50;
When I manually run this query, it takes 53 seconds to complete. However, within the mysqladmin processlist, I see the process stalled for quite a while longer than that, with a state of statistics. After killing this and the below query, everything returned to normal.
I was able to grab another partial query that seems to be causing a similar locking issue: SELECT COUNT(*) FROM node n INNER JOIN term_node tn0 ON n.n
I am not opposed to switching my tables over to use innoDB, however, I want to make sure that this will resolve my problem. My site is NOT under heavy load so I'm looking to get some recommendations of how to approach this. Thanks in advance!!
InnoDB follow-up
Originally I was having the lockup problem every week. I have since converted my term tables to InnoDB http://drupal.org/node/410974#comment-1612922
Sine the conversion my database has only locked up a few times in months. I do believe there are some mySQL settings that can improve the situation or make it worse, but I have not had much time to investigate this.
Because converting a few tables to InnoDB had a positive affect, I am considering converting the rest of the database or maybe just a few more key tables (node). My biggest problem is that this site does get a large amount of traffic and people will definitely notice if I put the site into off-line mode.