Below is a summary of the top-20 slowest SQL queries on Drupal.org sorted by number of occurrences (count). You can see the average time per SQL query as well as the accumulated execution time. This information is probably a good starting point for optimizations, and not necessarily specific to Drupal.org. Using this information to optimize Drupal is going to save days worth of processing power.

Feel free to share your site's slow queries. Be careful, copy-pasting your slow-query log may be tricky because of the < and > signs. The drupal.org results are obtained using MySQL's slow query log feature, and the slow query log analyzer that ships with MySQL, mysqldumpslow. We used the following command line to generate this list:
mysqldumpslow mysql-slow.log -t 20 -s=c -v

Count: 8589  Time=13.09s (112389s)  Lock=0.20s (1688s)  Rows=59.8 (513405), drupal[drupal]@3hosts
  SELECT DISTINCT i.sid, i.type FROM search_index i INNER JOIN node n ON n.nid = i.sid  INNER JOIN users u ON n.uid = u.uid WHERE n.status = N AND (i.word LIKE 'S')
Count: 2454  Time=12.18s (29887s)  Lock=0.18s (447s)  Rows=7.1 (17542), 2users@3hosts
  SELECT i.type, i.sid, SUM(i.score/t.count) AS score FROM search_index i INNER JOIN node n ON n.nid = i.sid  INNER JOIN users u ON n.uid = u.uid INNER JOIN search_total t ON i.word = t.word WHERE n.status = N AND (i.word LIKE 'S') GROUP BY i.type, i.sid ORDER BY score DESC LIMIT N, N
Count: 374  Time=13.13s (4910s)  Lock=0.00s (0s)  Rows=0.0 (0), 2users@3hosts
  INSERT INTO cache (cid, data, created, expire, headers) VALUES ('S', 'S', N, N, 'S')
Count: 345  Time=12.36s (4265s)  Lock=0.01s (2s)  Rows=7489.0 (2583705), drupal[drupal]@3hosts
  SELECT DISTINCT i.sid, i.type FROM search_index i INNER JOIN node n ON n.nid = i.sid  INNER JOIN users u ON n.uid = u.uid WHERE n.status = N AND (i.word LIKE 'S' OR i.word LIKE 'S')
Count: 236  Time=13.43s (3170s)  Lock=0.07s (17s)  Rows=13.5 (3196), drupal[drupal]@3hosts
  SELECT i.type, i.sid, SUM(i.score/t.count) AS score FROM search_index i INNER JOIN node n ON n.nid = i.sid  INNER JOIN users u ON n.uid = u.uid INNER JOIN search_total t ON i.word = t.word WHERE n.status = N AND (i.word LIKE 'S' OR i.word LIKE 'S') GROUP BY i.type, i.sid ORDER BY score DESC LIMIT N, N
Count: 194  Time=13.19s (2559s)  Lock=0.00s (0s)  Rows=0.0 (0), drupal[drupal]@3hosts
  INSERT INTO cache (cid, data, created, expire, headers) VALUES ('S', 'S', N, -N, 'S')
Count: 103  Time=18.09s (1863s)  Lock=1.57s (162s)  Rows=0.0 (0), drupal[drupal]@3hosts
  DELETE FROM cache WHERE expire != N AND expire <= N
Count: 99  Time=19.71s (1951s)  Lock=0.15s (15s)  Rows=30.0 (2970), drupal[drupal]@3hosts
  SELECT COUNT(a.uid) AS hits, a.uid, u.name, a.hostname, SUM(a.timer) AS total, ac.aid FROM accesslog a LEFT JOIN access ac ON ac.type = 'S' AND LOWER(a.hostname) LIKE (ac.mask) LEFT JOIN users u ON a.uid = u.uid GROUP BY a.hostname ORDER BY  hits DESC LIMIT N, N
Count: 71  Time=21.21s (1506s)  Lock=0.87s (62s)  Rows=11.0 (782), drupal[drupal]@2hosts
  SELECT DISTINCT(n.nid), n.title, n.type, n.changed, n.uid, u.name, l.last_comment_timestamp AS last_post, l.comment_count FROM node n INNER JOIN node_comment_statistics l ON n.nid = l.nid INNER JOIN users u ON n.uid = u.uid LEFT JOIN comments c ON n.nid = c.nid AND (c.status = N OR c.status IS NULL) WHERE n.status = N AND (n.uid = N OR c.uid = N) ORDER BY last_post DESC LIMIT N, N
Count: 42  Time=12.64s (531s)  Lock=0.02s (1s)  Rows=19792.2 (831274), drupal[drupal]@3hosts
  SELECT DISTINCT i.sid, i.type FROM search_index i INNER JOIN node n ON n.nid = i.sid  INNER JOIN users u ON n.uid = u.uid WHERE n.status = N AND (i.word LIKE 'S' OR i.word LIKE 'S' OR i.word LIKE 'S')
Count: 33  Time=15.33s (506s)  Lock=0.21s (7s)  Rows=14.6 (483), drupal[drupal]@3hosts
  SELECT i.type, i.sid, SUM(i.score/t.count) AS score FROM search_index i INNER JOIN node n ON n.nid = i.sid  INNER JOIN users u ON n.uid = u.uid INNER JOIN search_total t ON i.word = t.word WHERE n.status = N AND (i.word LIKE 'S' OR i.word LIKE 'S' OR i.word LIKE 'S') GROUP BY i.type, i.sid ORDER BY score DESC LIMIT N, N
Count: 33  Time=11.85s (391s)  Lock=0.06s (2s)  Rows=0.8 (26), drupal[drupal]@3hosts
  SELECT u.*, s.* FROM users u INNER JOIN sessions s ON u.uid = s.uid WHERE s.sid = 'S' AND u.status < N LIMIT N, N
Count: 29  Time=12.03s (349s)  Lock=1.79s (52s)  Rows=0.0 (0), drupal[drupal]@3hosts
  DELETE FROM cache WHERE cid = 'S'
Count: 27  Time=14.11s (381s)  Lock=0.19s (5s)  Rows=9.9 (267), drupal[drupal]@2hosts
  SELECT DISTINCT m.*, r.*, u.name, u.uid FROM cvs_messages m INNER JOIN cvs_files f ON m.cid = f.cid INNER JOIN cvs_repositories r ON m.rid = r.rid INNER JOIN users u ON m.uid = u.uid  ORDER BY m.created DESC LIMIT N, N
Count: 23  Time=14.26s (328s)  Lock=2.22s (51s)  Rows=0.0 (0), drupal[drupal]@2hosts
  DELETE FROM cache WHERE expire != N AND expire < N
Count: 22  Time=12.45s (274s)  Lock=5.77s (127s)  Rows=25.0 (550), drupal[drupal]@2hosts
  SELECT DISTINCT(n.nid), n.title, n.type, n.changed, n.uid, u.name, l.last_comment_timestamp AS last_post, l.comment_count FROM node n INNER JOIN users u ON n.uid = u.uid INNER JOIN node_comment_statistics l ON n.nid = l.nid WHERE n.status = N ORDER BY last_post DESC LIMIT N, N
Count: 20  Time=19.00s (380s)  Lock=0.00s (0s)  Rows=30.0 (600), drupal[drupal]@3hosts
  SELECT COUNT(a.uid) AS hits, a.uid, u.name, a.hostname, SUM(a.timer) AS total, ac.aid FROM accesslog a LEFT JOIN access ac ON ac.type = 'S' AND LOWER(a.hostname) LIKE (ac.mask) LEFT JOIN users u ON a.uid = u.uid GROUP BY a.hostname ORDER BY  total DESC LIMIT N, N
Count: 13  Time=56.46s (734s)  Lock=0.15s (2s)  Rows=0.0 (0), drupal[drupal]@3hosts
  DELETE FROM cache WHERE cid LIKE 'S'
Count: 11  Time=14.18s (156s)  Lock=0.00s (0s)  Rows=1.0 (11), 2users@2hosts
  SELECT COUNT(DISTINCT(n.nid)) FROM node n LEFT JOIN comments c ON n.nid = c.nid AND (c.status = N OR c.status IS NULL) WHERE n.status = N AND (n.uid = N OR c.uid = N)
Count: 11  Time=12.73s (140s)  Lock=0.00s (0s)  Rows=0.0 (0), 2users@3hosts
  DELETE FROM search_index WHERE sid = N AND type = 'S'
Support from Acquia helps fund testing for Drupal Acquia logo

Comments

killes@www.drop.org’s picture

I just want to add that if these queries are from current drupal.org (not scratch.drupal.org) they are relevant to cvs from a couple of months ago. The search queries for example should have been improved (or a least changed) dues to Stevens's search improvements.

It would be helfull to know over which period of time the query log was accumulated in order to assess the significance of 200 slow inserts into the cache table. I've said it a long time ago and say it again: Splitting up the cache table into smaller parts would probably be beneficial. There are parts that hardly change (produced by the filters) and parts which change every few minutes (the page cache).

Dries’s picture

The slow query log has been accumulated over the past couple of months (eg. we spent 112389 seconds or 31 hours (!) executing 8589 search queries).

It is probably worth doing a second experiment over a shorter time span with a much lower threshold to study day-to-day SQL behavior. In fact, by making the devel.module _store_ its query information rather than _sending_ it to the user, one could compute what queries attribute most SQL overhead across all page views _and_ over a longer period of time. It would certainly make the devel.module more useful, and it would allow us to understand and optimize SQL bottlenecks which are not necessarily outliers in terms of execution time.

Bèr Kessels’s picture

Should we not better use the MySQL slow log and normal logs for that? They are timestamped so it should be possible to find out how long queries take. In addition, we can use this to find non-indexed queries by comparing to a log with --log-queries-not-using-indexes argument.

More here: http://dev.mysql.com/doc/refman/4.1/en/log-files.html

Bèr Kessels’s picture

Ignore my post. I read the thread in the mail, and somehow did not receive the first two posts, untill just now. I saw the last one about using devel.module. Sorry.
/me drinks his coffee instead.

Dries’s picture

There are a many ways to profile an application. The slow-query log provides only one possible mechanism to help understand the behavior of Drupal. Let's not talk about how one could or should profile, but let's focus on the data at hand. Some of these queries are clearly a problem and the goal of this issue is to fix some of those.

Killes: I might be mistaken but I think the slow query log is not specific to drupal.org. It may contain slow queries from scratch.drupal.org. Of course, scratch.drupal.org is hardly used so it is less likely to show up in these logs. It is fair to assume that they are from drupal.org.

handelaar’s picture

Can I ask what storage engine's being used?

Dries’s picture

handelaar: we're using MyISAM for all tables. It might make sense to switch some tables to InnoDB. I haven't really experimented with that.

Amazon’s picture

InnoDB will greatly increase the size of the database, some reports indicate by a factor of 4 or larger. InnoDB offers row level locking for transactions compared to MyISAM which has table locking.

I recommend InnoDB because every time someone submits content to Drupal.org the entire node table locks while all the pending writes for new content are done. Reads of the node table are suspended until the writes are done. Hence, for fresh content there can be very long delays as all the writes are done. Here are the instructions for converting: http://dev.mysql.com/doc/refman/4.1/en/converting-tables-to-innodb.html

Could you also publish a show status of MySQL so that we can look at the buffer pools, MySQL memory cache, and thread cache? I am particularly interested in threads created versus threads cached.

Are you using pconnect or connect?

I'll respond to the slow queries shortly.

Kieran

Amazon’s picture

Could we get the table schema's for all the tables in these queries?

Desc search_index, node, user, score, search_total, cache, accesslog, access, node_comment_statistics, comments, sessions, cvs_messages, cvs_files, cvs_repositories.

Also, do you keep Drupal.org in version control so we could know exactly what versions of the code you are running? Once we move to the right engine, and make sure we have the correct table schema's next we will want to look to optimize the queries by looking at the code to see if we can make the queries more effectively, or call them less frequently.

It would also help to know what modules were calling these queries. If you know off the top of your head that's useful. If not we can start looking through head?

Kieran

Amazon’s picture

Two statements that would be useful would be to do an ANALYSE TABLE for each of the tables in those queries: search_index, node, user, score, search_total, cache, accesslog, access, node_comment_statistics, comments, sessions, cvs_messages, cvs_files, cvs_repositories.
http://dev.mysql.com/doc/refman/4.1/en/analyze-table.html

Also a EXPLAIN EXTENDED SHOW WARNINGS for each of the slow queries will help us to understand why this specific version of MySQL with these specific table schemas actually builds the queries.

So it might simply be the order the query is written in cause the tables to be joined inefficiently and we can rewrite the query slightly differently for our needs to allow MySQL to execute much faster.
http://dev.mysql.com/doc/refman/4.1/en/explain.html

Dries’s picture

Amazon: I'll try my best to provide the requested data in the next few days. That, or after Vancouver.

Amazon’s picture

Let me see if I can get the queries written in a PHP page that you can just cut and paste to get the answers.

killes@www.drop.org’s picture

FileSize
16.06 KB

I've had some fun with devel module...

The code is mysql specific.

killes@www.drop.org’s picture

FileSize
464 bytes

Here's the sql scheme.

The regexps that do the inverse sprintf could probably be improved.

kbahey’s picture

If you look at the top 5 or so offending queries, it is obvious that the search is the main culprit here.

I am not sure what would a LIKE on a single character achieve:
... LIKE 'S'

Search is an area to investigate for sure.

The second offending query has to do with cache:

INSERT INTO cache (cid, data, created, expire, headers) VALUES ('S', 'S', N, N, 'S')

Why on earth would that take 13 seconds to execute? Locking is not the issue here, since it says it took 0.00 seconds for locking.

deekayen’s picture

I'll use the word might enough because I haven't actually benchmarked this analysis. It looks like some of this stuff could be sped up by simply indexing some of the joined and WHERE'ed columns.

It looks like the first one might just need an index key for just sid in search_index. That might speed up all occurences of n.nid = i.sid. Maybe also experiment with a key that indexes both nid and uid in the node table.

The insert into cache I'm guessing is from the sheer size of the strings. I can only think to gzipcompress the string at level 1 or something, but I don't know what that would do to CPU overhead. Otherwise, maybe (optionally) strip whitespace from the HTML, but that sometimes can mess up a template, ergo the optional part.

I don't know what the CVS tables look like, but I'd still look at adding indexes.

For the Count: 33 query, index status in the users table.

For the 4th to last query, index type in the access table.

For the second to last query, index status in the comments table.

moshe weitzman’s picture

INSERT INTO cache (cid, data, created, expire, headers) VALUES ('S', 'S', N, -N, 'S')

i suspect this one took so long because some other query was blocking. by itself, this one has no value, right?

jadwigo’s picture

I suppose the 'S' and N stand for a string and a number.. not a string containing S

merlinofchaos’s picture

It seems to me that there might be some value in setting up a test server with some 'test users' that we can control the activity on the site. The bulk of the users can just be some form of crawler, and a few of the users could actually do posts and comments basically randomly. By controlling the number and rate of those we can collect significant amounts of data without having to accumulate it over months, and more importantly, can easily test the scenarios we think may or may not help performance.

Very simple versions of these could be set up with just wget scripts, too; and these scripts can record how long the operations took from their perspective, giving us additional data points.

Gerhard Killesreiter’s picture

I've installed my extended version of the devel module on scratch.drupal.org. It is collecting data.

PunkCut25’s picture


$queryarray = array("EXPLAIN SELECT DISTINCT i.sid, i.type FROM search_index i INNER JOIN node n ON n.nid = i.sid INNER JOIN users u ON n.uid = u.uid WHERE n.status = N AND (i.word LIKE 'S')",
"EXPLAIN SELECT i.type, i.sid, SUM(i.score/t.count) AS score FROM search_index i INNER JOIN node n ON n.nid = i.sid INNER JOIN users u ON n.uid = u.uid INNER JOIN search_total t ON i.word = t.word WHERE n.status = N AND (i.word LIKE 'S') GROUP BY i.type, i.sid ORDER BY score DESC LIMIT N, N",
"EXPLAIN INSERT INTO cache (cid, data, created, expire, headers) VALUES ('S', 'S', N, N, 'S')",
"EXPLAIN SELECT DISTINCT i.sid, i.type FROM search_index i INNER JOIN node n ON n.nid = i.sid INNER JOIN users u ON n.uid = u.uid WHERE n.status = N AND (i.word LIKE 'S' OR i.word LIKE 'S')",
"EXPLAIN SELECT i.type, i.sid, SUM(i.score/t.count) AS score FROM search_index i INNER JOIN node n ON n.nid = i.sid INNER JOIN users u ON n.uid = u.uid INNER JOIN search_total t ON i.word = t.word WHERE n.status = N AND (i.word LIKE 'S' OR i.word LIKE 'S') GROUP BY i.type, i.sid ORDER BY score DESC LIMIT N, N",
"EXPLAIN INSERT INTO cache (cid, data, created, expire, headers) VALUES ('S', 'S', N, -N, 'S')",
"EXPLAIN DELETE FROM cache WHERE expire != N AND expire <= N",
"EXPLAIN SELECT COUNT(a.uid) AS hits, a.uid, u.name, a.hostname, SUM(a.timer) AS total, ac.aid FROM accesslog a LEFT JOIN access ac ON ac.type = 'S' AND LOWER(a.hostname) LIKE (ac.mask) LEFT JOIN users u ON a.uid = u.uid GROUP BY a.hostname ORDER BY hits DESC LIMIT N, N",
"EXPLAIN SELECT DISTINCT(n.nid), n.title, n.type, n.changed, n.uid, u.name, l.last_comment_timestamp AS last_post, l.comment_count FROM node n INNER JOIN node_comment_statistics l ON n.nid = l.nid INNER JOIN users u ON n.uid = u.uid LEFT JOIN comments c ON n.nid = c.nid AND (c.status = N OR c.status IS NULL) WHERE n.status = N AND (n.uid = N OR c.uid = N) ORDER BY last_post DESC LIMIT N, N",
"EXPLAIN SELECT DISTINCT i.sid, i.type FROM search_index i INNER JOIN node n ON n.nid = i.sid INNER JOIN users u ON n.uid = u.uid WHERE n.status = N AND (i.word LIKE 'S' OR i.word LIKE 'S' OR i.word LIKE 'S')",
"EXPLAIN SELECT i.type, i.sid, SUM(i.score/t.count) AS score FROM search_index i INNER JOIN node n ON n.nid = i.sid INNER JOIN users u ON n.uid = u.uid INNER JOIN search_total t ON i.word = t.word WHERE n.status = N AND (i.word LIKE 'S' OR i.word LIKE 'S' OR i.word LIKE 'S') GROUP BY i.type, i.sid ORDER BY score DESC LIMIT N, N",
"EXPLAIN SELECT u.*, s.* FROM users u INNER JOIN sessions s ON u.uid = s.uid WHERE s.sid = 'S' AND u.status < N LIMIT N, N",
"EXPLAIN DELETE FROM cache WHERE cid = 'S'",
"EXPLAIN SELECT DISTINCT m.*, r.*, u.name, u.uid FROM cvs_messages m INNER JOIN cvs_files f ON m.cid = f.cid INNER JOIN cvs_repositories r ON m.rid = r.rid INNER JOIN users u ON m.uid = u.uid ORDER BY m.created DESC LIMIT N, N",
"EXPLAIN DELETE FROM cache WHERE expire != N AND expire < N",
"EXPLAIN SELECT DISTINCT(n.nid), n.title, n.type, n.changed, n.uid, u.name, l.last_comment_timestamp AS last_post, l.comment_count FROM node n INNER JOIN users u ON n.uid = u.uid INNER JOIN node_comment_statistics l ON n.nid = l.nid WHERE n.status = N ORDER BY last_post DESC LIMIT N, N",
"EXPLAIN SELECT COUNT(a.uid) AS hits, a.uid, u.name, a.hostname, SUM(a.timer) AS total, ac.aid FROM accesslog a LEFT JOIN access ac ON ac.type = 'S' AND LOWER(a.hostname) LIKE (ac.mask) LEFT JOIN users u ON a.uid = u.uid GROUP BY a.hostname ORDER BY total DESC LIMIT N, N",
"EXPLAIN DELETE FROM cache WHERE cid LIKE 'S'",
"EXPLAIN SELECT COUNT(DISTINCT(n.nid)) FROM node n LEFT JOIN comments c ON n.nid = c.nid AND (c.status = N OR c.status IS NULL) WHERE n.status = N AND (n.uid = N OR c.uid = N)",
"EXPLAIN DELETE FROM search_index WHERE sid = N AND type = 'S'");

foreach ($queryarray as $query)
{
$output = db_query($query);
  while ($array = db_fetch_array($output)) 
  {
  print_r($array);
  }
}

PunkCut25’s picture


$queryarray = array("DESCRIBE search_index","DESCRIBE node","DESCRIBE users","DESCRIBE search_total","DESCRIBE cache","DESCRIBE accesslog","DESCRIBE access","DESCRIBE comments","DESCRIBE cvs_messages","DESCRIBE cvs_files","DESCRIBE cvs_repositories","DESCRIBE node_comment_statistics");

foreach ($queryarray as $query)
{
$output = db_query($query);
  while ($array = db_fetch_array($output)) 
  {
  print_r($array);
  }
}

Fool2’s picture

The insert into cache I'm guessing is from the sheer size of the strings. I can only think to gzipcompress the string at level 1 or something, but I don't know what that would do to CPU overhead. Otherwise, maybe (optionally) strip whitespace from the HTML, but that sometimes can mess up a template, ergo the optional part.

I am looking at the cache code and pages are gzipped.

See http://drupaldocs.org/api/head/function/page_set_cache

Cvbge’s picture

Isn't mysql slow query log loging only queries longer than 1 second? If it's the case, it's not very usefull IMO... We don't know how much time the "quick" queries take.

Dries’s picture

Cvbge: we're also measuring that. As I said above, there are many ways to obtain insight in the behavior of Drupal. This is just one of them, and a commonly used one.

killes@www.drop.org’s picture

I've installed my version of devel.module on scratch.drupal.org. The results are not representative as this site is only used by a few people to evaluate the upcoming version of Drupal.org. The test ran for about 12 hours.

Here are the top five queries sorted by average execution time:

1.305 SELECT DISTINCT(n.nid), n.title, n.type, n.changed, n.uid, u.name, l.last_comment_timestamp AS last_post, l.comment_count FROM node n INNER JOIN users u ON n.uid = u.uid INNER JOIN node_comment_statistics l ON n.nid = l.nid WHERE n.status = D

1.168 SELECT DISTINCT(n.nid), n.title, n.type, n.changed, n.uid, u.name, l.last_comment_timestamp AS last_post, l.comment_count FROM node n INNER JOIN node_comment_statistics l ON n.nid = l.nid INNER JOIN users u ON n.uid = u.uid LEFT JOIN comments c ON n.nid =

0.529 SELECT DISTINCT m.*, r.*, u.name, u.uid FROM cvs_messages m INNER JOIN cvs_repositories r ON m.rid = r.rid INNER JOIN users u ON m.uid = u.uid ORDER BY m.created DESC LIMIT D

0.431 CREATE TEMPORARY TABLE temp_search_sids SELECT i.type, i.sid, SUM(i.score * t.count) AS relevance, COUNT(*) AS matches FROM search_index i INNER JOIN search_total t ON i.word = t.word INNER JOIN node n ON n.nid = i.sid INNER JOIN term_node tn ON n.nid =

0.407 CREATE TEMPORARY TABLE temp_search_sids SELECT i.type, i.sid, SUM(i.score * t.count) AS relevance, COUNT(*) AS matches FROM search_index i INNER JOIN search_total t ON i.word = t.word INNER JOIN node n ON n.nid = i.sid INNER JOIN users u ON n.uid = u.uid

killes@www.drop.org’s picture

Here are thw top five by accumulated execution time:

31.54 SELECT DISTINCT(n.nid), n.title, n.type, n.changed, n.uid, u.name, l.last_comment_timestamp AS last_post, l.comment_count FROM node n INNER JOIN node_comment_statistics l ON n.nid = l.nid INNER JOIN users u ON n.uid = u.uid LEFT JOIN comments c ON n.nid =

24.041 CREATE TEMPORARY TABLE temp_search_sids SELECT i.type, i.sid, SUM(i.score * t.count) AS relevance, COUNT(*) AS matches FROM search_index i INNER JOIN search_total t ON i.word = t.word INNER JOIN node n ON n.nid = i.sid INNER JOIN users u ON n.uid = u.uid

12.274 SELECT DISTINCT(n.nid), n.title, n.type, n.changed, n.uid, u.name, l.last_comment_timestamp AS last_post, l.comment_count FROM node n INNER JOIN users u ON n.uid = u.uid INNER JOIN node_comment_statistics l ON n.nid = l.nid WHERE n.status =

11.066 CREATE TEMPORARY TABLE temp_search_results SELECT i.type, i.sid, F

10.244 SELECT dst FROM url_alias WHERE src = S

killes@www.drop.org’s picture

And here by number of executions:

35176 SELECT dst FROM url_alias WHERE src = S

10608 SELECT data, created, headers, expire FROM cache WHERE cid = S

3988 SELECT t.* FROM term_node r INNER JOIN term_data t ON r.tid = t.tid INNER JOIN vocabulary v ON t.vid = v.vid WHERE r.nid = D

3026 SELECT t.tid, t.* FROM term_hierarchy h, term_data t WHERE h.parent = t.tid AND h.tid = D

2998 SELECT * FROM term_data WHERE tid = D

2432 UNLOCK TABLES

I can provide more info if requested, but copy and pasting all this is really a pain.

Dries’s picture

By the way: scratch.drupal.org is using CVS HEAD.

webchick’s picture

Here is the output of EXPLAIN on some of those queries on Drupal.org:

Documentation on reading EXPLAIN statements.



Running MySQL 4.0.25

EXPLAIN SELECT DISTINCT i.sid, i.type FROM search_index i INNER JOIN node n ON n.nid = i.sid INNER JOIN users u ON n.uid = u.uid WHERE n.status = 0 AND (i.word LIKE 'search')


table type possible_keys key key_len ref rows Extra
i range sid,word word 50 4314 Using where; Using temporary
n eq_ref PRIMARY,uid,node_status_type PRIMARY 4 i.sid 1 Using where; Distinct
u eq_ref PRIMARY PRIMARY 4 n.uid 1 Using where; Using index; Distinct



EXPLAIN SELECT i.type, i.sid, SUM(i.score/t.count) AS score FROM search_index i INNER JOIN node n ON n.nid = i.sid INNER JOIN users u ON n.uid = u.uid INNER JOIN search_total t ON i.word = t.word WHERE n.status = 0 AND (i.word LIKE 'search') GROUP BY i.type, i.sid ORDER BY score DESC LIMIT 0, 10


table type possible_keys key key_len ref rows Extra
i range sid,word word 50 4314 Using where; Using temporary; Using filesort
n eq_ref PRIMARY,uid,node_status_type PRIMARY 4 i.sid 1 Using where
u eq_ref PRIMARY PRIMARY 4 n.uid 1 Using where; Using index
t eq_ref PRIMARY PRIMARY 50 i.word 1



EXPLAIN SELECT DISTINCT i.sid, i.type FROM search_index i INNER JOIN node n ON n.nid = i.sid INNER JOIN users u ON n.uid = u.uid WHERE n.status = 0 AND (i.word LIKE 'search1' OR i.word LIKE 'search2')


table type possible_keys key key_len ref rows Extra
i range sid,word word 50 2 Using where; Using temporary
n eq_ref PRIMARY,uid,node_status_type PRIMARY 4 i.sid 1 Using where; Distinct
u eq_ref PRIMARY PRIMARY 4 n.uid 1 Using where; Using index; Distinct



EXPLAIN SELECT i.type, i.sid, SUM(i.score/t.count) AS score FROM search_index i INNER JOIN node n ON n.nid = i.sid INNER JOIN users u ON n.uid = u.uid INNER JOIN search_total t ON i.word = t.word WHERE n.status = 0 AND (i.word LIKE 'search1' OR i.word LIKE 'search2') GROUP BY i.type, i.sid ORDER BY score DESC LIMIT 0, 10


table type possible_keys key key_len ref rows Extra
i range sid,word word 50 2 Using where; Using temporary; Using filesort
n eq_ref PRIMARY,uid,node_status_type PRIMARY 4 i.sid 1 Using where
u eq_ref PRIMARY PRIMARY 4 n.uid 1 Using where; Using index
t eq_ref PRIMARY PRIMARY 50 i.word 1



EXPLAIN SELECT COUNT(a.uid) AS hits, a.uid, u.name, a.hostname, SUM(a.timer) AS total, ac.aid FROM accesslog a LEFT JOIN access ac ON ac.type = '0' AND LOWER(a.hostname) LIKE (ac.mask) LEFT JOIN users u ON a.uid = u.uid GROUP BY a.hostname ORDER BY hits DESC LIMIT 0, 10


table type possible_keys key key_len ref rows Extra
a ALL 421049 Using temporary; Using filesort
ac ALL 40
u eq_ref PRIMARY PRIMARY 4 a.uid 1



EXPLAIN SELECT DISTINCT(n.nid), n.title, n.type, n.changed, n.uid, u.name, l.last_comment_timestamp AS last_post, l.comment_count FROM node n INNER JOIN node_comment_statistics l ON n.nid = l.nid INNER JOIN users u ON n.uid = u.uid LEFT JOIN comments c ON n.nid = c.nid AND (c.status = 0 OR c.status IS NULL) WHERE n.status = 0 AND (n.uid = 0 OR c.uid = 0) ORDER BY last_post DESC LIMIT 0, 10


table type possible_keys key key_len ref rows Extra
n ref PRIMARY,uid,node_status_type node_status_type 4 const 561 Using where; Using temporary; Using filesort
l eq_ref PRIMARY PRIMARY 4 n.nid 1
u eq_ref PRIMARY PRIMARY 4 n.uid 1 Using where
c ref lid lid 4 n.nid 4 Using where; Distinct



EXPLAIN SELECT DISTINCT i.sid, i.type FROM search_index i INNER JOIN node n ON n.nid = i.sid INNER JOIN users u ON n.uid = u.uid WHERE n.status = 0 AND (i.word LIKE 'search1' OR i.word LIKE 'search2' OR i.word LIKE 'search3')


table type possible_keys key key_len ref rows Extra
i range sid,word word 50 3 Using where; Using temporary
n eq_ref PRIMARY,uid,node_status_type PRIMARY 4 i.sid 1 Using where; Distinct
u eq_ref PRIMARY PRIMARY 4 n.uid 1 Using where; Using index; Distinct



EXPLAIN SELECT i.type, i.sid, SUM(i.score/t.count) AS score FROM search_index i INNER JOIN node n ON n.nid = i.sid INNER JOIN users u ON n.uid = u.uid INNER JOIN search_total t ON i.word = t.word WHERE n.status = 0 AND (i.word LIKE 'search1' OR i.word LIKE 'search2' OR i.word LIKE 'search3') GROUP BY i.type, i.sid ORDER BY score DESC LIMIT 0, 10


table type possible_keys key key_len ref rows Extra
i range sid,word word 50 3 Using where; Using temporary; Using filesort
n eq_ref PRIMARY,uid,node_status_type PRIMARY 4 i.sid 1 Using where
u eq_ref PRIMARY PRIMARY 4 n.uid 1 Using where; Using index
t eq_ref PRIMARY PRIMARY 50 i.word 1



EXPLAIN SELECT u.*, s.* FROM users u INNER JOIN sessions s ON u.uid = s.uid WHERE s.sid = '0' AND u.status < 1 LIMIT 0, 10


Comment
Impossible WHERE noticed after reading const tables



EXPLAIN SELECT DISTINCT m.*, r.*, u.name, u.uid FROM cvs_messages m INNER JOIN cvs_files f ON m.cid = f.cid INNER JOIN cvs_repositories r ON m.rid = r.rid INNER JOIN users u ON m.uid = u.uid ORDER BY m.created DESC LIMIT 0, 10


table type possible_keys key key_len ref rows Extra
m ALL PRIMARY,uid,cid,rid 24954 Using temporary; Using filesort
f ref cid cid 4 m.cid 3 Using index
r ALL PRIMARY 2 Using where
u eq_ref PRIMARY PRIMARY 4 m.uid 1



EXPLAIN SELECT DISTINCT(n.nid), n.title, n.type, n.changed, n.uid, u.name, l.last_comment_timestamp AS last_post, l.comment_count FROM node n INNER JOIN users u ON n.uid = u.uid INNER JOIN node_comment_statistics l ON n.nid = l.nid WHERE n.status = 0 ORDER BY last_post DESC LIMIT 0, 10


table type possible_keys key key_len ref rows Extra
n ref PRIMARY,uid,node_status_type node_status_type 4 const 561 Using where; Using temporary; Using filesort
u eq_ref PRIMARY PRIMARY 4 n.uid 1 Using where
l eq_ref PRIMARY PRIMARY 4 n.nid 1



EXPLAIN SELECT COUNT(a.uid) AS hits, a.uid, u.name, a.hostname, SUM(a.timer) AS total, ac.aid FROM accesslog a LEFT JOIN access ac ON ac.type = '0' AND LOWER(a.hostname) LIKE (ac.mask) LEFT JOIN users u ON a.uid = u.uid GROUP BY a.hostname ORDER BY total DESC LIMIT 0, 10


table type possible_keys key key_len ref rows Extra
a ALL 421049 Using temporary; Using filesort
ac ALL 40
u eq_ref PRIMARY PRIMARY 4 a.uid 1



EXPLAIN SELECT COUNT(DISTINCT(n.nid)) FROM node n LEFT JOIN comments c ON n.nid = c.nid AND (c.status = 0 OR c.status IS NULL) WHERE n.status = 0 AND (n.uid = 1 OR c.uid = 1)


table type possible_keys key key_len ref rows Extra
n ref uid,node_status_type node_status_type 4 const 561 Using where
c ref lid lid 4 n.nid 4 Using where



webchick’s picture

Ahem. Scratch that. ;) Try this instead:

http://drupal.org/node/47915

killes@www.drop.org’s picture

Today I've collected query data from drupal.org over the course of about 12 hours (I've stopped now). Everybody who wants to have the resulting sql files is welcome to mail me. The results aren't too different from Dries' but they include data from fast queries too. Every tenth page view was sampled at random for queries. The result is a bzipd sql file of 5 MB.

Amazon’s picture

Here is the cardinality for the tables to help assess the quality of the table indexes

Table Non_unique Key_name Seq_in_index Column_name Collation Cardinality Sub_part Packed Null Index_type Comment
search_index 1 sid 1 sid A 41226 NULL NULL BTREE
search_index 1 word 1 word A 204547 NULL NULL BTREE
search_index 1 fromsid 1 fromsid A 6036 NULL NULL BTREE
Table Non_unique Key_name Seq_in_index Column_name Collation Cardinality Sub_part Packed Null Index_type Comment
node 0 PRIMARY 1 nid A 46001 NULL NULL BTREE
node 1 uid 1 uid A 9200 NULL NULL BTREE
node 1 node_title_type 1 title A 46001 NULL NULL BTREE
node 1 node_title_type 2 type A 46001 4 NULL BTREE
node 1 node_moderate 1 moderate A 2 NULL NULL BTREE
node 1 node_promote_status 1 promote A 2 NULL NULL BTREE
node 1 node_promote_status 2 status A 5 NULL NULL BTREE
node 1 node_type 1 type A 11 10 NULL BTREE
node 1 created 1 created A 46001 NULL NULL BTREE
node 1 node_created 1 created A 46001 NULL NULL BTREE
node 1 node_changed 1 changed A 46001 NULL NULL BTREE
node 1 node_status_type 1 status A 2 NULL NULL BTREE
node 1 node_status_type 2 type A 20 NULL NULL BTREE
node 1 node_status_type 3 nid A 46001 NULL NULL BTREE
Table Non_unique Key_name Seq_in_index Column_name Collation Cardinality Sub_part Packed Null Index_type Comment
users 0 PRIMARY 1 uid A 48332 NULL NULL BTREE
users 0 name 1 name A 48332 NULL NULL BTREE
users 1 timestamp 1 access A 48332 NULL NULL BTREE
users 1 timestamp_2 1 access A 48332 NULL NULL BTREE
users 1 timestamp_3 1 access A 48332 NULL NULL BTREE
users 1 mail 1 mail A 48332 NULL NULL YES BTREE
Table Non_unique Key_name Seq_in_index Column_name Collation Cardinality Sub_part Packed Null Index_type Comment
search_total 0 PRIMARY 1 word A 186600 NULL NULL BTREE
Table Non_unique Key_name Seq_in_index Column_name Collation Cardinality Sub_part Packed Null Index_type Comment
cache 0 PRIMARY 1 cid A 31894 NULL NULL BTREE
cache 1 expire 1 expire A 4556 NULL NULL BTREE
Table Non_unique Key_name Seq_in_index Column_name Collation Cardinality Sub_part Packed Null Index_type Comment
accesslog 0 PRIMARY 1 aid A 432282 NULL NULL BTREE
accesslog 1 accesslog_timestamp 1 timestamp A 144094 NULL NULL BTREE
Table Non_unique Key_name Seq_in_index Column_name Collation Cardinality Sub_part Packed Null Index_type Comment
access 0 PRIMARY 1 aid A 40 NULL NULL BTREE
access 0 mask 1 mask A 40 NULL NULL BTREE
Table Non_unique Key_name Seq_in_index Column_name Collation Cardinality Sub_part Packed Null Index_type Comment
node_comment_statistics 0 PRIMARY 1 nid A 45984 NULL NULL BTREE
node_comment_statistics 1 node_comment_timestamp 1 last_comment_timestamp A 45984 NULL NULL BTREE
Table Non_unique Key_name Seq_in_index Column_name Collation Cardinality Sub_part Packed Null Index_type Comment
comments 0 PRIMARY 1 cid A 87553 NULL NULL BTREE
comments 1 lid 1 nid A 21888 NULL NULL BTREE
comments 1 timestamp 1 timestamp A 87553 NULL NULL BTREE
Table Non_unique Key_name Seq_in_index Column_name Collation Cardinality Sub_part Packed Null Index_type Comment
sessions 0 PRIMARY 1 sid A 512101 NULL NULL BTREE
sessions 1 uid 1 uid A 4492 NULL NULL BTREE
sessions 1 sid 1 sid A 51210 4 NULL BTREE
sessions 1 timestamp 1 timestamp A 256050 NULL NULL BTREE

Amazon’s picture

For people trying to decode the SHOW INDEX above read the following: http://sunsite.mff.cuni.cz/MIRRORS/ftp.mysql.com/doc/en/SHOW_DATABASE_IN...

Table Name of the table.
Non_unique 0 if the index can't contain duplicates.
Key_name Name of the index.
Seq_in_index Column sequence number in index, starting with 1.
Column_name Column name.
Collation How the column is sorted in the index. In MySQL, this can have values `A' (Ascending) or NULL (Not sorted).
Cardinality Number of unique values in the index. This is updated by running isamchk -a.
Sub_part Number of indexed characters if the column is only partly indexed. NULL if the entire key is indexed.
Null Contains 'YES' if the column may contain NULL.
Index_type Index method used.
Comment Various remarks. For now, it tells in MySQL < 4.0.2 whether index is FULLTEXT or not.

Amazon’s picture

Brief summary of what's going on in this thread.

1) Used the EXPLAIN capabilities of MySQL to help us understand what is being done with the slowest queries and how the queries intersect between tables. We have focused on the SELECT statements for now because EXPLAIN only works on SELECTs and not DELETE's or INSERT's. These queries are slow because they are being done across three or four tables.

2) Explains have revealed that we are using temporary tables which is caused when GROUPED BY and ORDERED BY statements do not match. It's highly likely that MySQL's memory settings for temporary tables are not high enough for these tables to be built in memory. If this is the case then we are building tables on the hard drive. The hard drive input output is the slowest bottle neck on the server and so it's no surprise that these queries have become the slowest performing. We could increase the temporary table settings but we would be much better off if we could optimize these queries and then use the memory more effective for other parts of the database.

3) Explains have revealed that we are using filesorts. Filesorts require a second pass of the tables which in effect give us a worst case scenario. Filesorts were optimized in MySQL 4.1, but we are using MySQL 4.0.25 on Drupal.org. If we upgraded Drupal.org's version of MySQL we would likely see a performance improvement. You can read more about filesorts here: http://dev.mysql.com/doc/internals/en/filesort.html.

4) We posted the table schema's so that we can make sure that queries done across tables have an index. We have also shown the index on that page so we can understand the cardinality, number of unique values, in the index.

5) Look at total number of rows in the query. The product of the rows values is instructive to tell us how large the query is. Ideally the rows in the second, third, or fourth tables is one. If the subsequent row values are not one, then we should look at the queries to see if we can use permutations of the queries which are more restrictive.

6) The keys field indicates which field we use for the index joining two tables. Many of Drupal's tables have multiple indexes so there is some ambiguity in how the query is executed. If we discover that the wrong field is being used for the index joining two tables we can qualify the query to be more explicit. We need to be careful in selecting a field for indexing. Some join clauses are significantly slower than others and we need to be careful in chosing the faster performance clause.

Amazon’s picture

7. A quick review of possible keys and keys for the table joins indicate that we need keys on access, accesslog. The lack of indexing there means there is 16 Million rows that must be checked. CVS_messages has 4 keys and none are selected for the join.

8. The join between cvs_files f and cvs_repositories r does not have a key for the join.

moshe weitzman’s picture

we INSERT into accesslog on every page view. adding index there has consequences. please benchmark before submitting for CVS review. without caution, we could optimize the statistics pages at the expense of every other page.

i suspect the best thing to do for stats is to write into a dedicated short term table and then move records every night or so into an archive table for reporting.

deekayen’s picture

is to write into a dedicated short term table and then move records every night or so into an archive table

I wouldn't mind loosing some stats on a reboot for a little extra speed:
http://dev.mysql.com/doc/refman/5.0/en/memory-storage-engine.html

Literally using the archive table type in MySQL might not be the best idea, though, as I suspect unzipping the data for viewing won't be a speed improvement to the queries we're talking about:
http://dev.mysql.com/doc/refman/5.0/en/archive-storage-engine.html

Amazon’s picture

http://dev.mysql.com/doc/refman/4.1/en/delete.html

Five of the slowest queries of are delete queries which are performed row by row. There are three parameters that we can consider using with DELETE: low_priority, ignore, quick, LIMIT row_count
-Low priority means it will do it when there isn't a lot of activity. This seems logical for something like the cache, but others should review.
-Ignore means ignore errors on delete and keep deleting. We should review and see if we have been getting delete errors.

DELETE QUICK is probably the most promising option. The QUICK modifier affects whether index leaves are merged for delete operations. The cache table and the search_index table do not make use of auto_increment in their indexes and therefore are good candidates for DELETE QUICK. THIS should be followed by an OPTIMIZE TABLE for the table the query was preformed on. If OPTIMIZE TABLE is to slow we can run myisamchk.

If want deletes to be quick but not complete we can use DELETE LIMIT row_count to ensure that the DELETE is quick as deletion is a per row operation and therefore linear in time to number of rows to be deleted.

Deletions have some relation to the Indicies so would be best is all the Indicies were in memory as set by set_buffer_size and determined by the total size of all the .MYI (MySQL Index files).

I think this gives us five variable to experiment with to speed up the delete queries.

Amazon’s picture

http://dev.mysql.com/doc/refman/4.1/en/insert.html

Two of the slowest queries are insert queries into the cache table. INSERT has several options that depend on how the cache mechanism works today. Two of the options are DELAYED, and LOW PRIORITY. If you use the DELAYED keyword, the server puts the row or rows to be inserted into a buffer, and the client issuing the INSERT DELAYED statement can then continue immediately. If you use the LOW_PRIORITY keyword, and make the cache table an InnoDB table, execution of the INSERT is delayed until no other clients are reading from the table.

Amazon’s picture

http://dev.mysql.com/doc/refman/4.1/en/insert-speed.html

bulk_insert_buffer_size should be tuned to help improve insertions. key_buffer_size should be tuned as well.

I am not sure how cache writes from Drupal.org appear to the DB, but I assume it appears as many clients. If the cache writes appear as multiple clients then INSERT DELAYED will improve performance.

If the cache does 5 or more inserts then it should lock and write. Perhaps Cache writes should be pooled and written in chunks rather than individually.

Amazon’s picture

Upon further review the top 4 queries related to search indexing and one additional query have all been rewritten in the latest versions of Drupal and therefore do not need to be optimized.

This leaves us to optimize INSERT, DELETE, and non-search SELECT queries.

Kieran

beginner’s picture

Component: web site » Other

Killes has done a lot of work on slow queries since the last follow up.
Is the above still relevant?

killes@www.drop.org’s picture

Status: Active » Closed (fixed)

Closing this. There are still a lot of slow queries, but they should be discussed in the context of their own modules.