This query does not scale to large sets:

SELECT n.*, u.name FROM node n INNER JOIN users u ON n.uid = u.uid ORDER BY n.changed DESC LIMIT 0, 50;
50 rows in set (3 min 22.24 sec)

It is because it has to do the join and a sort - David Strauss can explain it much better. In any case, on a 2 core relatively modern machine and 4G of RAM, this query takes 3 minutes with 4 million nodes.

I suggest that we reduce the query to this:

SELECT * FROM node ORDER BY changed DESC LIMIT 0, 50;
50 rows in set (0.14 sec)

And then we grab the user name with 50 separate queries while building the table. This guarantees a relatively flat performance no matter what the size of the node table.

I've submitted this for D6 but chances are it affects D7.

Comments

robertdouglass’s picture

Status: Active » Needs review
StatusFileSize
new1.69 KB

here's a sample patch that follows the above suggestion and removes the 3 minute page load time I was experiencing.

gábor hojtsy’s picture

It would make sense to document this then in the code.

gerhard killesreiter’s picture

mysql> explain SELECT n.*, u.name FROM node n INNER JOIN users u ON n.uid = u.uid ORDER BY n.changed DESC LIMIT 0, 50;
+----+-------------+-------+--------+------------------+--------------+---------+-----------------+--------+-------------+
| id | select_type | table | type   | possible_keys    | key          | key_len | ref             | rows   | Extra       |
+----+-------------+-------+--------+------------------+--------------+---------+-----------------+--------+-------------+
|  1 | SIMPLE      | n     | index  | uid,tracker_user | node_changed | 4       | NULL            | 356921 |             | 
|  1 | SIMPLE      | u     | eq_ref | PRIMARY          | PRIMARY      | 4       | drupalorg.n.uid |      1 | Using where | 
+----+-------------+-------+--------+------------------+--------------+---------+-----------------+--------+-------------+

I get this explain on a copy of d.o, the query itself executes almost instantly.

robertdouglass’s picture

Gerhard, have you added indexes that I don't seem to have?

mysql> explain SELECT n.*, u.name FROM node n INNER JOIN users u ON n.uid = u.uid ORDER BY n.changed DESC LIMIT 0, 50;
+----+-------------+-------+-------+---------------+------+---------+----------------------+-------+----------------------------------------------+
| id | select_type | table | type  | possible_keys | key  | key_len | ref                  | rows  | Extra                                        |
+----+-------------+-------+-------+---------------+------+---------+----------------------+-------+----------------------------------------------+
|  1 | SIMPLE      | u     | index | PRIMARY       | name | 182     | NULL                 |     2 | Using index; Using temporary; Using filesort | 
|  1 | SIMPLE      | n     | ref   | uid           | uid  | 4       | drupal.u.uid         | 77839 | Using where                                  | 
+----+-------------+-------+-------+---------------+------+---------+----------------------+-------+----------------------------------------------+
2 rows in set (0.00 sec)

robertdouglass’s picture

Does EXPLAIN return different results depending on the size of memory available? ie. if I truncate my table to 100K rows, maybe I'd get the same EXPLAIN you do?

robertdouglass’s picture

Sorry - I completely don't understand this. Why does your explain start from n as the first table and mine starts with u?

robertdouglass’s picture

mysql> show keys in node;
+-------+------------+---------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| Table | Non_unique | Key_name            | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment |
+-------+------------+---------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| node  |          0 | PRIMARY             |            1 | nid         | A         |     3901948 |     NULL | NULL   |      | BTREE      |         | 
| node  |          0 | vid                 |            1 | vid         | A         |     3901948 |     NULL | NULL   |      | BTREE      |         | 
| node  |          1 | node_changed        |            1 | changed     | A         |        NULL |     NULL | NULL   |      | BTREE      |         | 
| node  |          1 | node_created        |            1 | created     | A         |        NULL |     NULL | NULL   |      | BTREE      |         | 
| node  |          1 | node_moderate       |            1 | moderate    | A         |        NULL |     NULL | NULL   |      | BTREE      |         | 
| node  |          1 | node_promote_status |            1 | promote     | A         |        NULL |     NULL | NULL   |      | BTREE      |         | 
| node  |          1 | node_promote_status |            2 | status      | A         |        NULL |     NULL | NULL   |      | BTREE      |         | 
| node  |          1 | node_status_type    |            1 | status      | A         |        NULL |     NULL | NULL   |      | BTREE      |         | 
| node  |          1 | node_status_type    |            2 | type        | A         |        NULL |     NULL | NULL   |      | BTREE      |         | 
| node  |          1 | node_status_type    |            3 | nid         | A         |        NULL |     NULL | NULL   |      | BTREE      |         | 
| node  |          1 | node_title_type     |            1 | title       | A         |        NULL |     NULL | NULL   |      | BTREE      |         | 
| node  |          1 | node_title_type     |            2 | type        | A         |        NULL |        4 | NULL   |      | BTREE      |         | 
| node  |          1 | node_type           |            1 | type        | A         |        NULL |        4 | NULL   |      | BTREE      |         | 
| node  |          1 | uid                 |            1 | uid         | A         |        NULL |     NULL | NULL   |      | BTREE      |         | 
| node  |          1 | tnid                |            1 | tnid        | A         |        NULL |     NULL | NULL   |      | BTREE      |         | 
| node  |          1 | translate           |            1 | translate   | A         |        NULL |     NULL | NULL   |      | BTREE      |         | 
+-------+------------+---------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+

mysql> show keys in users;
+-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment |
+-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| users |          0 | PRIMARY  |            1 | uid         | A         |           2 |     NULL | NULL   |      | BTREE      |         | 
| users |          0 | name     |            1 | name        | A         |           2 |     NULL | NULL   |      | BTREE      |         | 
| users |          1 | access   |            1 | access      | A         |           2 |     NULL | NULL   |      | BTREE      |         | 
| users |          1 | created  |            1 | created     | A         |           2 |     NULL | NULL   |      | BTREE      |         | 
| users |          1 | mail     |            1 | mail        | A         |           2 |     NULL | NULL   | YES  | BTREE      |         | 
+-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
robertdouglass’s picture

mysqladmin Ver 8.41 Distrib 5.0.51a, for debian-linux-gnu on i486

Tresler’s picture

Took me a bit to find it online, Robert. Page 165 in this book http://www.scribd.com/doc/3380730/High-Performance-MySQL-Chapter-4 This is a case of the MySQL optimizer that is built into the MySQL preprocessor making 2 different decisions. I think that chapter (it's been a while since I read it' tells you how to override the MySQL optimizer to compare the same query /lookup/ on different architectures.

nicholasthompson’s picture

On www.thingy-ma-jig.co.uk (although I dont even have 1,000 nodes or users) this is my EXPLAIN/Key results... I'm running DRUPAL-6-9...

mysql> EXPLAIN SELECT n.*, u.name FROM node n INNER JOIN users u ON n.uid = u.uid ORDER BY n.changed DESC LIMIT 0, 50;
+----+-------------+-------+------+---------------+------+---------+------+------+---------------------------------+
| id | select_type | table | type | possible_keys | key  | key_len | ref  | rows | Extra                           |
+----+-------------+-------+------+---------------+------+---------+------+------+---------------------------------+
|  1 | SIMPLE      | n     | ALL  | uid           | NULL | NULL    | NULL |  186 | Using temporary; Using filesort |
|  1 | SIMPLE      | u     | ALL  | PRIMARY       | NULL | NULL    | NULL |    3 | Using where; Using join buffer  |
+----+-------------+-------+------+---------------+------+---------+------+------+---------------------------------+
2 rows in set (0.00 sec)

mysql> show keys in node;
+-------+------------+---------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| Table | Non_unique | Key_name            | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment |
+-------+------------+---------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| node  |          0 | PRIMARY             |            1 | nid         | A         |         186 |     NULL | NULL   |      | BTREE      |         |
| node  |          0 | vid                 |            1 | vid         | A         |         186 |     NULL | NULL   |      | BTREE      |         |
| node  |          1 | node_type           |            1 | type        | A         |           5 |        4 | NULL   |      | BTREE      |         |
| node  |          1 | uid                 |            1 | uid         | A         |           1 |     NULL | NULL   |      | BTREE      |         |
| node  |          1 | node_moderate       |            1 | moderate    | A         |           1 |     NULL | NULL   |      | BTREE      |         |
| node  |          1 | node_promote_status |            1 | promote     | A         |           2 |     NULL | NULL   |      | BTREE      |         |
| node  |          1 | node_promote_status |            2 | status      | A         |           3 |     NULL | NULL   |      | BTREE      |         |
| node  |          1 | node_created        |            1 | created     | A         |         186 |     NULL | NULL   |      | BTREE      |         |
| node  |          1 | node_changed        |            1 | changed     | A         |         186 |     NULL | NULL   |      | BTREE      |         |
| node  |          1 | node_status_type    |            1 | status      | A         |           2 |     NULL | NULL   |      | BTREE      |         |
| node  |          1 | node_status_type    |            2 | type        | A         |           7 |     NULL | NULL   |      | BTREE      |         |
| node  |          1 | node_status_type    |            3 | nid         | A         |         186 |     NULL | NULL   |      | BTREE      |         |
| node  |          1 | nid                 |            1 | nid         | A         |         186 |     NULL | NULL   |      | BTREE      |         |
| node  |          1 | tnid                |            1 | tnid        | A         |           1 |     NULL | NULL   |      | BTREE      |         |
| node  |          1 | translate           |            1 | translate   | A         |           1 |     NULL | NULL   |      | BTREE      |         |
| node  |          1 | node_title_type     |            1 | title       | A         |         186 |     NULL | NULL   |      | BTREE      |         |
| node  |          1 | node_title_type     |            2 | type        | A         |         186 |        4 | NULL   |      | BTREE      |         |
+-------+------------+---------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
17 rows in set (0.00 sec)

mysql> show keys in users;
+-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment |
+-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| users |          0 | PRIMARY  |            1 | uid         | A         |           3 |     NULL | NULL   |      | BTREE      |         |
| users |          0 | name     |            1 | name        | A         |           3 |     NULL | NULL   |      | BTREE      |         |
| users |          1 | access   |            1 | access      | A         |           3 |     NULL | NULL   |      | BTREE      |         |
| users |          1 | created  |            1 | created     | A         |           3 |     NULL | NULL   |      | BTREE      |         |
| users |          1 | mail     |            1 | mail        | A         |           3 |     NULL | NULL   | YES  | BTREE      |         |
+-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
5 rows in set (0.01 sec)

MySQL Info...

$ rpm -q mysql-server
mysql-server-5.1.31-1.el5.remi
misty3’s picture

Subscribed.

gerhard killesreiter’s picture

A difference might be that d.o runs innoDB.

mysql> show keys in node;
+-------+------------+----------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| Table | Non_unique | Key_name             | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment |
+-------+------------+----------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| node  |          0 | PRIMARY              |            1 | nid         | A         |      356921 |     NULL | NULL   |      | BTREE      |         | 
| node  |          0 | vid                  |            1 | vid         | A         |      356921 |     NULL | NULL   |      | BTREE      |         | 
| node  |          1 | uid                  |            1 | uid         | A         |       59486 |     NULL | NULL   |      | BTREE      |         | 
| node  |          1 | node_moderate        |            1 | moderate    | A         |           2 |     NULL | NULL   |      | BTREE      |         | 
| node  |          1 | node_promote_status  |            1 | promote     | A         |           2 |     NULL | NULL   |      | BTREE      |         | 
| node  |          1 | node_promote_status  |            2 | status      | A         |           4 |     NULL | NULL   |      | BTREE      |         | 
| node  |          1 | node_type            |            1 | type        | A         |          11 |       10 | NULL   |      | BTREE      |         | 
| node  |          1 | node_created         |            1 | created     | A         |      356921 |     NULL | NULL   |      | BTREE      |         | 
| node  |          1 | node_changed         |            1 | changed     | A         |      356921 |     NULL | NULL   |      | BTREE      |         | 
| node  |          1 | node_status_type     |            1 | status      | A         |           2 |     NULL | NULL   |      | BTREE      |         | 
| node  |          1 | node_status_type     |            2 | type        | A         |          22 |     NULL | NULL   |      | BTREE      |         | 
| node  |          1 | node_status_type     |            3 | nid         | A         |      356921 |     NULL | NULL   |      | BTREE      |         | 
| node  |          1 | nid                  |            1 | nid         | A         |      356921 |     NULL | NULL   |      | BTREE      |         | 
| node  |          1 | tracker_user         |            1 | uid         | A         |       59486 |     NULL | NULL   |      | BTREE      |         | 
| node  |          1 | tracker_user         |            2 | status      | A         |       59486 |     NULL | NULL   |      | BTREE      |         | 
| node  |          1 | tracker_user         |            3 | changed     | A         |      356921 |     NULL | NULL   |      | BTREE      |         | 
| node  |          1 | tracker_global       |            1 | status      | A         |           2 |     NULL | NULL   |      | BTREE      |         | 
| node  |          1 | tracker_global       |            2 | changed     | A         |      356921 |     NULL | NULL   |      | BTREE      |         | 
| node  |          1 | node_status_type_uid |            1 | status      | A         |           2 |     NULL | NULL   |      | BTREE      |         | 
| node  |          1 | node_status_type_uid |            2 | type        | A         |          22 |     NULL | NULL   |      | BTREE      |         | 
| node  |          1 | node_status_type_uid |            3 | nid         | A         |      356921 |     NULL | NULL   |      | BTREE      |         | 
| node  |          1 | node_status_type_uid |            4 | uid         | A         |      356921 |     NULL | NULL   |      | BTREE      |         | 
| node  |          1 | tnid                 |            1 | tnid        | A         |           1 |     NULL | NULL   |      | BTREE      |         | 
| node  |          1 | translate            |            1 | translate   | A         |           1 |     NULL | NULL   |      | BTREE      |         | 
| node  |          1 | node_title_type      |            1 | title       | A         |      356921 |     NULL | NULL   |      | BTREE      |         | 
| node  |          1 | node_title_type      |            2 | type        | A         |      356921 |        4 | NULL   |      | BTREE      |         | 
+-------+------------+----------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
26 rows in set (0.00 sec)
nicholasthompson’s picture

Changing it to a LEFT JOIN appears to be better

mysql> EXPLAIN SELECT n.*, u.name FROM node n LEFT JOIN users u ON n.uid = u.uid ORDER BY n.changed DESC LIMIT 0, 50;
+----+-------------+-------+--------+---------------+---------+---------+------------------+------+----------------+
| id | select_type | table | type   | possible_keys | key     | key_len | ref              | rows | Extra          |
+----+-------------+-------+--------+---------------+---------+---------+------------------+------+----------------+
|  1 | SIMPLE      | n     | ALL    | NULL          | NULL    | NULL    | NULL             |  186 | Using filesort |
|  1 | SIMPLE      | u     | eq_ref | PRIMARY       | PRIMARY | 4       | drupal_tmj.n.uid |    1 |                |
+----+-------------+-------+--------+---------------+---------+---------+------------------+------+----------------+
2 rows in set (0.00 sec)
catch’s picture

This patch grabs the nids then does node_load_multiple() for the rest of the fields (because it needs the full node object for other reasons). #301902: Allow more users to see the node admin page

If a version of that goes into D7 at some point, then this should stay as D6.

robertdouglass’s picture

StatusFileSize
new1.96 KB

This patch avoids duplicate user lookups.

Pedro Lozano’s picture

Could it be faster to store the first query results in a temporary table and do a join of that with the user table?

robertdouglass’s picture

If using a temp table were faster MySQL would be doing it already (might be). Temp tables have earned a bad rep in Drupal.

david strauss’s picture

It is because it has to do the join and a sort - David Strauss can explain it much better.

This is not one of those cases, at least in the sense I've talked about. Cases where you split WHERE and ORDER BY criteria over a JOIN are impossible to fully optimize on MySQL because there's no data structure you can traverse to narrow your working set quickly.

On the contrary, this seems to be a case of MySQL's optimizer being not perfectly smart, though I can't blame it too much for not making this optimization.

For the case of SELECT * FROM node ORDER BY changed DESC LIMIT 0, 50;, we have a fairly simple, well-optimized query type. MySQL can traverse the index on "changed", starting with the largest values, until it has 50 rows.

For the case of SELECT n.*, u.name FROM node n INNER JOIN users u ON n.uid = u.uid ORDER BY n.changed DESC LIMIT 0, 50;, we're dealing with something quite a bit more complex because INNER JOINs can increase or decrease the number of results rows returned for each row from the left-hand table. (No matching row in the right-hand table prunes the result row. Multiple matching rows in the right-hand table multiplies the result rows.)

Because we're JOINing on the primary key of the users table, we can assume that the INNER JOIN won't increase the number of results in the query. MySQL may know this too, but it turns out not to matter because of the pruning problem.

Despite the fact that *we* know that every node author has a corresponding row in the users table, MySQL doesn't know that, nor could it from the schema alone. Without the knowledge that every node row has exactly one corresponding user row, MySQL could try fetching rows probabilistically.

But MySQL *will not* do this sort of probabilistic query execution plan:

  1. Pull 50 rows from the node table.
  2. Attempt to match each node row against rows in the users table. Prune any results without matching user records.
  3. Do we have 50 rows? No? Then go back to step 1, and pull N more node rows.
  4. Prune any result rows after the first 50.
  5. Return the results.

The only alternative -- given that MySQL won't probabilistically pull rows and go back for more -- is for MySQL to get all the rows from the node table, perform the join, and then lop off the top 50.

This is why a LEFT JOIN could be much faster. With the knowledge that rows from the node table will *never* be pruned based on whether there's a user match (even though there always will be one in Drupal right now), MySQL can comfortably pull 50 rows from the node table, perform the LEFT JOIN against the users table, and return the results. There's no risk of having to go back and fetch more node rows. I'm not 100% sure MySQL makes this optimization, but the empirical results in #13 suggests that it does.

Other alternatives that could avoid hitting the INNER JOIN problem above:
* Selecting 50 node rows into a temporary table, and then joining the temp table against users. I think this is messy.
* Pulling just the node rows and fetching user records in separate queries. There's a patch above for this, but I don't like this approach because of the added latency of more queries.
* Fetching user records using correlated subqueries.

david strauss’s picture

Also, temp tables aren't inherently bad. They're just *often* bad because they result in MySQL writing huge working set tables out to disk. If your tables are big enough that temp tables no longer fit in RAM, MySQL switches to using the disk, and it's like getting kicked when you're already down.

It's not terrible to use temp tables as small, fixed-size working sets, which is what one person is suggesting above.

robertdouglass’s picture

Version: 6.x-dev » 7.x-dev
StatusFileSize
new835 bytes

So, once again, adding the right index solves this. Gerhard's EXPLAIN works better because Drupal.org is running tracker2, which apparently adds an index on {node} (uid, status, changed). Here's a patch for D7 that adds both the {node} language and the {node} (uid, status, changed) indexes. These two indexes make the content administration page nearly uniformly fast on my site with 100,000 nodes and my site with 4,000,000 nodes.

robertdouglass’s picture

StatusFileSize
new806 bytes

Here's a D6 version.

gábor hojtsy’s picture

Status: Needs review » Needs work

In both #21 and #20, this line does not seem to be right. It does not list actual field names:

db_add_index($ret, 'node', 'node_uid_status_changed', array('uid_status_changed'));

Did you mean 'uid', 'status', 'changed'?

damien tournoud’s picture

Status: Needs work » Needs review

The content filter uses conditions on the following columns on the node table: promote, status, sticky, type and language. What is the minimal set of indexes we need to optimize all those queries?

Is there a way to compute that cleanly? Here is my gut feeling:

With four filters, we need:
 - promote, status, sticky, type, language

With three filters, we also need:
 - promote, status, language
 - promote, sticky, language
 - promote, type, language
 - status, sticky, language
 - status, type, language
 - sticky, type, language

With two filters, we also need:
 - language, promote
 - language, status

With one filter, we don't need any additional index.
robertdouglass’s picture

Status: Needs review » Needs work
StatusFileSize
new808 bytes
new837 bytes

Oh, yes. Oops. Rerolled.

Here's the explain from above now with the new index:

mysql> explain SELECT n.*, u.name FROM node n INNER JOIN users u ON n.uid = u.uid ORDER BY n.changed DESC LIMIT 0, 50;
+----+-------------+-------+-------+------------------------+--------------------+---------+----------------------+-------+----------------------------------------------+
| id | select_type | table | type  | possible_keys          | key                | key_len | ref                  | rows  | Extra                                        |
+----+-------------+-------+-------+------------------------+--------------------+---------+----------------------+-------+----------------------------------------------+
|  1 | SIMPLE      | u     | index | PRIMARY                | name               | 182     | NULL                 |     5 | Using index; Using temporary; Using filesort | 
|  1 | SIMPLE      | n     | ref   | uid,uid_status_changed | uid_status_changed | 4       | drupal.u.uid         | 39020 | Using where                                  | 
+----+-------------+-------+-------+------------------------+--------------------+---------+----------------------+-------+----------------------------------------------+
2 rows in set (0.02 sec)
robertdouglass’s picture

But I'm not sure that this brings the performance I claimed. Note to self - turn query cache off when testing these things. I don't think the indexes help much. I haven't tried the temp table approach. The "many smaller queries" from #15 is the fastest option, imo.

robertdouglass’s picture

Here's the difference #15 makes:

## This one is without the uid, changed, status index
192252.64	1	pager_query	SELECT n.*, u.name FROM node n INNER JOIN users u ON n.uid = u.uid ORDER BY n.changed DESC LIMIT 0, 50
30717.69	1	pager_query SELECT COUNT(*) FROM node n INNER JOIN users u ON n.uid = u.uid
                                                     
## This one is WITH the uid, changed, status index
213864.14	1	pager_query	SELECT n.*, u.name FROM node n INNER JOIN users u ON n.uid = u.uid ORDER BY n.changed DESC LIMIT 0, 50
91895.82	1	pager_query	SELECT COUNT(*) FROM node n INNER JOIN users u ON n.uid = u.uid   
                                                  
## This one is with the patch from #15
257.96		1	pager_query	SELECT n.* FROM node n ORDER BY n.changed DESC LIMIT 0, 50
1.02		1	node_admin_nodes	SELECT name FROM users WHERE uid = 1
david strauss’s picture

It doesn't really make sense for an index on {node} (uid, status, changed) to help here, anyway. :-)

Such an index is useless unless uid is part of WHERE or ORDER BY criteria. MySQL cannot skip the first part of an index. Even the second part of the index wouldn't help because we're not filtering or sorting by published status. The third part of the index would be useful if MySQL could skip the first two parts, but we already have an index on the "changed" value.

The extra index on the node table for Drupal.org is not for Tracker 2. It was for an earlier attempt at a fast Tracker.

david strauss’s picture

@robertDouglass How does a LEFT JOIN affect your results?

mikey_p’s picture

Just checking, but won't this break the tablesort in D7?

robertdouglass’s picture

@David Strauss - LEFT JOIN didn't speed anything up. It appeared to slow things down at smaller table sizes, and was similar at larger. I haven't tried temporary tables yet.

robertdouglass’s picture

@mikey_p: Hmm. Yes. You'd no longer be able to sort the table on user. I'd argue that getting the page to load in under 1 minute is more important than sorting on user (you only get the beginning or the end of the list of users... table sorting is only marginally useful, imo). This will definitely make my initial approach more difficult to get committed, though.

catch’s picture

Can't see anyone actually using sort by user on this page to be honest. Did we add that in Drupal 7? If so having this load nice and snappy (and improving the filters) seems like a much bigger win.

david strauss’s picture

This would be the "correlated subquery" option:
SELECT n.*, (SELECT u.name FROM users u WHERE u.uid = n.uid) AS name FROM node n ORDER BY n.changed DESC LIMIT 0, 50

How does that test on your system?

mikey_p’s picture

To be honest, I could care less about the tablesort on author as well, I was just throwing that out for consideration. I would much rather get the filters fixed to be able to do all kinds of nifty filters (such as a filter on author with author name as an autocomplete field) than worry about keeping the tablesort for every field.

Also keep in mind, to get the better filters this should be ported to use a dynamic query as soon as #299267: Add "Extender" support to SELECT query builder Assigned to: Crell lands.

Will porting this to a dynamic query affect the work done optimizing this query here?

andreiashu’s picture

subscribing.
about #33: is mysql caching subqueries by default ?

david strauss’s picture

Will porting this to a dynamic query affect the work done optimizing this query here?

No.

is mysql caching subqueries by default ?

MySQL caches entire result sets for full (non-sub) queries. It does not care how you generated said results. I do not believe it performs any specific subquery-level caching.

nerkn’s picture

This query locks tables and other queries wait. Wonderful life saver for me. After applying patch every thing is fine now.

killes@www.drop.org’s picture

Title: Performance: admin/content/node doesn't scale to large sets of nodes. » Add indices to node table
Status: Needs work » Needs review

Changing title and status (to get a retest).

catch’s picture

Why can't we do one query to get the nodes, loop to get uids, then a query to get the names based on those uids? That'd save the lots of little queries complaint above.

robertdouglass’s picture

Title: Add indices to node table » Fix slow query on node table

I like catch's suggestion in #39. It seems like a sensible refinement of the approach in #15 which gave the best performance results. I'm changing the title again because it's not to be assumed that adding indexes is the right solution here.

david strauss’s picture

Why aren't we using the correlated subquery option?

damien tournoud’s picture

Other question: can we compare MySQL version of Drupal.org and the one Robert is using? That's the only reasonable explanation that I have.

david strauss’s picture

@Damien

For Drupal.org: Server version: 5.0.70-log Gentoo Linux mysql-5.0.70-r1

robertdouglass’s picture

sorry - that setup is no longer available to me.

noahterp’s picture

When testing on one server (Drupal 5.18, MySQL 5.1.x), adding a simple WHERE n.changed > 0 to that node.module line solved my problem. The original query would run ad-nauseam and simply lock-up MySQL. But adding that extraneous WHERE must have altered how MySQL sorts the temp table, and it runs the query in a few hundred milliseconds.

But, this solution worked on only one version of my site. Our development server copy with a different MySQL 5.0.x configuration was not affected -- although the original query only took 8-10 seconds there versus forever.

damien tournoud’s picture

Ok, apparently the MySQL query planner is being really dumb in that case. Because this query uses a INNER JOIN on a primary key, the planner can choose to execute it starting from node or starting from users. The problem that some people here are facing is that it sometimes chooses to start from users.

This is apparently the case (looking at #4) when the users table has a very low cardinality (in Robert example, there are only 2 users). It looks like the planner is not taking the cost of the sort into account.

What explains #45 is that when you are adding a condition on the node table, the query becomes asymetric again, and the planner has no choice then to start from the node table.

The max_seeks_for_key MySQL configuration parameter could help the planner make a smarter choice, but it looks like it's time to open a bug against MySQL.

Tresler’s picture

Before filing a mysql bug can we confirm that this is an optimizer problem by forcing the optimizer on a slow machine to see if it fixes the problem? To me that would confirm the issue as being in MySQL's pre-processor (supposedly better in MySQL 6...)

http://dev.mysql.com/doc/refman/5.0/en/index-hints.html

For that matter, I have no idea if index hints are built into the database abstraction layer? are they?

catch’s picture

Status: Needs review » Postponed (maintainer needs more info)

We now do a straight query to get nids, then node_load_multiple() in node_admin_nodes() since each row of that table is now checked for node_access(). Is there anything else needed here?

jbrown’s picture

Yes - you can still sort by author, which results in:

PDOException: SQLSTATE[42S22]: Column not found: 1054 Unknown column 'u.name' in 'order clause': SELECT n.nid AS nid FROM {node} n WHERE (n.type = :db_condition_placeholder_0) ORDER BY u.name ASC LIMIT 50 OFFSET 0; Array ( [:db_condition_placeholder_0] => page ) in PagerDefault->execute() (line 93 of /home/jonny/sites/drupal-7.x-dev/includes/pager.inc).

jbrown’s picture

Status: Postponed (maintainer needs more info) » Needs review
StatusFileSize
new774 bytes
catch’s picture

Title: Fix slow query on node table » admin/content still has sortable author column resulting in SQL errors
Priority: Normal » Critical
Status: Needs review » Reviewed & tested by the community

Ooh, nice find.

jbrown’s picture

can we get this committed and out of the way?

webchick’s picture

Status: Reviewed & tested by the community » Fixed

Ok.

Status: Fixed » Closed (fixed)
Issue tags: -Performance

Automatically closed -- issue fixed for 2 weeks with no activity.