admin/content still has sortable author column resulting in SQL errors [#373897]

This query does not scale to large sets:

SELECT n.*, u.name FROM node n INNER JOIN users u ON n.uid = u.uid ORDER BY n.changed DESC LIMIT 0, 50;
50 rows in set (3 min 22.24 sec)

It is because it has to do the join and a sort - David Strauss can explain it much better. In any case, on a 2 core relatively modern machine and 4G of RAM, this query takes 3 minutes with 4 million nodes.

I suggest that we reduce the query to this:

SELECT * FROM node ORDER BY changed DESC LIMIT 0, 50;
50 rows in set (0.14 sec)

And then we grab the user name with 50 separate queries while building the table. This guarantees a relatively flat performance no matter what the size of the node table.

I've submitted this for D6 but chances are it affects D7.

Comment	File	Size	Author
#50	remove-author-sort.patch	774 bytes	jbrown
#24	node-indexes-D7.patch	837 bytes	robertdouglass
#24	node-indexes-D6.patch	808 bytes	robertdouglass
#21	node-indexes-D6.patch	806 bytes	robertdouglass
#20	node-indexes.patch	835 bytes	robertdouglass
#15	content-admin.patch	1.96 KB	robertdouglass
#1	content-admin.patch	1.69 KB	robertdouglass

Comments

Comment #1

robertdouglass commented 13 February 2009 at 12:37

Status:

Active

» Needs review

Status	File	Size
new	content-admin.patch	1.69 KB

here's a sample patch that follows the above suggestion and removes the 3 minute page load time I was experiencing.

Comment #2

gábor hojtsy

he/him

Hungarian

Hungary

commented 13 February 2009 at 12:53

It would make sense to document this then in the code.

Comment #3

gerhard killesreiter commented 13 February 2009 at 12:55

mysql> explain SELECT n.*, u.name FROM node n INNER JOIN users u ON n.uid = u.uid ORDER BY n.changed DESC LIMIT 0, 50;
+----+-------------+-------+--------+------------------+--------------+---------+-----------------+--------+-------------+
| id | select_type | table | type   | possible_keys    | key          | key_len | ref             | rows   | Extra       |
+----+-------------+-------+--------+------------------+--------------+---------+-----------------+--------+-------------+
|  1 | SIMPLE      | n     | index  | uid,tracker_user | node_changed | 4       | NULL            | 356921 |             | 
|  1 | SIMPLE      | u     | eq_ref | PRIMARY          | PRIMARY      | 4       | drupalorg.n.uid |      1 | Using where | 
+----+-------------+-------+--------+------------------+--------------+---------+-----------------+--------+-------------+

I get this explain on a copy of d.o, the query itself executes almost instantly.

Comment #4

robertdouglass commented 13 February 2009 at 14:28

Gerhard, have you added indexes that I don't seem to have?

mysql> explain SELECT n.*, u.name FROM node n INNER JOIN users u ON n.uid = u.uid ORDER BY n.changed DESC LIMIT 0, 50;
+----+-------------+-------+-------+---------------+------+---------+----------------------+-------+----------------------------------------------+
| id | select_type | table | type  | possible_keys | key  | key_len | ref                  | rows  | Extra                                        |
+----+-------------+-------+-------+---------------+------+---------+----------------------+-------+----------------------------------------------+
|  1 | SIMPLE      | u     | index | PRIMARY       | name | 182     | NULL                 |     2 | Using index; Using temporary; Using filesort | 
|  1 | SIMPLE      | n     | ref   | uid           | uid  | 4       | drupal.u.uid         | 77839 | Using where                                  | 
+----+-------------+-------+-------+---------------+------+---------+----------------------+-------+----------------------------------------------+
2 rows in set (0.00 sec)

Comment #5

robertdouglass commented 13 February 2009 at 14:33

Does EXPLAIN return different results depending on the size of memory available? ie. if I truncate my table to 100K rows, maybe I'd get the same EXPLAIN you do?

Comment #6

robertdouglass commented 13 February 2009 at 14:34

Sorry - I completely don't understand this. Why does your explain start from n as the first table and mine starts with u?

Comment #7

robertdouglass commented 13 February 2009 at 14:35

mysql> show keys in node;
+-------+------------+---------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| Table | Non_unique | Key_name            | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment |
+-------+------------+---------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| node  |          0 | PRIMARY             |            1 | nid         | A         |     3901948 |     NULL | NULL   |      | BTREE      |         | 
| node  |          0 | vid                 |            1 | vid         | A         |     3901948 |     NULL | NULL   |      | BTREE      |         | 
| node  |          1 | node_changed        |            1 | changed     | A         |        NULL |     NULL | NULL   |      | BTREE      |         | 
| node  |          1 | node_created        |            1 | created     | A         |        NULL |     NULL | NULL   |      | BTREE      |         | 
| node  |          1 | node_moderate       |            1 | moderate    | A         |        NULL |     NULL | NULL   |      | BTREE      |         | 
| node  |          1 | node_promote_status |            1 | promote     | A         |        NULL |     NULL | NULL   |      | BTREE      |         | 
| node  |          1 | node_promote_status |            2 | status      | A         |        NULL |     NULL | NULL   |      | BTREE      |         | 
| node  |          1 | node_status_type    |            1 | status      | A         |        NULL |     NULL | NULL   |      | BTREE      |         | 
| node  |          1 | node_status_type    |            2 | type        | A         |        NULL |     NULL | NULL   |      | BTREE      |         | 
| node  |          1 | node_status_type    |            3 | nid         | A         |        NULL |     NULL | NULL   |      | BTREE      |         | 
| node  |          1 | node_title_type     |            1 | title       | A         |        NULL |     NULL | NULL   |      | BTREE      |         | 
| node  |          1 | node_title_type     |            2 | type        | A         |        NULL |        4 | NULL   |      | BTREE      |         | 
| node  |          1 | node_type           |            1 | type        | A         |        NULL |        4 | NULL   |      | BTREE      |         | 
| node  |          1 | uid                 |            1 | uid         | A         |        NULL |     NULL | NULL   |      | BTREE      |         | 
| node  |          1 | tnid                |            1 | tnid        | A         |        NULL |     NULL | NULL   |      | BTREE      |         | 
| node  |          1 | translate           |            1 | translate   | A         |        NULL |     NULL | NULL   |      | BTREE      |         | 
+-------+------------+---------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+

mysql> show keys in users;
+-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment |
+-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| users |          0 | PRIMARY  |            1 | uid         | A         |           2 |     NULL | NULL   |      | BTREE      |         | 
| users |          0 | name     |            1 | name        | A         |           2 |     NULL | NULL   |      | BTREE      |         | 
| users |          1 | access   |            1 | access      | A         |           2 |     NULL | NULL   |      | BTREE      |         | 
| users |          1 | created  |            1 | created     | A         |           2 |     NULL | NULL   |      | BTREE      |         | 
| users |          1 | mail     |            1 | mail        | A         |           2 |     NULL | NULL   | YES  | BTREE      |         | 
+-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+

Comment #8

robertdouglass commented 13 February 2009 at 14:38

mysqladmin Ver 8.41 Distrib 5.0.51a, for debian-linux-gnu on i486

Comment #9

Tresler commented 13 February 2009 at 14:56

Took me a bit to find it online, Robert. Page 165 in this book http://www.scribd.com/doc/3380730/High-Performance-MySQL-Chapter-4 This is a case of the MySQL optimizer that is built into the MySQL preprocessor making 2 different decisions. I think that chapter (it's been a while since I read it' tells you how to override the MySQL optimizer to compare the same query /lookup/ on different architectures.

Comment #10

nicholasthompson

English

commented 13 February 2009 at 14:57

On www.thingy-ma-jig.co.uk (although I dont even have 1,000 nodes or users) this is my EXPLAIN/Key results... I'm running DRUPAL-6-9...

mysql> EXPLAIN SELECT n.*, u.name FROM node n INNER JOIN users u ON n.uid = u.uid ORDER BY n.changed DESC LIMIT 0, 50;
+----+-------------+-------+------+---------------+------+---------+------+------+---------------------------------+
| id | select_type | table | type | possible_keys | key  | key_len | ref  | rows | Extra                           |
+----+-------------+-------+------+---------------+------+---------+------+------+---------------------------------+
|  1 | SIMPLE      | n     | ALL  | uid           | NULL | NULL    | NULL |  186 | Using temporary; Using filesort |
|  1 | SIMPLE      | u     | ALL  | PRIMARY       | NULL | NULL    | NULL |    3 | Using where; Using join buffer  |
+----+-------------+-------+------+---------------+------+---------+------+------+---------------------------------+
2 rows in set (0.00 sec)

mysql> show keys in node;
+-------+------------+---------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| Table | Non_unique | Key_name            | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment |
+-------+------------+---------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| node  |          0 | PRIMARY             |            1 | nid         | A         |         186 |     NULL | NULL   |      | BTREE      |         |
| node  |          0 | vid                 |            1 | vid         | A         |         186 |     NULL | NULL   |      | BTREE      |         |
| node  |          1 | node_type           |            1 | type        | A         |           5 |        4 | NULL   |      | BTREE      |         |
| node  |          1 | uid                 |            1 | uid         | A         |           1 |     NULL | NULL   |      | BTREE      |         |
| node  |          1 | node_moderate       |            1 | moderate    | A         |           1 |     NULL | NULL   |      | BTREE      |         |
| node  |          1 | node_promote_status |            1 | promote     | A         |           2 |     NULL | NULL   |      | BTREE      |         |
| node  |          1 | node_promote_status |            2 | status      | A         |           3 |     NULL | NULL   |      | BTREE      |         |
| node  |          1 | node_created        |            1 | created     | A         |         186 |     NULL | NULL   |      | BTREE      |         |
| node  |          1 | node_changed        |            1 | changed     | A         |         186 |     NULL | NULL   |      | BTREE      |         |
| node  |          1 | node_status_type    |            1 | status      | A         |           2 |     NULL | NULL   |      | BTREE      |         |
| node  |          1 | node_status_type    |            2 | type        | A         |           7 |     NULL | NULL   |      | BTREE      |         |
| node  |          1 | node_status_type    |            3 | nid         | A         |         186 |     NULL | NULL   |      | BTREE      |         |
| node  |          1 | nid                 |            1 | nid         | A         |         186 |     NULL | NULL   |      | BTREE      |         |
| node  |          1 | tnid                |            1 | tnid        | A         |           1 |     NULL | NULL   |      | BTREE      |         |
| node  |          1 | translate           |            1 | translate   | A         |           1 |     NULL | NULL   |      | BTREE      |         |
| node  |          1 | node_title_type     |            1 | title       | A         |         186 |     NULL | NULL   |      | BTREE      |         |
| node  |          1 | node_title_type     |            2 | type        | A         |         186 |        4 | NULL   |      | BTREE      |         |
+-------+------------+---------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
17 rows in set (0.00 sec)

mysql> show keys in users;
+-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment |
+-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| users |          0 | PRIMARY  |            1 | uid         | A         |           3 |     NULL | NULL   |      | BTREE      |         |
| users |          0 | name     |            1 | name        | A         |           3 |     NULL | NULL   |      | BTREE      |         |
| users |          1 | access   |            1 | access      | A         |           3 |     NULL | NULL   |      | BTREE      |         |
| users |          1 | created  |            1 | created     | A         |           3 |     NULL | NULL   |      | BTREE      |         |
| users |          1 | mail     |            1 | mail        | A         |           3 |     NULL | NULL   | YES  | BTREE      |         |
+-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
5 rows in set (0.01 sec)

MySQL Info...

$ rpm -q mysql-server
mysql-server-5.1.31-1.el5.remi

Comment #11

misty3 commented 13 February 2009 at 15:05

Subscribed.

Comment #12

gerhard killesreiter commented 13 February 2009 at 15:13

A difference might be that d.o runs innoDB.

mysql> show keys in node;
+-------+------------+----------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| Table | Non_unique | Key_name             | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment |
+-------+------------+----------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| node  |          0 | PRIMARY              |            1 | nid         | A         |      356921 |     NULL | NULL   |      | BTREE      |         | 
| node  |          0 | vid                  |            1 | vid         | A         |      356921 |     NULL | NULL   |      | BTREE      |         | 
| node  |          1 | uid                  |            1 | uid         | A         |       59486 |     NULL | NULL   |      | BTREE      |         | 
| node  |          1 | node_moderate        |            1 | moderate    | A         |           2 |     NULL | NULL   |      | BTREE      |         | 
| node  |          1 | node_promote_status  |            1 | promote     | A         |           2 |     NULL | NULL   |      | BTREE      |         | 
| node  |          1 | node_promote_status  |            2 | status      | A         |           4 |     NULL | NULL   |      | BTREE      |         | 
| node  |          1 | node_type            |            1 | type        | A         |          11 |       10 | NULL   |      | BTREE      |         | 
| node  |          1 | node_created         |            1 | created     | A         |      356921 |     NULL | NULL   |      | BTREE      |         | 
| node  |          1 | node_changed         |            1 | changed     | A         |      356921 |     NULL | NULL   |      | BTREE      |         | 
| node  |          1 | node_status_type     |            1 | status      | A         |           2 |     NULL | NULL   |      | BTREE      |         | 
| node  |          1 | node_status_type     |            2 | type        | A         |          22 |     NULL | NULL   |      | BTREE      |         | 
| node  |          1 | node_status_type     |            3 | nid         | A         |      356921 |     NULL | NULL   |      | BTREE      |         | 
| node  |          1 | nid                  |            1 | nid         | A         |      356921 |     NULL | NULL   |      | BTREE      |         | 
| node  |          1 | tracker_user         |            1 | uid         | A         |       59486 |     NULL | NULL   |      | BTREE      |         | 
| node  |          1 | tracker_user         |            2 | status      | A         |       59486 |     NULL | NULL   |      | BTREE      |         | 
| node  |          1 | tracker_user         |            3 | changed     | A         |      356921 |     NULL | NULL   |      | BTREE      |         | 
| node  |          1 | tracker_global       |            1 | status      | A         |           2 |     NULL | NULL   |      | BTREE      |         | 
| node  |          1 | tracker_global       |            2 | changed     | A         |      356921 |     NULL | NULL   |      | BTREE      |         | 
| node  |          1 | node_status_type_uid |            1 | status      | A         |           2 |     NULL | NULL   |      | BTREE      |         | 
| node  |          1 | node_status_type_uid |            2 | type        | A         |          22 |     NULL | NULL   |      | BTREE      |         | 
| node  |          1 | node_status_type_uid |            3 | nid         | A         |      356921 |     NULL | NULL   |      | BTREE      |         | 
| node  |          1 | node_status_type_uid |            4 | uid         | A         |      356921 |     NULL | NULL   |      | BTREE      |         | 
| node  |          1 | tnid                 |            1 | tnid        | A         |           1 |     NULL | NULL   |      | BTREE      |         | 
| node  |          1 | translate            |            1 | translate   | A         |           1 |     NULL | NULL   |      | BTREE      |         | 
| node  |          1 | node_title_type      |            1 | title       | A         |      356921 |     NULL | NULL   |      | BTREE      |         | 
| node  |          1 | node_title_type      |            2 | type        | A         |      356921 |        4 | NULL   |      | BTREE      |         | 
+-------+------------+----------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
26 rows in set (0.00 sec)

Comment #13

nicholasthompson

English

commented 13 February 2009 at 15:16

Changing it to a LEFT JOIN appears to be better

mysql> EXPLAIN SELECT n.*, u.name FROM node n LEFT JOIN users u ON n.uid = u.uid ORDER BY n.changed DESC LIMIT 0, 50;
+----+-------------+-------+--------+---------------+---------+---------+------------------+------+----------------+
| id | select_type | table | type   | possible_keys | key     | key_len | ref              | rows | Extra          |
+----+-------------+-------+--------+---------------+---------+---------+------------------+------+----------------+
|  1 | SIMPLE      | n     | ALL    | NULL          | NULL    | NULL    | NULL             |  186 | Using filesort |
|  1 | SIMPLE      | u     | eq_ref | PRIMARY       | PRIMARY | 4       | drupal_tmj.n.uid |    1 |                |
+----+-------------+-------+--------+---------------+---------+---------+------------------+------+----------------+
2 rows in set (0.00 sec)

Comment #14

catch

he/him

English

commented 13 February 2009 at 15:29

This patch grabs the nids then does node_load_multiple() for the rest of the fields (because it needs the full node object for other reasons). #301902: Allow more users to see the node admin page

If a version of that goes into D7 at some point, then this should stay as D6.

Comment #15

robertdouglass commented 13 February 2009 at 17:34

Status	File	Size
new	content-admin.patch	1.96 KB

This patch avoids duplicate user lookups.

Comment #16

Pedro Lozano commented 14 February 2009 at 00:32

Could it be faster to store the first query results in a temporary table and do a join of that with the user table?

Comment #17

robertdouglass commented 14 February 2009 at 09:54

If using a temp table were faster MySQL would be doing it already (might be). Temp tables have earned a bad rep in Drupal.

Comment #18

david strauss

he/him

commented 15 February 2009 at 23:46

It is because it has to do the join and a sort - David Strauss can explain it much better.

This is not one of those cases, at least in the sense I've talked about. Cases where you split WHERE and ORDER BY criteria over a JOIN are impossible to fully optimize on MySQL because there's no data structure you can traverse to narrow your working set quickly.

On the contrary, this seems to be a case of MySQL's optimizer being not perfectly smart, though I can't blame it too much for not making this optimization.

For the case of SELECT * FROM node ORDER BY changed DESC LIMIT 0, 50;, we have a fairly simple, well-optimized query type. MySQL can traverse the index on "changed", starting with the largest values, until it has 50 rows.

For the case of SELECT n.*, u.name FROM node n INNER JOIN users u ON n.uid = u.uid ORDER BY n.changed DESC LIMIT 0, 50;, we're dealing with something quite a bit more complex because INNER JOINs can increase or decrease the number of results rows returned for each row from the left-hand table. (No matching row in the right-hand table prunes the result row. Multiple matching rows in the right-hand table multiplies the result rows.)

Because we're JOINing on the primary key of the users table, we can assume that the INNER JOIN won't increase the number of results in the query. MySQL may know this too, but it turns out not to matter because of the pruning problem.

Despite the fact that *we* know that every node author has a corresponding row in the users table, MySQL doesn't know that, nor could it from the schema alone. Without the knowledge that every node row has exactly one corresponding user row, MySQL could try fetching rows probabilistically.

But MySQL *will not* do this sort of probabilistic query execution plan:

Pull 50 rows from the node table.
Attempt to match each node row against rows in the users table. Prune any results without matching user records.
Do we have 50 rows? No? Then go back to step 1, and pull N more node rows.
Prune any result rows after the first 50.
Return the results.

The only alternative -- given that MySQL won't probabilistically pull rows and go back for more -- is for MySQL to get all the rows from the node table, perform the join, and then lop off the top 50.

This is why a LEFT JOIN could be much faster. With the knowledge that rows from the node table will *never* be pruned based on whether there's a user match (even though there always will be one in Drupal right now), MySQL can comfortably pull 50 rows from the node table, perform the LEFT JOIN against the users table, and return the results. There's no risk of having to go back and fetch more node rows. I'm not 100% sure MySQL makes this optimization, but the empirical results in #13 suggests that it does.

Other alternatives that could avoid hitting the INNER JOIN problem above:
* Selecting 50 node rows into a temporary table, and then joining the temp table against users. I think this is messy.
* Pulling just the node rows and fetching user records in separate queries. There's a patch above for this, but I don't like this approach because of the added latency of more queries.
* Fetching user records using correlated subqueries.

Comment #19

david strauss

he/him

commented 15 February 2009 at 23:54

Also, temp tables aren't inherently bad. They're just *often* bad because they result in MySQL writing huge working set tables out to disk. If your tables are big enough that temp tables no longer fit in RAM, MySQL switches to using the disk, and it's like getting kicked when you're already down.

It's not terrible to use temp tables as small, fixed-size working sets, which is what one person is suggesting above.

Comment #20

robertdouglass commented 16 February 2009 at 12:14

Version:

6.x-dev

» 7.x-dev

Status	File	Size
new	node-indexes.patch	835 bytes

So, once again, adding the right index solves this. Gerhard's EXPLAIN works better because Drupal.org is running tracker2, which apparently adds an index on {node} (uid, status, changed). Here's a patch for D7 that adds both the {node} language and the {node} (uid, status, changed) indexes. These two indexes make the content administration page nearly uniformly fast on my site with 100,000 nodes and my site with 4,000,000 nodes.

Comment #21

robertdouglass commented 16 February 2009 at 12:18

Status	File	Size
new	node-indexes-D6.patch	806 bytes

Here's a D6 version.

Comment #22

gábor hojtsy

he/him

Hungarian

Hungary

commented 16 February 2009 at 12:23

Status:

Needs review

» Needs work

In both #21 and #20, this line does not seem to be right. It does not list actual field names:

db_add_index($ret, 'node', 'node_uid_status_changed', array('uid_status_changed'));

Did you mean 'uid', 'status', 'changed'?

Comment #23

damien tournoud commented 16 February 2009 at 12:32

Status:

Needs work

» Needs review

The content filter uses conditions on the following columns on the node table: promote, status, sticky, type and language. What is the minimal set of indexes we need to optimize all those queries?

Is there a way to compute that cleanly? Here is my gut feeling:

With four filters, we need:
 - promote, status, sticky, type, language

With three filters, we also need:
 - promote, status, language
 - promote, sticky, language
 - promote, type, language
 - status, sticky, language
 - status, type, language
 - sticky, type, language

With two filters, we also need:
 - language, promote
 - language, status

With one filter, we don't need any additional index.

Comment #24

robertdouglass commented 16 February 2009 at 12:33

Status:

Needs review

» Needs work

Status	File	Size
new	node-indexes-D6.patch	808 bytes
new	node-indexes-D7.patch	837 bytes

Oh, yes. Oops. Rerolled.

Here's the explain from above now with the new index:

mysql> explain SELECT n.*, u.name FROM node n INNER JOIN users u ON n.uid = u.uid ORDER BY n.changed DESC LIMIT 0, 50;
+----+-------------+-------+-------+------------------------+--------------------+---------+----------------------+-------+----------------------------------------------+
| id | select_type | table | type  | possible_keys          | key                | key_len | ref                  | rows  | Extra                                        |
+----+-------------+-------+-------+------------------------+--------------------+---------+----------------------+-------+----------------------------------------------+
|  1 | SIMPLE      | u     | index | PRIMARY                | name               | 182     | NULL                 |     5 | Using index; Using temporary; Using filesort | 
|  1 | SIMPLE      | n     | ref   | uid,uid_status_changed | uid_status_changed | 4       | drupal.u.uid         | 39020 | Using where                                  | 
+----+-------------+-------+-------+------------------------+--------------------+---------+----------------------+-------+----------------------------------------------+
2 rows in set (0.02 sec)

Comment #25

robertdouglass commented 16 February 2009 at 12:39

But I'm not sure that this brings the performance I claimed. Note to self - turn query cache off when testing these things. I don't think the indexes help much. I haven't tried the temp table approach. The "many smaller queries" from #15 is the fastest option, imo.

Comment #26

robertdouglass commented 16 February 2009 at 12:46

Here's the difference #15 makes:

## This one is without the uid, changed, status index
192252.64	1	pager_query	SELECT n.*, u.name FROM node n INNER JOIN users u ON n.uid = u.uid ORDER BY n.changed DESC LIMIT 0, 50
30717.69	1	pager_query SELECT COUNT(*) FROM node n INNER JOIN users u ON n.uid = u.uid
                                                     
## This one is WITH the uid, changed, status index
213864.14	1	pager_query	SELECT n.*, u.name FROM node n INNER JOIN users u ON n.uid = u.uid ORDER BY n.changed DESC LIMIT 0, 50
91895.82	1	pager_query	SELECT COUNT(*) FROM node n INNER JOIN users u ON n.uid = u.uid   
                                                  
## This one is with the patch from #15
257.96		1	pager_query	SELECT n.* FROM node n ORDER BY n.changed DESC LIMIT 0, 50
1.02		1	node_admin_nodes	SELECT name FROM users WHERE uid = 1

Comment #27

david strauss

he/him

commented 16 February 2009 at 16:21

It doesn't really make sense for an index on {node} (uid, status, changed) to help here, anyway. :-)

Such an index is useless unless uid is part of WHERE or ORDER BY criteria. MySQL cannot skip the first part of an index. Even the second part of the index wouldn't help because we're not filtering or sorting by published status. The third part of the index would be useful if MySQL could skip the first two parts, but we already have an index on the "changed" value.

The extra index on the node table for Drupal.org is not for Tracker 2. It was for an earlier attempt at a fast Tracker.

Comment #28

david strauss

he/him

commented 16 February 2009 at 16:27

@robertDouglass How does a LEFT JOIN affect your results?

Comment #29

mikey_p commented 16 February 2009 at 19:55

Just checking, but won't this break the tablesort in D7?

Comment #30

robertdouglass commented 16 February 2009 at 23:17

@David Strauss - LEFT JOIN didn't speed anything up. It appeared to slow things down at smaller table sizes, and was similar at larger. I haven't tried temporary tables yet.

Comment #31

robertdouglass commented 16 February 2009 at 23:22

@mikey_p: Hmm. Yes. You'd no longer be able to sort the table on user. I'd argue that getting the page to load in under 1 minute is more important than sorting on user (you only get the beginning or the end of the list of users... table sorting is only marginally useful, imo). This will definitely make my initial approach more difficult to get committed, though.

Comment #32

catch

he/him

English

commented 16 February 2009 at 23:37

Can't see anyone actually using sort by user on this page to be honest. Did we add that in Drupal 7? If so having this load nice and snappy (and improving the filters) seems like a much bigger win.

Comment #33

david strauss

he/him

commented 17 February 2009 at 04:02

This would be the "correlated subquery" option:
SELECT n.*, (SELECT u.name FROM users u WHERE u.uid = n.uid) AS name FROM node n ORDER BY n.changed DESC LIMIT 0, 50

How does that test on your system?

Comment #34

mikey_p commented 17 February 2009 at 04:53

To be honest, I could care less about the tablesort on author as well, I was just throwing that out for consideration. I would much rather get the filters fixed to be able to do all kinds of nifty filters (such as a filter on author with author name as an autocomplete field) than worry about keeping the tablesort for every field.

Also keep in mind, to get the better filters this should be ported to use a dynamic query as soon as #299267: Add "Extender" support to SELECT query builder Assigned to: Crell lands.

Will porting this to a dynamic query affect the work done optimizing this query here?

Comment #35

andreiashu commented 17 February 2009 at 05:00

subscribing.
about #33: is mysql caching subqueries by default ?

Comment #36

david strauss

he/him

commented 17 February 2009 at 05:35

Will porting this to a dynamic query affect the work done optimizing this query here?

No.

is mysql caching subqueries by default ?

MySQL caches entire result sets for full (non-sub) queries. It does not care how you generated said results. I do not believe it performs any specific subquery-level caching.

Comment #37

nerkn commented 29 March 2009 at 01:09

This query locks tables and other queries wait. Wonderful life saver for me. After applying patch every thing is fine now.

Comment #38

killes@www.drop.org commented 15 June 2009 at 12:19

Title:	Performance: admin/content/node doesn't scale to large sets of nodes.	» Add indices to node table
Status:	Needs work	» Needs review

Changing title and status (to get a retest).

Comment #39

catch

he/him

English

commented 15 June 2009 at 13:48

Why can't we do one query to get the nodes, loop to get uids, then a query to get the names based on those uids? That'd save the lots of little queries complaint above.

Comment #40

robertdouglass commented 15 June 2009 at 14:25

Title:

Add indices to node table

» Fix slow query on node table

I like catch's suggestion in #39. It seems like a sensible refinement of the approach in #15 which gave the best performance results. I'm changing the title again because it's not to be assumed that adding indexes is the right solution here.

Comment #41

david strauss

he/him

commented 29 June 2009 at 05:54

Why aren't we using the correlated subquery option?

Comment #42

damien tournoud commented 29 June 2009 at 08:40

Other question: can we compare MySQL version of Drupal.org and the one Robert is using? That's the only reasonable explanation that I have.

Comment #43

david strauss

he/him

commented 29 June 2009 at 10:32

@Damien

For Drupal.org: Server version: 5.0.70-log Gentoo Linux mysql-5.0.70-r1

Comment #44

robertdouglass commented 29 June 2009 at 11:28

sorry - that setup is no longer available to me.

Comment #45

noahterp commented 20 July 2009 at 21:31

When testing on one server (Drupal 5.18, MySQL 5.1.x), adding a simple WHERE n.changed > 0 to that node.module line solved my problem. The original query would run ad-nauseam and simply lock-up MySQL. But adding that extraneous WHERE must have altered how MySQL sorts the temp table, and it runs the query in a few hundred milliseconds.

But, this solution worked on only one version of my site. Our development server copy with a different MySQL 5.0.x configuration was not affected -- although the original query only took 8-10 seconds there versus forever.

Comment #46

damien tournoud commented 20 July 2009 at 22:24

Ok, apparently the MySQL query planner is being really dumb in that case. Because this query uses a INNER JOIN on a primary key, the planner can choose to execute it starting from node or starting from users. The problem that some people here are facing is that it sometimes chooses to start from users.

This is apparently the case (looking at #4) when the users table has a very low cardinality (in Robert example, there are only 2 users). It looks like the planner is not taking the cost of the sort into account.

What explains #45 is that when you are adding a condition on the node table, the query becomes asymetric again, and the planner has no choice then to start from the node table.

The max_seeks_for_key MySQL configuration parameter could help the planner make a smarter choice, but it looks like it's time to open a bug against MySQL.

Comment #47

Tresler commented 21 July 2009 at 12:03

Before filing a mysql bug can we confirm that this is an optimizer problem by forcing the optimizer on a slow machine to see if it fixes the problem? To me that would confirm the issue as being in MySQL's pre-processor (supposedly better in MySQL 6...)

http://dev.mysql.com/doc/refman/5.0/en/index-hints.html

For that matter, I have no idea if index hints are built into the database abstraction layer? are they?

Comment #48

catch

he/him

English

commented 1 December 2009 at 06:18

Status:

Needs review

» Postponed (maintainer needs more info)

We now do a straight query to get nids, then node_load_multiple() in node_admin_nodes() since each row of that table is now checked for node_access(). Is there anything else needed here?

Comment #49

jbrown commented 31 December 2009 at 21:16

Yes - you can still sort by author, which results in:

PDOException: SQLSTATE[42S22]: Column not found: 1054 Unknown column 'u.name' in 'order clause': SELECT n.nid AS nid FROM {node} n WHERE (n.type = :db_condition_placeholder_0) ORDER BY u.name ASC LIMIT 50 OFFSET 0; Array ( [:db_condition_placeholder_0] => page ) in PagerDefault->execute() (line 93 of /home/jonny/sites/drupal-7.x-dev/includes/pager.inc).

Comment #50

jbrown commented 31 December 2009 at 21:18

Status:

Postponed (maintainer needs more info)

» Needs review

Status	File	Size
new	remove-author-sort.patch	774 bytes

Comment #51

catch

he/him

English

commented 1 January 2010 at 02:32

Title:	Fix slow query on node table	» admin/content still has sortable author column resulting in SQL errors
Priority:	Normal	» Critical
Status:	Needs review	» Reviewed & tested by the community