mysql> explain SELECT COUNT(*) FROM (SELECT DISTINCT(node.nid) AS nid FROM node node LEFT JOIN workflow_node workflow_node ON node.nid = workflow_node.nid ) count_alias; +----+-------------+---------------+--------+---------------+---------+---------+-------------------------+---------+------------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+---------------+--------+---------------+---------+---------+-------------------------+---------+------------------------------+ | 1 | PRIMARY | NULL | NULL | NULL | NULL | NULL | NULL | NULL | Select tables optimized away | | 2 | DERIVED | node | index | NULL | PRIMARY | 4 | NULL | 3901949 | Using index; Using temporary | | 2 | DERIVED | workflow_node | eq_ref | PRIMARY,nid | PRIMARY | 4 | drupal.node.nid | 1 | Using index; Distinct | +----+-------------+---------------+--------+---------------+---------+---------+-------------------------+---------+------------------------------+ 3 rows in set (2 min 1.22 sec)
Notice that with 3.9 million nodes the EXPLAIN query takes over 2 minutes to execute. The real query takes several more minutes.
Comments
Comment #1
robertdouglass commentedWhy can't this query be reduced to
SELECT count(*) FROM node;?Comment #2
robertdouglass commentedOk, it's because the query is coming from views. Grr.
Comment #3
jvandyk commentedOK, I'll stop grepping workflow and scratching my head now. :)
Comment #4
robertdouglass commentedI'm going to ask Earl for advice here.
Comment #5
robertdouglass commentedIf anyone wants to look at the View definition: http://tinyurl.com/av63ga
Comment #6
jcisio commentedNightmare here, too. Any clue about this? Workflow on 14k nodes = die :(
Comment #7
Nathan Goulding commentedDoes anyone have any suggestions? I've got 188k nodes and I'm dying.
Comment #8
Nathan Goulding commentedI worked around this by doing the following (MySQL):
Added an `sid` column to the node table:
ALTER TABLE `node` ADD `sid` int;Created a database trigger so that when the workflow_node table is updated, it also updates the node with the proper sid:
Then I created the proper indexes. The first one because I always need to pull where workflow state = X and status <> 0 and type = 'some_type' so I wanted to optimize that:
CREATE INDEX `sid` ON `node`(`sid`,`status`,`type`);I also often sort by node.created, so I needed to optimize those queries as well. Another index to do that:
CREATE INDEX `sid_created` ON `node`(`sid`,`created`);I had 188,000 nodes that needed to be updated, so I ran this to populate the node table with the proper workflow state id:
UPDATE `node` set `node`.`sid` = (SELECT `workflow_node`.`sid` FROM `workflow_node` WHERE `workflow_node`.`nid` = `node`.`nid`);My site is built using views, so I needed to alter the node.install file to account for the new column and indexes, so views could use it:
(***Please note that the line numbers won't match up here because I added another non-default index "type_created_status" to optimize other common queries I need***)
Then I modified the node.views.inc file to allow me to add a filter based on the workflow state ID:
Finally, I cleared cache and changed my view to filter on "Node: Workflow State". Query time went from ~5 seconds down to .45 seconds.
***This is for ADVANCED users. Use at your own risk.***
Comment #9
Nathan Goulding commentedThese triggers should actually be:
Comment #10
Bastlynn commentedI'll take a deeper look at this once the rest of the queue has been knocked down a little. Given views is involved simplifying code may not be feasible.
Comment #11
johnvClosing this issue. The proposed changes include changing the node table. Doesn't sound right to me.
Also, D6 version is unmaintained, and code in D7 has changed drastically.