mysql> explain SELECT COUNT(*) FROM (SELECT DISTINCT(node.nid) AS nid FROM node node LEFT JOIN workflow_node workflow_node ON node.nid = workflow_node.nid ) count_alias;
+----+-------------+---------------+--------+---------------+---------+---------+-------------------------+---------+------------------------------+
| id | select_type | table         | type   | possible_keys | key     | key_len | ref                     | rows    | Extra                        |
+----+-------------+---------------+--------+---------------+---------+---------+-------------------------+---------+------------------------------+
|  1 | PRIMARY     | NULL          | NULL   | NULL          | NULL    | NULL    | NULL                    |    NULL | Select tables optimized away | 
|  2 | DERIVED     | node          | index  | NULL          | PRIMARY | 4       | NULL                    | 3901949 | Using index; Using temporary | 
|  2 | DERIVED     | workflow_node | eq_ref | PRIMARY,nid   | PRIMARY | 4       | drupal.node.nid         |       1 | Using index; Distinct        | 
+----+-------------+---------------+--------+---------------+---------+---------+-------------------------+---------+------------------------------+
3 rows in set (2 min 1.22 sec)

Notice that with 3.9 million nodes the EXPLAIN query takes over 2 minutes to execute. The real query takes several more minutes.

Comments

robertdouglass’s picture

Why can't this query be reduced to SELECT count(*) FROM node;?

robertdouglass’s picture

Ok, it's because the query is coming from views. Grr.

jvandyk’s picture

OK, I'll stop grepping workflow and scratching my head now. :)

robertdouglass’s picture

I'm going to ask Earl for advice here.

robertdouglass’s picture

If anyone wants to look at the View definition: http://tinyurl.com/av63ga

jcisio’s picture

Nightmare here, too. Any clue about this? Workflow on 14k nodes = die :(

Nathan Goulding’s picture

Does anyone have any suggestions? I've got 188k nodes and I'm dying.

Nathan Goulding’s picture

Title: Performance: q=workflow nightmare query » Performance: workflow nightmare query

I worked around this by doing the following (MySQL):

Added an `sid` column to the node table:
ALTER TABLE `node` ADD `sid` int;

Created a database trigger so that when the workflow_node table is updated, it also updates the node with the proper sid:

delimiter //
CREATE TRIGGER workflow_trigger AFTER UPDATE ON workflow_node
FOR EACH ROW
BEGIN
	UPDATE node SET sid = NEW.sid WHERE nid = NEW.nid;
END;//
delimiter ;

Then I created the proper indexes. The first one because I always need to pull where workflow state = X and status <> 0 and type = 'some_type' so I wanted to optimize that:
CREATE INDEX `sid` ON `node`(`sid`,`status`,`type`);

I also often sort by node.created, so I needed to optimize those queries as well. Another index to do that:
CREATE INDEX `sid_created` ON `node`(`sid`,`created`);

I had 188,000 nodes that needed to be updated, so I ran this to populate the node table with the proper workflow state id:
UPDATE `node` set `node`.`sid` = (SELECT `workflow_node`.`sid` FROM `workflow_node` WHERE `workflow_node`.`nid` = `node`.`nid`);

My site is built using views, so I needed to alter the node.install file to account for the new column and indexes, so views could use it:

(***Please note that the line numbers won't match up here because I added another non-default index "type_created_status" to optimize other common queries I need***)

Index: node/node.install
===================================================================
--- node/node.install   (revision 309)
+++ node/node.install   (working copy)
@@ -88,6 +88,12 @@
         'type' => 'int',
         'not null' => TRUE,
         'default' => 0),
+      'sid' => array(
+        'description' => 'The current Workflow state.',
+        'type' => 'int',
+        'unsigned' => TRUE,
+        'not null' => FALSE,
+        'default' => 0),
       ),
     'indexes' => array(
       'node_changed'        => array('changed'),
@@ -101,6 +107,8 @@
       'tnid'                => array('tnid'),
       'translate'           => array('translate'),
       'type_created_status' => array('created', 'status', 'type'),
+      'sid'                 => array('sid', 'status', 'type'),
+      'sid_created'         => array('sid', 'created'),
       ),
     'unique keys' => array(
       'vid'     => array('vid')

Then I modified the node.views.inc file to allow me to add a filter based on the workflow state ID:

Index: views/modules/node.views.inc
===================================================================
--- views/modules/node.views.inc        (revision 309)
+++ views/modules/node.views.inc        (working copy)
@@ -154,6 +154,25 @@
     ),
   );

+  // Workflow state
+  $data['node']['sid'] = array(
+    'title' => t('Workflow State'),
+    'help' => t('The current workflow state'),
+    'field' => array(
+      'handler' => 'views_handler_field_numeric',
+      'click sortable' => TRUE,
+    ),
+    'filter' => array(
+      'handler' => 'views_handler_filter_numeric',
+    ),
+    'sort' => array(
+      'handler' => 'views_handler_sort',
+    ),
+    'argument' => array(
+      'handler' => 'views_handler_argument',
+    ),
+  );
+
   // published status
   $data['node']['status'] = array(
     'title' => t('Published'),

Finally, I cleared cache and changed my view to filter on "Node: Workflow State". Query time went from ~5 seconds down to .45 seconds.

***This is for ADVANCED users. Use at your own risk.***

Nathan Goulding’s picture

These triggers should actually be:

delimiter //
CREATE TRIGGER workflow_node_insert AFTER INSERT ON workflow_node
FOR EACH ROW
BEGIN
  UPDATE node SET node.sid = NEW.sid WHERE node.nid = NEW.nid;
END;//
delimiter ;

delimiter //
CREATE TRIGGER workflow_node_update AFTER UPDATE ON workflow_node
FOR EACH ROW
BEGIN
  UPDATE node SET node.sid = NEW.sid WHERE node.nid = NEW.nid;
END;//
delimiter ;

delimiter //
CREATE TRIGGER node_before_insert BEFORE INSERT ON node
FOR EACH ROW
BEGIN
   SET NEW.sid = (SELECT workflow_node.sid FROM workflow_node WHERE workflow_node.nid = NEW.nid);
END;//
delimiter ;

delimiter //
CREATE TRIGGER node_before_update BEFORE UPDATE ON node
FOR EACH ROW
BEGIN
   SET NEW.sid = (SELECT workflow_node.sid FROM workflow_node WHERE workflow_node.nid = NEW.nid);
END;//
delimiter ;
Bastlynn’s picture

I'll take a deeper look at this once the rest of the queue has been knocked down a little. Given views is involved simplifying code may not be feasible.

johnv’s picture

Component: Code » Performance
Issue summary: View changes
Status: Active » Closed (won't fix)

Closing this issue. The proposed changes include changing the node table. Doesn't sound right to me.
Also, D6 version is unmaintained, and code in D7 has changed drastically.