Performance: workflow nightmare query [#374139]

mysql> explain SELECT COUNT(*) FROM (SELECT DISTINCT(node.nid) AS nid FROM node node LEFT JOIN workflow_node workflow_node ON node.nid = workflow_node.nid ) count_alias;
+----+-------------+---------------+--------+---------------+---------+---------+-------------------------+---------+------------------------------+
| id | select_type | table         | type   | possible_keys | key     | key_len | ref                     | rows    | Extra                        |
+----+-------------+---------------+--------+---------------+---------+---------+-------------------------+---------+------------------------------+
|  1 | PRIMARY     | NULL          | NULL   | NULL          | NULL    | NULL    | NULL                    |    NULL | Select tables optimized away | 
|  2 | DERIVED     | node          | index  | NULL          | PRIMARY | 4       | NULL                    | 3901949 | Using index; Using temporary | 
|  2 | DERIVED     | workflow_node | eq_ref | PRIMARY,nid   | PRIMARY | 4       | drupal.node.nid         |       1 | Using index; Distinct        | 
+----+-------------+---------------+--------+---------------+---------+---------+-------------------------+---------+------------------------------+
3 rows in set (2 min 1.22 sec)

Notice that with 3.9 million nodes the EXPLAIN query takes over 2 minutes to execute. The real query takes several more minutes.

Comments

Comment #1

robertdouglass commented 13 February 2009 at 21:14

Why can't this query be reduced to SELECT count(*) FROM node;?

Comment #2

robertdouglass commented 13 February 2009 at 21:27

Ok, it's because the query is coming from views. Grr.

Comment #3

jvandyk commented 13 February 2009 at 21:41

OK, I'll stop grepping workflow and scratching my head now. :)

Comment #4

robertdouglass commented 13 February 2009 at 21:42

I'm going to ask Earl for advice here.

Comment #5

robertdouglass commented 13 February 2009 at 21:45

If anyone wants to look at the View definition: http://tinyurl.com/av63ga

Comment #6

jcisio commented 22 February 2010 at 16:30

Nightmare here, too. Any clue about this? Workflow on 14k nodes = die :(

Comment #7

Nathan Goulding commented 26 May 2010 at 18:14

Does anyone have any suggestions? I've got 188k nodes and I'm dying.

Comment #8

Nathan Goulding commented 27 May 2010 at 14:25

Title:

Performance: q=workflow nightmare query

» Performance: workflow nightmare query

I worked around this by doing the following (MySQL):

Added an `sid` column to the node table:
ALTER TABLE `node` ADD `sid` int;

Created a database trigger so that when the workflow_node table is updated, it also updates the node with the proper sid:

delimiter //
CREATE TRIGGER workflow_trigger AFTER UPDATE ON workflow_node
FOR EACH ROW
BEGIN
	UPDATE node SET sid = NEW.sid WHERE nid = NEW.nid;
END;//
delimiter ;

Then I created the proper indexes. The first one because I always need to pull where workflow state = X and status <> 0 and type = 'some_type' so I wanted to optimize that:
CREATE INDEX `sid` ON `node`(`sid`,`status`,`type`);

I also often sort by node.created, so I needed to optimize those queries as well. Another index to do that:
CREATE INDEX `sid_created` ON `node`(`sid`,`created`);

I had 188,000 nodes that needed to be updated, so I ran this to populate the node table with the proper workflow state id:
UPDATE `node` set `node`.`sid` = (SELECT `workflow_node`.`sid` FROM `workflow_node` WHERE `workflow_node`.`nid` = `node`.`nid`);

My site is built using views, so I needed to alter the node.install file to account for the new column and indexes, so views could use it:

(***Please note that the line numbers won't match up here because I added another non-default index "type_created_status" to optimize other common queries I need***)

Index: node/node.install
===================================================================
--- node/node.install   (revision 309)
+++ node/node.install   (working copy)
@@ -88,6 +88,12 @@
         'type' => 'int',
         'not null' => TRUE,
         'default' => 0),
+      'sid' => array(
+        'description' => 'The current Workflow state.',
+        'type' => 'int',
+        'unsigned' => TRUE,
+        'not null' => FALSE,
+        'default' => 0),
       ),
     'indexes' => array(
       'node_changed'        => array('changed'),
@@ -101,6 +107,8 @@
       'tnid'                => array('tnid'),
       'translate'           => array('translate'),
       'type_created_status' => array('created', 'status', 'type'),
+      'sid'                 => array('sid', 'status', 'type'),
+      'sid_created'         => array('sid', 'created'),
       ),
     'unique keys' => array(
       'vid'     => array('vid')

Then I modified the node.views.inc file to allow me to add a filter based on the workflow state ID:

Index: views/modules/node.views.inc
===================================================================
--- views/modules/node.views.inc        (revision 309)
+++ views/modules/node.views.inc        (working copy)
@@ -154,6 +154,25 @@
     ),
   );

+  // Workflow state
+  $data['node']['sid'] = array(
+    'title' => t('Workflow State'),
+    'help' => t('The current workflow state'),
+    'field' => array(
+      'handler' => 'views_handler_field_numeric',
+      'click sortable' => TRUE,
+    ),
+    'filter' => array(
+      'handler' => 'views_handler_filter_numeric',
+    ),
+    'sort' => array(
+      'handler' => 'views_handler_sort',
+    ),
+    'argument' => array(
+      'handler' => 'views_handler_argument',
+    ),
+  );
+
   // published status
   $data['node']['status'] = array(
     'title' => t('Published'),

Finally, I cleared cache and changed my view to filter on "Node: Workflow State". Query time went from ~5 seconds down to .45 seconds.

***This is for ADVANCED users. Use at your own risk.***

Comment #9

Nathan Goulding commented 2 June 2010 at 17:37

These triggers should actually be:

delimiter //
CREATE TRIGGER workflow_node_insert AFTER INSERT ON workflow_node
FOR EACH ROW
BEGIN
  UPDATE node SET node.sid = NEW.sid WHERE node.nid = NEW.nid;
END;//
delimiter ;

delimiter //
CREATE TRIGGER workflow_node_update AFTER UPDATE ON workflow_node
FOR EACH ROW
BEGIN
  UPDATE node SET node.sid = NEW.sid WHERE node.nid = NEW.nid;
END;//
delimiter ;

delimiter //
CREATE TRIGGER node_before_insert BEFORE INSERT ON node
FOR EACH ROW
BEGIN
   SET NEW.sid = (SELECT workflow_node.sid FROM workflow_node WHERE workflow_node.nid = NEW.nid);
END;//
delimiter ;

delimiter //
CREATE TRIGGER node_before_update BEFORE UPDATE ON node
FOR EACH ROW
BEGIN
   SET NEW.sid = (SELECT workflow_node.sid FROM workflow_node WHERE workflow_node.nid = NEW.nid);
END;//
delimiter ;

Comment #10

Bastlynn commented 14 November 2011 at 19:16

I'll take a deeper look at this once the rest of the queue has been knocked down a little. Given views is involved simplifying code may not be feasible.

Comment #11

johnv

Dutch

commented 26 November 2013 at 19:09

Component:	Code	» Performance
Issue summary:	View changes
Status:	Active	» Closed (won't fix)

Closing this issue. The proposed changes include changing the node table. Doesn't sound right to me.
Also, D6 version is unmaintained, and code in D7 has changed drastically.

Performance: workflow nightmare query