Severe performance problem with unecessary LEFT JOIN on a CONCAT field

kbahey - March 3, 2007 - 03:37
Project:Google Sitemap
Version:5.x-1.x-dev
Component:Code
Category:bug report
Priority:normal
Assigned:Darren Oh
Status:closed
Description

There is a severe performance degradation due to a wrong join.

This query in gsitemap_output_chunk()

$result = db_query(db_rewrite_sql(sprintf("SELECT DISTINCT(n.nid), n.type, n.status, n.promote, n.changed, g.previously_changed, g.priority_override FROM {node} n LEFT JOIN {gsitemap} g ON n.nid=g.nid LEFT JOIN {url_alias} u ON src=CONCAT('node/',n.nid) WHERE n.status > 0 AND (g.priority_override >=0 OR g.priority_override IS NULL) AND n.type NOT IN (" . $excludes . ") AND n.nid >= %d AND n.nid < %d", $range->low, $range->high), 'n'));

Does not use any of the columns in the u.* table, and it is a LEFT JOIN on a concatenated field.

As an example, a site with 6000+ nodes takes 39 seconds. With the part LEFT JOIN {url_alias} u ON src=CONCAT('node/',n.nid) removed, the query takes only 0.13 seconds.

This is in 4.7 and 5.x.

Any valid reason why this join is there?

#1

kbahey - March 3, 2007 - 14:15

There is another query too.

Here is a patch against 5.x

retrieving revision 1.56.2.1
diff -u -F^f -r1.56.2.1 gsitemap.module
--- gsitemap.module     11 Jan 2007 07:04:16 -0000      1.56.2.1
+++ gsitemap.module     3 Mar 2007 14:14:45 -0000
@@ -331,7 +331,7 @@ function gsitemap_output() {
       }
     }
     $maxcomments = db_fetch_object(db_query("SELECT MAX(comment_count) AS max_comments FROM {node_comment_statistics}"));
-    $result = db_query(db_rewrite_sql("SELECT DISTINCT(n.nid), n.type, n.status, n.promote, s.comment_count, n.changed, g.previously_changed, s.last_comment_timestamp, g.previous_comment, g.priority_override FROM {node} n LEFT JOIN {node_comment_statistics} s ON n.nid=s.nid LEFT JOIN {gsitemap} g ON n.nid=g.nid LEFT JOIN {url_alias} u ON src=CONCAT('node/', n.nid) WHERE n.status > 0 AND (g.priority_override >= 0 OR g.priority_override IS NULL) AND n.type NOT IN (" . $excludes . ")", 'n'));
+    $result = db_query(db_rewrite_sql("SELECT DISTINCT(n.nid), n.type, n.status, n.promote, s.comment_count, n.changed, g.previously_changed, s.last_comment_timestamp, g.previous_comment, g.priority_override FROM {node} n LEFT JOIN {node_comment_statistics} s ON n.nid=s.nid LEFT JOIN {gsitemap} g ON n.nid=g.nid WHERE n.status > 0 AND (g.priority_override >= 0 OR g.priority_override IS NULL) AND n.type NOT IN (" . $excludes . ")", 'n'));
   }
   else {
     if ($count[0]->count != $count[1]->count) {
@@ -340,7 +340,7 @@ function gsitemap_output() {
         db_query("INSERT INTO {gsitemap} (nid, last_changed) VALUES(%d, %d)", $node->nid, $node->changed);
       }
     }
-    $result = db_query(db_rewrite_sql("SELECT DISTINCT(n.nid), n.type, n.status, n.promote, n.changed, g.previously_changed, g.priority_override FROM {node} n LEFT JOIN {gsitemap} g ON n.nid=g.nid LEFT JOIN {url_alias} u ON src=CONCAT('node/', n.nid) WHERE n.status > 0 AND (g.priority_override >=0 OR g.priority_override IS NULL) AND n.type NOT IN (" . $excludes . ")", 'n'));
+    $result = db_query(db_rewrite_sql("SELECT DISTINCT(n.nid), n.type, n.status, n.promote, n.changed, g.previously_changed, g.priority_override FROM {node} n LEFT JOIN {gsitemap} g ON n.nid=g.nid WHERE n.status > 0 AND (g.priority_override >=0 OR g.priority_override IS NULL) AND n.type NOT IN (" . $excludes . ")", 'n'));
   }
   while($node = db_fetch_object($result)) {
     $pri = _gsitemap_calc_priority($node, $maxcomments->max_comments);

#2

gbear - March 7, 2007 - 16:19

Works for me, similar performance improvement on 4500 nodes. Thanks for the patch kb.

-Greg

#3

Darren Oh - April 18, 2007 - 05:50
Version:HEAD» 5.x-1.x-dev
Status:patch (reviewed & tested by the community)» fixed

Fixed in CVS commit 64516.

#4

Darren Oh - April 23, 2007 - 14:54
Assigned to:Anonymous» Darren Oh
Status:fixed» active

I set issue 113230 as a duplicate of this issue. According to Alexander Hass, using the url() function on every link causes a much worse performance problem. If this is confirmed, I will restore the original query and eliminate the use of url().

#5

Darren Oh - May 3, 2007 - 01:31
Status:active» fixed

Fixed in CVS commit 66110.

#6

Anonymous - May 17, 2007 - 02:28
Status:fixed» closed
 
 

Drupal is a registered trademark of Dries Buytaert.