There is a severe performance degradation due to a wrong join.

This query in gsitemap_output_chunk()

$result = db_query(db_rewrite_sql(sprintf("SELECT DISTINCT(n.nid), n.type, n.status, n.promote, n.changed, g.previously_changed, g.priority_override FROM {node} n LEFT JOIN {gsitemap} g ON n.nid=g.nid LEFT JOIN {url_alias} u ON src=CONCAT('node/',n.nid) WHERE n.status > 0 AND (g.priority_override >=0 OR g.priority_override IS NULL) AND n.type NOT IN (" . $excludes . ") AND n.nid >= %d AND n.nid < %d", $range->low, $range->high), 'n'));

Does not use any of the columns in the u.* table, and it is a LEFT JOIN on a concatenated field.

As an example, a site with 6000+ nodes takes 39 seconds. With the part LEFT JOIN {url_alias} u ON src=CONCAT('node/',n.nid) removed, the query takes only 0.13 seconds.

This is in 4.7 and 5.x.

Any valid reason why this join is there?

Comments

kbahey’s picture

There is another query too.

Here is a patch against 5.x

retrieving revision 1.56.2.1
diff -u -F^f -r1.56.2.1 gsitemap.module
--- gsitemap.module     11 Jan 2007 07:04:16 -0000      1.56.2.1
+++ gsitemap.module     3 Mar 2007 14:14:45 -0000
@@ -331,7 +331,7 @@ function gsitemap_output() {
       }
     }
     $maxcomments = db_fetch_object(db_query("SELECT MAX(comment_count) AS max_comments FROM {node_comment_statistics}"));
-    $result = db_query(db_rewrite_sql("SELECT DISTINCT(n.nid), n.type, n.status, n.promote, s.comment_count, n.changed, g.previously_changed, s.last_comment_timestamp, g.previous_comment, g.priority_override FROM {node} n LEFT JOIN {node_comment_statistics} s ON n.nid=s.nid LEFT JOIN {gsitemap} g ON n.nid=g.nid LEFT JOIN {url_alias} u ON src=CONCAT('node/', n.nid) WHERE n.status > 0 AND (g.priority_override >= 0 OR g.priority_override IS NULL) AND n.type NOT IN (" . $excludes . ")", 'n'));
+    $result = db_query(db_rewrite_sql("SELECT DISTINCT(n.nid), n.type, n.status, n.promote, s.comment_count, n.changed, g.previously_changed, s.last_comment_timestamp, g.previous_comment, g.priority_override FROM {node} n LEFT JOIN {node_comment_statistics} s ON n.nid=s.nid LEFT JOIN {gsitemap} g ON n.nid=g.nid WHERE n.status > 0 AND (g.priority_override >= 0 OR g.priority_override IS NULL) AND n.type NOT IN (" . $excludes . ")", 'n'));
   }
   else {
     if ($count[0]->count != $count[1]->count) {
@@ -340,7 +340,7 @@ function gsitemap_output() {
         db_query("INSERT INTO {gsitemap} (nid, last_changed) VALUES(%d, %d)", $node->nid, $node->changed);
       }
     }
-    $result = db_query(db_rewrite_sql("SELECT DISTINCT(n.nid), n.type, n.status, n.promote, n.changed, g.previously_changed, g.priority_override FROM {node} n LEFT JOIN {gsitemap} g ON n.nid=g.nid LEFT JOIN {url_alias} u ON src=CONCAT('node/', n.nid) WHERE n.status > 0 AND (g.priority_override >=0 OR g.priority_override IS NULL) AND n.type NOT IN (" . $excludes . ")", 'n'));
+    $result = db_query(db_rewrite_sql("SELECT DISTINCT(n.nid), n.type, n.status, n.promote, n.changed, g.previously_changed, g.priority_override FROM {node} n LEFT JOIN {gsitemap} g ON n.nid=g.nid WHERE n.status > 0 AND (g.priority_override >=0 OR g.priority_override IS NULL) AND n.type NOT IN (" . $excludes . ")", 'n'));
   }
   while($node = db_fetch_object($result)) {
     $pri = _gsitemap_calc_priority($node, $maxcomments->max_comments);
greg@beargroup.com’s picture

Works for me, similar performance improvement on 4500 nodes. Thanks for the patch kb.

-Greg

darren oh’s picture

Version: master » 5.x-1.x-dev
Status: Reviewed & tested by the community » Fixed

Fixed in CVS commit 64516.

darren oh’s picture

Assigned: Unassigned » darren oh
Status: Fixed » Active

I set issue 113230 as a duplicate of this issue. According to Alexander Hass, using the url() function on every link causes a much worse performance problem. If this is confirmed, I will restore the original query and eliminate the use of url().

darren oh’s picture

Status: Active » Fixed

Fixed in CVS commit 66110.

Anonymous’s picture

Status: Fixed » Closed (fixed)