Severe performance problem with unecessary LEFT JOIN on a CONCAT field
| Project: | Google Sitemap |
| Version: | 5.x-1.x-dev |
| Component: | Code |
| Category: | bug report |
| Priority: | normal |
| Assigned: | Darren Oh |
| Status: | closed |
Jump to:
There is a severe performance degradation due to a wrong join.
This query in gsitemap_output_chunk()
$result = db_query(db_rewrite_sql(sprintf("SELECT DISTINCT(n.nid), n.type, n.status, n.promote, n.changed, g.previously_changed, g.priority_override FROM {node} n LEFT JOIN {gsitemap} g ON n.nid=g.nid LEFT JOIN {url_alias} u ON src=CONCAT('node/',n.nid) WHERE n.status > 0 AND (g.priority_override >=0 OR g.priority_override IS NULL) AND n.type NOT IN (" . $excludes . ") AND n.nid >= %d AND n.nid < %d", $range->low, $range->high), 'n'));
Does not use any of the columns in the u.* table, and it is a LEFT JOIN on a concatenated field.
As an example, a site with 6000+ nodes takes 39 seconds. With the part LEFT JOIN {url_alias} u ON src=CONCAT('node/',n.nid) removed, the query takes only 0.13 seconds.
This is in 4.7 and 5.x.
Any valid reason why this join is there?

#1
There is another query too.
Here is a patch against 5.x
retrieving revision 1.56.2.1diff -u -F^f -r1.56.2.1 gsitemap.module
--- gsitemap.module 11 Jan 2007 07:04:16 -0000 1.56.2.1
+++ gsitemap.module 3 Mar 2007 14:14:45 -0000
@@ -331,7 +331,7 @@ function gsitemap_output() {
}
}
$maxcomments = db_fetch_object(db_query("SELECT MAX(comment_count) AS max_comments FROM {node_comment_statistics}"));
- $result = db_query(db_rewrite_sql("SELECT DISTINCT(n.nid), n.type, n.status, n.promote, s.comment_count, n.changed, g.previously_changed, s.last_comment_timestamp, g.previous_comment, g.priority_override FROM {node} n LEFT JOIN {node_comment_statistics} s ON n.nid=s.nid LEFT JOIN {gsitemap} g ON n.nid=g.nid LEFT JOIN {url_alias} u ON src=CONCAT('node/', n.nid) WHERE n.status > 0 AND (g.priority_override >= 0 OR g.priority_override IS NULL) AND n.type NOT IN (" . $excludes . ")", 'n'));
+ $result = db_query(db_rewrite_sql("SELECT DISTINCT(n.nid), n.type, n.status, n.promote, s.comment_count, n.changed, g.previously_changed, s.last_comment_timestamp, g.previous_comment, g.priority_override FROM {node} n LEFT JOIN {node_comment_statistics} s ON n.nid=s.nid LEFT JOIN {gsitemap} g ON n.nid=g.nid WHERE n.status > 0 AND (g.priority_override >= 0 OR g.priority_override IS NULL) AND n.type NOT IN (" . $excludes . ")", 'n'));
}
else {
if ($count[0]->count != $count[1]->count) {
@@ -340,7 +340,7 @@ function gsitemap_output() {
db_query("INSERT INTO {gsitemap} (nid, last_changed) VALUES(%d, %d)", $node->nid, $node->changed);
}
}
- $result = db_query(db_rewrite_sql("SELECT DISTINCT(n.nid), n.type, n.status, n.promote, n.changed, g.previously_changed, g.priority_override FROM {node} n LEFT JOIN {gsitemap} g ON n.nid=g.nid LEFT JOIN {url_alias} u ON src=CONCAT('node/', n.nid) WHERE n.status > 0 AND (g.priority_override >=0 OR g.priority_override IS NULL) AND n.type NOT IN (" . $excludes . ")", 'n'));
+ $result = db_query(db_rewrite_sql("SELECT DISTINCT(n.nid), n.type, n.status, n.promote, n.changed, g.previously_changed, g.priority_override FROM {node} n LEFT JOIN {gsitemap} g ON n.nid=g.nid WHERE n.status > 0 AND (g.priority_override >=0 OR g.priority_override IS NULL) AND n.type NOT IN (" . $excludes . ")", 'n'));
}
while($node = db_fetch_object($result)) {
$pri = _gsitemap_calc_priority($node, $maxcomments->max_comments);
#2
Works for me, similar performance improvement on 4500 nodes. Thanks for the patch kb.
-Greg
#3
Fixed in CVS commit 64516.
#4
I set issue 113230 as a duplicate of this issue. According to Alexander Hass, using the url() function on every link causes a much worse performance problem. If this is confirmed, I will restore the original query and eliminate the use of url().
#5
Fixed in CVS commit 66110.
#6