With Drupal 5.6.
Bulk generation of nodes and give error "The SELECT would examine more than MAX_JOIN_SIZE rows"
user warning: The SELECT would examine more than MAX_JOIN_SIZE rows; check your WHERE and use SET SQL_BIG_SELECTS=1 or SET SQL_MAX_JOIN_SIZE=# if the SELECT is okay query: SELECT nid, type, title, uid, created, src, dst, vid FROM drupal_node LEFT JOIN drupal_url_alias ON CONCAT('node/', nid) = src WHERE src IS NULL AND (type = 'forum' OR type = 'poll' OR type = 'page' OR type = 'story') LIMIT 0, 50 in /mnt/w0511/d44/s22/b02bd24d/www/max-pt.net/includes/database.mysql.inc on line 172.
This error is from the code?
Comments
Comment #1
gregglesWhat host do you use?
And how many records do you have in your drupal_node table and your drupal_url_alias table?
I'm leaving this critical because this is important functionality, but frankly I imagine that this is misconfiguration problem and/or a low quality host and that the solution is to alter your server.
Comment #2
int commentedDrupal
3600 nodes
519 drupal_url_alias
Host: http://www.netfirms.com/web-hosting/web-hosting-advantage/web-hosting-ad... ($9.95/mo)
Comment #3
int commentedIn phpmyadmin
SQL-query :
SELECT @@max_join_size
@@max_join_size
300000
I think that is more that enough.
In phpmyadmin:
SQL-query :
MySQL said:
The SELECT would examine more than MAX_JOIN_SIZE rows; check your WHERE and use SET SQL_BIG_SELECTS=1 or SET SQL_MAX_JOIN_SIZE=# if the SELECT is okaySo, something is wrong with the SQL command.
Comment #4
int commentedI remember that I update some old nodes with bulk, and now this errors happers...
Comment #5
gregglesWell, you are the first person to experience this problem and I don't have any ideas on how to solve it other than increasing that number.
If you (or anyone else) has ideas on how to make the query better then they are certainly welcome.
Comment #6
int commentedThe SQL command can be replace with this?
I don't know why you join two tables, to remove (from select) drupa_the url_alias, you can just remove (from select) in the drupal_nodes.
Comment #7
gregglesThose seem functionally equivalent - left joins with a where is null is a fairly common practice.
If you can benchmark the two (to show that this is actually faster and not just a matter of sneaking around the max_join_size) then I would love to update the query for a faster one. But I'm not sure yours would be any faster.
Comment #8
int commentedIs logical that LEFT JOIN is very slow and heavy SQL command.
And my SQL is the same with your's without LEFT JOIN. So is faster.
You are puting one BIG TABLE and Left JOIN one second BIG TABLE and with where, them you remove the second BIG TABLE.
I just put one BIG TABLE and remove the rows that are in the second.
And LEFT JOIN isn't 100% equal with the PostgreSQL.
Comment #9
int commented"I simply removed LEFT JOIN {url_alias} u ON src = CONCAT('node/', n.nid) as it is actually not used and its this part that causes the tremendous delay..."
http://drupal.org/node/190068#comment-625771
"Slow left join"
http://forums.mysql.com/read.php?108,136194,136194
"slow query left join"
http://www.dbforums.com/showthread.php?t=1215892
"slow query (JOIN JOIN LEFT JOIN ... HAVING)"
http://www.jsw4.net/info/listserv_archives/mysql/04-wk42/msg00361.html
Comment #10
gregglesWell, first we cannot use subqueries in Drupal5 because we have to support older versions of databases.
Second, that's great that you found out that left joins are slow. This is a known fact. But the real question is whether they are slower than "IN (subquery)" queries? I don't personally think so ( http://drupal.org/node/157180#comment-578052 similar opinion here, for example).
This is why I asked you for a benchmark.
Comment #11
int commentedIN (subquery) require PHP 4.1 or higher.
Drupal 5.6 require 4.3.5 or higher
So we only need a benchmark.
Comment #12
gregglesFrom http://drupal.org/requirements
MySQL doesn't support subqueries prior to 4.1.
Comment #13
int commentedOk.
we are doing one SELECT queries.
The best performance is with EXISTS
SELECT nid, TYPE , title, uid, created FROM drupal_node WHERE NOT EXISTS ( SELECT src FROM drupal_url_alias wher where src=CONCAT( 'node/', drupal_node.nid ) ) AND ( TYPE = 'forum' OR TYPE = 'poll' OR TYPE = 'page' OR TYPE = 'story' )but exists only is support MYSQL 4.1 on.
Comment #14
int commented"Drupal 6 will require MySQL 4.1 or higher."
From http://drupal.org/requirements
So we can improve performance with EXISTS and resolve some problems with a large left join.
Comment #15
gregglesYes, we are doing "SELECT...SELECT" not "INSERT...SELECT" so it is not supported.
You haven't provided any benchmarks.
This is going to be my last comment on this thread until 1) benchmarks or 2) some new valuable information is shared.
Comment #16
int commentedJOIN vs IN vs EXISTS - the logical difference
So, the best performance is with EXISTS.
SELECT nid, TYPE , title, uid, created FROM drupal_node WHERE NOT EXISTS ( SELECT src FROM drupal_url_alias wher where src=CONCAT( 'node/', drupal_node.nid ) ) AND ( TYPE = 'forum' OR TYPE = 'poll' OR TYPE = 'page' OR TYPE = 'story' )But I don't have how to benchmarks. So I give you a suggest/tip. Use this if you want, if not, close this.
Comment #17
neclimdulI've run into a similar problem on the situation. Actually, it was much different but the solution might be the same. My site had a couple 100K nodes to update. That CONCAT and LEFT JOIN just won't run on that and just locked up the database machine forever.
My solution was something like the patch attached below. Basically, we query all nodes fitting our criteria blindly. Then we rove through them, skipping ones that are already setup. This is effectively the same as the left join/not null query only PHP is more likely to be able to whip though the answers. Since we're not making a A x B sized result set and then sorting through it like the LEFT JOIN has to do our memory footprint will be much smaller as well even if there are a ton more queries and php processing.
Any thoughts?
Comment #18
gregglesI'm not sure that this would ever finish. Currently, bulk update relies on MySQL to give us a small set of nodes that are guaranteed to need aliases. If we use your mechanism then we are back to "try to bulk update every single node on the site and if we fail, oh well". Right?
An alternative would be to create a variable that keeps track of which nid was the last one we did the last time that bulk update was run. However, then there will need to be some way to reset that id which requires more UI.
Only one person has ever reported the issue of it failing completely. With the old bulk update method we had dozens of people reporting that bulk update would never finish for their sites. So, I don't think this is a problem that really deserves our attention.
Comment #19
neclimdul(technically two because I had a different but similar death)
You're right, this does just move from one problem to another. In my case, the mysql problem was unsolvable where as the long runtime was solveable. I thought I would put the option out for review.
Comment #20
gregglesI just ran an EXPLAIN on the three proposed queries using MySQL. It appears that the IN and EXISTS queries can use the src index (while the index is "range checked for each record" using the current left join).
In all three cases it says that it will examine every row in both tables. This isn't definitive, but provides some sense of hope that the subqueries could be faster. We still can't use them in 5.x of course. I wanted to do this since it is easy and since it would at least give an indication of whether or not this could be worthwhile.
My personal hope and motivation lies in making the url_alias_extra table work since that will allow us to do even faster joins (i.e. not using the concat, which I suspect is the biggest problem with this query).
Comment #21
int commentedPatch with NOT EXISTS.
Lots of faster and compatible with Drupal 6, and work with all db Servers.
Comment #22
gregglesCan you quantify, please?
Comment #23
neclimdulYes, please be specific. Part of the problem with this issue is that different solutions have different characteristics based on the size of your alias table, how full it is, how many nodes don't have aliases etc.
Comment #24
stennie commentedThe main problem with the current pathauto query (and the suggested patch in #21) is that they do not use the index on the url_alias table .. so require a temporary table equivalent to the product of the two tables. This is why the original poster is getting a MAX_JOIN_SIZE error, and others are finding the query fails or sucks the life out of their DB server.
I find the performance of the current query and the #21 patch to be equivalent for my data set, which is a DB with 8499 nodes and 8819 aliases; in either case there is a massive temporary table to create and query.
The fix for indexing is to CAST() the result of the CONCAT:
The smaller result set now fits into memory :)
Have only tested this on MySQL 5.x, but the CAST() function should work similarly with MySQL 4.x.
Cheers,
Stephen
Comment #25
stennie commentedShould note that my sample queries in #24 omitted a node type restriction for brevity. The pathauto query should already be using the index on node.type in most cases *except* when one of your types will return more than some percentage of the total row count (iirc >50% .. it's mentioned in the MySQL docs somewhere as an indexing optimization).
In my use case, one of the node types with pathauto enabled accounts for about 98% of the node table .. so this always reverts to a full scan of the node table. That's not a problem as long as the joins are sensible ;).
Cheers,
Stephen
Comment #26
stennie commentedfrom #3:
> Drupal
> 3600 nodes
> 519 drupal_url_alias
from #4:
> @@max_join_size
> 300000
>
> I think that is more that enough.
3600 x 519 = 1,868,400 rows to inspect using full tables for the join (worst case).
Cheers,
Stephen
Comment #27
gregglesGreat - thanks for the idea stennie and the benchmarking. Any chance this will work on postgresql? If so can you roll a patch? I'd love to commit your fix as long as it will work on postgresql.
Comment #28
stennie commentedThe CAST() is standard SQL but will have to test against Postgres
to ensure this has the desired outcome on the query plan.
Looks like I'll need to install Postgres 7.3 to test the Drupal
minimum requirements .. I'm currently on 8.3.
Will take me a bit longer to get Postgres setup but will
work on a patch ;).
Cheers,
Stephen
Comment #29
gregglesIf it works on 8.3 then that's good enough for me. Even if it doesn't improve the query plan that's good enough for me.
Attached is a patch which implements the cast( as char) function in all of the bulk updates that pathauto is in charge of.
I tested this only on MySQL 5.0.x
Comment #30
jaydub commentedTesting in PostgreSQL 8.1 all of these queries save one work fine. The one query that does not uses a CONCAT with more than 2 variables.
See http://drupal.org/node/83974 for mention of this problem.
Drupal creates a CONCAT function for PostgreSQL (since PostgreSQL natively uses a different syntax for concatenation) but it is only defined for the use of 2 parameters.
Here is a workaround:
From
SELECT uid, name, src, dst FROM {users} LEFT JOIN {url_alias} ON CAST(CONCAT('user/', uid,'/track') AS CHAR) = src WHERE uid > 0 AND src IS NULL
To
SELECT uid, name, src, dst FROM {users} LEFT JOIN {url_alias} ON CAST(CONCAT(CONCAT('user/', uid), '/track') AS CHAR) = src WHERE uid > 0 AND src IS NULL
Comment #31
jgoldfeder commentedPatch #29 was a huge impact to my site with 250,975 nodes. According to devel, the query time went from 35445.53 ms to 1285.202 ms. It was becoming impossible to run batch updates despite being necessary.
I tried to run the query to benchmark and it failed. Here is what I did:
EXPLAIN SELECT nid,
TYPE , title, uid, created, src, dst, vid
FROM node
LEFT JOIN url_alias ON CAST( CONCAT( 'node/', nid ) AS CHAR ) = src
WHERE src IS NULL
MySQL said: Documentation
#1267 - Illegal mix of collations (utf8_unicode_ci,IMPLICIT) and (utf8_general_ci,IMPLICIT) for operation '='
...in other news, it would be nice to have an option to bulk update only published content...which i obviously hard coded....
Comment #32
aufumy commentedAs per greggles: http://drupal.org/node/234374#comment-770294, adding my two cents:
Used my local MAMP install, no other mysql processes running.
The sql using left join took 14 mins 23.59 secs
The sql using a subquery took 4 mins 21.87 secs
An earlier test had 4 mins for the left join and 13 secs for the subquery
explain of left join
+----+-------------+-----------+-------+---------------+-----------+---------+------+-------+------------------------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-----------+-------+---------------+-----------+---------+------+-------+------------------------------------------------------------+
| 1 | SIMPLE | node | range | node_type | node_type | 14 | NULL | 12010 | Using where |
| 1 | SIMPLE | url_alias | ALL | src | NULL | NULL | NULL | 70232 | Range checked for each record (index map: 0x4); Not exists |
+----+-------------+-----------+-------+---------------+-----------+---------+------+-------+------------------------------------------------------------+
explain of subquery
+----+--------------------+-----------+-------+---------------+-----------+---------+------+-------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+--------------------+-----------+-------+---------------+-----------+---------+------+-------+--------------------------+
| 1 | PRIMARY | node | range | node_type | node_type | 14 | NULL | 12010 | Using where |
| 2 | DEPENDENT SUBQUERY | url_alias | index | src | src | 386 | NULL | 70232 | Using where; Using index |
+----+--------------------+-----------+-------+---------------+-----------+---------+------+-------+--------------------------+
Comment #33
stennie commented@aufumy: can you also benchmark with the CAST() patch from
comment #29 (or just the query from #24):
http://drupal.org/node/212327#comment-729893
Subquery or join, you're still inspecting a large number of rows
from the url_alias table according to your current EXPLAINs.
Suspect the casting should improve this considerably.
Cheers,
Stephen
Comment #34
aufumy commentedStephen, the casting did indeed improve the speed much more:
Cast of left join: 2.66s
Cast of subquery: 0.78s
explain of left join with cast
+----+-------------+-----------+-------+---------------+-----------+---------+------+-------+-------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-----------+-------+---------------+-----------+---------+------+-------+-------------------------+
| 1 | SIMPLE | node | range | node_type | node_type | 14 | NULL | 12010 | Using where |
| 1 | SIMPLE | url_alias | ref | src | src | 386 | func | 1 | Using where; Not exists |
+----+-------------+-----------+-------+---------------+-----------+---------+------+-------+-------------------------+
explain of subquery with cast
+----+--------------------+-----------+-------+---------------+-----------+---------+------+-------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+--------------------+-----------+-------+---------------+-----------+---------+------+-------+--------------------------+
| 1 | PRIMARY | node | range | node_type | node_type | 14 | NULL | 12010 | Using where |
| 2 | DEPENDENT SUBQUERY | url_alias | ref | src | src | 386 | func | 1 | Using where; Using index |
+----+--------------------+-----------+-------+---------------+-----------+---------+------+-------+--------------------------+
Thanks.
Comment #35
Anonymous (not verified) commentedThe revised function in comment #30 works fine on my drupal + postgres 8.3 install.
Comment #36
FiReaNGeL commentedPatch in #30 allowed me to bulk generate my url_alias table - I have close to 50,000 nodes on that site, and without the patch mysql just hard locks for hours (at least, killed it before that). I patched the 5.x-2.1 btw.
Comment #37
greggles@FiReaNG3L - there is no patch in #30. If you've applied it can you please roll the actual patch?
Comment #38
FiReaNGeL commentedSorry, meant #29
Comment #39
gregglesOk, updated patch. If a couple of people could review this I would appreciate it.
I did some basic testing, but especially a PGSQL test on 8.(1|2) and 8.3 would help.
Comment #40
gregglesWell, I accidentally committed this to the 5.x branch. Either way - I guess that makes it easier to test now since people can just wait for the -dev tarball to be updated and then test that ;)
@stennie - thanks very much for your help with this patch. I gave you credit in the commit for 6.x and wish I could have for 5.x as well.
Thanks also to all the testers and people who provided ideas for this. If there are problems from this commit then we can re-open this issue and investigate...
Comment #41
aufumy commentedHappened to just decide today to install drupal with postgres (8.2.4), for the reason of testing this patch.
The following error:
"Query failed: ERROR: function concat(unknown, integer) does not exist
No function matches the given name and argument types. You might need to add explicit type casts." would appear when bulk-generating user, nodes, taxonomy. system.install creates stored postgres FUNCTION "concat"(text, text)
Had to cast each argument supplied to the concat function to char. So that the type matched, patch enclosed, tested against postgres as well as mysql.
Comment #42
andremolnar commentedDidn't realize that I posted a duplicate issue.
Taking a look at these patches I question why the RESULT of the concat is being cast? Would it not make more sense to cast the numeric value to a char before concatenation?
e.g. CONCAT('node/', CAST(nid as CHAR)) VS CAST(CONCAT('node/', nid) AS CHAR)
Some additional reading: http://www.mysqlperformanceblog.com/2007/10/16/be-careful-when-joining-o...
andre
Comment #43
greggles@aufumy - you've got a couple of "CAST('taxonomy/term/' AS CHAR)" and similar things. Is that on purpose/necessary? Seems extraneous and bloaty to me (on a line that is already long and painful to read...)
Comment #44
gregglesAlso, your patch needs review, right?
@andremolnar - can you please review aufumy's patch? I read that page already but its not as useful to me as having multiple people review and comment on a patch.
Comment #45
andremolnar commentedSorry if I wasn't clear, but that was my review of the patch. i.e. that the query code is not precise. Much like your comment on the code there are unneeded casts in some places. The main idea being that what needs casting is numeric field values prior to concatenation.
The link was for the benefit of those that might not have seen it - and to provide additional context to the issue.
andre
Comment #46
aufumy commentedRe-rolled the patch, removing unnecessary casts, leaving the cast of the integer inside of concat.
Tested on mysql and postgresql drupal installs.
Comment #47
andremolnar commentedLatest patch looks good. Can only confirm that the node query is light years faster based on my test data set. But, there is no reason to believe that the other queries wouldn't be just as fast with large numbers of users, taxonomy terms etc.
Comment #48
emok commentedI have not used the feature or run the code. But when I updated my home version with the changes found in the tarball "6.x-1.x-dev" released 2008-Apr-06, I think I've spotted an unintended change.
Look at what happens in the function user_pathauto_bulkupdate() (third diff from end of patches).
Old version:
CONCAT('user/', uid)After #39:
CAST(CONCAT(CONCAT('user/', uid), '/track') AS CHAR)After #46:
CONCAT('user/', CAST(uid AS CHAR))#39 seems to append a '/track' which wasn't there before. Then it is removed again by #46. So my guess is that if #46 were to not be comitted then an intermediate patch should be created to remove the '/track' which is currently in CVS (at least the tarball).
Just my 2 cent, and I have not really read trough what this whole issue is about...
Comment #49
gregglesThanks for finding that, emok. The patch in #46 looks good to me aside from the fact that it uses != style comparison instead of the ANSI standard <> (which were just fixed recently).
I also added back the comment about the double concat (in the right place this time!) and committed it to both DRUPAL-5--2 and HEAD (for 6.x).
Thanks aufumy, andremolnar, and emok for your help with this.
Comment #50
Anonymous (not verified) commentedAutomatically closed -- issue fixed for two weeks with no activity.
Comment #51
giorgio79 commentedHallelujah, thanks Greg for referring me here! (http://drupal.org/node/293422#comment-961721)
One question about #46 though, I have the latest 5 dev version from August, but I cannot see the queries from the patch...Any ideas why?
I also tried patching but it failed:
Anyway, I am facing the same issues as most here, so I will apply this patch manually :)
patch < cast_pg_2_212327.patch
patching file pathauto_node.inc
Hunk #1 FAILED at 64.
1 out of 1 hunk FAILED -- saving rejects to file pathauto_node.inc.rej
patching file pathauto_taxonomy.inc
Hunk #1 FAILED at 49.
Hunk #2 FAILED at 128.
Hunk #3 FAILED at 140.
3 out of 3 hunks FAILED -- saving rejects to file pathauto_taxonomy.inc.rej
patching file pathauto_user.inc
Hunk #1 FAILED at 84.
Hunk #2 FAILED at 107.
Hunk #3 FAILED at 129.
3 out of 3 hunks FAILED -- saving rejects to file pathauto_user.inc.rej
Comment #52
giorgio79 commented#46
Praise the Lord!
It works!! :)
I copied and pasted the stuff manually from #46 :) My comp does not hang anymore. Before this patch it choked at 100% CPU usage and out of memory stuff. But with this it is humming like a colibri :D and of course updating my nice little paths
Comment #53
greggles#46 was already applied to cvs. So, if the patch applied cleanly to your copy then you weren't using the latest Pathauto :/
Comment #54
kenorb commented#361967: Increase MAX_JOIN_SIZE and MAX_ALLOWED_PACKET settings in system.install
Comment #55
kenorb commented#361967: Increase MAX_JOIN_SIZE and MAX_ALLOWED_PACKET settings in system.install for 6.x