Support for Drupal 7 is ending on 5 January 2025—it’s time to migrate to Drupal 10! Learn about the many benefits of Drupal 10 and find migration tools in our resource center.
Similar to http://drupal.org/node/257281
It turns out this becomes really important with 100k+ nodes and older versions of MySQL (ex: 5.0.51a).
Please credit to:
--author="gcassie <gcassie@80260.no-reply.drupal.org>"
Patch against 7.x-1.x HEAD coming shortly.
Comment | File | Size | Author |
---|---|---|---|
#1 | pathauto-castconcats-1415930-1.patch | 2.29 KB | Raines37 |
Comments
Comment #1
Raines37 CreditAttribution: Raines37 commentedPatch for comments / review. Thanks!
http://drupal.org/files/pathauto-castconcats-1415930-1.patch
Comment #2
bago CreditAttribution: bago commentedI tested #1 and changed my life. bulk updating stopped working since I loaded many thousands nodes as the first query to find "unaliased" nodes tooks more than 2 minutes (but I never investigated the issue). However, applying this patch the query takes less than a second and the bulk update works again! Thank you!
Comment #3
bago CreditAttribution: bago commentedComment #4
Dave ReidWe would have to confirm #1 works on pgsql and sqlite as well. This issue cannot be RTBC until that happens.
Comment #5
bago CreditAttribution: bago commented@Dave AFAIK the same kind of change has already landed the code in alternative issues (http://drupal.org/node/257281) so I don't see why we should block this issue waiting for some sqlite/pgsql people to appear here (and if they don't have this issue I don't see how we can expect them to appear here and be willing to test this patch).
This is a BLOCKING issue for people using mysql 5.0 as it break bulk updating...
Comment #6
bago CreditAttribution: bago commented@Dave can you give me more informations about the workflow you ask? Why do you think the "CAST" will break anything? The same cast is used in other pathauto queries. Why should we wait for less used DB users to take care of testing a patch they don't need? I put this again to "RTBC" so maybe you see my questions. (I didn't find any policy about testing postgre or sqlite before RTBC, but I may have missed it.)
Comment #7
alliax CreditAttribution: alliax commentedSimply make a special IF case for those few not using mysql and be done with it, it's important to speed up this bulk process, thank you.
Comment #8
MathieuMa CreditAttribution: MathieuMa commentedThanks a lot, this patch saved my day !
I was not aware about that MySQL performance issue, the difference is just amazing ...
(now re-gerating path for a 250000 node site)
Comment #9
maikeru CreditAttribution: maikeru commentedPostgres user here.
Sort of have a related issue, in that not having CAST means that bulk generation doesn't work and returns an error:
Ambiguous function: 7 ERROR: function concat(text, integer) is not unique LINE 6: LEFT OUTER JOIN url_alias ua ON CONCAT('taxonomy/term
The above patch removes the error, but the URL alias isn't actually generated.
Instead of just 'CHAR', using 'CHAR(10)' seems to get the alias created nicely.
So from:
$query->leftJoin('url_alias', 'ua', "CONCAT('taxonomy/term/', CAST(td.tid AS CHAR)) = ua.source");
to:
$query->leftJoin('url_alias', 'ua', "CONCAT('taxonomy/term/', CAST(td.tid AS CHAR(10))) = ua.source");
Not sure if that'll affect mysql?
Tested it by deleting a term alias, and then regenerating using the bulk option - haven't tried it with 1000s of nodes.
Comment #10
Countzero CreditAttribution: Countzero commentedI can confirm the patch in #1 works like magic.
Also, it's worth noting that optimizing the relevant tables (taxonomy_term_data, taxonomy_index, etc.) can greatly speed up everything, especially if you imported huge volumes.
Another note : you don't have to generate huge amounts of aliases to hit the performance problem. The bulk generation stalled on a dev site with 29 aliases to create. I'm talking about MySQL here.
Thanks to all contributors in this thread, especially to Raines37 for the patch.
Comment #11
Dave ReidComment in #9 indicates this needs work for PostgreSQL, which is why I asked in #4 for it to be tested in all three database system supported by Drupal core to ensure compatibility.
Comment #12
thiagogalvao CreditAttribution: thiagogalvao commentedThis query without CAST is a problem in mysql version minor of 5.5.
Comment #13
jjchinquist CreditAttribution: jjchinquist commentedHi Dave,
Could not the table structure be changed to reflect fields a bit more? Such as adding columns for entity_bundle, entity_id, entity_revision, etc?
It takes extra space, but then lookups could be done much faster and in a more stable format.
- Jeremy
Comment #14
jweowu CreditAttribution: jweowu commented(n.b. No actual changes to the issue summary, that I can discern. That was unintentional.)
So in postgres (I'm using version 9.3)
CONCAT('taxonomy/term/', CAST(td.tid AS CHAR)) = ua.source
results in a query condition likeIndex Cond: (('taxonomy/term/'::text || ((td.tid)::character(1))::text) = (source)::text)
Note the
character(1)
-- this is using only the first digit of the term ID.The suggested (#9) change to:
CONCAT('taxonomy/term/', CAST(td.tid AS CHAR(10))) = ua.source
will handle numbers up to 10 digits long, which is sufficient for a 32-bit SIGNED (or UNSIGNED) INT column (limits of 2,147,483,647 or 4,294,967,295 respectively). In practice that's probably fine(?), but at minimum it's ugly to have that constraint.Note that it's only working because the subsequent
::text
type-cast is eliminating the trailing spaces on all but the longest numbers. My recollection of MySQL is that by default it uses irritating non-standard CHAR behaviour that doesn't produce the expected padding, so it probably bypasses that particular issue, but I'm unsure whether this is a non-issue for all databases.To me it seems more correct to use:
CONCAT('taxonomy/term/', CAST(td.tid AS VARCHAR)) = ua.source
to eliminate the issue of the digit counts and padding entirely.I don't know whether that's equivalent in MySQL, or if there are other concerns with using VARCHAR? Perhaps we need to specify a
VARCHAR(N)
for some databases? MySQL has (or once had) a limit of 255, so it may as well be that if we need one.MySQL's aforementioned non-standard behaviour for CHAR may or may not be dependent on the database configuration. (For CHAR columns it makes a difference; I just don't know whether the same applies to dynamic type-casting.)
https://dev.mysql.com/doc/refman/5.7/en/char.html says:
It would be useful to ascertain whether "strict SQL mode" and/or PAD_CHAR_TO_FULL_LENGTH causes any issues with either
CAST(td.tid AS CHAR)
orCAST(td.tid AS CHAR(10))