Similar to http://drupal.org/node/257281

It turns out this becomes really important with 100k+ nodes and older versions of MySQL (ex: 5.0.51a).

Please credit to:

--author="gcassie <gcassie@80260.no-reply.drupal.org>"

Patch against 7.x-1.x HEAD coming shortly.

CommentFileSizeAuthor
#1 pathauto-castconcats-1415930-1.patch2.29 KBRaines37
Support from Acquia helps fund testing for Drupal Acquia logo

Comments

Raines37’s picture

Status: Active » Needs review
FileSize
2.29 KB
bago’s picture

I tested #1 and changed my life. bulk updating stopped working since I loaded many thousands nodes as the first query to find "unaliased" nodes tooks more than 2 minutes (but I never investigated the issue). However, applying this patch the query takes less than a second and the bulk update works again! Thank you!

bago’s picture

Status: Needs review » Reviewed & tested by the community
Dave Reid’s picture

Status: Reviewed & tested by the community » Needs review

We would have to confirm #1 works on pgsql and sqlite as well. This issue cannot be RTBC until that happens.

bago’s picture

@Dave AFAIK the same kind of change has already landed the code in alternative issues (http://drupal.org/node/257281) so I don't see why we should block this issue waiting for some sqlite/pgsql people to appear here (and if they don't have this issue I don't see how we can expect them to appear here and be willing to test this patch).

This is a BLOCKING issue for people using mysql 5.0 as it break bulk updating...

bago’s picture

Status: Needs review » Reviewed & tested by the community

@Dave can you give me more informations about the workflow you ask? Why do you think the "CAST" will break anything? The same cast is used in other pathauto queries. Why should we wait for less used DB users to take care of testing a patch they don't need? I put this again to "RTBC" so maybe you see my questions. (I didn't find any policy about testing postgre or sqlite before RTBC, but I may have missed it.)

alliax’s picture

Simply make a special IF case for those few not using mysql and be done with it, it's important to speed up this bulk process, thank you.

MathieuMa’s picture

Thanks a lot, this patch saved my day !
I was not aware about that MySQL performance issue, the difference is just amazing ...
(now re-gerating path for a 250000 node site)

maikeru’s picture

Postgres user here.

Sort of have a related issue, in that not having CAST means that bulk generation doesn't work and returns an error:

Ambiguous function: 7 ERROR: function concat(text, integer) is not unique LINE 6: LEFT OUTER JOIN url_alias ua ON CONCAT('taxonomy/term

The above patch removes the error, but the URL alias isn't actually generated.

Instead of just 'CHAR', using 'CHAR(10)' seems to get the alias created nicely.

So from:
$query->leftJoin('url_alias', 'ua', "CONCAT('taxonomy/term/', CAST(td.tid AS CHAR)) = ua.source");

to:
$query->leftJoin('url_alias', 'ua', "CONCAT('taxonomy/term/', CAST(td.tid AS CHAR(10))) = ua.source");

Not sure if that'll affect mysql?

Tested it by deleting a term alias, and then regenerating using the bulk option - haven't tried it with 1000s of nodes.

Countzero’s picture

I can confirm the patch in #1 works like magic.

Also, it's worth noting that optimizing the relevant tables (taxonomy_term_data, taxonomy_index, etc.) can greatly speed up everything, especially if you imported huge volumes.

Another note : you don't have to generate huge amounts of aliases to hit the performance problem. The bulk generation stalled on a dev site with 29 aliases to create. I'm talking about MySQL here.

Thanks to all contributors in this thread, especially to Raines37 for the patch.

Dave Reid’s picture

Status: Reviewed & tested by the community » Needs work

Comment in #9 indicates this needs work for PostgreSQL, which is why I asked in #4 for it to be tested in all three database system supported by Drupal core to ensure compatibility.

thiagogalvao’s picture

This query without CAST is a problem in mysql version minor of 5.5.

jjchinquist’s picture

Hi Dave,

Could not the table structure be changed to reflect fields a bit more? Such as adding columns for entity_bundle, entity_id, entity_revision, etc?
It takes extra space, but then lookups could be done much faster and in a more stable format.

- Jeremy

jweowu’s picture

Issue summary: View changes

(n.b. No actual changes to the issue summary, that I can discern. That was unintentional.)

So in postgres (I'm using version 9.3) CONCAT('taxonomy/term/', CAST(td.tid AS CHAR)) = ua.source results in a query condition like Index Cond: (('taxonomy/term/'::text || ((td.tid)::character(1))::text) = (source)::text)

Note the character(1) -- this is using only the first digit of the term ID.

The suggested (#9) change to: CONCAT('taxonomy/term/', CAST(td.tid AS CHAR(10))) = ua.source will handle numbers up to 10 digits long, which is sufficient for a 32-bit SIGNED (or UNSIGNED) INT column (limits of 2,147,483,647 or 4,294,967,295 respectively). In practice that's probably fine(?), but at minimum it's ugly to have that constraint.

Note that it's only working because the subsequent ::text type-cast is eliminating the trailing spaces on all but the longest numbers. My recollection of MySQL is that by default it uses irritating non-standard CHAR behaviour that doesn't produce the expected padding, so it probably bypasses that particular issue, but I'm unsure whether this is a non-issue for all databases.

To me it seems more correct to use: CONCAT('taxonomy/term/', CAST(td.tid AS VARCHAR)) = ua.source to eliminate the issue of the digit counts and padding entirely.

I don't know whether that's equivalent in MySQL, or if there are other concerns with using VARCHAR? Perhaps we need to specify a VARCHAR(N) for some databases? MySQL has (or once had) a limit of 255, so it may as well be that if we need one.

MySQL's aforementioned non-standard behaviour for CHAR may or may not be dependent on the database configuration. (For CHAR columns it makes a difference; I just don't know whether the same applies to dynamic type-casting.)

https://dev.mysql.com/doc/refman/5.7/en/char.html says:

If strict SQL mode is not enabled and you assign a value to a CHAR or VARCHAR column that exceeds the column's maximum length, the value is truncated to fit and a warning is generated. For truncation of nonspace characters, you can cause an error to occur (rather than a warning) and suppress insertion of the value by using strict SQL mode. See Section 6.1.8, “Server SQL Modes”.

It would be useful to ascertain whether "strict SQL mode" and/or PAD_CHAR_TO_FULL_LENGTH causes any issues with either CAST(td.tid AS CHAR) or CAST(td.tid AS CHAR(10))