Support for Drupal 7 is ending on 5 January 2025—it’s time to migrate to Drupal 10! Learn about the many benefits of Drupal 10 and find migration tools in our resource center.
I'm running a test on a server for a live site conversion that's supposed to happen very soon, and the regeneration process is sticking at the Initializing process. It's re-creating approximately 110k aliases, so it's not an incredibly huge amount. I've run it fine before on my local machine, and it took a while, but it ran. On the server, however, MySQL is using 95% of only one processor, even though there are four on the box. As it sits now, it's been at the Initializing phase for over an hour. Any thoughts on anything I can do or look at to see why it's getting stuck?
Thanks.
Comment | File | Size | Author |
---|---|---|---|
#9 | 1240760-bulk-update-index-table.patch | 7.65 KB | wonder95 |
| |||
#3 | full-table-scan-query-1240760-3.patch | 5.99 KB | wonder95 |
|
Comments
Comment #1
wonder95 CreditAttribution: wonder95 commentedFor what it's worth here is what the query looks like in a profiler.
Comment #2
wonder95 CreditAttribution: wonder95 commentedPer a tweet from Steven Merrill, it appears that having a function (CONCAT() in this case) in the query is what generates a full table scan. The options appear to be:
Comment #3
wonder95 CreditAttribution: wonder95 commentedChanging title and category to match issue.
As demonstrated by profiler screen shots here and here, the queries used in taxonomy_pathauto_bulk_update_batch_process() and node_pathauto_bulk_update_batch_process() use queries that cause full table scans because of the use of the CONCAT() MySQL function in the query. The attached patch offloads all that processing to PHP using arrays. With this code, it gets right to the updating functionality and doesn't get stuck on the initialization part.
I'm sure there are ways this could be improved in terms of memory management, so I welcome suggestions on how to improve it. I looked at ways to do it in MySQL but the fact that the "node/" and "taxonomy/term/" strings have to be concatenated with tids and nids, I'm not sure there's another way to do it in the db. I do remember some type of query type in MSSQL and Oracle where you could do a query that is basically a loop through records, but I'd have to do some research on that. Even so, I think CONCAT() would have to be used, unless you want to create the whole src value first and then do an individual query for each row.
Anyway, I'm open to suggestions as to how this could be improved (or why it should be thrown out).
As a side note, in looking at the different process functions (taxonomy, node, forum, user, etc.) they are similar enough that I'm thinking they could be combined into one function. I'll look at that when I get a chance.
Thanks.
Comment #4
Dave ReidYeah unfortunately we're stuck with a crappy core table. I've been thinking about creating an url_alias_entity_index sub-module that can help perform this indexing against {url_alias} and could optionally be used by Pathauto.
Comment #5
sp3boy CreditAttribution: sp3boy commentedI have a site with >100,000 aliases to rebuild, so I'm interested in this issue.
In the patch above, is it necessary to re-query url_alias every time another batch of 25 nodes is processed?
(Also, don't you need {...} around table names in your SQL text?)
Comment #6
jscm CreditAttribution: jscm commented+1000000 subscribe
I really need a solution about this hard problem.
Comment #7
jscm CreditAttribution: jscm commentedThe patch suggested works better than original pathauto.pathauto.inc queries, in my test website.
it's faster and dosen't require too much resources from server.
Comment #8
jscm CreditAttribution: jscm commentedI tried to patch pathauto.pathauto.inc in my production website.
So, I get an error message with severity 3, that show a query that doesn't exist.
Comment #9
wonder95 CreditAttribution: wonder95 commentedHere's a completely different approach that worked for updating ~80k nodes. It involves three parts:
The idea is to run the drush command to fill the index table with any combination of the three types (node, user, and term), and then run the bulk update at admin/build/path/update_bulk (after deleting the appropriate aliases at admin/build/path/delete_bulk). It can take a while to run, but it works.
I only used this process for nodes and terms, but I modified user_pathauto_bulk_update_batch_process() anyway, even though I haven't actually tested it. It should work, though.
Possible improvements:
Comment #10
kenorb CreditAttribution: kenorb commentedComment #13
kenorb CreditAttribution: kenorb commentedClosed because Drupal 6 is no longer supported. If the issue verifiably applies to later versions, please reopen with details and update the version.