I'm running a test on a server for a live site conversion that's supposed to happen very soon, and the regeneration process is sticking at the Initializing process. It's re-creating approximately 110k aliases, so it's not an incredibly huge amount. I've run it fine before on my local machine, and it took a while, but it ran. On the server, however, MySQL is using 95% of only one processor, even though there are four on the box. As it sits now, it's been at the Initializing phase for over an hour. Any thoughts on anything I can do or look at to see why it's getting stuck?

Thanks.

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

wonder95’s picture

For what it's worth here is what the query looks like in a profiler.

wonder95’s picture

Per a tweet from Steven Merrill, it appears that having a function (CONCAT() in this case) in the query is what generates a full table scan. The options appear to be:

  • Re-write the query so it doesn't use a function
  • Get the data into PHP and use it to do a count.
wonder95’s picture

Title: Stuck on initialization when bulk updating aliases » Initialization process uses query that causes full table scan
Category: support » bug
FileSize
5.99 KB

Changing title and category to match issue.

As demonstrated by profiler screen shots here and here, the queries used in taxonomy_pathauto_bulk_update_batch_process() and node_pathauto_bulk_update_batch_process() use queries that cause full table scans because of the use of the CONCAT() MySQL function in the query. The attached patch offloads all that processing to PHP using arrays. With this code, it gets right to the updating functionality and doesn't get stuck on the initialization part.

I'm sure there are ways this could be improved in terms of memory management, so I welcome suggestions on how to improve it. I looked at ways to do it in MySQL but the fact that the "node/" and "taxonomy/term/" strings have to be concatenated with tids and nids, I'm not sure there's another way to do it in the db. I do remember some type of query type in MSSQL and Oracle where you could do a query that is basically a loop through records, but I'd have to do some research on that. Even so, I think CONCAT() would have to be used, unless you want to create the whole src value first and then do an individual query for each row.

Anyway, I'm open to suggestions as to how this could be improved (or why it should be thrown out).

As a side note, in looking at the different process functions (taxonomy, node, forum, user, etc.) they are similar enough that I'm thinking they could be combined into one function. I'll look at that when I get a chance.

Thanks.

Dave Reid’s picture

Yeah unfortunately we're stuck with a crappy core table. I've been thinking about creating an url_alias_entity_index sub-module that can help perform this indexing against {url_alias} and could optionally be used by Pathauto.

sp3boy’s picture

I have a site with >100,000 aliases to rebuild, so I'm interested in this issue.

In the patch above, is it necessary to re-query url_alias every time another batch of 25 nodes is processed?

(Also, don't you need {...} around table names in your SQL text?)

jscm’s picture

+1000000 subscribe
I really need a solution about this hard problem.

jscm’s picture

The patch suggested works better than original pathauto.pathauto.inc queries, in my test website.

it's faster and dosen't require too much resources from server.

jscm’s picture

I tried to patch pathauto.pathauto.inc in my production website.
So, I get an error message with severity 3, that show a query that doesn't exist.

wonder95’s picture

Here's a completely different approach that worked for updating ~80k nodes. It involves three parts:

  1. An index table to store the nodes/users/terms to be aliased.
  2. A drush command to fill the table
  3. Modified queries in the *_pathauto_bulk_update_batch_process() functions to use the index table.

The idea is to run the drush command to fill the index table with any combination of the three types (node, user, and term), and then run the bulk update at admin/build/path/update_bulk (after deleting the appropriate aliases at admin/build/path/delete_bulk). It can take a while to run, but it works.

I only used this process for nodes and terms, but I modified user_pathauto_bulk_update_batch_process() anyway, even though I haven't actually tested it. It should work, though.

Possible improvements:

  • Instead of a drush command, implement the index table filling as part of the generation process.
  • Integrate it all together so that it delete existing aliases and generate url redirects (which would have been ideal in my case, since I had to go back and write some code to create new aliases for all those nodes), similar to the option at admin/build/path/settings.
kenorb’s picture

Issue summary: View changes
Status: Active » Needs review

The last submitted patch, 3: full-table-scan-query-1240760-3.patch, failed testing.

Status: Needs review » Needs work

The last submitted patch, 9: 1240760-bulk-update-index-table.patch, failed testing.

kenorb’s picture

Status: Needs work » Closed (outdated)

Closed because Drupal 6 is no longer supported. If the issue verifiably applies to later versions, please reopen with details and update the version.