There is already code in hook_nodeapi to ensure that pathauto only does work if it needs to. We should add similar code to hook_taxonomy.

Comments

BenK’s picture

Subscribing.... this issue is important if you're using a tool like Open Calais which can generate a lot of taxonomy terms. Path aliasing all of those terms can really bog down your server.

--Ben

dman’s picture

StatusFileSize
new34.73 KB

This is awful :-B

I've just run benchmarks for my taxonomy_xml (bulk imports) that among other things
- makes an HTTP request to a remote server
- does XML parsing, RDF analysis
- term creation, re-linking, synonyms
- database lookups for pre-existing terms
- progress feedback and heaps of other goodies
- in a batch environment

... so it's working pretty hard, and pushes timeouts sometimes.
I looked for the bottleneck:

pathauto is responsible for over 50% of my execution time
EVEN WHEN IT'S DISABLED

O_o

This is what happens even with taxonomy pattern blank:

On an import that only had 200 terms.
over 200,000(!) of those t() calls there were from pathauto_punctuation_chars()

It's all because the tokens are generated even when they are never used, and the token generation seems to be outrageously expensive.
[200 calls to ] pathauto_taxonomy()
-> _taxonomy_pathauto_alias()
-> pathauto_get_placeholders()
-> [6,400 calls to] pathauto_clean_token_values()
-> pathauto_cleanstring()
... 230 000 calls to variable get, the same to t() all attributed to this chain of events.

And only after all that does pathauto_create_alias() realize the pattern is blank and return nothing.

Holy ****!

I'm going to have to get taxonomy_xml declare an incompatability with pathauto and brutally disable it if found :-)

IF, however, we were to abstract the first few lines of pathauto_create_alias
// Retrieve and apply the pattern for this content type
...

down to a smaller utility function, we could call that check before all this heavy lifting.

greggles’s picture

That's a lot of ranting for someone who didn't even provide a patch.

dman’s picture

Yep,
I sorta thought that one string manipulation callback taking more processing than the rest of Drupal core put together was indeed rant-worthy. If you were not at all entertained or convinced that it's an issue, *shrug*

But following that, I went to the trouble of tracing it all, and suggested precisely where I thought the fix could go.
Clearly proposing the patch before writing it and pushing it out for review seemed like a good start.

Not much point patching randomly if the approach is not going to be accepted or anyone else knows the internals of the system better and has a better fix.

If you are saying that the patch I described above is the best fix and will be accepted, then I could throw one out.
I strongly suspected that someone else may have a better idea than mine. I guess I was wrong. :-B

greggles’s picture

The hook_nodeapi handles this well, and yes, we could do that even better by abstracting the code from the top of pathauto_create_alias into its own function and re-using that in each of the functions that calls pathauto_create_alias before hand.

For the 6.x-2.x/7.x version of this patch we should change the function signatures around a bit so we're only calling that function once and then using the return value as an argument to pathauto_create_alias.

dman’s picture

Yeah, it was either repeat the code or a change to the function signatures as far as I could see too. Neither desirable.
But still better to run the same code twice - from a new utility function - than loop into those dark depths the profiler found.

I just looked at pathauto_nodeapi, and I see what you mean.
we just need a pathauto_get_pattern() function called in a few places and it'll be tidier for everyone.

I gotta get some sleep, but I'll see if I can do that soonish.

dave reid’s picture

Assigned: Unassigned » dave reid

Picking this up. We could benefit from a general 'load pattern' function that I've been working on for the 7.x revamp:

function pathauto_pattern_load_by_entity($entity, $bundle, $language = LANGUAGE_NONE) {
  $patterns = array();
  if ($language != LANGUAGE_NONE) {
    // Only check language pattern if it is not the default langauge.
    $patterns[] = "pathauto_{$entity}_{$bundle}_{$language}_pattern";
  }
  $patterns[] = "pathauto_{$entity}_{$bundle}_pattern";
  $patterns[] = "pathauto_{$entity}_pattern";

  foreach ($patterns as $variable) {
    if ($pattern = trim(variable_get($variable, ''))) {
      return $pattern;
    }
  }

  return '';
}
dave reid’s picture

Status: Active » Needs review
StatusFileSize
new5.77 KB
new6.06 KB

Patches for 7.x-1.x and 6.x-2.x for review.

dave reid’s picture

This helps cut down a lot of the code we end up re-using.

dave reid’s picture

Quick revision after checking some of the logic to get the node's language in pathauto_form_alter, that $node->language is used for both $form['language']['#value'] and $form['language']['#default_value'].

dave reid’s picture

Version: 6.x-1.x-dev » 7.x-1.x-dev
dave reid’s picture

Status: Needs review » Fixed

Committed #10 with tests to 7.x-1.x and 6.x-2.x.

Status: Fixed » Closed (fixed)

Automatically closed -- issue fixed for 2 weeks with no activity.