Closed (fixed)
Project:
Pathauto
Version:
7.x-1.x-dev
Component:
Code
Priority:
Normal
Category:
Task
Assigned:
Reporter:
Created:
9 Sep 2009 at 13:22 UTC
Updated:
16 Mar 2010 at 23:10 UTC
Jump to comment: Most recent, Most recent file
There is already code in hook_nodeapi to ensure that pathauto only does work if it needs to. We should add similar code to hook_taxonomy.
| Comment | File | Size | Author |
|---|---|---|---|
| #10 | 572604-pathauto-load-pattern-D7.patch | 5.73 KB | dave reid |
| #10 | 572604-pathauto-load-pattern-D6.patch | 5.96 KB | dave reid |
| #8 | 572604-pathauto-load-pattern-D6.patch | 6.06 KB | dave reid |
| #8 | 572604-pathauto-load-pattern-D7.patch | 5.77 KB | dave reid |
| #2 | pathauto_cachegrind.png | 34.73 KB | dman |
Comments
Comment #1
BenK commentedSubscribing.... this issue is important if you're using a tool like Open Calais which can generate a lot of taxonomy terms. Path aliasing all of those terms can really bog down your server.
--Ben
Comment #2
dman commentedThis is awful :-B
I've just run benchmarks for my taxonomy_xml (bulk imports) that among other things
- makes an HTTP request to a remote server
- does XML parsing, RDF analysis
- term creation, re-linking, synonyms
- database lookups for pre-existing terms
- progress feedback and heaps of other goodies
- in a batch environment
... so it's working pretty hard, and pushes timeouts sometimes.
I looked for the bottleneck:
pathauto is responsible for over 50% of my execution time
EVEN WHEN IT'S DISABLED
O_o
This is what happens even with taxonomy pattern blank:

On an import that only had 200 terms.
over 200,000(!) of those t() calls there were from pathauto_punctuation_chars()
It's all because the tokens are generated even when they are never used, and the token generation seems to be outrageously expensive.
[200 calls to ] pathauto_taxonomy()
-> _taxonomy_pathauto_alias()
-> pathauto_get_placeholders()
-> [6,400 calls to] pathauto_clean_token_values()
-> pathauto_cleanstring()
... 230 000 calls to variable get, the same to t() all attributed to this chain of events.
And only after all that does pathauto_create_alias() realize the pattern is blank and return nothing.
Holy ****!
I'm going to have to get taxonomy_xml declare an incompatability with pathauto and brutally disable it if found :-)
IF, however, we were to abstract the first few lines of pathauto_create_alias
// Retrieve and apply the pattern for this content type
...
down to a smaller utility function, we could call that check before all this heavy lifting.
Comment #3
gregglesThat's a lot of ranting for someone who didn't even provide a patch.
Comment #4
dman commentedYep,
I sorta thought that one string manipulation callback taking more processing than the rest of Drupal core put together was indeed rant-worthy. If you were not at all entertained or convinced that it's an issue, *shrug*
But following that, I went to the trouble of tracing it all, and suggested precisely where I thought the fix could go.
Clearly proposing the patch before writing it and pushing it out for review seemed like a good start.
Not much point patching randomly if the approach is not going to be accepted or anyone else knows the internals of the system better and has a better fix.
If you are saying that the patch I described above is the best fix and will be accepted, then I could throw one out.
I strongly suspected that someone else may have a better idea than mine. I guess I was wrong. :-B
Comment #5
gregglesThe hook_nodeapi handles this well, and yes, we could do that even better by abstracting the code from the top of pathauto_create_alias into its own function and re-using that in each of the functions that calls pathauto_create_alias before hand.
For the 6.x-2.x/7.x version of this patch we should change the function signatures around a bit so we're only calling that function once and then using the return value as an argument to pathauto_create_alias.
Comment #6
dman commentedYeah, it was either repeat the code or a change to the function signatures as far as I could see too. Neither desirable.
But still better to run the same code twice - from a new utility function - than loop into those dark depths the profiler found.
I just looked at pathauto_nodeapi, and I see what you mean.
we just need a pathauto_get_pattern() function called in a few places and it'll be tidier for everyone.
I gotta get some sleep, but I'll see if I can do that soonish.
Comment #7
dave reidPicking this up. We could benefit from a general 'load pattern' function that I've been working on for the 7.x revamp:
Comment #8
dave reidPatches for 7.x-1.x and 6.x-2.x for review.
Comment #9
dave reidThis helps cut down a lot of the code we end up re-using.
Comment #10
dave reidQuick revision after checking some of the logic to get the node's language in pathauto_form_alter, that $node->language is used for both $form['language']['#value'] and $form['language']['#default_value'].
Comment #11
dave reidComment #12
dave reidCommitted #10 with tests to 7.x-1.x and 6.x-2.x.