If there is no taxonomy pattern, don't do any work
greggles - September 9, 2009 - 13:22
| Project: | Pathauto |
| Version: | 6.x-1.x-dev |
| Component: | Code |
| Category: | task |
| Priority: | normal |
| Assigned: | Unassigned |
| Status: | active |
Description
There is already code in hook_nodeapi to ensure that pathauto only does work if it needs to. We should add similar code to hook_taxonomy.

#1
Subscribing.... this issue is important if you're using a tool like Open Calais which can generate a lot of taxonomy terms. Path aliasing all of those terms can really bog down your server.
--Ben
#2
This is awful :-B
I've just run benchmarks for my taxonomy_xml (bulk imports) that among other things
- makes an HTTP request to a remote server
- does XML parsing, RDF analysis
- term creation, re-linking, synonyms
- database lookups for pre-existing terms
- progress feedback and heaps of other goodies
- in a batch environment
... so it's working pretty hard, and pushes timeouts sometimes.
I looked for the bottleneck:
pathauto is responsible for over 50% of my execution time
EVEN WHEN IT'S DISABLED
O_o
This is what happens even with taxonomy pattern blank:

On an import that only had 200 terms.
over 200,000(!) of those t() calls there were from pathauto_punctuation_chars()
It's all because the tokens are generated even when they are never used, and the token generation seems to be outrageously expensive.
[200 calls to ] pathauto_taxonomy()
-> _taxonomy_pathauto_alias()
-> pathauto_get_placeholders()
-> [6,400 calls to] pathauto_clean_token_values()
-> pathauto_cleanstring()
... 230 000 calls to variable get, the same to t() all attributed to this chain of events.
And only after all that does pathauto_create_alias() realize the pattern is blank and return nothing.
Holy ****!
I'm going to have to get taxonomy_xml declare an incompatability with pathauto and brutally disable it if found :-)
IF, however, we were to abstract the first few lines of pathauto_create_alias
// Retrieve and apply the pattern for this content type
...
down to a smaller utility function, we could call that check before all this heavy lifting.
#3
That's a lot of ranting for someone who didn't even provide a patch.
#4
Yep,
I sorta thought that one string manipulation callback taking more processing than the rest of Drupal core put together was indeed rant-worthy. If you were not at all entertained or convinced that it's an issue, *shrug*
But following that, I went to the trouble of tracing it all, and suggested precisely where I thought the fix could go.
Clearly proposing the patch before writing it and pushing it out for review seemed like a good start.
Not much point patching randomly if the approach is not going to be accepted or anyone else knows the internals of the system better and has a better fix.
If you are saying that the patch I described above is the best fix and will be accepted, then I could throw one out.
I strongly suspected that someone else may have a better idea than mine. I guess I was wrong. :-B
#5
The hook_nodeapi handles this well, and yes, we could do that even better by abstracting the code from the top of pathauto_create_alias into its own function and re-using that in each of the functions that calls pathauto_create_alias before hand.
For the 6.x-2.x/7.x version of this patch we should change the function signatures around a bit so we're only calling that function once and then using the return value as an argument to pathauto_create_alias.
#6
Yeah, it was either repeat the code or a change to the function signatures as far as I could see too. Neither desirable.
But still better to run the same code twice - from a new utility function - than loop into those dark depths the profiler found.
I just looked at pathauto_nodeapi, and I see what you mean.
we just need a pathauto_get_pattern() function called in a few places and it'll be tidier for everyone.
I gotta get some sleep, but I'll see if I can do that soonish.