If there is no taxonomy pattern, don't do any work

greggles - September 9, 2009 - 13:22
Project:Pathauto
Version:6.x-1.x-dev
Component:Code
Category:task
Priority:normal
Assigned:Unassigned
Status:active
Description

There is already code in hook_nodeapi to ensure that pathauto only does work if it needs to. We should add similar code to hook_taxonomy.

#1

BenK - October 25, 2009 - 18:54

Subscribing.... this issue is important if you're using a tool like Open Calais which can generate a lot of taxonomy terms. Path aliasing all of those terms can really bog down your server.

--Ben

#2

dman - November 24, 2009 - 01:04

This is awful :-B

I've just run benchmarks for my taxonomy_xml (bulk imports) that among other things
- makes an HTTP request to a remote server
- does XML parsing, RDF analysis
- term creation, re-linking, synonyms
- database lookups for pre-existing terms
- progress feedback and heaps of other goodies
- in a batch environment

... so it's working pretty hard, and pushes timeouts sometimes.
I looked for the bottleneck:

pathauto is responsible for over 50% of my execution time
EVEN WHEN IT'S DISABLED

O_o

This is what happens even with taxonomy pattern blank:

On an import that only had 200 terms.
over 200,000(!) of those t() calls there were from pathauto_punctuation_chars()

It's all because the tokens are generated even when they are never used, and the token generation seems to be outrageously expensive.
[200 calls to ] pathauto_taxonomy()
-> _taxonomy_pathauto_alias()
-> pathauto_get_placeholders()
-> [6,400 calls to] pathauto_clean_token_values()
-> pathauto_cleanstring()
... 230 000 calls to variable get, the same to t() all attributed to this chain of events.

And only after all that does pathauto_create_alias() realize the pattern is blank and return nothing.

Holy ****!

I'm going to have to get taxonomy_xml declare an incompatability with pathauto and brutally disable it if found :-)

IF, however, we were to abstract the first few lines of pathauto_create_alias
// Retrieve and apply the pattern for this content type
...

down to a smaller utility function, we could call that check before all this heavy lifting.

AttachmentSize
pathauto_cachegrind.png 34.73 KB

#3

greggles - November 24, 2009 - 04:05

That's a lot of ranting for someone who didn't even provide a patch.

#4

dman - November 24, 2009 - 06:05

Yep,
I sorta thought that one string manipulation callback taking more processing than the rest of Drupal core put together was indeed rant-worthy. If you were not at all entertained or convinced that it's an issue, *shrug*

But following that, I went to the trouble of tracing it all, and suggested precisely where I thought the fix could go.
Clearly proposing the patch before writing it and pushing it out for review seemed like a good start.

Not much point patching randomly if the approach is not going to be accepted or anyone else knows the internals of the system better and has a better fix.

If you are saying that the patch I described above is the best fix and will be accepted, then I could throw one out.
I strongly suspected that someone else may have a better idea than mine. I guess I was wrong. :-B

#5

greggles - November 24, 2009 - 12:45

The hook_nodeapi handles this well, and yes, we could do that even better by abstracting the code from the top of pathauto_create_alias into its own function and re-using that in each of the functions that calls pathauto_create_alias before hand.

For the 6.x-2.x/7.x version of this patch we should change the function signatures around a bit so we're only calling that function once and then using the return value as an argument to pathauto_create_alias.

#6

dman - November 24, 2009 - 15:11

Yeah, it was either repeat the code or a change to the function signatures as far as I could see too. Neither desirable.
But still better to run the same code twice - from a new utility function - than loop into those dark depths the profiler found.

I just looked at pathauto_nodeapi, and I see what you mean.
we just need a pathauto_get_pattern() function called in a few places and it'll be tidier for everyone.

I gotta get some sleep, but I'll see if I can do that soonish.

 
 

Drupal is a registered trademark of Dries Buytaert.