I'm currently using 'apachesolr' module as an API module for complex business stuff. I need to call a lot of times the apachesolr_cck_fields().
Until now this functions is totally right, there is a static internal cache. But I think it could be even faster using cache_set() and cache_get() to store this result.
The fact is when my users so complex queries, this function will call the hook_apachesolr_cck_fields_alter() on all modules each time I need to get fields information. Deeper, this means it will be called at each request by my custom module. It's possible to avoid such intensive algorithm. Plus, it could also speed up the 'apachesolr_search' module which build faceted blocks at each page that displays it.
I'd prefer this complex array computed only once, even if it means the module has to wipe out this cache each time a new CCK field is created/altered or a new module enabled or an old one disable.
I attached an imcomplete patch, the only thing missing is an automatic cache wipeout on module enable/disable or cck field alteration.
Comments
Comment #1
robertdouglass commentedCaching is good. Please add the cache wipe if you can.
Comment #2
pounardI'll do, as soon as I can free some time to do it:)
The main problem here is I was not able to find any CCK hooks to tell modules that CCK fields have been modified. The last solution to do it is implementing hook_form_alter() on CCK field edit forms and add a custom submit callback. I think this is ugly, but it should work.
BTW, all the CCK fields handling could be in a submodule, it would make the main apachesolr API cleaner, what do you think?
Comment #3
robertdouglass commentedThe submodule is a good idea in the respect that it forces the main api to have the right flexibility. However, the proliferation of 1,000 small modules is not particularly nice, so maybe an .inc file that gets conditionally included is another option. Reworking the whole CCK integration is a good idea, though.
Comment #4
pounardThat's a good point, I agree with the fact a submodule would be too much. Is it possible to make the CCK features working using main API hooks instead of embedded code? It would be a nice mixed up solution to get a lighter API - I mean it would avoid specific code in generic API and it'll centralize the CCK feature code to some hooks implementation. Using an include file to put these functions is a good idea.
Comment #5
pounardFound the right hook, see attached patch.
Comment #6
drewish commentedWhat's the rationale for using = &?:
Seems like $fields = $cached->data would suffice.
Comment #7
pounardI don't remember the nature of $cached, but if it's not an object, PHP would copy the variable instead of using the reference.
EDIT: and because my variable's goal only for convenience, I prefer to minimize the impact on performances.
RE-EDIT: and $data->cached is a big array with string as content, if we don't use the &, it will cause PHP to recopy the full array, this copy will be useless because there is no need to protect the $data->cached from changes.
Comment #8
claudiu.cristeaHere's a patch for DRUPAL-5--2. It really speeds up the site according to some brief testings.
Unfortunately I had to workaround cache wiping. No hook_content_fieldapi in D5 :-(
The patch adds also an optional argument to the main function that allows returning a single CCK field.
Comment #9
claudiu.cristeaRemoved the prefix "apachesolr_" from CID. While ApacheSolr has its own cache table this is obsolete - true also for other CIDs (apachesolr:luke:*, apachesolr:stats:*, etc...)
Comment #10
claudiu.cristeaThe patch from #5 may be slower compared with code without cache because it doesn't take advantage from microcaching (static caching). You should include the cache_get() code snippet inside
if (is_null($fields)) {statement together with and "else" or a "return". Something like...Comment #11
claudiu.cristeaHere's a patch against DRUPAL-6--2.
I've added also that API improvement so that custom modules can obtain a single CCK field mapping without loading all the list.
Comment #12
claudiu.cristeaSmall fix... added also
return is_null($field) ? $fields : $fields[$field];to the first return.Comment #13
pounardI think you should have only one 'return' statement within the function, it's I think more readable.
Comment #14
claudiu.cristeaI didn't want to create a bigger patch... A single return mean introducing an "else" clause...
Anyway... here's the patch with only one "return" and with a standard cache resetting mechanism.
Comment #15
pounardAttached a patch:
Notice: the fact I added an else statement changes the function's body indentation, so the patch is kinda heavy, but the whole function's business part remains unchanged.EDIT: is seems our posts have crossed each other.
Comment #16
pounardFixed a typo error.
Also changed the return statement replacing is_null() for $field variable to empty() function call (maybe more secure, people could have passed FALSE or '' in method signature instead of NULL constant.
Comment #17
claudiu.cristea@pounard
We don't need this... Simply "static $fields;" and then test with isset() - like I did in my patch #14.
Using is_null() seems more clear...
Agree...
Hmmm.... This is an extra precaution... I don't think that is necessary... I vote again for is_null()... It's more redable...
The patch is missing a cache refreshing mechanism...
I remain on #14.
Comment #18
pounardI prefer the empty() because the $field is user input. It could be provided by a HTTP request in case it could be just an empty string instead of the NULL constant. This extra precaution is I think really necessary because else where in Drupal core and modules NULL input is pretty much the same as '' or 0.
The NULL init is a coding style, in many languages it's always recommended to initialize all variables, but in PHP as you say there the 'isset' keyword so why not.
EDIT: Plus, I prefer the === operator which is a generic operator that says "Is the same instance as". I think using the is_null() is wrong because as it is not an operator even a keyword, but an API function, it makes it less clear that using a real operator.A real operator is much clear than an API function call which is, a function (which implies a function call btw) not a keyword nor an operator.The patch IS NOT missing the cache refresh, see at end of file.
Re-EDIT: stroked some text which has no point since I'm OK with your static non init and the isset() usage.
Comment #19
pounard@claudiu.cristea
Reading what you said, I did another patch, I reverted the static $fields init / isset() usage stuff.
I let the empty() stuff in the return statement (with the isset, there is no '===' operator usage at all, so it resolves this point).
Comment #20
pounard@claudiu.cristea
What about the:
I think we should keep the '&' to avoid the full array copy.
Comment #21
claudiu.cristea"Cache refresh" is not the same thing as "cache wipe". The final part of the patch is about cache clearing. There are cases when you want to force loading the real value not the chached ones. This is the reason for the additional $reset argument. Setting $reset to TRUE will force rebuilding the CCK mappings and will refresh the cache. You can also use apachesolr_cck_fields(NULL, TRUE) when you want to refresh the static variable in the middle of a request...
I cannot see an immediate usage of the $reset argument but is classical approach to caching in Drupal.
About $fields = &$cache->data; I don't know what to say... It may be a memory saving. Other opinions?
Comment #22
pounardOh sorry, I misread the refresh part.
Indeed there is no urge, but it would be a good thing to do it right now, while we are talking about this.
Comment #23
pounardSame patch with refresh, keeping array recopy instead of using reference.
Comment #24
pounardEven better, using () in there would be better:
Instead of:
Comment #25
claudiu.cristeaWith
or with
it's RTBC... we'll let the branch maintainer to decide this while on the other points we all are on the same solution.
Comment #26
pounardAgree, thanks for your time.
Comment #27
robertdouglass commentedBranch maintainer is paying attention. I'm working on code for cck date facets that touches this and solves similar problems, so thanks for the work... I'll be looking at it closely in that context, soon.
Comment #28
claudiu.cristeaComment #29
claudiu.cristeaThe attached patch was tested and committed against DRUPAL-5--2 in #326750 and #326752.
Switching to 6.x-2.x-dev for porting.
Comment #30
robertdouglass commentedNeeds a re-roll.
Comment #31
jpmckinney commentedI assume this should also go in to 6-1.
6-2 has parts of this due to the fubar in #551582: Show value instead of key in CCK facets
Comment #32
jpmckinney commentedDo we want this? It has only been applied to 5.x so far.
Comment #33
pounardD7 API probably has changed too much to set this issue for 7.x release. I probably think that, if the same problem is spotted on D7 then it should have its own issue. This particular bug was CCK specific when created.
Comment #34
pwolanin commentedWell, I think it's valid to ask whether this can be cached fully built to avoid all the extra field info invocations in 7, and cck in 6.
The main pain point seems to be properly catching the cases that require the cache to be cleared.
Comment #35
pounardMy patch, a year ago for D6 took care of the cache clear. I can't tell you with D7 and Field API, but there hooks, thousands of hooks, millions of hooks in D7, there is one out there that must help you for this! Basically you have to clear your cache at cache clear all time and when fields are being modified.
Comment #36
nick_vhClosing this because of it's ago and non-relevance of the issue towards the D7 version. If one feels that this is still relevant, please open a new issue with a summary of the problem directed towards D7