I'm currently using 'apachesolr' module as an API module for complex business stuff. I need to call a lot of times the apachesolr_cck_fields().

Until now this functions is totally right, there is a static internal cache. But I think it could be even faster using cache_set() and cache_get() to store this result.

The fact is when my users so complex queries, this function will call the hook_apachesolr_cck_fields_alter() on all modules each time I need to get fields information. Deeper, this means it will be called at each request by my custom module. It's possible to avoid such intensive algorithm. Plus, it could also speed up the 'apachesolr_search' module which build faceted blocks at each page that displays it.

I'd prefer this complex array computed only once, even if it means the module has to wipe out this cache each time a new CCK field is created/altered or a new module enabled or an old one disable.

I attached an imcomplete patch, the only thing missing is an automatic cache wipeout on module enable/disable or cck field alteration.

Comments

robertdouglass’s picture

Status: Needs review » Needs work

Caching is good. Please add the cache wipe if you can.

pounard’s picture

I'll do, as soon as I can free some time to do it:)
The main problem here is I was not able to find any CCK hooks to tell modules that CCK fields have been modified. The last solution to do it is implementing hook_form_alter() on CCK field edit forms and add a custom submit callback. I think this is ugly, but it should work.

BTW, all the CCK fields handling could be in a submodule, it would make the main apachesolr API cleaner, what do you think?

robertdouglass’s picture

The submodule is a good idea in the respect that it forces the main api to have the right flexibility. However, the proliferation of 1,000 small modules is not particularly nice, so maybe an .inc file that gets conditionally included is another option. Reworking the whole CCK integration is a good idea, though.

pounard’s picture

That's a good point, I agree with the fact a submodule would be too much. Is it possible to make the CCK features working using main API hooks instead of embedded code? It would be a nice mixed up solution to get a lighter API - I mean it would avoid specific code in generic API and it'll centralize the CCK feature code to some hooks implementation. Using an include file to put these functions is a good idea.

pounard’s picture

Status: Needs work » Needs review
StatusFileSize
new1.01 KB

Found the right hook, see attached patch.

drewish’s picture

What's the rationale for using = &?:

+  if ($cached = cache_get('apachesolr_cck_fields', 'cache_apachesolr')) {
+    $fields = &$cached->data;
+  }

Seems like $fields = $cached->data would suffice.

pounard’s picture

I don't remember the nature of $cached, but if it's not an object, PHP would copy the variable instead of using the reference.

EDIT: and because my variable's goal only for convenience, I prefer to minimize the impact on performances.

RE-EDIT: and $data->cached is a big array with string as content, if we don't use the &, it will cause PHP to recopy the full array, this copy will be useless because there is no need to protect the $data->cached from changes.

claudiu.cristea’s picture

Title: Simple caching using cache_get() to speed up some pieces of code » Simple caching CCK fields
Version: 6.x-2.x-dev » 5.x-2.x-dev
StatusFileSize
new2.86 KB

Here's a patch for DRUPAL-5--2. It really speeds up the site according to some brief testings.

Unfortunately I had to workaround cache wiping. No hook_content_fieldapi in D5 :-(

The patch adds also an optional argument to the main function that allows returning a single CCK field.

claudiu.cristea’s picture

Removed the prefix "apachesolr_" from CID. While ApacheSolr has its own cache table this is obsolete - true also for other CIDs (apachesolr:luke:*, apachesolr:stats:*, etc...)

claudiu.cristea’s picture

The patch from #5 may be slower compared with code without cache because it doesn't take advantage from microcaching (static caching). You should include the cache_get() code snippet inside if (is_null($fields)) { statement together with and "else" or a "return". Something like...

  if (is_null($fields)) {
    if ($cached = cache_get('apachesolr_cck_fields', 'cache_apachesolr')) {
      $fields = &$cached->data;
    }
    else {
    .......
claudiu.cristea’s picture

Title: Simple caching CCK fields » Cache CCK fields and small API improvement
Version: 5.x-2.x-dev » 6.x-2.x-dev
StatusFileSize
new2.08 KB

Here's a patch against DRUPAL-6--2.

I've added also that API improvement so that custom modules can obtain a single CCK field mapping without loading all the list.

claudiu.cristea’s picture

Small fix... added also return is_null($field) ? $fields : $fields[$field]; to the first return.

pounard’s picture

I think you should have only one 'return' statement within the function, it's I think more readable.

claudiu.cristea’s picture

I didn't want to create a bigger patch... A single return mean introducing an "else" clause...

Anyway... here's the patch with only one "return" and with a standard cache resetting mechanism.

pounard’s picture

Attached a patch:

  • Added some '\n' for readability (more air in the code)
  • Added static variable explicit initialization to 'NULL' constant (sometimes, PHP does affect random memory values to non initialized variables, I already had this bug)
  • Replaced is_null() function call with '===' operator, which is exactly the same thing, without a function call (I don't really know if it makes a difference for PHP)
  • Added a 'else' statement instead of an early return (non mandatory) to have only one return statement in the full method (more readable and if the signature changes in the future, there is only one place to change it), also there is no performance cost at all here

Notice: the fact I added an else statement changes the function's body indentation, so the patch is kinda heavy, but the whole function's business part remains unchanged.

EDIT: is seems our posts have crossed each other.

pounard’s picture

Fixed a typo error.

Also changed the return statement replacing is_null() for $field variable to empty() function call (maybe more secure, people could have passed FALSE or '' in method signature instead of NULL constant.

claudiu.cristea’s picture

@pounard

Added static variable explicit initialization to 'NULL' constant (sometimes, PHP does affect random memory values to non initialized variables, I already had this bug)

We don't need this... Simply "static $fields;" and then test with isset() - like I did in my patch #14.

Replaced is_null() function call with '===' operator, which is exactly the same thing, without a function call (I don't really know if it makes a difference for PHP)

Using is_null() seems more clear...

Added a 'else' statement instead of an early return (non mandatory) to have only one return statement in the full method (more readable and if the signature changes in the future, there is only one place to change it), also there is no performance cost at all here

Agree...

Also changed the return statement replacing is_null() for $field variable to empty() function call (maybe more secure, people could have passed FALSE or '' in method signature instead of NULL constant.
Attachment

Hmmm.... This is an extra precaution... I don't think that is necessary... I vote again for is_null()... It's more redable...

The patch is missing a cache refreshing mechanism...

I remain on #14.

pounard’s picture

I prefer the empty() because the $field is user input. It could be provided by a HTTP request in case it could be just an empty string instead of the NULL constant. This extra precaution is I think really necessary because else where in Drupal core and modules NULL input is pretty much the same as '' or 0.

The NULL init is a coding style, in many languages it's always recommended to initialize all variables, but in PHP as you say there the 'isset' keyword so why not.

EDIT: Plus, I prefer the === operator which is a generic operator that says "Is the same instance as". I think using the is_null() is wrong because as it is not an operator even a keyword, but an API function, it makes it less clear that using a real operator.

A real operator is much clear than an API function call which is, a function (which implies a function call btw) not a keyword nor an operator.

The patch IS NOT missing the cache refresh, see at end of file.

Re-EDIT: stroked some text which has no point since I'm OK with your static non init and the isset() usage.

pounard’s picture

@claudiu.cristea
Reading what you said, I did another patch, I reverted the static $fields init / isset() usage stuff.
I let the empty() stuff in the return statement (with the isset, there is no '===' operator usage at all, so it resolves this point).

pounard’s picture

@claudiu.cristea
What about the:

    if ($cache = cache_get('cck_fields', 'cache_apachesolr')) {
      $fields = &$cache->data;
    }

I think we should keep the '&' to avoid the full array copy.

claudiu.cristea’s picture

The patch IS NOT missing the cache refresh, see at end of file.

"Cache refresh" is not the same thing as "cache wipe". The final part of the patch is about cache clearing. There are cases when you want to force loading the real value not the chached ones. This is the reason for the additional $reset argument. Setting $reset to TRUE will force rebuilding the CCK mappings and will refresh the cache. You can also use apachesolr_cck_fields(NULL, TRUE) when you want to refresh the static variable in the middle of a request...

I cannot see an immediate usage of the $reset argument but is classical approach to caching in Drupal.

About $fields = &$cache->data; I don't know what to say... It may be a memory saving. Other opinions?

pounard’s picture

Oh sorry, I misread the refresh part.
Indeed there is no urge, but it would be a good thing to do it right now, while we are talking about this.

pounard’s picture

Same patch with refresh, keeping array recopy instead of using reference.

pounard’s picture

Even better, using () in there would be better:

if (! $reset && ($cache = cache_get('cck_fields', 'cache_apachesolr'))) {

Instead of:

if (! $reset && $cache = cache_get('cck_fields', 'cache_apachesolr')) {
claudiu.cristea’s picture

Status: Needs review » Reviewed & tested by the community

With

  return empty($field) ? $fields : $fields[$field];

or with

  return is_null($field) ? $fields : $fields[$field];

it's RTBC... we'll let the branch maintainer to decide this while on the other points we all are on the same solution.

pounard’s picture

Agree, thanks for your time.

robertdouglass’s picture

Branch maintainer is paying attention. I'm working on code for cck date facets that touches this and solves similar problems, so thanks for the work... I'll be looking at it closely in that context, soon.

claudiu.cristea’s picture

Version: 6.x-2.x-dev » 5.x-2.x-dev
claudiu.cristea’s picture

Version: 5.x-2.x-dev » 6.x-2.x-dev
Status: Reviewed & tested by the community » Patch (to be ported)
StatusFileSize
new9.87 KB

The attached patch was tested and committed against DRUPAL-5--2 in #326750 and #326752.

Switching to 6.x-2.x-dev for porting.

robertdouglass’s picture

Status: Patch (to be ported) » Needs work

Needs a re-roll.

jpmckinney’s picture

Status: Needs work » Patch (to be ported)

I assume this should also go in to 6-1.

6-2 has parts of this due to the fubar in #551582: Show value instead of key in CCK facets

jpmckinney’s picture

Title: Cache CCK fields and small API improvement » Cache fields info (and small API improvement)
Version: 6.x-2.x-dev » 7.x-1.x-dev

Do we want this? It has only been applied to 5.x so far.

pounard’s picture

D7 API probably has changed too much to set this issue for 7.x release. I probably think that, if the same problem is spotted on D7 then it should have its own issue. This particular bug was CCK specific when created.

pwolanin’s picture

Well, I think it's valid to ask whether this can be cached fully built to avoid all the extra field info invocations in 7, and cck in 6.

The main pain point seems to be properly catching the cases that require the cache to be cleared.

pounard’s picture

My patch, a year ago for D6 took care of the cache clear. I can't tell you with D7 and Field API, but there hooks, thousands of hooks, millions of hooks in D7, there is one out there that must help you for this! Basically you have to clear your cache at cache clear all time and when fields are being modified.

nick_vh’s picture

Status: Patch (to be ported) » Closed (won't fix)

Closing this because of it's ago and non-relevance of the issue towards the D7 version. If one feels that this is still relevant, please open a new issue with a summary of the problem directed towards D7