Scan: Cache results

sun - May 18, 2009 - 16:41
Project:Libraries API
Version:7.x-1.x-dev
Component:Code
Category:task
Priority:normal
Assigned:Unassigned
Status:active
Issue tags:Libraries
Description

#320562-46: Libraries API - we want to cache the results of available libraries.

But when is the cache invalidated?

#1

markus_petrux - May 18, 2009 - 17:25

hmm... let's say modules that need a library need to implement hook_libraries_info(), maybe something like this:

<?php
function wysiwyg_libraries_info() {
  return array(
   
'fckeditor',
   
'tinymce',
  );
}
?>

Now, libraries API knows what it will be requested to find at one time or another. For example, Wysiwyg API when working to set up the node edit form, or whatever else form, could ask for the path to 'tinymce' like this:

<?php
$path
= libraries_get_path('tinymce');
?>

Here, Libraries API first would check the cached repository. If a match is not found, then a look up against the file system is triggered (this look up would search for the library in sites/all, sites/domain, etc.). If that scan fails, libraries_get_path() returns FALSE. If that scan succeeds, it caches the result and returns the path.

If a match is found in the libraries cache, it could try a simple file_exists() to make sure the library is still there. If not, then the cached information would be removed, and FALSE would be returned. If a match is found in the cache and confirmed by file_exists(), then the path would be returned.

With this method, every invocation to libraries_get_path() would mean a minimum request to the file system (file_exists()), a scan against sites/all/thing, sites/domain/thing, at a maximum.

libraries_get_path() could also have a second argument so that the caller can bypass file_exists() confirmation. Or the other way around, by default file_exists() confirmation is not performed when a match is found in the cache. The file system look up could only be forced by the calling module on the admin side or similar.

Finally, this needs some kind of garbage collector to clear cached stuff that will never be requested again, and that is not present on the file system. Maybe here the libraries API could detect when a module is disabled, and if it was implementing hook_libraries_info(), then try to clean the related libraries. Or maybe using a cron task that executed once a day or so.

Not sure if I'm missing something. :-|

#2

sun - May 19, 2009 - 01:12

- We probably want to cache the currently available libraries permanently until an administration page is visited. In Wysiwyg API's case, that would be admin/settings/wysiwyg/*. I'd do something like drupal_flush_libraries() [ideal example] there.

- We want to prepare this cache, so a minimum of additional processing is done at (regular) runtime.

- We probably want to allow partial cache flushes, i.e. drupal_flush_libraries('wysiwyg'), which invokes wysiwyg_libraries_info(), removes only those entries from the cache, and cache_set()'s back the remaining result (via hook_exit() ?).

- libraries_get_library('foo') gets the current cache of all libraries; if no cache exists, uses library_get_path() to scan for that library, determines its version, performs dependency checks, executes an optional 'load callback' (or similar; to allow modules like Wysiwyg to attach further properties) and stores back the cache.

- libraries_get_path() probably does not have to be cached, because it's a simple file system operation, and combined with a cached registry of libraries, it's probably invoked occassionally only. (That patch for Wysiwyg API removed all instances but those in editor library definitions.)

- Debatable: Whether to cache this registry in one BLOB, or cache each library separately.

- Debatable: Whether library info should be an object of class DrupalLibrary.

Relevant code from wysiwyg.module:

<?php
  $editors
= wysiwyg_load_includes('editors', 'editor');
  foreach (
$editors as $editor => $properties) {
   
// Fill in required properties.
   
$editors[$editor] += array(
     
'title' => '',
     
'vendor url' => '',
     
'download url' => '',
     
'editor path' => wysiwyg_get_path($editors[$editor]['name']),
     
'library path' => wysiwyg_get_path($editors[$editor]['name']),
     
'libraries' => array(),
     
'version callback' => NULL,
     
'themes callback' => NULL,
     
'settings callback' => NULL,
     
'plugin callback' => NULL,
     
'plugin settings callback' => NULL,
     
'versions' => array(),
     
'js path' => $editors[$editor]['path'] . '/js',
     
'css path' => $editors[$editor]['path'] . '/css',
    );
   
// Check whether library is present.
   
if (!($editors[$editor]['installed'] = file_exists($editors[$editor]['library path']))) {
      continue;
    }
   
// Detect library version.
   
if (function_exists($editors[$editor]['version callback'])) {
     
$editors[$editor]['installed version'] = $editors[$editor]['version callback']($editors[$editor]);
    }
    if (empty(
$editors[$editor]['installed version'])) {
     
$editors[$editor]['error'] = t('The version of %editor could not be detected.', array('%editor' => $properties['title']));
     
$editors[$editor]['installed'] = FALSE;
      continue;
    }
   
// Determine to which supported version the installed version maps.
   
ksort($editors[$editor]['versions']);
   
$version = 0;
    foreach (
$editors[$editor]['versions'] as $supported_version => $version_properties) {
      if (
version_compare($editors[$editor]['installed version'], $supported_version, '>=')) {
       
$version = $supported_version;
      }
    }
    if (!
$version) {
     
$editors[$editor]['error'] = t('The installed version %version of %editor is not supported.', array('%version' => $editors[$editor]['installed version'], '%editor' => $editors[$editor]['title']));
     
$editors[$editor]['installed'] = FALSE;
      continue;
    }
   
// Apply library version specific definitions and overrides.
   
$editors[$editor] = array_merge($editors[$editor], $editors[$editor]['versions'][$version]);
    unset(
$editors[$editor]['versions']);
  }
  return
$editors;
?>

#3

markus_petrux - May 19, 2009 - 07:58

hmm... it seems to me that this approach complicates things too much.

The common problem that needs to be solved, I think, is that modules need to know the location of certain libraries. Users need to know where they can install libraries required by certain modules in a directory that's independent from the module location.

For example, module_a needs library_a:

1) User can install library_a at sites/all/libraries/library_a, sites/domain.1/libraries/library_a, sites/domain.2/libraries/library_a, and so on...

That's easy to explain, easy to manage for the user. And it's similar in concept on how module/theme locations work.

2) All module_a needs to do is libraries_get_path('library_a'), and if it will get the location of library_a the user wants to have enabled for the current site. ie. if running on domain.1, it will get "sites/domain.1/libraries/library_a". if running on domain.2, it will get "sites/domain.2/libraries/library_a". if running on domain.3, it will get "sites/all/libraries/library_a".

3) Libraries don't need .info files. Libraries cannot be nested in subdirectories. Only at "sites/(all|domain)/libraries/library_a".

Isn't it simple and nice?

If that's not enough, then I think further information (and methods) attached to each library will vary too much from one library to another, so that could be implemented by each separate module, independently.

If time and experience is able to tell more common requirements related to library management, that could be added at any time. But now, it seems to me quite complex to accomplish, because we don't know what other libraries/modules need.

#4

sun - July 5, 2009 - 14:12
Version:<none>» 7.x-1.x-dev

Tagging.

#5

markus_petrux - August 12, 2009 - 00:06

For the record: Here's a good reason to keep 3rd party libraries off the modules directories: #546584: allow to exclude folders from drupal_system_listing.

 
 

Drupal is a registered trademark of Dries Buytaert.