It would be very useful to have the cached files distributed among subdirectories. When managing a large site, the number of individual files in a directory may sometimes overwhelm standard unix utilities or scripts (e.g. 'rm').

Perhaps something very lightweight like one- or two- levels of directories, hashed by the first/second character of the filename. This would make the number of individual files in any given directory manageable, even for larger sites.

Comments

crunchywelch’s picture

add this to the install file:

global $file_cache;
$ivalfrom = ord("a");
$ivalto = ord("f");
for($i = $ivalfrom; $i <= $ivalto; $i++) {
  mkdir($file_cache .'/'. chr($i), 0777);
}

for($i = 0; $i<10; $i++) {
  mkdir($file_cache .'/'. $i, 0777);
}

and in cache.fs.inc change this:

function cache_filename($cid) {
  return variable_get('fastpath_fscache_path', FASTPATH_FSCACHE_PATH) .'/'. md5($cid);
}

to this

function cache_filename($cid) {
  $hash = md5($cid);
  return variable_get('fastpath_fscache_path', FASTPATH_FSCACHE_PATH) .'/'. $hash{0} .'/'. $hash;
}

I'm working on some load testing for this scheme on a 4.7 site now, but it should defintely help. Drupal's file scan will break when it hits the kernel memory limit for file descriptors, which also prevents you from doing a manual rm * from the command line. This *will* happen on a busy site with lots of content.

crunchywelch’s picture

Also, we need to ensure the subdirectories are scanned for purging. In the 4.7 implementation I have done this in system.module:

/**
 * Implementation of hook_cron().
 */
function system_cron() {
  global $file_cache, $cache_lifetime;

  // if using file-based caching, perform routine garbage collection
  if ($file_cache && is_dir($file_cache)) {
    // delete all files older than the last call to cache_flush_all()
    if (variable_get('cache_files_expired', 0)) {

      $ivalfrom = ord("a");
      $ivalto = ord("f");
      for($i = $ivalfrom; $i <= $ivalto; $i++) {
        system_file_cache_purge($file_cache .'/'. chr($i));
      }

      for($i = 0; $i<10; $i++) {
        system_file_cache_purge($file_cache .'/'. $i);
      }
    }
    variable_set('cache_files_expired', 0);
  }
}

function system_file_cache_purge($dir) {
  global $cache_lifetime;
  $files = file_scan_directory($dir, '.', array('.', '..', 'CVS'));
  foreach ($files as $file) {
    if (filemtime($file->filename) < (time() - $cache_lifetime)) {
      if ($fp = fopen($file->filename, 'r')) {
        // We need an exclusive lock, but don't block if we can't get it as
        // we can simply try again next time cron is run.
        if (flock($fp, LOCK_EX|LOCK_NB)) {
          unlink($file->filename);
        }
      }
    }
  }
}

This scales nicely, and has not caused the kernel-based exhausted memory error in php on the busy site I am testing this on.

moshe weitzman’s picture

i committed #1 with modifications. will look at #2 shortly.

jeremy’s picture

It would seem logical to have the following cache subdirectories: 'filter', 'menu', and 'page', structuring things like the database.

jeremy’s picture

(Which of course you'd already done, when I finally looked at the HEAD branch.)

moshe weitzman’s picture

Status: Active » Fixed
Anonymous’s picture

Status: Fixed » Closed (fixed)