create separate directories for each cache bins to overcome number of files in a single directory and increase performance.

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

W32’s picture

this is a good proposition.

vinoth.3v’s picture

Status: Active » Needs work

adding these lines to filecache.inc works great to me.


  function __construct($bin) {
	$this->bin = $bin;
	$filecache_directory = filecache_directory();
	$t = get_t();

	$this->directory = $filecache_directory .'/'. $bin ;

	if (!file_exists($this->directory)) {
		if (!mkdir($this->directory, 644, TRUE)) {
			// handle error here...
		}
	}
	$this->prefix = $this->directory . '/';
	//.....
lismail’s picture

It works with me too. However, in the long run I think better to have it done on settings.php, just like cacherouter or memcache module do.
I will try and share it.

W32’s picture

This proposition is not compatible with filecache_cron() function.
Does anybody know where DrupalFileCache::delete_expired() is called ?

W32’s picture

Dear ogi,
can you realize this proposition ? Because, if your module installed with entitycache module than filecache folder is very huge! Subdividing it on sub folders by bins is good solution.

ogi’s picture

Assigned: Unassigned » ogi

Ability to have separate directory for any cache bin is definately needed. I didn't consider as important so I postponed it but it looks like beta2 have to include it, and it will.

roderik’s picture

Status: Needs work » Needs review
FileSize
7.34 KB
7.15 KB

I dived into the code to see how to do this best, and ended up introducing some helper functions, and getting rid of the 'prefix' property & all 'chdir' commands. This works for me.
(I guess it's functionally equivalent to what is proposed above, but I needed to dive into the code for some other stuff, which is coming up later.)

But while looking at this, I also discovered a little bug. The clear() command will currently also delete cache entries from other bins, since everything is in the same directory. This patch fixes the bug.
---

There's a separate patch introducing only whitespace changes, to make it adhere to coding standards. I don't want to be pedantic, but I was afraid that otherwise my patch would not be readable because of too many lines.

roderik’s picture

FileSize
15.53 KB
7.12 KB

....and I thought that this still would not be enough. Because e.g. the directory for the cache_field bin will still be huge.
Actually, that should be a multi-level directory with ':' as the separator.

But.... that doesn't work well for other cache bins, which have relatively few entries. It's not useful to split those into a multi-level directory.

So... I hacked different 'encoding methods' into the encode_cid() function. Plus a set_encoding() method, which defaults to setting varying encoding methods for different bins.

See 1211166-7-3.patch.
1211166-7-ALL.patch is patches 1/2/3 rolled into one, in case someone just wants to update their sources to the latest work I did.

roderik’s picture

FileSize
20.98 KB
12.76 KB

Here's an extension of what I posted in #8.

This adds more comments, and more 'encoding methods' - which are used by default for cache_entity_* bins. (The site which I'll install this on, has a lot of comments/nodes, and I'm afraid of storing them all in one flat directory.)

(FYI: I provided a default for 'cache_entity_comment' which seems sane to me for most sites, but I want something else on my site - and successfully tested the hook_init() override which is documented in the patch's comments.)

Status:

  • If you trust my code and just want to test, apply 1211166-9-ALL.patch only. I'll be using it on a live site from tomorrow.
  • If you want to review code and/or apply separate commits to this module's repo, then
    • take the two patches in #7 first. They are code cleanup/bugfix, and take each cache bin's files into its own separate directory, which is what this issue is originally about. 1211166-7-2 also changes some characters in the stored filename (no more ':') - and is ready to be committed IMHO.
    • After this, you can apply 1211166-9-3 which implements different 'encoding schemes', a.k.a. ways to change a cache bin from a flat directory in a tree, in different ways. IMHO this isa a necessity for e.g. entity cache on a large site.
  • While all is ready for review (and use on my production site), I just realized there's one thing missing from 1211166-9-3. After a wildcard clear(), files are deleted but empty directories aren't. Not a biggie, but that should be implemented. I don't have time for it at this moment, though.
SocialNicheGuru’s picture

Error when apply patch

git apply 1211166-9-ALL.patch
1211166-9-ALL.patch:48: space before tab in indent.
$hint = $t('%dir is a directory but PHP cannot write to it.', array('%dir' => $dir));
1211166-9-ALL.patch:313: space before tab in indent.
return FALSE;
1211166-9-ALL.patch:346: space before tab in indent.
($cache_flush && $cache->created < $cache_flush))) {
1211166-9-ALL.patch:399: space before tab in indent.
$this->delete_expired();
1211166-9-ALL.patch:406: space before tab in indent.
$this->delete_expired();
error: patch failed: filecache.inc:12
error: filecache.inc: patch does not apply

roderik’s picture

Status: Needs review » Needs work

Ah, more whitespace errors. ogi uses tabs in his file and sets his tab width to 8, which does not work well for my editor (or Drupal Coding Standards). I thought I squashed all tabs in 1211166-7-1.patch, but didn't.

I'll take care of that later. For now, use --ignore-whitespace flag for git apply.

Also:
I discovered filecache_cron() is 'broken': since cache bins are now in their own directories, and it checks the 'main' dir, files never expire. You can still apply the patch if you don't mind that. I'll try to get to this, soon.

SocialNicheGuru’s picture

still doesn't apply

if I use git apply I get the same error as in #9

patch -p0 --ignore-whitespace < 1*ALL*patch
can't find file to patch at input line 5
Perhaps you used the wrong -p or --strip option?
The text leading up to this was:
--------------------------
|diff --git a/filecache.inc b/filecache.inc
|index 39d1c9d..3757724 100644
|--- a/filecache.inc
|+++ b/filecache.inc
--------------------------
File to patch: ./filecache.inc
patching file ./filecache.inc
Hunk #1 FAILED at 12.
1 out of 8 hunks FAILED -- saving rejects to file ./filecache.inc.rej

ogi’s picture

roderik, thank you for all your efforts :-) Separate directories are in the TODO list since the beginning.

I'm thinking about optionally adding a new directory level like /var/spool/postfix queues.

roderik’s picture

Swamped now.
I'll have some new patches for review soon. Planning to take the "separate styles of directory structures" (a.k.a. patches with a -3 behind them) into a separate issue, because it's gotten larger than I thought.

Danny Englander’s picture

Title: create ceparate directories for each cache bins » Create separate directories for each cache bins

I tested the "all" patch in #9 now and indeed File Cache now creates separate directories but I am still plagued by this error which I am sure is probably something different. Note - fixing typo in issue title.

ogi’s picture

I just want to point out that splitting into directories will decrease performance instead of increasing it. All modern file systems implement directory hashing and it's faster to lookup in one giant directory than in many smaller directories. Of course, for humans it's still more convenient to have this split so it will be part of filecache.

roderik’s picture

Status: Needs work » Needs review
FileSize
11.79 KB
13.32 KB

New patches.

Since this has gotten bigger, I will split out the '-3' portion into a separate issue. Remark #16 can be discussed there. This issue then will be purely what the title said: one different directory per cache bin. This is definitely necessary, because right now the different bins are (potentially) overwriting and deleting each other's files!

This patch has grown larger since #9... in hook_cron(), expired files were just deleted from the one large 'global' cache directory. Since this is not the case anymore, hook_cron() should be able to know which cache directories are actually being used. And Drupal Core doesn't implement a 'registry' of cache bins.
I could have gone for "assume all directories within the 'global parent' are cache bin directories", but it seemed better to implement a kind of 'registry' in the FileCache module.

The deletion code used by hook_cron() has been moved into DrupalFileCache::delete_expired(). Note that followup patches will remove more code duplication (by removing delete_flushed()) - but this was not part of the functionality for this issue.

Rehash:

* patch 1 makes the code adhere to Drupal coding standards because otherwise patch 2 would get unreadable. But I also added something else: watchdog logging for delete_expired() and delete_wildcard(). This should be useful for testing stuff later on. delete_flushed() is unfortunately called way too often to add the watchdog call to.

* patch 2 is the actual patch that fixes the issue. It also has some other general things: checks for empty $bin on __construct() and for empty $cid on get() / set(). I encountered these several times while testing.

roderik’s picture

FileSize
11.85 KB
13.54 KB

...and some minor code reshuffling that would otherwise needlessly end up in #1488870. I'm getting really tired of working with 4 interdependent patches simultaneously...

roderik’s picture

The '-3' patch from #9 has been moved to #1488976: Possibility to have directory trees.

ogi’s picture

PHP files are untabified in d3a4614.

SocialNicheGuru’s picture

Status: Needs review » Closed (fixed)
vinoth.3v’s picture

Are you sure?
last commit on 2012-07-25 ?

vinoth.3v’s picture

Status: Closed (fixed) » Needs work
SocialNicheGuru’s picture

Check the link. It's in the commit log

vinoth.3v’s picture

Issue number is there, Not the patch!

4kant’s picture

Issue summary: View changes

I´m having 75 GB of data in the cache directory.
Since this issue is pretty old: has there been any progress - am I missing something? Perhaps any configuration steps?

Thanks!

cockers’s picture

Will this feature be committed?

  • ogi committed 2af5f7b on 7.x-1.x
    Issue #1211166: Always use separate directory for each cache bins....
ogi’s picture

Status: Needs work » Fixed

Status: Fixed » Closed (fixed)

Automatically closed - issue fixed for 2 weeks with no activity.