cache system should delete no longer active outputs
Mgccl - January 9, 2008 - 06:57
| Project: | Mathematics Filter |
| Version: | 5.x-1.1 |
| Component: | Code |
| Category: | feature request |
| Priority: | normal |
| Assigned: | Unassigned |
| Status: | active |
Jump to:
Description
when I change the tex, a new image is generated. but the old one is still presenting in the system. is there some way to get rid of them?
maybe some settings like, during each cron run, delete all the tex image are not shown for over x amount of days.

#1
I could probably delete them, but then I need to keep track of each tex file for every single node in the database, which I think is a significant performance hit. Otherwise you have a 1-2kb file cluttering up your files directory. The cache just checks to see if that particular file has been made already, it doesn't 'remember' which files have been made and which ones have not.
Can you give me a compelling reason why you want those files deleted?
#2
for a site with a lot of equations it:
1. Take up space
2. Too much file in the same directory are slow for some file systems(from what I heard)
I usually write the tex inside drupal, so I have to preview a few times to get it right, so if I generate 1000 equations, I have over 4000 useless ones.
Deleting the equation haven't been accessed(there is last access date in *inx file system right? fileatime() should work) for 30 days is a good idea. So there is no need to keep track of each tex user inputs. Of course, there is a possibility of deleting ones that's still in use(what's the odds of a file not accessed for 30 days? 1 out of 10,000?), but regenerate them again doesn't take a huge amount of CPU, considering most cached files should be visited at least once in while in site with a reasonable traffic.
if fileatime() does not work because the last accessed timestamps are disabled, then this feature can be disabled.
#3
Well I am willing to add this as a feature for the next version, if you are willing to provide a patch. I'm a bit too busy to work on this myself atm.
Dave
#4
Sorry, occurred to me that another alternative is to store the cached files in sub-directories of the mathfilter directory, named by node. Then if a node is deleted, I can delete all of the gif's associated with that node at once. However, this would lead to some minor inefficiencies as you may have 2 or more nodes with the same equation, resulting in duplication of files. Probably this is a good start to the problem. You would end up with a lot less files in each directory, but a lot more subdirectories.
Is this worth doing?
#5
Putting in different folder certainly are are better, but one for each node might be too much. it's easier to get the first character of the hash, and put them in folders like that, so basically... 1/16 of the usual amount. if there is still problem, use the first 2 characters and with 1/256 of the original amount.
writing a deleting feature, sounds tough... maybe I can work on that at summer time.
#6
Another idea: periodically (at cron run), remove cache files that have not been accessed since a certain amount of time. In other words, we delete files that are not used anymore in certain amount of time. It will be easy to implement, and the code can be used later for any other cache-with-file modules. Mmm.... that deserves a new module on its own.
#7
I think the folder per node is the best option right now, since then if a node is deleted, I can remove the folder using hook_nodeapi, which means that at least this module will do a little bit of clean-up.
Also, maybe I need to look using the revision api a bit better, if you revise the page, maybe the old cached images should be removed? The easiest way to do this is to clear the cache for a node on each revision.
Heck, I could even clear the cache for a node on each edit. The cache isn't exactly hard to rebuild, it would be done as soon as you viewed the node.
Does anyone have any opinions on this?