Is there an automated way to erase unused images from server? I mean:
1-) Someone creates a node which contains an image attached in it;
2-) This someone or the admin deletes the node
3-) The image still stored in the server with no use
4-) The admin have to erase the unused images manualy from the server in order to have suficient space to run the site properly. (only 300MB let´s say)

It would be great if a module or something could search for those images and erase it automaticaly (when the node is deleted let´s say). Maybe a query could scan all nodes in the site, get the address of all used images and delete all the others.

If there´s a module or code that already do this plz let me know.

Is anyone else interested in this funcionality? Anyone ever had such problem?

Comments

stevenc’s picture

There are two main challenges with your problem.

1) In real life, there isn't necessarily a one-to-one relationship between nodes and images. i.e. multiple nodes could include the same image. So, you would need to know when ALL nodes are no longer using an image, and that can be very tricky.

2) There are several different modules for handling image uploads. Detecting unused images would depend on how they were created.

Which Image upload module(s) are you using? Knowing that is a first step to solving your problem.

---------------------------------
Steven Wright

Slalom

Artese’s picture

A-)
I'm using these modules to upload images:
1-) imagefield
2-) image

B-)
Maybe a way to salve your number one comment:

A module or code that is triggered manualy or perodicaly which looks at the image address and scan all the nodes in order to check if any of then use it. If theres no nodes using it, delete the image. As soon as the code finds the very first node which uses the image , skips to the next image.
I would be very happy if something like this exists or is doable by module developers (whom are awesome by the way). I don't mind if i have to hit some botton on the admin area which says "erase unused images". As long as i dont need to check image per image manualy, i'm happy!

WorldFallz’s picture

The image and imagefield modules handle images in different ways, but of course what you describe can be done. I don't know of any module that already does this though.

stevenc’s picture

There are several other challenges to trying to automate something like this.

1) Scanning all nodes can be very time consuming. The "Body" field of a node can store 4GB of data - not that you're likely to encounter such nodes, but the point is that they can be quite large.

Now, take the number and size of the nodes that you want to scan, and multiply it by the number of images you are searching for. Since the point of your exercise is to find unused images, your code would have no starting point as to what images ARE used. Therefore, if your site has 100 images and you're trying to find the ones that aren't used, the code would have to look for all 100 of them on EVERY node.

That could be a lot of processing.

2) Images can be in other fields. If you use CCK, you may have fields other than the "body" which can contain images. So, take item #1 and now multiply it by the number of image-capable fields in your content types.

3) If your module is creating thumbnails or other derivatives, you'd need to check for these too. Again, multiply 1 and 2 by this and you've adding more required processing.

4) Modules embed images into nodes in different ways. For example, image assist can use either a token syntax like "[img_assist|nid=40|title=Drupal Association|desc=|link=none|align=left|width=100|height=100]" or regular HTML syntax like "Only local images are allowed..

None of this makes a solution impossible, it's just fairly complicated. The layers of complexity are likely barriers to why there isn't a module to do this already - it would be inordinately difficult to code something that would be effective (and light-weight) enough for general adoption.

I did find a documentation note about finding orphaned images using a View. It may be helpful if you match the situation that it describes:
http://drupal.org/node/588362

Finally, I thought of one other possible technique that could help. If you have access to your web server logs, they should contain every URL called to the server, which should indicate calls to specific image files. If you were to reset the log and spider your ENTIRE site, in theory the log would show you all of the images that were used. This would give you a white list of images to which you can compare a list of all of your stored images, and therefore deduce the unused ones. This idea is a bit of a hack, because not all content would necessarily be called during a spidering of your site (depending on user roles, conditional blocks and Views, AJAX techniques, etc.) so you'd have to be very careful using it.

In other words, the technique could be best used as a starting point to identify images which are DEFINITELY in use, which narrows the list of possible unused images.

---------------------------------
Steven Wright

Slalom

Artese’s picture

OK, its a difficult task and way beyond my programing skills but, is it doable? Is anyone willing to take the challenge (or already woring on something like it)?

Am I the only one with such problem?

varisvitols’s picture

It could be done easier this way...

1. search for all the files and their path against the Files base directory, store results in the DB
2. search for all possible nodes or other records in the database (Upload, Image, FileField, ImageField references, if they make any) for references to all files with previously named extensions (i.e. .jpg, .jepg, .png, .doc, etc...), along with their paths. Store results in the DB again.
3. take each record in the files DB we just created, and compare it against every record in the references DB we created.
4. And then there could be a simple interface to checkbox every unreferenced file and delete the selected ones. (before this operation backup of all files would recommended in case something goes wrong)

This doesn't even have to be a Drupal module, it could be standalone program, maybe...

Do you guys think this would be possible? Anyone has skill and knowledge to do this? I think it might work, but dont have what it takes to get this done yet.

If anyone succeeds, please let us know...

Dharanic’s picture

Can u tell me any one how to set event calender.

stefan81’s picture

that would be a fantastic module! My server is filling up with orphaned images

stevenc’s picture

I came across this module today. It's limited to comparing the {files} table, but it may be a good start to help.

http://drupal.org/project/auditfiles

Note that I haven't tried it out yet, so I'm curious what your results are.

---------------------------------
Steven Wright

Slalom

varisvitols’s picture

This only works for files uploaded with the Upload module.
For ImageField we need something else...

Anonymous’s picture

You can find orphaned files by running the following MySQL query:

SELECT fm.*
FROM file_managed AS fm
LEFT OUTER JOIN file_usage AS fu ON (fm.fid = fu.fid)
LEFT OUTER JOIN node AS n ON (fu.id = n.nid) 
WHERE fu.type = 'node' AND n.nid IS NULL

This returns all files which have no associated node. I'm not sure if it's safe to delete the returned rows and files, probably also depends on your module setup. Only use at your own risk!

(Any word from more experienced Drupal developers on this?)

daviddr’s picture

The query only works for files that have at some point really been used somewhere. I had a browser crash after uploading a bunch of files to a node that wasn't saved yet, resulting in all those files being in the file system and in the file_managed table, but not in the file_usage table. I modified the query to filter out these files as well (fu.type is NULL in this case):

SELECT fm.*
FROM file_managed AS fm
LEFT OUTER JOIN file_usage AS fu ON ( fm.fid = fu.fid )
LEFT OUTER JOIN node AS n ON ( fu.id = n.nid )
WHERE (fu.type = 'node' OR fu.type IS NULL) AND n.nid IS NULL

According to this comment it's safe to set the status to zero.

If there are no references to the file in the file_usage table, and you don't need the file any more, it's perfectly safe to simply delete the line from the file_managed table. Alternatively you could set the status for that file to zero in the table, and let it be deleted automatically on Drupal's cron run. The advantage of doing it that way is that the physical file is also deleted automatically so you don't have to clean up yourself.

I had to wait about 6 hours for cron to do something, the default DRUPAL_MAXIMUM_TEMP_FILE_AGE, as discussed here. The files themselves and the entries in the file_managed table were indeed gone then.

hwasem’s picture

This would be a fantastic module to have! I've pretty new and don't have much expertise yet, but would be happy to test anything in a development environment. I'm in D6 and use IMCE and IMCE Image for image uploads.

ergophobe’s picture

This certainly isn't the comprehensive script/module that folks are looking for, but you can ID images that are in the file_managed table but are not assigned to a node using dro0x soltion from Feb 9, 2012.

This is for the other end of the problem - finding files on the file system that are missing from the file_managed table.

This is really rough and rudimentary and all it does is print the info to the screen. It's just to get a quick look and then manually fix the problem. In my case, it IDed files that are actually present on the server and in the table, but that had an extra slash in the table between category and postname as in:
public://images/category//postname/photo.jpg

(I'm using the filedfield_path module).

So it helps a bit with the audit, but only works in a default situation and does only the most rudimentary checking.


<?php

define('PROTOCOL', 'public://'); // this will usually be public://
define('IMAGE_DIR', '/home/yosemite/public_html/sites/default/files/images');
define('IMAGE_DIR_PREFIX', 'sites/default/files/'); // this may take some tweaking
define('DB_HOST', '127.0.01');
define('DB_NAME',  'db_name');
define('DB_USER', 'db_user');
define('DB_PASS', 'password');

$image_files = find_file_paths(IMAGE_DIR);
sort($image_files);

$missing = check_image_usage($image_files);
$num_missing = count($missing);
$num_files = count($image_files);

preprint($missing, "MISSING - missing $num_missing out of $num_files files");
preprint ($image_files, "ALL IMAGE FILES");


function check_image_usage($image_files)
{
	$link = mysql_connect(DB_HOST, DB_USER, DB_PASS);
	if (!$link) 
	{
		die('Could not connect: ' . mysql_error());
	}
	$db =mysql_select_db(DB_NAME, $link);
	if (!$db) 
	{
		die ('Can\'t use yewe : ' . mysql_error());
	}
	$missing = array();
	foreach ($image_files as $image)
	{
		$query = 'SELECT COUNT(*) as `count` FROM `file_managed` WHERE `uri` LIKE \'' . $image . '\'';
		$result = mysql_query($query);
		if (!$result) 
		{
			die('Invalid query:' . mysql_error());
		}
		$count = mysql_result($result, 0, 'count');
		if(!$count)
		{
			$missing[] = $image;
		}
	}
	return $missing;
}


function find_file_paths($dir, $files=NULL)
{
    $dir_contents = scandir($dir);
	$cwd = getcwd();
	if(!is_array($files))
	{
	  $files = array();
	}

	foreach($dir_contents as $key => $item)
	{
	  $full_path = realpath($dir . '/' . $item);
	  if ($item =='.' || $item == '..')
	  {
//	    print " is current or parent dir";
	  }
	  elseif(is_dir($full_path))
	  {
		$files = find_file_paths($full_path, $files);
	  }
	  elseif(is_file($full_path))
	  {
	    $rel_path = str_replace($cwd, '', $full_path);
		$rel_path = ltrim(str_replace('\\', '/', $rel_path), '/');
		$rel_path = str_replace(IMAGE_DIR_PREFIX, '', $rel_path);
	    $files[] = PROTOCOL . $rel_path;		
	  }
	}
	return $files;
}

function preprint($var, $title='')
{
	print "<h2>$title</h2><pre>";
	print_r($var);
	print "</pre><p> ====>>DONE $title</p>"; 
	
}
TipiT’s picture

This won't work for D6 because there is no file_managed table.

On the other hand this script seems to be a right way to start looking for orphan files.

WillGFP’s picture

Here's the code I use to delete all unused files in D7:

//db_query to find all files not attached to a node:
$result = db_query("SELECT fid FROM file_managed WHERE NOT EXISTS (SELECT * FROM file_usage WHERE file_managed.fid = file_usage.fid) ");

//Delete file & database entry
for ($i = 1; $i <= $result->rowCount(); $i++) {
  $record = $result->fetchObject();
  $file = file_load($record->fid);
  if ($file != NULL) {
    file_delete($file);
  } }
mauriziopinotti’s picture

I would fix the above query like this to make it work with db prefixed tables:

SELECT fid FROM {file_managed} m WHERE NOT EXISTS (SELECT * FROM {file_usage} u WHERE m.fid = u.fid)

---------------

Mobimentum ||| let your apps run, be in Mobimentum |||

Shyghar’s picture

Thank you for your code ^^

drupalninja99’s picture

Why wouldn't you do a LEFT JOIN and then add a WHERE clause 'WHERE file_manage.fid IS NULL'? Seems easier to me.

jobjol’s picture

This code will work when nodes are deleted but the corresponding file/image fields not:
Change $force to TRUE if you also want to delete files that are flagged as "in use" while the corresponding node does not exists.

<?php
$force = FALSE;
//db_query to find all files not attached to a node:
$result = db_query("SELECT fid FROM file_usage WHERE file_usage.type = node AND NOT EXISTS (SELECT * FROM node WHERE file_usage.id = node.nid) ");

//Delete file & database entry
for ($i = 1; $i <= $result->rowCount(); $i++) {
  $record = $result->fetchObject();
  $file = file_load($record->fid);
  if ($file != NULL) {
    file_delete($file, $force);
  } }
?>
aloknarwaria’s picture

Please follow the below URL to delete the un used files:
https://www.drupal.org/sandbox/osopolar/2470465

knalstaaf’s picture

I'm on a D7.34 installation, and when I delete nodes, the related images are gone form their proper files folder as well (e.g. files/news-images), even without running cron. Same goes for the styles-folders, they're being removed from there as well.

osopolar’s picture

I created a sandbox module called File Delete for this, which provides the drush command drush file-delete-unused --all to delete all files reported in file_managed which aren't in file_usage. It deletes the file itself and the database-entry.

labboy0276’s picture

I made this for a university project awhile back After reading this, I put it up on DO for the world to have, similar to your project osopolar:

https://www.drupal.org/project/fancy_file_delete

John Ouellet
Sales Engineering Manager
https://thinktandem.io/

osopolar’s picture

I looked at your module, it seems that it deletes files that are still registered in file_usage table but their nodes does not exists anymore, right? Normally if the node gets deleted the entry should also be removed from file_usage table, if not I guess something is not working well on the node deletion. My module only checks if there are files in file managed that are not used anymore (not in file_usage).

labboy0276’s picture

Correct. We use this on a couple different clients. Most clients will have a few thousand remnant files when we run this deletion. It usually appears when you have some sort of node version control, workbench, revision, etc etc.

I did try your code and it seemed way too high of a number to delete, so I didnt go through with it. On one site we had 25k+ files, 19k in file usage, but your code wanted to delete 14k rows. That didnt seem right to me.

John Ouellet
Sales Engineering Manager
https://thinktandem.io/

Valter Bengtsson’s picture

Hi, I would like to use the options for deleting Unmanaged and/or Orphaned files with Fancy File Delete, but I can't figure out how to use it. After installing the module I have one tab for INFO and another for MANUAL. How do I use the options for Unmanaged and Orphaned?

drilauri’s picture

I had the same problem until I manually flushed my caches. Of course now I have a problem with insufficient permissions, but looks like it's related to VBO.

aloknarwaria’s picture

Please refer the below link to delete the unused files in drupal 7 using drush command.
https://www.drupal.org/sandbox/osopolar/2470465

GiorgosK’s picture

This module looks promising
https://www.drupal.org/project/fancy_file_delete
I am about to try it

------
GiorgosK
Web Development

Geus’s picture

Hi GiorgosK, did you manage to make it work?

I tried to use it but i get a lot of unmanaged files, but when I want to delete them, after the batch is complete, it says that nothing was deleted.

dr.admin’s picture

I wrote post about unused files on Drupal sites. This simple script allows to find all unused images on site.

Perhaps it will be useful to someone.

--
Drupal sites and server support - https://drupal-admin.com

mohsinkhanmca’s picture

$result = db_query("SELECT fm.*
FROM {file_managed} AS fm
LEFT OUTER JOIN {file_usage} AS fu ON ( fm.fid = fu.fid )
LEFT OUTER JOIN {node} AS n ON ( fu.id = n.nid )
WHERE (fu.type = 'node' OR fu.type IS NULL) AND n.nid IS NULL
ORDER BY `fm`.`fid`  DESC");

//Delete file & database entry
foreach ($result as $delta => $record) {
     file_delete($record->fid);
}
seb-ksl’s picture

Thank you so much :-)

Nick Hope’s picture

Novice question: Where/how does one run this code? I'm running Drupal8 with XAMPP.

weweblog’s picture

I don't know how to modify and run this code either. Did you find how-to later on?

mmjvb’s picture

You might use devel => Execute php code.

Or you can put it in your own module. As long as it executes in a drupal environment because it uses drupal code.

weweblog’s picture

It works.

dakruchko’s picture

how does it work? file_delete accepts file object, not file ID

mmjvb’s picture

./web/core/includes/file.inc:function file_delete($fid) {

./web/core/includes/file.inc:function file_delete_multiple(array $fids) {

It is deprecated in D8.7 however, to be removed.

dakruchko’s picture

thanks i see now, just did not notice that this is D8:)

adechiaro’s picture

I have orphaned files on a site that I'd like to remove and this module seems like it would work, but it requires the FID of the files. I don't know where to get this from, and I don't have access to the database. Is that the only way that this could work? Are there other options? Thanks!

adechiaro’s picture

Never mind! I didn't see the views that were created.