Problem/Motivation

Core's file fields have very hard-coded assumptions that if a file is no longer used in a file reference field, then it should be deleted. This is because there's no way to have any kind of media "library" like http://drupal.org/project/media. Once we shift into a library sort of UX then auto-deleting files becomes unacceptable since it's perfectly valid for those files to sit un-used and then down the road re-used without having to re-upload the file. We are dealing with so many bug reports and issues in the Media issue queue due to this "feature" in core. It would be nice if the Media module could force this behavior to be optional.

Also this behavior depends on files being tracked with usage anywhere they can, while the file usage is only meant for entities. We are unable to add usage for media embedded into WYSIWYG for one big reason: filters lack context about the entity that has the text field being changed/viewed.

Proposed resolution

We propose to make file cleanup optional by introducing a configuration variable and UI configuration interface that:
1. Makes people aware of cleanup being a feature of Drupal core
2. Allows them to enable/disable it

Remaining tasks

- create variable
- create UI configuration

User interface changes

- adds new form element under admin/config/media/file-system

API changes

- no API changes

CommentFileSizeAuthor
#316 cleanup-files-1399846-316.patch18.38 KBzeyus
#315 cleanup-files-1399846-315.patch18.24 KBjyraya
#310 cleanup-files-1399846-306.patch17.48 KBPol
#306 cleanup-files-1399846-306.patch18.22 KBPol
#291 cleanup-files-1399846-291.patch18.39 KBpfrenssen
#286 cleanup-files-1399846-286.patch18.2 KBcathsens
#285 cleanup-files-1399846-283.patch18.2 KBcathsens
#282 cleanup-files-1399846-282.patch18.39 KBcathsens
#280 interdiff.txt930 bytespfrenssen
#280 1399846-280.patch18.39 KBpfrenssen
#278 interdiff.txt853 bytespfrenssen
#278 1399846-278.patch18.39 KBpfrenssen
#275 interdiff.txt3.11 KBpfrenssen
#275 1399846-275.patch17.55 KBpfrenssen
#263 cleanup-1399846-263.patch14.61 KBkmoll
#263 interdiff.txt986 byteskmoll
#259 cleanup-1399846-258.patch14.3 KBkmoll
#259 interdiff.txt2.19 KBkmoll
#257 cleanup-1399846-257.patch12.11 KBkmoll
#257 interdiff.txt846 byteskmoll
#256 interdiff.txt621 bytesgarphy
#256 cleanup-1399846-256.patch11.28 KBgarphy
#248 cleanup-1399846_248.patch11.28 KBDevin Carlson
#241 cleanup-1399846_241.patch15.61 KBsaltednut
#241 interdiff.txt2.57 KBsaltednut
#240 cleanup-1399846_240.patch15.64 KBsaltednut
#240 interdiff.txt1.31 KBsaltednut
#236 cleanup-1399846_236.patch15.69 KBmartin107
#233 1399846_233.patch15.7 KBvisabhishek
#233 interdiff-1399846-231-233.txt922 bytesvisabhishek
#231 1399846_231.patch15.68 KBvisabhishek
#227 1399846_227-interdiff.txt3.19 KBBerdir
#227 1399846_227.patch17.94 KBBerdir
#225 1399846_225-interdiff.txt2.91 KBBerdir
#225 1399846_225.patch17.86 KBBerdir
#223 1399846_223.patch17.15 KBBerdir
#213 1399846_213.patch17.84 KBSteffenR
#204 1399846_204.patch18.08 KBDavid_Rothstein
#204 interdiff-203-204.txt1.1 KBDavid_Rothstein
#203 1399846_203.patch18.08 KBcweagans
#203 interdiff.txt1.81 KBcweagans
#188 1399846_188.patch17.4 KBslashrsm
#188 interdiff.txt1 KBslashrsm
#173 1399846_172.patch17.41 KBslashrsm
#173 interdiff.txt5.39 KBslashrsm
#168 1399846_168.patch17.16 KBslashrsm
#168 interdiff.txt726 bytesslashrsm
#159 1399846_159.patch17.16 KBslashrsm
#159 interdiff.txt1.79 KBslashrsm
#157 1399846_157.patch17.26 KBslashrsm
#157 interdiff.txt4.03 KBslashrsm
#152 1399846_151.patch17.5 KBslashrsm
#152 interdiff.txt6.9 KBslashrsm
#137 1399846_137.patch16.62 KBslashrsm
#137 interdiff.txt4.07 KBslashrsm
#132 1399846_132.patch16.31 KBslashrsm
#132 interdiff.txt5.17 KBslashrsm
#129 1399846_129.patch13.06 KBslashrsm
#129 interdiff.txt5.85 KBslashrsm
#122 1399846_d8_temporary_files_age_fixed.png12.78 KBtsvenson
#121 1399846_121.patch9.86 KBslashrsm
#121 interdiff.txt787 bytesslashrsm
#120 1399846_d8_temporary_files_age.png6.76 KBtsvenson
#119 1399846_119.patch9.86 KBslashrsm
#119 interdiff.txt1.46 KBslashrsm
#117 1399846-temp-file-delete-window-116.patch9.74 KBGábor Hojtsy
#117 interdiff.txt4.63 KBGábor Hojtsy
#115 1399846-temp-file-delete-window-115.patch4.47 KBGábor Hojtsy
#113 interdiff.txt2.61 KBGábor Hojtsy
#113 1399846-temp-file-delete-window-113.patch4.57 KBGábor Hojtsy
#110 interdiff.txt2.87 KBGábor Hojtsy
#110 1399846-temp-file-delete-window-110.patch4.57 KBGábor Hojtsy
#106 1399846-temp-file-delete-window-106.patch4.47 KBGábor Hojtsy
#87 1399846-temp-file-delete-window-87.patch4.48 KBBerdir
#87 1399846-temp-file-delete-window-87-interdiff.txt4.2 KBBerdir
#70 1399846-temp-file-delete-window.patch6.19 KBDave Reid
#45 LOLwalrus - Mah Bucket.jpg79.99 KBDamienMcKenna
#38 interdiff.txt843 bytesslashrsm
#38 make_file_cleanup_optional_1399846_38.patch2.55 KBslashrsm
#18 interdiff.txt404 bytesslashrsm
#18 make_file_cleanup_optional_1399846_18.patch2.48 KBslashrsm
#15 make_file_cleanup_optional_1399846_15.patch2.48 KBslashrsm
Support from Acquia helps fund testing for Drupal Acquia logo

Comments

tsvenson’s picture

Priority: Normal » Critical

From a usability point I grade this as critical since normal users have very little control over this unless they are extremely careful in what they do.

Worst case scenario is this:

  1. Someone has used the multi-import feature of images provided by Media/File Entity and added x files to the library.
  2. Someone else then browse the library and selects one of the images for a node they are working on.
  3. However, they quickly change their mind and removes it before even saving the node.
  4. Bang, the image is then removed both from the node and the file system!

Also +1 for backporting this to Drupal 7.

Dave Reid’s picture

Priority: Critical » Major

Let's use major then. This doesn't cause Drupal to WSOD or not install, which is what critical is reserved for in the core issue queue.

sun’s picture

Status: Active » Closed (works as designed)

You're dealing with those bugs, because Media module allows to upload/import (managed) files without registering their usage in any way.

The file usage API was deliberately added to D7. It is required to figure out when a managed file is obsolete.

You will not be able to reproduce this bug with core functionality.

This means that Media module wants Drupal to manage files without letting it know how or where the managed file is used. Thus, Media module should either not use managed files for its purpose, or it should properly register their usage. (For the latter, see #1268116: WYSIWYG does not record file usage and some more issues in the Media queue.)

EDIT: If you treat "files like nodes", then you assume that there's a defined entity in the system. If the file itself is the entity, then the entity itself declares a file usage.

Dave Reid’s picture

Status: Closed (works as designed) » Active

Hell no, this is not working as designed because core treats files differently from any other entity types, which is wrong. Please allow us to provide a patch and discuss.

sun’s picture

Category: task » feature

This definitely works as designed. The issue you're experiencing in Media module's #1268116: WYSIWYG does not record file usage is a unfortunate effect of telling the system to manage a file without also telling it that you're using it.

We can surely discuss to change the garbage collection of File API for managed files, but the essential question that needs to be answered is how you're going to perform garbage collection in a system that doesn't know whether a file is used in any way or not.

If the site administrator needs to decide, then we need a full blown user interface for (manual) garbage collection of orphan files.

But, again, even if we'd do that, then that is module behavior, not API-level behavior. Not every possible managed file in Drupal needs or is supposed to be managed as if it was part of a file library that never forgets.

rickvug’s picture

I think I'm on the same page as sun is here. I wrote down some ideas on the way forward here: http://drupal.org/node/1239056#comment-5491500.

tsvenson’s picture

My biggest reason for that this needs to be changed is that it leaves the site owner, administrators, content editors and users without any kind of choice and control over this. We have already seen far to many reports from users being hit with this problem filing bug reports and so on.

I have nothing against Drupal having cleanup routines, but in this case its execution is without control by anyone but Drupal itself.

This needs to be optional and per default turned off so the site administrators can make the choice of how it works!

drzraf’s picture

It's a shame it hasn't been fixed. It's a major bug : you can't manage files without having Drupal messing around with them, don't you care about user data ?

Just a sample of what happens now:
A content-type has 2 fields: 1) field_file ; 2) field_image (notice the order)
$node n°460 field_image's is still empty, field_file has 1 (or more) element(s).

$node = node_load(460);
// switch the first file to the image field
array_push($node->field_image[LANGUAGE_NONE],
           array_shift($node->field_file[$node->language] ) );
node_save($node);

Believe it or not : you'll never fill field_image because the file has been removed from the filesystem, {file_managed} and {file_usage} before file_field_update() get called about field_image.

I once saw the proposal of a cron/script to clear orphaned files. It's probably the way to go (if it's not black magic) : Drupal must not delete files without an explicit decision of the users.

[edit] workaround to hijack this buggy behavior which remove user files (can be used as a cron or a hook):
UPDATE {file_usage} SET count = count + 100 where count < 100;

tsvenson’s picture

@drzraf: A better workaround is to use the http://drupal.org/project/file_lock module. Then you can have it to own every file and thus never risk them getting to 0.

drzraf’s picture

someone else got beaten too. Hopefully this will be finally fixed, ref: #1141912: Changing node language by moving down language list deletes files in file field

Dave Reid’s picture

Title: Make the file deletion in file_field_delete_file() optional » Make unused file 'cleanup' optional

The problem for Drupal 8 now seems to reside in FileUsageBase::delete(), there at least doesn't seem to be anything in file.field.inc that specifically deletes a file when it's removed, it only removes usage.

  /**
   * Implements Drupal\file\FileUsage\FileUsageInterface::delete().
   */
  public function delete(File $file, $module, $type = NULL, $id = NULL, $count = 1) {
    // If there are no more remaining usages of this file, mark it as temporary,
    // which result in a delete through system_cron().
    $usage = file_usage()->listUsage($file);
    if (empty($usage)) {
      $file->status = 0;
      $file->save();
    }
  }
Dave Reid’s picture

I had a really great idea on how to solve this for D7, by adding a setting to the file field itself for 'delete_unused' that defaults to TRUE, so you could disable this behavior in the UI for your individual fields. This would be a fully-backwards compatible solution that wouldn't break existing site installs, but the downside is that the solution would not apply to Drupal 8 since the logic has moved to the file usage service.

Dave Reid’s picture

Possibility for D8: Adding a config option checkbox on admin/config/media/file-system for auto-removing files

Dave Reid’s picture

I guess another option would be to add a parameter to FileUsageInterface::delete() for not setting files back to temporary might be a good option. This way we could pass in the setting from D8 file's usages of file_usage()->delete().

Thoughts? Options?

slashrsm’s picture

Category: feature » task
Status: Active » Needs review
FileSize
2.48 KB

I agree that we have an unnecessary assumption here. Nobody says that we do not need file usage tracking. Of course we need to know if some file looses all references pointing to it. Completely deleting it from the system without letting site's user/owner know it is a different thing.

Attached patch still leaves clean-up functionality in core but makes it optional using a configuration variable.

I do not think this is a feature request. It just resolves huge UX issue that a lot of people experience by allowing them to decide on whether to preform automatic clean-ups or not. We had a long discussion about this at today's code sprint in Portland and everybody agreed with that.

aspilicious’s picture

I think it's bets to set it to 1 and modules like media can change it it 0. :)
That way the current behavior doesn't change.

hass’s picture

Status: Needs review » Needs work

Default should be cleanup = 1

As long as I do not have any software like ICME or media installed that allow a user to select a file for attachment it makes no sense to keep files that cannot found nor reused in any way. I do not like to get my disk filled with garbage by default. Most files are never reused anywhere. I also teach my users to delete unused files to free diskspace. We cannot keep all garbage for all times.

slashrsm’s picture

Status: Needs work » Needs review
FileSize
2.48 KB
404 bytes

Make sense. Here we go.

tsvenson’s picture

No no no, this option should not be enabled as default!

Only yesterday at DrupalCampl Gothenburg I mentioned this problem in a discussion with two random Drupal users. Neither of them where aware of Drupal automatically deleting files. After I had given them a description of when this is happened, both quickly realized they have had client sites this has happened to and that had not only caused them, and their clients, a lot of pain neither had been able to track down the reason for it.

Both are very skilled Drupal developers an both agree 100% that it is a totally crazy default behaviour.

Thus far no one has been able to give me even a hit of a good use case/argument to why Drupal shall delete files automatically.

This must be turned off by default!

From my understanding of update hooks, it should be easy to add one that preserves the current behaviour when updating existing sites. Thus only new installations will have this turned off by default.

+++ b/core/modules/system/lib/Drupal/system/Form/FileSystemForm.php
@@ -68,6 +68,13 @@ public function buildForm(array $form, array &$form_state) {
+      '#description' => t('Unused files will be deleted from the system if checked.'),

This text is way to vague. Suggest something like this instead:
"Warning: When enabled Drupal will automatically, and permanently, delete files from the system when their use count reaches zero"

Would it be possible to also add a warning triangle for it?

Particularly for the backport to Drupal 7 there introduction of this option will need to be addressed in the release notes.

hass’s picture

@tsvenson: Why are do you like to keep files on a disk if you cannot reuse them? This is a no-brainer without Media or ICME. Managed files that I have uploaded via node forms must be deleted by default and NOT kept on disk. Nobody can access nor reuse them after the node where these files are attached has been deleted. It only wastes server disk (stale files).

tsvenson’s picture

@hass: Files are content and Drupal is not qualified to make decisions about what content a site owner or user want to keep or not. If you want this behaviour then you can simply click the checkbox and turn it on. But having this behaviour out-of-the box is just insane!

Particularly since a very small minority of Drupal users actually are aware of how this works. When they become aware of it the accident has already happened and in most cases they at first will have absolutely no idea why their files have disappeared.

So, please explain to me why it is a good thing to have such a bad default behaviour!

plach’s picture

IMHO both @hass and @tstevenson have a point here: in stock Drupal there is no way for a file to get more than one usage per node, hence if you remove it from the node or delete the node itself there is no way to reference it again, unless Media is subsequently installed. However this last "unless" makes me think that probably the correct default should be keeping files. AAMOF if one decides to install Media after deleting some nodes she can recover the orphaned files, while if we default to remove them this would be impossible. By defaulting to the "keep" behavior, more resource-caring site builders can make an aware choice and switch to "delete", but this won't affect unaware users.

hass’s picture

I'm missing an answer to my main question. Why do you like to keep files on a disk if you cannot reuse them?

I have zero plans to install media module nor icme or any other. I believe you cannot manage a large site with 5.000.000 files with media module or icme. I guess you only talk about extreme small sites with only a very few files (<100-1000) Every file I'm uploading manually is also located on my local machine and can be uploaded again; if required. I do not like to see millions of stale files on my server disk. If I delete a node I expect it is really deleted including all useless attachments. *Not* like facebook - only hidden in a black hole. If I delete, this is intentional! Otherwise I create a new revision and all is kept for rollbacks. There are a lot of sites that have no need to keep files (user generated content). Other sites that require Media module may auto-enable the keep all files forever feature, but this are really only edge case sites from my point of view and not the majority.

plach’s picture

I'm missing an answer to my main question. Why do you like to keep files on a disk if you cannot reuse them?

To avoid finding myself screwed if I change my mind later :)

Other sites that require Media module may auto-enable the keep all files forever feature, but this are really only edge case sites from my point of view and not the majority.

I don't think large sites out there are a bigger number than small sites, my proposal was assuming exactly the opposite. That said we'd need actual numbers to support a decision based on this argument :)

hass’s picture

How are you manage to prevent that users nervertheless upload everything 10 times to different folders? :-)

For ages I wonder that the filesystem on d.o does not collapse... Or there is no backup? 1 folder with millions of files inside... But that's a different story, too.

Aside, if a button is named Delete I expect that it really does only this. If we do not clean the filesystem we also need conditional buttons. E.g. "Detach file" and otherwise "Delete" in all forms.

rooby’s picture

I would agree with hass on this one.

Default drupal behaviour doesn't re-use files, so I would want and expect files to be gone when they are not used. I also don't want unused files on the server.

What if someone has a file with sensitive information and they attach it to a node.

They delete the node and expect that the file is also gone, but the file with all the sensitive information is still lying around on the server forever and they don't realise.

Now if you have proper access control not many people can see it, but it is still a bad situation.

I think once you install something like the media module then that module should display a message to the user then the module is installed, to tell them to review that setting.
Alternatively that module could even change the setting and give the user a message to tell them that has happened and they can change it back if they like.

I also agree with what sun said earlier, that the file entity module should be recording file usage.
Then that file would remain on the file system until the user actually deletes the file entity.
That would make perfect sense.

It makes sense because file usage is usage of the file itself, not usage of the file entity.
If we want usage of the file entity that should be something separate that is handled by the file entity module.

tsvenson’s picture

@plach:

in stock Drupal there is no way for a file to get more than one usage per node

I'm not too sure about that. From the explanation I have got from those who have looked at the code involved each revision a file is used in as also adding a +1.

So, take this example:

  1. I create a new node and add a file to it.
  2. File usage is now 1
  3. I edit the same node, remove the file and save a new revision.
  4. File usage is still 1 even though the file isn't displayed at all.

Similarly, for every revision I keep the file, it will add to the file usage.

The main problem I have with the current default behaviour in Drupal 7 and 8 is that it was designed from a purely technical point of view where all sorts of wrong assumptions was made. Starting with - "When file usage is 0 the file is not used and should be deleted".

This was also poorly documented and practically no awareness about this was created for the user base. The fact there is no option to turn this off reaffirms that.

This is, for me, similar to garbage collection in various programming languages. Its used to free up memory, or storage in this case. The garbage collection in those languages are removing waste that is generated from a source. The source will still be there and when needed the application can recreate things with it.

However, here Drupal isn't removing the waste - it is removing, without pardon, the original content.

And it is doing it in secret without anyone knowing about it.

This is a prime example of why we need to have a proper UX strategy in the community. Then we could have easily looked at several scenarios from a USE perspective and been able to identify better what sites needs and what the default behaviour should be.

This is also a prime example where we are first building the functionality and API's and then we try to figure out how to build the UI and UX to fit those.

That is - We build the UX/UI to fit the API's, the machine, and not the humans who are going to use it.

Last time I checked it isn't the machines you need UX or UI's

So, from a UX perspective there is no other conclusion that this behaviour has to be disabled by default. Then those who want it can simply turn it on and the rest wont experience any nasty, and costly, surprises!

plach’s picture

I almost completely agree with #27. I was just saying that @hass has a point when he says that in stock Drupal you have no way to reuse a file and so there is no point in keeping it since it can no longer be referenced. Things change completely when you install a module like Media letting you reference orphaned files. Overall I'd support a more conservative approach here, if we leave the ability to opt-out from it. This way people building complex sites, which are arguably more experienced, can do a conscious choice and decide to always remove orphaned files.

rooby’s picture

@tsvenson:

Even with revisions that is still a one to one relationship with a file and a node with stock drupal.
It is still one piece of content and when you delete it usage goes to zero and the file is gone.
And it should be gone with stock drupal because there is no way to reuse that file from that point, and if you upload it again you have duplicate files on the server.
Then you have users not realising that their file name now has a number appended to the end of it so they accidentally link to the old version of the file (which in some situations can be a very bad thing), which they now have no way of removing or changing.

From plach's comment:

... Things change completely when you install a module like Media letting you reference orphaned files ...

Exactly, so why is it not the job of such modules to set the value to what suits them best?

Why have a bunch of orphaned files lying around on sites that doesn't use an enhanced file handling module and whose users don't understand that they should undo the default setting on stock drupal if they aren't using such modules?

On a stock drupal site, I cannot think of a reason why you would want orphaned files on the server that can't be reused, changed or removed (by the content administrator).

slashrsm’s picture

I understand arguments from both sides here and everybody has a point. I, however, agree more with @hass here. Since Drupal has no central file management I see no point in preservation of files that have lost all usable references.

The fact that this patch introduces UI configuration for this feature will also make people aware of this behaviour. People not knowing about Drupal delete-ing files is the biggest issue here, IMO. Your files disappear and you have no idea what is going on. This should be fixed with this patch no matter which behaviour we decide to be default.

I also agree with @tsvenson about description below configuration check-box. We should make it visible and informative enough to make everyone aware about this behaviour. Please suggest description's content.

IF we back-port this to D7 (some people might argue) we definitely need to keep existing behaviour (leaving clean-up enabled by default).

slashrsm’s picture

I created another issue that might be related: #2005166: Create simple file listing under admin/content/file.

tsvenson’s picture

@slashrsm: Having automatic deletion of unused files on by default is wrong.

This is an all or nothing option. It is a site wide behaviour that either will delete every file that gets its use counter to 0 or not. Yes, there are use cases when this is wanted, but there are tons of use cases where this is not wanted.

We are really good at creating these all or nothing functionality in Drupal and making assumptions for users we don't know! The problem is that very few sites works that way. The sites need these functions to work differently at different parts of the same site.

Drupal is amazing to build rich content sites with. Sites where the owners want great tools to produce new content as well as reuse existing content. They also want to be able to tailor the various site features to fit their needs. Some of them will have the need to delete files when those files are no longer needed, but for other features they want the files to remain in the library so they later can be reused easily.

To allow this to be possible Drupal core needs to support this correctly to start with. That is not the case here.

This was architectured using garbage collection as a model and based on the assumption that a file with use count 0 will never ever be needed on the site again. For site owners and users that logic doesn't make sense.

What's worse, this was implemented without any choice for site owners and users. It was just assumed that 0 = never needed again and delete without warning.

Adding and UI option to disable this default behaviour is not the right solution. It will still be a site wide all or nothing choice.

The arguments about stock Drupal not being designed for reusing of files is also a bad argument. No one builds a site on just stock Drupal with zero contrib modules. We all know that Drupal sucks hard when it comes to file management, that debate has been going on even longer that not having a WYSIWYG editor in core.

So using that as an argument for this being on by default is incorrect.

Again, this is a prime example where even the most basic UX principles have been ignored!

Those who do want Drupal to delete their files only have to enable this. When they do they will know exactly what they get and Drupal will do it for all existing files with use count 0 instantly.

Just because you want it, don't assume everyone else do to! Don't let those who want to keep their files have to discover this when their files are gone! When it is happening, few will instantly know they had forgot to disable this, they will spend tons of time trying to figure out what the h_ll happened.

aspilicious’s picture

We should move 15 to rtbc when it's green again (it's sended to the bot). That way a maintainer will arrive and judge :)

slashrsm’s picture

@tsvenson I don't want it. I will disable it for every site. I think that we must behave pragmatic here. There is practicly no chance to get any sort of file library in D8 core. That is a fact. Files behaviour will stay the same as it is now.

When we choose default value for variable introduced in this patch we must think in context of core. Which should be the best default value for a site that installs no contrib modules? We should not make any assumptions on which contrib modules will be installed afterwards.

hass’s picture

#15 is not RTBC!

aspilicious’s picture

Calm down, i just want more opinions from the comitters.

tsvenson’s picture

@slashrsm: Yes, I am aware that there is little possibility to rearchitect this functionality at this stage.

However, that still leaves us with two options for how the existing functionality shall work by default:

  • Enabled - Content in the form of files will automatically be deleted when their use count reaches 0. There is no warning or other notification at all as and when a file gets permanently deleted. It is also known that very few users actually knows about this for Drupal 7 and there has been many reports from users complaining about this. In many cases users have had to spend considerable time debugging their sites before finding out the cause. As a result users will have to manually upload the deleted files and in some cases that isn't possible if for example the content is old and its hard to track down the originals. Yes, this introduces a visible configuration option in the UI, but that is not a guarantee users will find and understand how it works before its too late.
  • Disabled - This is a content safe option. No files will be deleted. Site owners will have to make an informed decision when enabling it. Users that want files automatically deleted can change to enabled at any time and Drupal will instantly remove all existing files that meet the 0 use count criteria.

My vote is on disabled for reasons given in my previous comments. It will avoid bad surprises for a lot of users who simply have been made aware about this and it is just bad practice to have default settings that deletes content. That is something users should have to manually configure.

I would also like to know if this cleanup functionality can be replaced with more advanced and easier to tailor functionality in contrib? For example one that gives site admins and/or the file owners a list of unused files they then can select from that will be permanently deleted from the system.

Lastly, did you see my suggestion to new description for this option in #19? I think you maybe missed that one.

slashrsm’s picture

Updated description.

rooby’s picture

@tsvenson:

You are making the assumption that our assumptions are wrong and your assumptions are right.

Most of my points seem to have so far been ignored, for example:
#26, regarding sensitive files being undeletable, which I would argue could be easily as bad as accidentally losing files.
#29, asking why is it so terrible for modules like media to set the value themselves? Seeing as this isn't an issue with drupal core unless you have such modules and since when do we put things in drupal core specifically for other contrib modules? We just give them the ability to do it themselves.

The arguments about stock Drupal not being designed for reusing of files is also a bad argument. No one builds a site on just stock Drupal with zero contrib modules.

It is not a bad argument. Drupal core should configure itself assuming no other modules are installed. Core cannot assume any contrib module are going to be installed so it has to be configured based on that.
Core's job is to provide the means for contrib modules to do what they want, not to try to do what they think some contrib modules would want, especially seeing as there are probably other modules that won't want it too.

Plus, in the case of the media group of modules, why on earth isn't file entity storing file usage?

I would also like to know if this cleanup functionality can be replaced with more advanced and easier to tailor functionality in contrib? For example one that gives site admins and/or the file owners a list of unused files they then can select from that will be permanently deleted from the system.

Do you mean the admin manually deletes files or that they flag files as being "delete allowed", so that if they have no usage they get cleaned up?

It should already be possible to make a contrib module that shows a list of files and whether or not they are being used anywhere.
This should be based on the file usage table in the database though, which still means you have issues with modules that don't record file usage, like file entity and up until recently, WYSIWYG.

Currently you could use hook_file_delete() to stop a file being deleted but it is a bit hacky and it would interact badly with other modules' implementations of the same hook. Another pre delete hook could be added where modules could stop the deletion but then there could be arguments from modules that want to delete a file but are being stopped at this point. In that case you could have a force delete parameter, but then the other side would argue they should have the right to stop any delete. Which side gets final say?

tsvenson’s picture

@rooby: Let me ask you this in relation to your arguments:

  • What functionality and behaviour will you loose if this option would be set to disabled as default?
  • What would be required for you to get that functionality and behaviour back?
hass’s picture

Answering questions with questions... Great.

We may have sensitive or copyrighted data where we do have ownership in public. This can cause serious troubles to the site owner. We must delete in such a case.

On your sites you can install Media and media will set the value to disabled to keep all stuff for you and may bring you into serious law troubles. Other users get the cleanup and may have backups and the world is ok for both. Better safe than sorry.

rooby’s picture

What functionality and behaviour will you loose if this option would be set to disabled as default?

The cleanup of files that are no longer used, leaving me with orphaned files on the server.

What would be required for you to get that functionality and behaviour back?

Changing the setting.

The same can be said for your case, if the setting was enabled by default.

So the question is: what should the default be? (for drupal 8, I think changing the default in drupal 7 at this stage in the lifecycle is crazy talk)

Aside from the fact that as per #3, if modules were properly recording usage there would be no problem, the problem is that on a stock version of drupal (whether or not this is uncommon is not the point) there is no use case (that I am aware of) for this to be disabled by default.

Drupal is providing the means for contrib to configure. Its defaults should only concern drupal core, not whatever contrib modules the user might install later.

I just can't see the negative side of having the default as enabled if contrib modules that need it to be disabled have the power to change that.
However I can see the negative in the opposite case, where no contrib file related modules are used and the site ends up with a bunch of orphaned files.

I don't feel I can add anymore to the conversation without more going round in circles so I'm happy to await some committer input.

tsvenson’s picture

@hass: I have the outermost respect for you as a developer and are using many modules you helped with myself. That includes respecting and listening to yours and others arguments. Unfortunately I rarely see the same mutual respect from developers for us who have other skills and experiences, such as UX, including usability and UI for example.

In my case I have almost 30 years experience in this field, yes even from before the term User eXperience was even used.

When this behaviour of Drupal core was discovered, there where no option, and still are none as this patch aims to fix that, for enabling and disabling the automatic deletion of files.

Those who are responsible for originally designing and implemented this functionality made several assumptions. One of them being that any file with the use count 0 will never ever be needed again on the site. Other assumptions was that no user ever needed to be warned about this, nor notified that files would be deleted. It was also poorly documented as few other developers in the community realised this.

The reason for this is simple. It was designed from a technical point of view and with garbage collection as a model.

Even more importantly, as I have been pointing out, not even the most basic UX principles was applied to the design phase. Not even the most simple scenarios can have been looked at for when files will be deleted and so on.

Now saying that the Media module should change this is a bad argument. It is an afterthought to validate a poorly designed and implemented functionality from the start. You don't fix a problem by adding more complexity too it, that is a hackish workaround in my view! Unfortunately that is the how the-Drupal-way really works for a lot of things.

This is a far more complexed issue that you may think. The most complex problem to solve is the embedding of files in text areas, WYSIWYG editing. There is no easy way to handle that as those embeds are just text. You need complex code to parse revisions and compare them to first identify if a file belongs to your site or not. Then if it has been added or removed in the text area and so on. Since there are many ways to embed those files, the problem gets even more complex.

And no, the Media module isn't the only module that deals with files in Drupal. The Insert module is another and when you embed an image using it, the use count isn't upped. Should the Insert module then keep track on those embeds and then make sure use is decreased when they are removed? What happens if you then disable the insert module, how will then the existing embeds be handled.

You also have FileField Sources that is a simple little module that makes it very easy to reuse files anywhere the file field can be used.

In Drupal 8, the WYSIWYG improvements will include also managing increasing and decreasing the use count. At least that is what Nate Haug said in his DrupalCon presentation I embedded in my http://www.tsvenson.com/blog/2013/05/drupal-8-wysiwyg-and-file-cleanup post.

I am going to be bold and take full credit for that happening. I was the one who initially discovered this, then thanks to @fabsor we found out that it is Drupal Core doing it. Since then I have tirelessly advocated about that this has to change. I also want to thank @Dave Reid and others who have supported the changes needed since the original discovery.

So you see, you can't just look at this functionality isolated, you have to look at scenarios, figure out how this is going to affect sites and user.

That is something I as a UX strategist have lots of skills and experiences about. So please show a little respect for those skills too. If you do you will quickly learn just how much we can help improving things. Including your own user experience as a developer!

I, and other UX people like me, only want to help making Drupal better by working with you, not having to spend the majority of our time and energy on fighting, over convincing, for and about every little UX improvement just because developers don't respect our skills, experience and all the great things we can contribute with to the community.

Sorry for the ranting, but I am frankly speaking sick and tired of this situation and how large part of the developers in the community conveniently and for the wrong reasons chose to ignore the severe UX problems Drupal has!

tsvenson’s picture

@rooby:

The same can be said for your case, if the setting was enabled by default.

No it is not because there is a fundamental difference.

Enabled it will delete files without warning. It is a known fact that many users has and will discover this after the accident already occurred. In almost all cases it will also take them a long time to figure out why it is happening. When they do they will have to somehow restore those filen.

Disabled will keep unused files. Yes, those files can still be accessed, but only if you know the URL. Also, if the file is used in an earlier revision of a node, it will not be deleted if its removed in a newer revision.

If you need files to be deleted you can't put your full trust in this core behaviour. You also know you need this behaviour and thus you will, hopefully, find this setting and can turn it on. This because it is a requirement for the site feature you are building.

Everyone I have talked to that has had the experience of files being deleted have all said they never expected this. Therefore very few will assume that is happening and look for a way to turn this off.

DamienMcKenna’s picture

FileSize
79.99 KB

A problem with automatically deleting files is that there is currently no indication given within the site as to what will happen with a file's usage drops to 0 - the user doesn't get a message saying the file was deleted and, without this patch, there's no way of telling the system "no, don't delete my files."

I has a bucket.

I would vote to err on the side of not deleting the user's content, I would wager this would be a preferable default scenario than people wondering where their files went to.

At the very least, has anyone examined the current D8 process if a term is no longer referenced by any content - is the term deleted?

DamienMcKenna’s picture

I recommend we either change the default to not silently delete files or we improve the UX so that a message is displayed when an image will be removed & if the user has the appropriate permissions indicate that the default can be changed via the file settings page.

jcisio’s picture

I think if modules do not want their files to be deleted, just add a file usage record. Point.

Look at what modules like IMCE do. There is no problem. Files uploaded in a node form are deleted when no longer used. I even think this issue should be "works as designed" as sun pointed out in #5.

tsvenson’s picture

@jcisio: No, code should not have a vote on when content should be deleted or not and then just go ahead and do it without any warning or notification.

Then every module that is adding and removing files to Drupal will either do that or not. Then it is the modules that are in charge of the content and not the content owners and/or users!

cweagans’s picture

@tsvenson, I think you may be misinterpreting the intent of the code that powers this feature. The entire purpose is about managing expectations. If I create a node with a file field on it, attach a file, and then delete that piece of content, I expect that the content is deleted. In core, since we don't have a way to manage files, "content" includes the file attached to the node I'm deleting. We put file_usage in as a way to determine if this is a safe operation or not. Modules like Insert, Media, et al should increment the file_usage counter.

If you want to put in a checkbox to determine whether or not this behavior is enabled, that's fine, but the default in D7 can't be changed at this point, and I believe that having file purging as a default behavior in D8 is appropriate (at least in the standard profile, where the guiding logic is "sane defaults". Minimal...sure, turn it off. Everything is turned off in Minimal). For sites with hundreds of thousands of files, garbage collection is important. We are trying to conserve disk space where possible, and contrary to your talks with people, my clients have enjoyed that feature.

If a module depends on having this behavior disabled, it can easy disable file purging while being enabled.

It also sounds like a documentation page about this feature needs to be written somewhere. My opinion as a core dev may be a little biased, but I'm not sure how you can not know about this feature.

tsvenson’s picture

@cweagans:

I think you may be misinterpreting the intent of the code that powers this feature.

No, I don't believe I have. I believe it is working exactly how it was intended to work. However that is also exactly the problem as extremely narrow use case(s) was used to design the intent.

Consider this, using just Drupal core:

  • I create a node and add an image to its image field. This particular content type also has the the setting create new revisions automatically, something users can't avoid. Use count is now 1
  • Then after some time that image is reused in an article by manually embedding using the <img> tag in the text area. The image is now used in two places, but the use count is still only 1.
  • Then a few months later I decide to replace the image in the first node. Thus I edit it, remove the existing image and upload the new. Since revision is automatic the old image reference, and use count, still exists. Thus the old image is still visible in my article as it should.
  • However, a few months later I decide to delete the node completely as it is never ever needed again. I only see the last image that was added, however the cleanup routine will decrease both its and the image in the revisions use count and both will now be deleted from the system.
  • The image in my article will at the same time disappear too. But I wont know that until I view that article myself or someone reports that something is missing there.

If this is a site with many users working and producing and editing content the confusion will be even greater.

If the content type didn't have revision the image would have disappeared when I replaced the existing and chose not to manually trigger a new revision.

That also means that unused files still referenced to only in older revisions still will be URL accessible even if they never will be used again. Thus right here the intended use of the code fails!

If proper UX principles and strategy had been used and applied when designing this functionality it would have been very easy to identify many other use cases, some even simpler than the above. Then using that knowledge learned from those to better understand the full implications and scope of the task ahead. Including a much better risk assessment for users with this kind on blunt functionality.

I am convinced the resulting functionality would have been much different and without question not as badly designed as this one!

cweagans’s picture

Then after some time that image is reused in an article by manually embedding using the <img> tag in the text area. The image is now used in two places, but the use count is still only 1.

Ah, so this is the real problem. Please open an issue for it. File usage tracking should track all usages of a file. IMO, if we fix that, this problem goes away. That is, if file usage is actually tracking ALL usage of a file, then the problem that you described is no longer a problem.

If proper UX principles and strategy had been used and applied when designing this functionality it would have been very easy to identify many other use cases, some even simpler than the above. Then using that knowledge learned from those to better understand the full implications and scope of the task ahead. Including a much better risk assessment for users with this kind on blunt functionality.

I am convinced the resulting functionality would have been much different and without question not as badly designed as this one!

If it's such a badly designed thing, please propose an alternative to this functionality. Having stale files laying around is a very poor solution to this problem - one that I would not be happy with at all. If the feature itself is designed on poor principles, I'm much more in favor of removing it altogether than making it optional, but I don't think that's the case here. I think the general idea behind file usage tracking is a good one, even if our implementation of it has some gaps that need filled.

Making the functionality optional and allowing a site user to pick which ones to delete isn't really a good solution here either, since the behavior on most bulk action screens like that goes something like this:

* select all
* click submit
* panic when something goes wrong

That's not any better than the situation that you're running into today.

cweagans’s picture

Also, I think the situation that you described is impossible, since a bug in core prevents file_usage from being decremented when node revisions are deleted: #1239558: Deleting an entity with revisions and file/image field does not release file usage of non-default revisions, causing files to linger

jcisio’s picture

Then after some time that image is reused in an article by manually embedding using the Only local images are allowed. tag in the text area. The image is now used in two places, but the use count is still only 1.

While the would be some work on file usage, the described situation is common in core. In the textarea, if you link to another node, then delete that node, you'll get a 404 link. The same problem happens even with file field, user reference field etc. Even worse, you embed a derivatived image, then remove that preset, that derivatived image becomes a 404. Or rename the file name, then the embed file in all textareas become 404.

If you want to reuse images properly, use a module that do that. IMCE is the easiest and it works out of the box. There are also quite a few modules for advanced use cases.

The core behavior IMHO is actually good. If file usage does not increase/decrease after a revision add/remove, then it is a bug. I do think those bugs exist, because there are a lot of @todo in Field and File modules.

hass’s picture

@tsvenson: I'm always thinking about usability very first. It makes me often not implementing something just for UX reasons :-). Since I'm using drupal I'm fixing more usability issues than any other issues. I'm not part of UX team, but things I implement are well documented, have good and self explaining descriptions (!) and try to be solveable by reading without guessing or beeing a computer hero and without using handbooks...

Just to give any other comparable example. I also do not expect that my images embedded in MS Word documents are not breaking if I miss to enable the embed image into document checkbox. This is expierience and knowledge about the product they are using. Every word beginner may fail here, too. This is 100% comparable, but Drupal do not have this embed/attach checkbox yet. cweagans has a very good point here. We need to track these embedded files, too. I have never seen myself only one customer who forbid deleting and moving files on his fileserver.

And with ICME I can move files between folders and turn your site into file linking death. There is no chance to prevent this as I know and this is why I removed ICME from my sites, too! It's more risky to use such tools than not using them. Aside of this I'm personally not re-using files very often, but this may be an exception. If I teach my users to attach every file to the node I make sure the files will never get lost and content is always consistent and will never use files from other content. Negative side effect is I may have files twice on disk, but this outweights the risk of destroyed content from my point of view. At least as long as we are able to make wonders happen and track wysiwyg embedded files. I say "wonder" as I know how difficult it is to extract links from wysiwyg and tons of other fields as I developped linkchecker.

cweagans’s picture

This is only tangentially related, but I'd also like to note that a lot of clients that I've done work for actually forbid putting images into their node bodies, because they prefer to keep those images separate as structured data. It makes changing design/layout/etc down the road a lot easier than having to go through and edit a bunch of nodes because they were constructed with a certain design assumption in mind.

hass’s picture

#55 Correct, but there are a lot who like WYSIWYG and you may cannot say no I don't implement this :-).

cweagans’s picture

Of course - just thought I'd mention it.

slashrsm’s picture

A lot have been said. A lot of arguments, most (if not all of them) perfectly justified. We still have to decide on one solution, so I'd urge all participants of this thread to be proactive.

Let's try to converge our views and achieve solution that will be OK for all.

cweagans’s picture

Okay, here's the solution that I propose:

- We close this issue (won't fix)
- We open an issue for "File usage tracking should track all usages of a file", including files referenced in markup.

I strongly feel that removing or allowing disabling of the file cleanup is a mistake. If we must make it an option, I'd prefer to leave the UI in contrib (there's some module that I can't remember that exposes a settings form for settings that core only has through variable_get/variable_set in d7, but I can't remember what it's called). We should let core do the smart thing and have some way for site owners to turn it off if they feel they must.

slashrsm’s picture

I would not agree with #59. Primary purpose of this issue/patch is to make easier for contrib to modify cleanup's behaviour. I think we should achieve at least that (introduce variable without exposed UI or something similar).

I am also convinced that most of the people (users, site builders and even developers) have no idea about this feature. Exposing it in the UI would help people to know what is going on in their systems. Same result could be possibly achieved using some other approach, but I am not sure how.

It is quite clear that there are use cases that prefer cleanup and also those that definitely try to avoid it (just read all the comments above), which makes me think that there is no single solution possible here. I believe that we should consider that fact when we think about the solution for this issue and leave enough flexibility to cover both sides of the story.

fuzzy76’s picture

1. If I directly link an image from node 2 in node 3, I would actually expect the image to stop working if node 2 is deleted. That's how linking works. Novice users may not necessarily realize this, but novice users don't know how to link images across nodes either.

2. If I delete a node, I would expect files uploaded to that node to be deleted as well.

This is how Drupal without special case 3rd party modules would be expected to work by most users. That is what UX is about, and that is why disabling this garbage collection by default would be the most confusing setting for users.

And I'm not even sure contrib should be able to prevent this. If I attach a sensitive Word-document to a node, I would NOT expect the media module to prevent it from being deleted when I delete the node. This could actually send me into legal trouble (my employer work in the health care sector).

jcisio’s picture

#60 #61 contrib should be able or not, they could now. Modules like file_lock can easily enable/disable the garbage collection.

mallezie’s picture

If i read this issue, it looks to me there are multiple issues combined. And i think those could better be discussed seperate.

  1. The UX-problem according to me is that drupal core deletes files without warning. (Without the discussion if it should do that).
  2. Would it be possible that drupal warn's you, when it does that. If you delete a node, you come on an confirmation delete page, where Drupal also mentions which files will be deleted. (Without an option to delete the node, without the files.). Something like when you enable a module which has dependencies, drupal says which modules it will enable as well. (This could even be backported to D7 perhaps).

  3. Further drupal doesn't count file usage correct, which is described in #50 with related issue probably #1239558: Deleting an entity with revisions and file/image field does not release file usage of non-default revisions, causing files to linger, which could fix everything, when the file usage is counted everywhere, but probably very complex in core to find the embedded image (core only, no imce or media involved.)
  4. Further there is the make cleanup optional. (patch in this issue). This could be finished here, with default to cleanup (since that's the case now). A new issue could be created to change this default behavior. (Perhaps this should be postponed on the first option.)

Just trying to debug the discussion here, and make it possibly to advance in the discussion.

Greetings

rooby’s picture

#47:

I think if modules do not want their files to be deleted, just add a file usage record. Point.

Exactly. The file usage system provides the ability to do what this issue is trying to do. This issue is just a way to mask that a bunch of contrib modules are not properly using the drupal API.

#49:

It also sounds like a documentation page about this feature needs to be written somewhere.

Absolutely.

#50:

Then after some time that image is reused in an article by manually embedding using the Only local images are allowed. tag in the text area. The image is now used in two places, but the use count is still only 1.

This is a problem, however I think it is a user training problem.
As mentioned in #53, this is a much further reaching problem and I think it comes down to knowing the tool you are using (as per #54), whether that be by having your developer give you adequate training, or if you are making your own site, learning for yourself.
The tool in this case is the internet.
As mentioned in #61, this is how hyperlinking works everywhere. If I link to a page on joe bloggs' blog and he deletes that page the link breaks. If I use remote images and those images get deleted at the source then the images are broken. This is the same thing except that what you are linking to is within the same site (it's still on a different page - if it is on the same node and it gets deleted the broken link goes to so that isn't an issue). It pays for people making or using websites to understand how things work. And I'm not talking crazy complicated programming things here, it's hyperlinking, one of the most fundamental aspects of the internet.
It reminds me somewhat of Jeff Eaton's blog post, which talks about users needing to know how things work (to some extent) under the hood of their fancy UX features, to avoid getting themselves into trouble.
I'm a huge advocate of UX, however I don't think any amount of UX can replace all learning/training (and if it somehow did that would be a terrible thing for the world IMO).

The UX argument goes both ways IMO anyway, because if I have a file field and I remove that file from that node or delete that node I would expect the file to be gone. The fact that it is not gone and I have not been notified of that fact is an equal and opposite UX issue to the one of deleting files without notice.

#59:

Okay, here's the solution that I propose:

- We close this issue (won't fix)
- We open an issue for "File usage tracking should track all usages of a file", including files referenced in markup.

I tend to agree with this, because technically this issue is about masking the problem that contrib modules are misusing (or not using to be more specific) the file_usage table.
What we should be trying to do is fix that problem, not mask it.

If we mask it with this and have some sites where a module can't delete files even when it is very important that the files be deleted, and contrib moudules don't start using file_usage properly, we could end up with more mess than we have now.

I think that the file lock module does what we are asking for here the proper way (according to the current drupal API).
That module could even add more admiistrative areas where you could view all locked/unlocked files and lock or unlock specific files.
This could be done using views bulk operations and can have rules integration, etc.

Maybe there is a way to make it so that whenever a user is about to set file usage to zero with some action they are warned.

This would not be a straightforward thing to do though. Possibly a pipe dream.

PS.
@tsvenson: I'm a fan of what you do and your blog posts. I almost always agree with what you say in your blog posts and comments, just in this specific case I disagree (I don't disagree with UX but I disagree that your solution is the correct one in this case). So I'm not disagreeing with UX and I'm not disagreeable towards you in general or trying to make you frustrated.
Understand that I am just as frustrated from the opposite site.
:)

slashrsm’s picture

I like ideas that @mallezie posted in #63, specially the one that proposes notification on entity delete. This would definitely increase user's awareness about this feature in core, which can only be good.

It would also be nice to have usage tracking for files referenced from textarea fields, but this is complex problem that would need separate issue and a lot of discussion to make it right. And I rather have nothing that something that only partly works.

I am really happy to see this discussion going on. A lot of arguments were presented and we saw lot of views that I was not aware of them. I can say that this issue improved my understanding of this topic a lot. However, I still believe that we need to make this optional. This would allow site owner to decide whether to use clean-up or not. Contrib modules like file lock could still be used for sites that need more fine-grained control ("lock" based on filename pattern, mime type, ...).

tsvenson’s picture

Hey guys,

Awesome feedback the last few days. So happy to see so many caring for finding a solution to this problem. I have been a little busy the last couple of days and need to mull over all the new information a bit. As some has pointed out, there are several issues being discussed here. However, they are all related and dependent on each other. Therefore, just breaking them up in separate issues and solve them separately is not the solution. We need a to have a full understanding of all the things happening and a clear goal to what kind of solution we need.

I'll try and find some time this evening to go over all the new info and see if I can compile some form of proposal out of it.

Dave Reid’s picture

Assigned: Unassigned » Dave Reid

Ok I'll step in here finally after a bit. The major issue I have with temporary file cleanup is that file are removed only 6 hours after they've been marked as temporary. I really don't have an issue with temporary file cleanup. It's a good thing, but the window of opportunity to re-use a file is so short. I can't go back and try to use that file I refered to yesterday, because now it's gone.

We need to ignore issues of file usage in WYSIWYG. That is a bug in modules that don't track usage properly and has nothing to do with this issue. Media has essentially solved this for its use case already. I've talked with quicksketch at DrupalCon that we have a need for a unified way to 'embed' references to entities/files in text areas and it would be nice if core could also provide the usage tracking for those items. That will be a separate issue.

What I'm going to propose is changing the temporary cleanup window to a variable/config that by default is the 6 hours. This way, a contrib module is free to increase this window to something more reasonable, like 1 week before a file is removed. The window could be set to 0 to have temporary files deleted immediately. It could be set to -1 to mean that temporary files should never be deleted. This should also be exposed in the UI, similar to the watchdog record expiration.

This way we don't actually change any of the functionality and defaults that core has, but we made it alterable in a better way.

I am working on a patch currently for this.

cweagans’s picture

I think I can get behind that approach. Thanks for the proposal (and forthcoming patch), Dave!

Dave Reid’s picture

Just a note that I also had a chance to catch up with quicksketch at DrupalCon. He informed me he is working on adding a very simple Media browser to the CKeditor WYSIWYG integration in D8, which by default shows temporary files. So it could be a very real thing that we might need to investigate increasing the default window when that lands. I edited #67 to also point that we talked about a need for a unified way to embed or at least refer to embedded entities/files/images in textarea/WYSIWYG content (using data attributes) and that if that were possible then we could have core take care of tracking file usage of those.

Dave Reid’s picture

Status: Needs review » Needs work
FileSize
6.19 KB

Here's the first revision of what I'm proposing. I haven't been able to actually test this but it gives a good idea of how it changes the file settings form, file_cron() and FileStorageController::retrieveTemporaryFiles().

I'm not quite sure what to do with the DRUPAL_MAXIMUM_TEMP_FILE_AGE constant, likely it needs to be removed in favor of the config variable. And I know I need to at least add some tests for the new variable, and adjust any existing tests.

Berdir’s picture

Status: Needs work » Needs review

I like that idea as well. I think the main reason for the conflicts is how the concept of files changes if you have just core or file_entity. In core, files are secondary content, they're just attachments of other entities. So once a file is no longer used, it makes a lot of sense to delete it after a few hours. However, if you have file_entity, then files become primary content, you have lists of files, media libraries, can add them on their own and so on. Then it is a strange idea to delete files just because they're not used right now. Making it a setting as suggested, possibly with a longer default works for core and file_entity could even change it by default.

Also not that we are in a good situation to implement that, very much unlike in 7.x with file fields. What happens there is that they manually check the file usage and if it's going to be removed and the file usage is 0, they immediately delete the file. So the file actually doesn't even exists for those 6 hours, it's immediately gone. So I think it's tricky to backport this as we would have to change that logic if we want it to work for file fields too.

The patch that I got into 8.x after a lot of back and forth made it so that file fields no longer care. They just decrement the file usage count. The file usage API then internally marks the file as status 0 if there are no usages left. And then file_cron() asks the file storage controller for files that can be deleted using retrieveTemporaryFiles(). Which I think is incorrectly named and maybe we should kill that method and instead just use EFQ.

Anyway, that means there's a single place which is currently a constant that we need to change to config and we're done.

And even if that doesn't happen, it would still be possible to prevent file_cron() from running (unlike system_cron() in 7.x, it does only that) and implement a different logic, especially if we move the retrieve logic out of the storage controller. And if not, you could also alter that one and implement a different logic there. Or you could replace the file usage service and no longer mark files as temporary when usage is removed. Lots of options ;)

Berdir’s picture

I don't think immediately delete is safe option. Files are also temporary when you upload them. So if cron runs while you are creating new content, it might delete your files before you have a chance to save them :)

Dave Reid’s picture

Yeah for D7 we still would have to fix the fact that the fields themselves attempt to delete the files. If we were to backport, which I think we can, we can backport the variable, disable the deleting from the fields, and just rely on system_cron() for cleanup.

David_Rothstein’s picture

Also not that we are in a good situation to implement that, very much unlike in 7.x with file fields. What happens there is that they manually check the file usage and if it's going to be removed and the file usage is 0, they immediately delete the file. So the file actually doesn't even exists for those 6 hours, it's immediately gone. So I think it's tricky to backport this as we would have to change that logic if we want it to work for file fields too.
.....
Yeah for D7 we still would have to fix the fact that the fields themselves attempt to delete the files. If we were to backport, which I think we can, we can backport the variable, disable the deleting from the fields, and just rely on system_cron() for cleanup.

That doesn't sound safe to me - I thought immediately deleting the file in D7 is a security feature? (And therefore potentially a security hole in D8 already, if it now waits to delete them.)

The reason is, if you have a private node with a private file attached to it, the File module will extend the node's access control onto the file as well. But as soon as you remove the file from the node, that won't happen anymore, thereby potentially exposing the file to people who shouldn't be able to see it (at least in some cases; it depends what other file access control modules you have installed). And since it's an existing file which has been around for a while, the URL may be readily known and therefore the 6 hour window is still dangerous.

Also, in D7 I thought the 6 hour window for temporary files was chosen to match the form cache lifetime? Because it's for situations where you start creating content and upload a file on the content creation form, but then change your mind and never save the content. Drupal in general gives you 6 hours to make up your mind on that kind of thing, and therefore also keeps your newly-uploaded file around for 6 hours in case you decide to save it.

This issue is overall a real UX problem but unfortunately tied up in these gnarly security issues :( I wonder if part of the solution is for Drupal to have clearer conceptual distinction, in both the code and user interface, between (a) files that are "only" attachments and (b) files that have an independent life on their own? Because the first makes sense to delete automatically and the second doesn't. From a site building perspective, if there were some kind of option to "add this file to the media library" when uploading and the two different kinds of files were presented differently everywhere in the admin interface, I think it would make more sense too. That's a big change but I'm not sure how else to get around some of these issues because so much of the security aspects revolve around what the intended purpose of the file actually is in relation to the content it's attached to, and that is something that only the site builder or content creator really knows the answer to.

Dave Reid’s picture

Despite being a member of the security team, I'm not actually aware of anything related to temporary file deleting aside with regards to security. It all falls back to file usage. If a private file isn't declared to be used by anything, it should not be accessible (maybe unless you were the person who uploaded it). If it still exposes it somehow (which I'm not aware of), then that's definitely a separate issue from this. For all I know this deletion "feature" was added to cleanup files for disk space. I will attempt to go back through logs to confirm it.

DamienMcKenna’s picture

if there were some kind of option to "add this file to the media library" when uploading and the two different kinds of files were presented differently everywhere in the admin interface, I think it would make more sense too.

*cough* scald. ;-)

Dave Reid’s picture

Removing temporary files in system_cron() was added in #115267: Simplify File Uploads, Centralize File Validation and Quotas, Fix File Previews.... which references nothing about security when deleting temporary files, but ironically the original issue post:

The benefits of this patch

Files won't be deleted when nodes are deleted.
You can have a filebrowser to manage files without needing nodes.
Simplify preview and upload handling for nodes with attached files.

@David_Rothstein, unless you can point to anything specifically about temporary files and security concerns, I have no idea where you got that.

I'm opposed to any solution that forces used to classify their files as media or not.

DamienMcKenna’s picture

@Dave: I mentioned scald more to point out that there was an existing (contrib) module to provide a separate data structure for "media" files if site builders specifically wanted it, not that I'm specifically advocating for it.

It seems this whole thing has looped over itself and gone back to the original request: provide a way to tell the system to not automatically delete files when the file usage drops to 0 - that's why the current file deletion scheme was originally added.

driki_’s picture

I know this is not the right place to discuss about Media and Scald, I'll probably post an issue in media module queue or in the media group, but I just wanted to clarify one point regarding to this :

I'm opposed to any solution that forces used to classify their files as media or not.

We are advocating for a solution that maximises the possibility for the site builders and not assume what usage they should have with their drupal / media solution. If they want to have media assets separately from other files in their drupal, that's fine. If they want to see all their files in the scald media library that is fine also. There will be soon I think one scald provider ( ~ scald_fileentity ) to allow this usage.
Sorry for the noise, we'll continue I guess this discussion in a more appropriate place.

Status: Needs review » Needs work
Issue tags: -D7UX usability, -D8UX usability, -Needs backport to D7, -Media Initiative

The last submitted patch, 1399846-temp-file-delete-window.patch, failed testing.

slashrsm’s picture

Status: Needs work » Needs review

Status: Needs review » Needs work
Issue tags: +D7UX usability, +D8UX usability, +Needs backport to D7, +Media Initiative

The last submitted patch, 1399846-temp-file-delete-window.patch, failed testing.

David_Rothstein’s picture

Removing temporary files in system_cron() was added in #115267: Simplify File Uploads, Centralize File Validation and Quotas, Fix File Previews.... which references nothing about security when deleting temporary files

I was actually referring to the immediate deletion of files by the File module in file_field_delete_file(), which seems to date back to the FileField module in Drupal 6 (if not earlier)... I wasn't able to track down when it was originally added, but regardless of whether it had a security rationale then, I believe it may have one now.

I'll have to check this further and do some testing in Drupal 8. It's not an issue that would be reproducible with core alone, though. I'll file a separate issue if so, but the way it could affect this one is if it turns out that nothing but immediate deletion is safe.

I'm opposed to any solution that forces used to classify their files as media or not.

It wouldn't necessarily have to be the content creators doing this. It could be a choice made by the site builder when configuring the file field instead. For example, we're building a site now where we essentially try to make this distinction based on the field widget. If a file was uploaded via the media browser widget it goes in the media library and is also "locked" so that it never gets automatically deleted. But if it was uploaded via a regular file field widget then it doesn't go in the media library and gets treated as an attachment instead, and Drupal is allowed to automatically delete it when it's no longer attached to anything.

Dave Reid’s picture

If a file was uploaded via the media browser widget it goes in the media library and is also "locked" so that it never gets automatically deleted. But if it was uploaded via a regular file field widget then it doesn't go in the media library and gets treated as an attachment instead, and Drupal is allowed to automatically delete it when it's no longer attached to anything.

This is interesting and might be worth a new feature request for D9 and investigation for contrib for D8. I think I would still like to move forward with this proposal since it 1. does not change any behavior in core and 2. provides flexibility for contrib.

Berdir’s picture

Tempory files only allow access to the file owner, so I don't think this is a security problem.

file_file_download():

  // Stop processing if there are no references in order to avoid returning
  // headers for files controlled by other modules. Make an exception for
  // temporary files where the host entity has not yet been saved (for example,
  // an image preview on a node/add form) in which case, allow download by the
  // file's owner.
  if (empty($references) && ($file->status == FILE_STATUS_PERMANENT || $file->uid != $user->uid)) {
    return;
  }
tsvenson’s picture

Something I hear a lot of enthusiasm about is the opportunity a new major version of Drupal gives when it comes to break API's, rip out stuff that wasn't that cleverly implemented and replace it with something much smarter.

Here, in my view, we have such opportunity. An excellent opportunity to show we take things seriously in the community and will replace a UX black hole with something that works and gives back control to those who should not have lost it in the first place. Something that moves control over content from an arbitrary number to those who design and work with the actual site and its unique features and needs.

Yes, this is a black hole functionality. As it currently works in D7, and still in D8, it is a piece of code that monitors the use count number of files, then whenever that number hits 0 it sucks in each file without pardon or check. I have already given several examples of how uncertain the use count number is and that it simply is not a trustworthy decision maker verifying that a file a; is not in use any longer and b; never ever going to be reused again.

This issue alone is, at least for me, proof that the proper risk assessment of using 0 as decision maker was not performed. As shown not even the most simple use cases using core alone where thought out to test using 0 as the trigger for deleting files.

Now we have soon been busy for about 2 years trying to figure out how to fix this and still use 0 as the delete trigger.

What's even worse, this was, and still is, a silent black hole that gives users no warning whatsoever about that files will be deleted. Neither are there still no real work or suggestions made for giving control over the file-content back to those who never should have lost it in the first place.

I am not against Drupal providing functionality that aids of keeping sites tidy, including deleting files. But it simply can't be untrustworthy functionality like this. Functionality that has absolutely no clue and/or gives the site owner limited chance to decide about their own content.

I want to challenge everyone here to show me why 0 use count is a trustworthy and safe trigger for deleting content in Drupal. Not just safe, but also not requiring a lot of exceptions etc to work properly. Thus far I haven't seen any that passes even the simplest UX tests.

There are simply to many unknown factors involved here that makes this really problematic!

Berdir’s picture

Status: Needs work » Needs review
FileSize
4.2 KB
4.48 KB

Here's a patch that uses EFQ and age 0 == never delete. Previous patch didn't work because the method returned a result object instead of an array of fid's.

Dave Reid’s picture

Title: Make unused file 'cleanup' optional » Make unused file 'cleanup' configurable
tkoleary’s picture

I'd like to add a task to this issue:

Add the following text:

"If the content that contains this file is deleted the file will also be deleted from your site. Disable file cleanup."

To the tool tip for the image file upload field, the CKeditor image modal and the help region of admin/content/files (if that UI gets in to core)

jcisio’s picture

This should be for image/file fields only.

CKEditor images are never deleted: it does not know if the uploaded file is used in the textarea that is part of a field in an entity, and files uploaded with CKEditor is "owned" by CKEditor (the same way with IMCE).

jcisio’s picture

Clarification for #90 it was my guess, as CKEditor does not handle image upload yet.

hass’s picture

Just as a note. Since D8 there are data attributes used to reference images and files. We no longer have full path's in there. I have read it somewhere... These look like UUIDs... Since we have these UUIDs in source code it should be extremly easy to set the file counter +1 if there is a reference in wysiwyg fields. We just need to make sure this counting works properly.

azinck’s picture

@hass: latest 2.x-dev of media module already parses rich-text fields for media "macros" to update the file usage table. This works well, even with revisioning. Core could learn from what's been done there.

Gábor Hojtsy’s picture

I'd like to take a bit of a higher picture. Drupal core has lots of data objects. Nodes, terms, users, files, etc. Only files have a usage counter and related cleanup. If a user did not log in in a year or a term is not used anywhere in relation to anything else, that is not subject to automated removal. Only files. Then there is the question of "file use". If I make a view of files based on some criteria, is that use? Yup. Is that counted as use? No. So when I delete a node, the file will disappear from the view, even though I *used* the file in the view. This is yet another example of the blurry definition of use and the definite limitation of the counter feature and how it disallows file use based on site builder intent. Files are treated like menu items on nodes, if I remove the node, the attached data goes away with it, but reality is people have a wholly different understanding of how files on their system should behave. The fact that usage counting exists highlights a bit of an understanding that files are not sub-data of a node, but instead their own thing possibly related to multiple other data objects. Let's make a small step forward on this lane and embrace it!

tkoleary’s picture

@Gábor Hojtsy

Thank you for so succinctly articulating the problem space and potential benefit here. This is not a small issue and it deserves serious consideration.

slashrsm’s picture

Status: Needs review » Needs work

Good work so far. I really like the way this is going to. I would only suggest to remove immediate removal (temporary_maximum_age == 0) as this could result in problems when working with forms (cron could delete temporary file before form would be submitted).

#89 could be implemented in a separate patch. I don't think it is so related to this one that we should combine them.

Berdir’s picture

@slashrsm: That's exactly what my patch did, 0 no longer means immediate delete for the exact reason you pointed out but never delet.

slashrsm’s picture

Sorry.... my bad. Then we also need to change the way this value appears in configuration drop down (0 will be passed to format_interval resulting in "0 sec" as a label, which is not very descriptive).

Gábor Hojtsy’s picture

Also just wanted to call #2005166: Create simple file listing under admin/content/file to attention of the followers again, which should make the (re)use of files (or lack thereof) on the site apparent and more possible than before.

Berdir’s picture

@slashrsm

+++ b/core/modules/system/lib/Drupal/system/Form/FileSystemForm.phpundefined
@@ -68,6 +68,16 @@ public function buildForm(array $form, array &$form_state) {
+    $period = drupal_map_assoc(array(0, 3600, 10800, 21600, 32400, 43200, 86400, 604800), 'format_interval');
+    $period[0] = t('Do not delete temporary files');

That's why the 0 key is replaced with a different label :)

slashrsm’s picture

Status: Needs work » Reviewed & tested by the community

Can we RTBC this then?

Berdir’s picture

Status: Reviewed & tested by the community » Needs review
Issue tags: +Needs tests

No, because we need tests :)

drzraf’s picture

will the patch fix the issue mentioned in comment #8...

Berdir’s picture

Status: Needs review » Needs work
Issue tags: +Needs tests, +D7UX usability, +D8UX usability, +Needs backport to D7, +Media Initiative

The last submitted patch, 1399846-temp-file-delete-window-87.patch, failed testing.

Gábor Hojtsy’s picture

Status: Needs work » Needs review
FileSize
4.47 KB

Did not apply anymore. Let's see a quick reroll first.

Status: Needs review » Needs work
Issue tags: -Needs tests, -D7UX usability, -D8UX usability, -Needs backport to D7, -Media Initiative

The last submitted patch, 1399846-temp-file-delete-window-106.patch, failed testing.

slashrsm’s picture

Status: Needs work » Needs review

Status: Needs review » Needs work
Issue tags: +Needs tests, +D7UX usability, +D8UX usability, +Needs backport to D7, +Media Initiative

The last submitted patch, 1399846-temp-file-delete-window-106.patch, failed testing.

Gábor Hojtsy’s picture

Status: Needs work » Needs review
FileSize
4.57 KB
2.87 KB

So the install fails, because FileStorageControllerInterface mandates the retrieveTemporaryFiles() method. I think it makes sense to keep it on the storage interface. I've been pondering to inject config or just pass on the max age requested. I think it makes sense to not make the file storage controller dependent on config this way, so I decided passing in the max age makes more sense. Updated the interface and the implementation. This also required us to roll back a few lines of the changes in file.module to rely on the storage controller (which makes sense IMHO).

Berdir’s picture

I don't see why this is necessary :) Adding an argument is just as much an API change as removing it. I would just remove that method from the interface.

This is a trivial entity field query, that's what we have it for. We should only add custom methods for things that are not supported by EFQ.

Gábor Hojtsy’s picture

I think having storage responsibilities inside the storage controller makes sense :) But its equally possible to just remove it from the storage controller interface and not want storage controllers to define it. It used to not be on the interface or we did not have an interface, so I thought we have stronger feelings to keep the responsibility with the storage controller. No? :)

Gábor Hojtsy’s picture

Here is a version with the method removed from the interface and implementation as per @Berdir.

slashrsm’s picture

I personally prefer to have it in StorageController. I think it belongs there (just as all other storage-related operations do).

Gábor Hojtsy’s picture

It helps to include your changes in the patch too, not just the interdiff :D This should have been 113.

Status: Needs review » Needs work

The last submitted patch, 1399846-temp-file-delete-window-115.patch, failed testing.

Gábor Hojtsy’s picture

Status: Needs work » Needs review
FileSize
4.63 KB
9.74 KB

Included some untested tests. We need to specifically provide age of files so we can test for scenarios where the default deletes them, the no-delete scenario and when we set the lifetime to delete younger files as well. I *may* have my basic maths backwards here, but I'm in a hurry and wanted to post anyway. Please help review/correct :)

Status: Needs review » Needs work

The last submitted patch, 1399846-temp-file-delete-window-116.patch, failed testing.

slashrsm’s picture

Status: Needs work » Needs review
FileSize
1.46 KB
9.86 KB

This should fix tests...

tsvenson’s picture

I just did a quick test of the patch in #119 and there seems to be something goofy going on with the values in the dropdown:
1399846_d8_temporary_files_age.png

slashrsm’s picture

FileSize
787 bytes
9.86 KB

This should fix it.

tsvenson’s picture

@slashrsm: Yes it did.

1399846_d8_temporary_files_age_fixed.png

Gábor Hojtsy’s picture

What else do reviewers spot? I suspect we should do something about the constant at the bottom of the patch. Should we keep it and remove the TODO, rename it or remove it?

slashrsm’s picture

I think we can delete it. It is used only in the code that we change with this patch and in some tests. It is also used in update_delete_file_if_stale(), but we should probably localize this just to that function as it doesen't deal with temporary files at all.

Gábor Hojtsy’s picture

Status: Needs review » Needs work

Right, agreed :)

slashrsm’s picture

Assigned: Dave Reid » slashrsm

I can do that.

Should we replace appearances in tests/update_delete_file_if_stale() with hardcoded values?

Gábor Hojtsy’s picture

We can also rename the constant so it applies to update staleness and keep using it. If it appears still multiple times in tests, then it is better kept as a constant.

slashrsm’s picture

OK. I'll see how many usages can be replaced with configuration value and then decide based on number appearances where this is not possible.

slashrsm’s picture

Status: Needs work » Needs review
FileSize
5.85 KB
13.06 KB

Here we go....

slashrsm’s picture

Assigned: slashrsm » Unassigned
jcisio’s picture

Status: Needs review » Needs work

I still see DRUPAL_MAXIMUM_TEMP_FILE_AGE in the comment.

slashrsm’s picture

Status: Needs work » Needs review
FileSize
5.17 KB
16.31 KB

My bad. Grep now looks to be clean.

tstoeckler’s picture

Status: Needs review » Needs work
+++ b/core/modules/file/lib/Drupal/file/Tests/FileFieldRevisionTest.php
@@ -133,7 +133,7 @@ function testRevisions() {
-    // of the file is older than DRUPAL_MAXIMUM_TEMP_FILE_AGE.
+    // of the file is older than system.file.temporary_maximum_age.

I think it's not clear from the context here that system.file.temporary_maximum_age is a config object name. That applies to most of the changes in the interdiff.

slashrsm’s picture

Any suggestion on how to form that? "of the file is older than config value in system.file.temporary_maximum_age."?

tstoeckler’s picture

Yeah, something like that. Along those lines I would suggest "...than the system.file.temporary_maximum_age configuration value."

Gábor Hojtsy’s picture

Agreed with @tstoeckler.

slashrsm’s picture

Status: Needs work » Needs review
FileSize
4.07 KB
16.62 KB

Here we go.

Status: Needs review » Needs work

The last submitted patch, 1399846_137.patch, failed testing.

tstoeckler’s picture

+++ b/core/includes/file.inc
@@ -95,8 +95,8 @@
+ *   temporary (0) or permanent (1). Temporary files older than the
+ *   system.file.temporary_maximum_age value will be removed during cron runs.

@@ -128,9 +128,9 @@
+ * Temporary files older than the system.file.temporary_maximum_age value will
+ * be removed during cron runs, but permanent files will not be removed during
+ * the file garbage collection process.

+++ b/core/modules/file/file.module
@@ -702,9 +702,15 @@ function file_file_download($uri, $field_type = 'file') {
+  $age = config('system.file')->get('temporary_maximum_age');
+  if ($age) {

This is not true anymore, at least not unconditionally. We should note the ability to disable the garbage collection.

Also the actual if ($age) check could use a code comment.

+++ b/core/modules/file/lib/Drupal/file/Tests/DeleteTest.php
@@ -61,10 +61,10 @@ function testInUse() {
-    // of the file is older than DRUPAL_MAXIMUM_TEMP_FILE_AGE.
+    // of the file is older than the system.file.temporary_maximum_age value.

Sorry to be a picky about this, but the word "config" is still missing in this sentence. Can we do "value" -> "config value" or better "configuration value". I think that is important.

+++ b/core/modules/system/lib/Drupal/system/Form/FileSystemForm.php
@@ -68,6 +68,16 @@ public function buildForm(array $form, array &$form_state) {
+    $period = drupal_map_assoc(array(0, 3600, 10800, 21600, 32400, 43200, 86400, 604800), 'format_interval');
+    $period[0] = t('Do not delete temporary files');

This is where the conceptual problem with this patch surfaces, IMO. "Do not delete temporary files" is a paradox. Being temporary (in the context of a file) means having a certain (usually short) life expectancy and being deleted afterwards. Opting to not delete them makes no sense per definition. I get the use-case, but this solution is quite simply wrong. I think we should find a different solution to this problem, i.e. allowing files to be set as permanent without having registered usage. Or we should make our file usage API more proficient, so that files rendered in views, actually are registered as used, etc.

If people disagree with this, and the patch continues with this approach, we should at the very least have the usability team take a look at this and verify that it's OK.

Gábor Hojtsy’s picture

@tstoeckler: I agree the "Do not delete temporary files" option is confusing. What makes them temporary if they are never deleted? I think we can easily overcome this by the right labeling. Eg. Instead of "Temporary file maximum age" we can say "Delete unused files" and instead of "Do not delete temporary files", we can say "Never", while all times can say "after X hours", etc.

plach’s picture

+100 for #140

I found the screenshot in #122 very confusing for exactly the same reasons Gabor is mentioning (and I already had a cursory look the patches earlier).

tsvenson’s picture

@Gábor Hojtsy: Completely agree that the term "temporary" is extremely confusing here. I think most people think of temporary stuff as leaning to garbage, or things that is needed to use a source to produce the resulting file/output. Since we here are in fact going to delete the only copy of the files, the wording is highly important to get right.

I am personally not happy with "unused" either as there are to many unknowns still about what makes a file unused or not.

Maybe something like "Automatic file cleanup" would be more suitable. Then it is implied that Drupal will in fact automatically clean out files and that will make most people look twice at what it does. Then together with a emphasised warning and verbose description it would help to explain things.

By using "Automatic file cleanup" we can also change the "Do not delete temporary files" to simply "Disabled" instead.

Lastly, just wondering if there aren't too many options for the first 24h (6 in total) and then just one bigger - a week. The last one is quite a big step. Would like to suggest the 9h is removed and then options for 2 and 4 weeks are added.

Gábor Hojtsy’s picture

@tsvenson: I'm fine with those too. Sounds good. The description of the form element is supposed to explain the unused file concept already. We can clarify that there.

tsvenson’s picture

Al right, how about something like this:

Label: Automatic removal of orphaned files after

Options:

  • Disabled
  • 1 hour
  • 3 hours
  • 6 hours
  • 12 hours
  • 1 day
  • 1 week
  • 2 weeks
  • 4 weeks

Personally I would have preferred it to be fully configurable for the users instead of only static options.

Description:
-----
Orphaned files are files that is no longer visible for site visitors. However they are still available for content editors to be re-used in existing or new content.

Warning: If enabled, Drupal will permanently delete orphaned files from the file storage!
-----

OK, that's a start. Feel free to improve it.

tstoeckler’s picture

Well, I'm not going to fight this (at least not very hard :-)) but I think the same confusion arises in the API level. We reference "temporary files" there as well, except they're sometimes not removed. That's why I would advocate a different solution to this problem.

tsvenson’s picture

Come to think of it, they are indeed accessible for anyone with the URL. Removing that part from the description...

tsvenson’s picture

@tstoeckler: If "orphaned" is agreed to be used for this, couldn't it be used in the function names, instead of temporary, too then?

From my point of view it makes more sense. After all, a file might not become orphaned in years...

Gábor Hojtsy’s picture

@tstoeckler: at this point in the release cycle, unless we can argue this is a critical problem in core, we need to choose between keeping the API intact *as much as possible* or moving this issue to Drupal 9.

klonos’s picture

...we could keep the API intact (if it actually is an issue at this point), then file follow-up for cleaning up the function names in D9.

tsvenson’s picture

To be blunt, from a sitebuilder and admin role point of view, I don't really care for the API names as long as the UX is top notch in the UI.

slashrsm’s picture

Status: Needs work » Needs review

Attached patch includes comments from #139 and subsequent comments.

I changed configuration form label to the one proposed in #144, but changed first value to "Never". It seemed a bit more appropriate for the current label.

I would not agree with #150. Naming uniformity is very important even in API, as it prevents confusion. It may not directly influence site user and/or admin, but it definitely does indirectly.

slashrsm’s picture

FileSize
6.9 KB
17.5 KB
Gábor Hojtsy’s picture

Status: Needs review » Needs work
+++ b/core/modules/file/lib/Drupal/file/Tests/DeleteTest.phpundefined
@@ -7,6 +7,8 @@
+use Drupal;
+

+++ b/core/modules/file/lib/Drupal/file/Tests/FileFieldRevisionTest.phpundefined
@@ -7,6 +7,7 @@
 
+use Drupal;
 use Drupal\Core\Language\Language;

+++ b/core/modules/file/lib/Drupal/file/Tests/UsageTest.phpundefined
@@ -7,6 +7,8 @@
 
+use Drupal;

Adding use Drupal all over is unrelated to what this patch needs. This makes it hard to review. Let's not add unrelated changes to this patch.

+++ b/core/modules/system/lib/Drupal/system/Form/FileSystemForm.phpundefined
@@ -69,13 +69,13 @@ public function buildForm(array $form, array &$form_state) {
-      '#description' => t('The maximum age a temporary, unused file must be before it will be deleted. Files that are no longer used or referred to anywhere are marked as temporary.'),
+      '#description' => t("Orphaned files are files that is no longer visible for site visitors. However they are still available for content editors to be re-used in existing or new content. <strong>Warning:</strong> If enabled, Drupal will permanently delete orphaned files from the file storage!"),

It is not really true that the file is not visible anymore.

Also "files that is..." is not good English :)

I'd say "files that are not referenced from other content anymore", which is how the counter works.

slashrsm’s picture

Adding use Drupal all over is unrelated to what this patch needs. This makes it hard to review. Let's not add unrelated changes to this patch.

I changed config() calls to Drupal::config() and tests failed if namespace was not included. Should we leave it to config()?

Gábor Hojtsy’s picture

Yeah changes like that are unrelated to the goal of the patch IMHO. Let's focus on changes only needed for this patch.

Berdir’s picture

Global classes like Drupal and PDO shouldn't be "use"'d but called like this: "\Drupal::config()".

slashrsm’s picture

Status: Needs work » Needs review
FileSize
4.03 KB
17.26 KB

I actually did not change config() for Drupal::config(). I replaced DRUPAL_MAXIMUM_TEMP_FILE_AGE with those calls. Using approach that @Berdir mentioned in #156. Thanks for that suggestion.

Also updated conf form description.

Gábor Hojtsy’s picture

Status: Needs review » Needs work

Looks almost all good, except some leftover use Drupal in the patch :)

slashrsm’s picture

Status: Needs work » Needs review
FileSize
1.79 KB
17.16 KB

Grrr... :)

Gábor Hojtsy’s picture

Looks all good to me. This makes minor API changes, but only what is required. I'm not sure I'm eligible to RTBC since I've been involved to some degree :) Anybody else? :)

tsvenson’s picture

@Gábor Hojtsy: Lets hold off from RTBC just yet, I still believe we need to work a bit on the UI UX and also discuss the available options and the default setting a bit.

The changed label and description is a good improvement, but the UX problem for me is that this setting still isn't sticking out as much as I would like. I had a look around in other configurations and found the admin/config/media/image-toolkit settings.

There the JPEG quality is boxed in and that feels like a good solution here too. Especially as it then will allow us to place the Warning: text on its own line and still tie it to this setting.

Suggested frame label: Manage orphaned files

Then simply put the config group inside it and insert line break before the Warning: so it is on its own line with whitespace to the main description. The frame will make sure the user understand this all belongs together.

Options

Besides the change of the disable label to Never the timer options are still the same and with a week as maximum.

To the least it should be improved to the time options listed in #144.

However, locking this to just static options in not optimal. It really should be possible to set a custom value here.

With the date module in core, I believe it could provide help here, or?

Default setting

Lastly, default setting is 6 hours. I would much rather have Never as default and then users must make an informed change of the behaviour of this. I.e. the user has to make the decision that Drupal will automatically remove orphaned files.

This is actually very important. Settings like this should never default to a destructive mode that is irrevocable.

tsvenson’s picture

Oh, one thing more.

Like for the "Private file system path" setting on the same page, we should also add a link to a d.o page that provides more detailed information to how Drupal is managing orphaned files.

Gábor Hojtsy’s picture

@tsvenson: I'm afraid that opening yet more avenues of related issues here would drive this too long and eventually postponed to Drupal 9. It would be great to be reassured that the basic change is at least in Drupal 8 AFAIS. Eg. being able to set custom values for the time is not something we do *ever* anywhere else in Drupal core, and date module does not really do any favors for us for that. It would be great to get at least the basic change into Drupal 8.

slashrsm’s picture

100% agree with @Gábor.

tsvenson’s picture

@Gábor Hojtsy: Setting custom values is actually coming to D8, see: #680546: Add a custom 'number of values' for 'multiple / multi-value' field. (e.g. nodereference, imagefield)

But, OK I can live without it, but then we need to have a better list of static options. Particularly options for a longer period than just 1 week.

How about (modifying my previous suggestion):

  • Never
  • 1 hour
  • 3 hours
  • 6 hours
  • 12 hours
  • 1 day
  • 1 week
  • 4 weeks
  • 3 months

This change would cover the majority of needs for users.

I still would like to see the other UI changes, plus the default setting as described in #161 though.

Berdir’s picture

Anything below 6 hours isn't save and should IMHO not be in that list. It is 6 hours because that matches with the form state cache. If you set it to 1h, and upload a file and then submit that form after 2 hours, your temporary file will be gone.

tsvenson’s picture

@Berdir: Great point. I'm fine with ditching both 1h and 3h.

slashrsm’s picture

FileSize
726 bytes
17.16 KB

Attached patch removed 1h and 3h from drop-down and adds 4 weeks and 3 months.

I'd suggest that we stick to the basic focus of this issue, which is making clean-up optional. Other stuff (UI, default value) can be discussed in the follow-ups, which would allow this patch to go in. To many problems discussed in one issue could make this thread go on and on, which could even result in this patch not being committed at the end. I believe that nobody wants that.

tsvenson’s picture

Status: Needs review » Reviewed & tested by the community

@slashrsm: Agreed, I will create a follow up issue covering the remaining UX/UI bits later.

Did a quick test of the patch in #168 and it seems to work as it should:

  • admin/config/media/file-system is OK for now (will file follow up as mentioned above)
  • Created an article with an image and it showed up as Permanent in admin/content/files
  • Then deleted the article and the image changed to Temporary in the list

So from a usability point of view it seems this is working. I couldn't actually test if files are being deleted by Drupal as I was using simplytest.me and it only allow a sandbox for 30 minutes.

Setting to RTBC

David_Rothstein’s picture

Status: Reviewed & tested by the community » Needs review
+      '#description' => t("Orphaned files are files that are not referenced from other content anymore. However they are still available for content editors to be re-used in existing or new content. 

This seems misleading in the case of private files. No one besides the original uploader will be able to actually see the file once it's orphaned; everyone else will get access denied.

(This is related to the D8 security issue I mentioned above, which I still haven't bothered filing but will eventually. Actually, the security issue only comes into play when combined with potential D8 contrib modules; with core alone the current behavior in D8 is already as described above: everyone except one person will get access denied.)

So should we reference the private files situation in the above description? I think this patch is a clear improvement in the case of public files and that's probably good enough, but it still leaves things a mess for private files and people will be confused.

Also:

+      '#title' => t('Automatic removal of orphaned files after'),

This is the wrong tense (compare to "Run cron every" on the cron settings page). I'd suggest "Automatically remove orphaned files after".

Warning: If enabled, Drupal will permanently delete orphaned files from the file storage!

The exclamation point is overkill, and I think the "Warning" is too. These descriptions should just be informative. I would suggest removing this line entirely; if we haven't conveyed this crucial point in the title of the form element then we need to improve the title instead! (For example, something like "Permanently delete orphaned files after" rather than "Automatically remove orphaned files after"?)

andypost’s picture

+++ b/core/modules/file/file.moduleundefined
@@ -702,9 +702,18 @@ function file_file_download($uri, $field_type = 'file') {
+  $age = config('system.file')->get('temporary_maximum_age');

Drupal::config()

+++ b/core/modules/file/lib/Drupal/file/Tests/DeleteTest.phpundefined
@@ -61,10 +62,11 @@ function testInUse() {
+        'timestamp' => REQUEST_TIME - (\Drupal::config('system.file')->get('temporary_maximum_age') + 1),

+++ b/core/modules/file/lib/Drupal/file/Tests/FileFieldRevisionTest.phpundefined
@@ -118,10 +118,11 @@ function testRevisions() {
+        'timestamp' => REQUEST_TIME - (\Drupal::config('system.file')->get('temporary_maximum_age') + 1),

@@ -133,10 +134,11 @@ function testRevisions() {
+        'timestamp' => REQUEST_TIME - (\Drupal::config('system.file')->get('temporary_maximum_age') + 1),

+++ b/core/modules/file/lib/Drupal/file/Tests/UsageTest.phpundefined
@@ -121,24 +121,24 @@ function testRemoveUsage() {
+        'timestamp' => $_SERVER['REQUEST_TIME'] - \Drupal::config('system.file')->get('temporary_maximum_age') - 1,

@@ -146,18 +146,27 @@ function testTempFileCleanup() {
+      ->fields(array('timestamp' => $_SERVER['REQUEST_TIME'] - \Drupal::config('system.file')->get('temporary_maximum_age') - 1))

@@ -165,4 +174,39 @@ function testTempFileCleanup() {
+    \Drupal::config('system.file')->set('temporary_maximum_age', 21600 + 2);

+++ b/core/modules/user/lib/Drupal/user/Tests/UserPictureTest.phpundefined
@@ -69,10 +69,11 @@ function testCreateDeletePicture() {
+        'timestamp' => REQUEST_TIME - (\Drupal::config('system.file')->get('temporary_maximum_age') + 1),

$this->container->get()

slashrsm’s picture

Includes comments form #171.

@David_Rothstein: maybe we could create a help page and cover file usage more in depth there?

slashrsm’s picture

FileSize
5.39 KB
17.41 KB

Ahhh....

tsvenson’s picture

@David_Rothstein: I don't see the Warning line as overkill. This is one of very few, maybe even the only one, features in Drupal that will automatically delete content when enabled.

There is no middle ground here, either orphaned files are kept (Never setting) or they are deleted automatically. There is no option for "Notify admin about files ready to be deleted".

tsvenson’s picture

@slashrsm: Agree we need a help page for this. I'm working on a mockup for a follow up issue to target the UI design of this. Hope to get time to finalize and post it later today.

tsvenson’s picture

I have create #2052831: Improved UI layout and UX for the file clean-up configuration as follow up for the UI/UX needs as recommended in #168.

It also includes a better label and options list as I am not very fond (even though it was originally my suggestion) of the label ending with "after"...

Gábor Hojtsy’s picture

Status: Needs review » Needs work

@David: I'm not sure we can reference the private file situation in a sane way. Do you have a suggestion? "However they are still available for content editors to be re-used in existing or new content." applies to the situation you explain as well, so long as the content creator is the same as the user who uploaded the file. We do not state that all files are accessible to all content creators. We state that files are available to content creators until deleted. Also I don't think core has a widget to reuse these files, so with core alone, it is somewhat of an exaggeration that they can be reused. They can be reused so long as you know the URL and inject with direct HTML markup. File widgets will not let you pick a prior uploaded file for example in core. So there are many ways this description is not accurate, but it does not need to cover all bases, neither it can do that I think.

I agree with the "Permanently delete orphaned files after" wording suggestion and don't have a strong opinion either way about the warning. We also put warnings on dangerous permissions in similar ways. I don't think the removal or inclusion of that should hold this patch up.

Marking as needs work only for retitling to "Permanently delete orphaned files after".

tsvenson’s picture

@Gábor Hojtsy @David_Rothstein:

Totally missed the private file stuff, sorry guys.

Lets see if I understand it correct (after reading #177):

  • Private files are also marked as orphaned (currently "temporary" in Drupal lingo)
  • Orphaned private files does not show up in the core file list, except for the file uploader/owner (author)
  • Orphaned private files will be automatically deleted with this option too

Also:

They can be reused so long as you know the URL and inject with direct HTML markup.

But that wont add to the file usage count?

Gábor Hojtsy’s picture

@tsvenson: the whole point of the issue is that uses of images in HTML or inclusion in views lists will not count them (and some uses are impossible to count). Therefore the need to make it possible to not delete things that you use in ways that are not counted (or want to reuse later).

tsvenson’s picture

Issue tags: -Needs tests

@Gábor Hojtsy:

I'm quite aware of the somewhat blind sword of this feature, which is why I have been pushing for this to be more predictable, and thus more reliable, for site owners.

Grateful if you can confirm/clear up my bullets re the private files as I would like to better understand how they are affected by this functionality.

tsvenson’s picture

Adding back removed tag...

tsvenson’s picture

Someone please explain why d.o is doing this to tags? Back you go!

jcisio’s picture

Issue tags: -Needs tests

#179 this issue is just to make the cleanup configurable, it does not fix anything else.

Drupal core does not have a mechanism for file upload (at least when #1932652: Add image uploading to WYSIWYGs through editor.module has not landed). Besides, use a file field (in a node) to embed into another node is broken. It is as broken as embedding a derivative image then the site builder delete/rename the image style. A dedicate file upload, like #1932652: Add image uploading to WYSIWYGs through editor.module or IMCE, always makes file status as permanent and it is not affected by this issue.

#180 private files are files. So private file cleanup is file cleanup, managed private files are managed files, by design.

Gábor Hojtsy’s picture

Issue tags: +Needs tests

@tsvenson: yes, private-ness and temporary file state are orthogonal, they can both happen at the same time to a file :) So there can be files that are private but will be deleted (eg. if your file field is set to private, any file you uploaded to include with the entity if you don't save the entity after all). Those will both be private and temporary.

Gábor Hojtsy’s picture

Issue tags: -Needs tests

Cross-post.

tsvenson’s picture

Issue tags: +Needs tests

@Gábor Hojtsy:

Hm, this new info opens up a whole new can of worms form me really. Its quite a difference between public and private files. Main one, as I understand it, being that orphaned public files are still accessible via a normal URL, while private are not.

Ideally we should have two settings for the cleanup, one for public and one for private. Would this be a difficult thing to add at this stage? As Drupal is using the "public://" vs "private://" my hunch is that this might be rather easy?

I'm on my way out now, but tonight I am going to make new mockups and post in #2052831: Improved UI layout and UX for the file clean-up configuration. I have an idea on how the UX can be radically improved there.

Aslo adding back the "Needs test" tag as it was accidentally removed in #180.

Gábor Hojtsy’s picture

@tsvenson: I don't see the distinction between private and public here. Private files are only ever accessed by a non-"normal URL" to borrow your terminology. That URL and access rules do not change whether it is temporary or not AFAIK. Am I missing something here? Orphaned public files and orphaned private files are accessible on the same URL they were accessible on prior to being orphaned under the same file access restrictions. None of these would change by becoming orphans AFAIK. I don't see the reason to hold this up over this as I don't see the problem there.

slashrsm’s picture

FileSize
1 KB
17.4 KB

Title corrected. I'm personally a bit more on the side of warning in description, since it makes this behaviour a bit more visible (I am sure that 80%+ of people have no idea about this happening in core). I, however, agree with @Gabor that this is not something that should hold this patch.

@tsvenson: I don't think separate configuration for private:// and public:// is good idea. Do not forget that we allow modules to introduce even more stream wrappers, which could result in a complete configuration mess.

slashrsm’s picture

Status: Needs work » Needs review
tsvenson’s picture

@Gábor Hojtsy:

Private files are only ever accessed by a non-"normal URL" to borrow your terminology. That URL and access rules do not change whether it is temporary or not AFAIK.

Exactly and that is also completely different to public files. Public files are still accessible when knowing the URL when orphaned.

One of the biggest arguments, which I don't fully agree with btw, for this cleanup core function has from the start been that it is about removing files no longer used in content from being accessible if you have the URL. That is something that doesn't apply to private files.

I would argue that the use cases for public and private files in most cases are vastly different, thus most likely also have different requirements for automatic cleanup.

@slashrsm:

I'm quite sure we can avoid creating a mess here. I hope I will be able to come up with new mockups for the file system config page either today or latest tomorrow.

To be honest I much rather have options to tailor the cleanup individually on a per schema basis than only having an all or nothing setting. With all or nothing it will force site owners to make a compromise when they shouldn't have to.

Also, if this is implemented to it can be used on a per schema basis, then it should be pretty simple for them to reuse the core feature. Or?

Gábor Hojtsy’s picture

Exactly and that is also completely different to public files. Public files are still accessible when knowing the URL when orphaned.

Same is true for private files. Replace public to private in the second sentence and it will still be true. No difference I believe. Please indicate how it would be different.

rooby’s picture

One of the biggest arguments, which I don't fully agree with btw, for this cleanup core function has from the start been that it is about removing files no longer used in content from being accessible if you have the URL. That is something that doesn't apply to private files.

This does still apply to private files.

Users with the required permissions can still get the private files that should have been cleaned up.
It is the same argument where you might have legal documents that are out of date and should have been deleted but haven't been and users could be able to access those documents.

Gábor Hojtsy’s picture

@rooby: It is true that even if an entity is deleted, only the usage counter is decremented and no file delete happens directly, relying on the cleanup mechanism currently to do that job. (https://api.drupal.org/api/drupal/core%21modules%21file%21file.field.inc...). So the modified code would keep those files around. I think the expectation of deleting an entity may be different for the related files depending on whether you understand those files as media assets or direct data belonging only to that entity. From this light, it sounds messier than I thought :/

slashrsm’s picture

It is clear that we have pretty complicated stuff here. It looks like we have so many use cases and and different situations that we'll have problems handling all of them. It seems like there is no "correct" configuration here. It really depends on every specific use case. That makes me think that we should not try to handle everything in core. We need to provide some simple and basic configuration and allow contrib to work on top of that if needed.

I think it would make sense to expose a hook/event that would allow modules to decide about orphaned files deletion on per-file basis. That would allow contrib to provide much more complex logic, which is not needed on most of the simple sites. This kind of event would also allow Rules (or anything similar) integration, which would allow site admins to configure a bit more complex logic around orphaned files without any coding.

Gábor Hojtsy’s picture

At least at the moment contrib can override the file usage counter service (https://api.drupal.org/api/drupal/core%21modules%21file%21file.module/fu...) and fake a count for files that don't have counts anymore. Eg https://api.drupal.org/api/drupal/core%21modules%21file%21lib%21Drupal%2... gets the file object and if overriden, can decide on a file by file basis essentially if it marks the file temporary. That seems somewhat heavy-handed to need to override the file usage service, but that is at least one method for D8 contrib at the moment.

tsvenson’s picture

@slashrsm

It is clear that we have pretty complicated stuff here. It looks like we have so many use cases and and different situations that we'll have problems handling all of them.

Yes, that is a good analysis of the situation. This is a good (bad?) example of what happens when proper risk assessment wasn't made before code was written. Now we are stuck in a situation where we are forced to implement a solution that gives back control to the users, but that has to work on top of that.

More often than not the end result becomes rather messy. In this case though I believe we can actually sort that out, but it requires expanding the scope a little. I'm still working on those mockups, but I believe it will help simplify and clear up the UX quite a bit.

And yes, hooks are needed so that contrib can easily replace this with custom logic. Preferable with as detailed control as possible.

From what I understand, the only thing that really differs between different schema storages is basically the schema name itself. Then there are functions to do various things to the files, such as upload, attach to content, delete from content, increase/decrease usage and ultimately delete the file if needed? If so, wouldn't it then be possible to write this cleanup config option so it can be easily reusable for each schema?

Maybe even with an option to select if the cleanup should be one global setting or on a per schema basis?

I do realize that means more work to fix this, but to really fix this and give back control of files to users I believe we must give users that detailed control.

And just for the sake of it, here is another use case:
If this would be easy to customize through hooks, then the File Entity module could allow for customized file cleanup setting on a per file type bases? Think of being able to create a "Photo Album" file type ala Flickr. Then users can just dump photos to it, fill in fields (tags included) and make them available for display and/or embedding into other content. And that without having to worry about them being deleted, while at the same time other file type files will be cleaned out because they have different needs.

rooby’s picture

If this would be easy to customize through hooks, then the File Entity module could allow for customized file cleanup setting on a per file type bases? Think of being able to create a "Photo Album" file type ala Flickr. Then users can just dump photos to it, fill in fields (tags included) and make them available for display and/or embedding into other content. And that without having to worry about them being deleted, while at the same time other file type files will be cleaned out because they have different needs.

This sounds good.
There is definitely the need for different behaviour depending on what is being uploaded, even within the same site.
This kind of solution should allow the flexibility that is required to suit everyone's needs.

jcisio’s picture

What on earth a Flickr photo album is a file? Are nodes files (node://nid for example)? Is the situation is not complex enough?

Was the purpose not to make "21600 seconds" configurable?

With #195, so now file.usage service is pluggable, this issue is even could be "works as design". But it is not very constructive...

David_Rothstein’s picture

Exactly and that is also completely different to public files. Public files are still accessible when knowing the URL when orphaned.

Same is true for private files. Replace public to private in the second sentence and it will still be true. No difference I believe. Please indicate how it would be different.

It's very different. If you have a private image field attached to articles, then create an article, upload an image, and delete the article, the image will not be accessible anymore (the URL will give access denied) except by the person who originally uploaded it.

Again, that doesn't directly impact this issue, and there are even similar ways to trigger that without the file actually becoming orphaned... It's just that the UI being added here currently kind of implies that all files can be reused after they are orphaned, and that's misleading.

Private files really are the special ones, so dealing with this problem shouldn't necessarily require handling every single stream wrapper differently or anything like that (though I'm interested in seeing those mockups anyway). It could just be something for private files specifically.

But I mostly agree with #177 and for starters I'd be happy if "However they are still available for content editors to be re-used in existing or new content" became something more like "However, they remain on the site and may appear in administrative listings or other contexts", and then I don't think we're misleading anyone so it may be possible to avoid mentioning private files at all.

rooby’s picture

It's very different. If you have a private image field attached to articles, then create an article, upload an image, and delete the article, the image will not be accessible anymore (the URL will give access denied) except by the person who originally uploaded it.

Only for files handled by file fields.
Other modules implementing hook_file_download() can potentially serve such files to people.

I agree that this is a side track though.
I don't see why private and public files should be treated differently in this issue.

tsvenson’s picture

Just want to leave a quick comment and say sorry for the delay of the mockups. Have had some personal issues to deal with the last week. Starting to ease up now and should have the mockups done by the weekend.

Dave Reid’s picture

+++ b/core/modules/file/lib/Drupal/file/Tests/DeleteTest.phpundefined
@@ -61,10 +62,11 @@ function testInUse() {
     // Call system_cron() to clean up the file. Make sure the timestamp
-    // of the file is older than DRUPAL_MAXIMUM_TEMP_FILE_AGE.

This needs to be updated to file_cron().

+++ b/core/modules/user/lib/Drupal/user/Tests/UserPictureTest.phpundefined
@@ -69,10 +69,11 @@ function testCreateDeletePicture() {
     // Call system_cron() to clean up the file. Make sure the timestamp
-    // of the file is older than DRUPAL_MAXIMUM_TEMP_FILE_AGE.

Same here.

I'm 100% in favor of #188 with the fixes here. I don't think we should continue to hold up such a major, necessary change for mockups or discussions on public vs private which don't actually determine the cause or outcome of this issue. This is getting frustrating.

cweagans’s picture

Status: Needs review » Reviewed & tested by the community
FileSize
1.81 KB
18.08 KB

Rerolling with changes from #202. This isn't really my patch, so I feel comfortable with RTBCing it (pending bot reply).

David_Rothstein’s picture

Status: Reviewed & tested by the community » Needs review
FileSize
1.1 KB
18.08 KB

I still think the above patch has a highly misleading UI, but would be happy enough with the small text change from #199.

Here is a reroll with that change.

David_Rothstein’s picture

<strong>Warning:</strong> If enabled, Drupal will permanently delete orphaned files from the file storage!

This warning-with-an-exclamation point also still looks well outside Drupal's normal UI patterns to me... but that's someone else's fight to wage, not mine :)

tstoeckler’s picture

Especially because that's how Drupal has always worked and noone had a problem with it until Media Library came along. We could just as well add a "Warning: If disabled, Drupal will leave orphaned files on your file system, without you ever being able to delete them!"

tkoleary’s picture

@tstoeckler

without you ever being able to delete them!

This is not the case is it? My understanding was that the file listing gave you the ability to delete.

tstoeckler’s picture

Well a contrib media library might add that capability but just with core, no, there is no way to delete the files. (That is one reason why I and others have argued against this change above.)

Also see the current discussion at #2060405: Impossible to uninstall any module using the file usage service, because it is impossible to delete all file usages by a module over this problem-space.

tkoleary’s picture

hmm. Well that's not good

tkoleary’s picture

Issue summary: View changes

Create proper issue summary.

webchick’s picture

Issue summary: View changes
Issue tags: +beta blocker

According to Dave Reid, the assumptions about this functionality which are baked into core is something that's actively biting every site that uses Media module and trying to re-use files in multiple instances. Because this is a major issue and it involves a schema change, tentatively tagging as a beta blocker.

dawehner’s picture

204: 1399846_204.patch queued for re-testing.

Status: Needs review » Needs work

The last submitted patch, 204: 1399846_204.patch, failed testing.

SteffenR’s picture

FileSize
17.84 KB

I just recreated the patch against the latest DEV of Drupal - all tests were fine on my local testing machine - i hope i can help pushing the issue forward.

SteffenR

slashrsm’s picture

Status: Needs work » Needs review

Let's send this to testbot.

Status: Needs review » Needs work

The last submitted patch, 213: 1399846_213.patch, failed testing.

slashrsm’s picture

I was thinking about this again.... I still agree that we need to fix this, but I'm not sure if it is really a beta blocker and/or a major issue.

It was practicly impossible to override this behaviour in D7. But D8 is a completely different beast, which is much more flexible. file.usage is a service that can be altered. That means that we can completely change it's logic in contrib w/o any ugly hacks. Modules like media will be able to provide own implementation of file.usage service, which would keep unused files in place.

Berdir’s picture

Agreed about this not being a beta blocker, as there are possibilities to work around it but those workarounds are kind of hacky (I'd actually override storage controller and then prevent it from returning anything in that method) but this still seems like a nice clean-up to me and removes an unecessary method on the file storage controller.

Also I don't think the beta blocker tag has been vetted by a core committer, which afaik needs to happen. Misread, it has been, that was the timestamp issue afaik.

Can't reproduce the test fail and patch still applies, so going to request a re-test of this.

Berdir’s picture

Status: Needs work » Needs review

213: 1399846_213.patch queued for re-testing.

webchick’s picture

Issue tags: -beta blocker

Works for me!

martin107’s picture

213: 1399846_213.patch queued for re-testing.

Status: Needs review » Needs work

The last submitted patch, 213: 1399846_213.patch, failed testing.

martin107’s picture

Issue tags: +Needs reroll
Berdir’s picture

Status: Needs work » Needs review
Issue tags: -Needs tests, -Needs reroll
FileSize
17.15 KB

Re-roll. What do we need to do to get this RTBC?

Status: Needs review » Needs work

The last submitted patch, 223: 1399846_223.patch, failed testing.

Berdir’s picture

Status: Needs work » Needs review
FileSize
17.86 KB
2.91 KB

Interesting, using a non-existing field in an entity query doesn't fail ;)

Status: Needs review » Needs work

The last submitted patch, 225: 1399846_225.patch, failed testing.

Berdir’s picture

Status: Needs work » Needs review
FileSize
17.94 KB
3.19 KB

Hm, somehow this tests in UsageTest were wrong? Pretty sure it should be like this?

slashrsm’s picture

Status: Needs review » Needs work
  1. +++ b/core/includes/file.inc
    @@ -108,8 +108,10 @@
    + *   system.file.temporary_maximum_age configuration value will be removed
    
    @@ -141,9 +143,10 @@
    + * Temporary files older than the system.file.temporary_maximum_age
    + * configuration value will be, if clean-up not disabled, removed during cron
    + * runs, but permanent files will not be removed during the file garbage
    + * collection process.
      */
    
    +++ b/core/modules/user/lib/Drupal/user/Tests/UserPictureTest.php
    @@ -65,11 +65,12 @@ function testCreateDeletePicture() {
    +    // Call file_cron() to clean up the file. Make sure the timestamp
    

    This is quite hard to understand. Could we simply write something like:

    "Permanent files will not be deleted during the file garbage collection process."

    And point people to the explanation above?

  2. +++ b/core/modules/update/update.module
    @@ -756,7 +756,7 @@ function update_clear_update_disk_cache() {
    -    if (REQUEST_TIME - $filectime > DRUPAL_MAXIMUM_TEMP_FILE_AGE || (preg_match('/.*-dev\.(tar\.gz|zip)/i', $path) && REQUEST_TIME - $filectime > 300)) {
    +    if (REQUEST_TIME - $filectime > 21600 || (preg_match('/.*-dev\.(tar\.gz|zip)/i', $path) && REQUEST_TIME - $filectime > 300)) {
           file_unmanaged_delete_recursive($path);
    

    Would it make sense to keep using a const here?

gaele’s picture

Just rewrite:

Temporary files will be removed during cron runs if they are older than the configuration value "system.file.temporary_maximum_age", and if clean-up is enabled. Permanent files will not be removed.

martin107’s picture

Issue tags: +Needs reroll

Patch no longer applies .. needs reroll

this issue from file.module is

<<<<<<< HEAD
$result = \Drupal::entityManager()->getStorage('file')->retrieveTemporaryFiles();
foreach ($result as $row) {
if ($file = file_load($row->fid)) {
=======
$age = \Drupal::config('system.file')->get('temporary_maximum_age');

// Only delete temporary files if older than $age. Note that automatic cleanup
// is disabled if $age set to 0.
if ($age) {
$fids = Drupal::entityQuery('file')
->condition('status', FILE_STATUS_PERMANENT, '<>')
->condition('changed', REQUEST_TIME - $age, '<')
->range(0, 100)
->execute();
$files = file_load_multiple($fids);
foreach ($files as $file) {
>>>>>>> PATCH

visabhishek’s picture

Status: Needs work » Needs review
FileSize
15.68 KB

Simply Reroll

slashrsm’s picture

Status: Needs review » Needs work

Stuff from #228 has not been addressed yet.

visabhishek’s picture

Status: Needs work » Needs review
FileSize
922 bytes
15.7 KB

I am updating the patch as per #228

For first issue i am following #229,

diff -u b/core/includes/file.inc b/core/includes/file.inc
--- b/core/includes/file.inc
+++ b/core/includes/file.inc
@@ -109,10 +109,10 @@
  * - filesize: The size of the file in bytes.
  * - status: A bitmapped field indicating the status of the file. The first 8
  *   bits are reserved for Drupal core. The least significant bit indicates
- *   temporary (0) or permanent (1). Temporary files older than the
- *   system.file.temporary_maximum_age configuration value will be removed
- *   during cron runs. Note that clean-up is disabled if configuration value
- *   set to 0.
+ *   temporary (0) or permanent (1). Temporary files will be removed during
+ *   cron runs if they are older than the configuration value
+ *   "system.file.temporary_maximum_age", and if clean-up is enabled. Permanent
+ *   files will not be removed.
  * - timestamp: UNIX timestamp for the date the file was added to the database.
  */

and for Second Issue, no need to change any thing

function update_delete_file_if_stale($path) {
  if (file_exists($path)) {
    $filectime = filectime($path);
    if (REQUEST_TIME - $filectime > DRUPAL_MAXIMUM_TEMP_FILE_AGE || (preg_match('/.*-dev\.(tar\.gz|zip)/i', $path) && REQUEST_TIME - $filectime > 300)) {
      file_unmanaged_delete_recursive($path);
    }
  }
}
saltednut’s picture

Issue summary: View changes
saltednut’s picture

Status: Needs review » Needs work

error: patch failed: core/modules/system/system.module:16
error: core/modules/system/system.module: patch does not apply

martin107’s picture

Status: Needs work » Needs review
Issue tags: -Needs reroll
FileSize
15.69 KB

Rerolled.

pwolanin’s picture

As we discussed at nyccamp, this seems fine to make the window configurable, but really marking files that were once permanent back as temporary seems like a bug.

Berdir’s picture

As discussed before, that has been a logical thing to do for core because files were always second-class citizens, they only existed as attachments to other things, and when they were removed from there, they had to be automatically removed as there was no way to do it manually.

That has changed since the media and similar projects in 7.x contrib and partially also in 8.x core.

Thinking about it, this doesn't really change with media_entity in 8.x, because it's the media_entity that is the manually managed media element, an image on it's own is still a simple thing that you never interact with directly.. so kind of wondering if this is even a problem for 8.x?

Dave Reid’s picture

  1. +++ b/core/modules/file/lib/Drupal/file/Tests/FileFieldRevisionTest.php
    @@ -115,11 +115,12 @@ function testRevisions() {
    +    // Call system_cron() to clean up the file. Make sure the changed timestamp
    

    Should this reference file_cron() now?

  2. +++ b/core/modules/file/lib/Drupal/file/Tests/FileFieldRevisionTest.php
    @@ -130,11 +131,12 @@ function testRevisions() {
    +    // Call system_cron() to clean up the file. Make sure the changed timestamp
    

    Same here.

  3. +++ b/core/modules/file/lib/Drupal/file/Tests/UsageTest.php
    @@ -123,24 +123,24 @@ function testRemoveUsage() {
    +        'changed' => $_SERVER['REQUEST_TIME'] - $this->container->get('config.factory')->get('system.file')->get('temporary_maximum_age') - 1,
    

    Should this be REQUEST_TIME?

  4. +++ b/core/modules/file/lib/Drupal/file/Tests/UsageTest.php
    @@ -148,17 +148,25 @@ function testTempFileCleanup() {
    +      ->fields(array('changed' => $_SERVER['REQUEST_TIME'] - $this->container->get('config.factory')->get('system.file')->get('temporary_maximum_age') - 1))
    

    I think this should be REQUEST_TIME as well.

+++ b/core/modules/system/lib/Drupal/system/Form/FileSystemForm.php
@@ -68,6 +68,17 @@ public function buildForm(array $form, array &$form_state) {
+      '#title' => t('Permanently delete orphaned files after'),

I'm going to end the bikeshed on the title and descriptions right here, right now.

#title = t('Delete orphaned files after')

#description = t('Orphaned files are not referenced from any content but remain in the file system and may appear in administrative listings. Warning: If enabled, orphaned files will be permanently deleted and may not be recoverable.')

@Berdir: We are still planning on porting File entity to D8, so yes we still need this in D8.

saltednut’s picture

FileSize
1.31 KB
15.64 KB

I've manually tested this doing the following:

With default settings (clean install after patch applied)

  1. Created an article attaching a jpg (hippo.jpg)
  2. Edited the database table file_managed and changed the UNIX timestamps to be older than 6 hours for created and changed.
  3. Deleted the article node.
  4. hippo.jpg then shows up as no longer in use (it is an orphan)
  5. Run cron - hippo.jpg is deleted from the file_managed table and is no longer in the local filesystem.

With modified settings (set "Permanently delete orphaned files after" to "Never")

  1. Created an article attaching hippo.jpg
  2. Edited the db table file_managed and changed the UNIX timestamps to be older than 6 hours for created and changed
  3. Deleted the article node.
  4. hippo.jpg shows up as no longer in use (it is an orphan)
  5. Run cron - hippo.jpg persists both in the file_managed table and is still in the local filesystem

One thing I noticed about the current patch was the text in the actual "File system" form - the text was rather confusing and a bit redundant. Talked about this with Dave Reid at NYC Camp and we agreed on some new text. See attached patch and interdiff.

saltednut’s picture

FileSize
2.57 KB
15.61 KB

The other cleanup mentioned in #239 is in this new patch.

saltednut’s picture

Dave Reid’s picture

@pwolanin makes a valid point about temporary status meaning two different things. I've filed #2239977: File temporary status is ambiguous in response which can happen after this issue.

Dave Reid’s picture

Status: Needs review » Reviewed & tested by the community

I think this is finally ready.

webchick’s picture

Version: 8.x-dev » 7.x-dev
Status: Reviewed & tested by the community » Patch (to be ported)

Awesome! This looks like it addresses the immediate problem, which is that there's no way outside of ugly hacks for media library-esque modules in contrib to keep files around for a reasonable amount of time. By making this both configurable and a variable it can be set to whatever timeframe works best for a site.

We discussed this at NYCCamp, and talked about for Drupal 8, a much better approach would be to make a separate delineation between a "temporary" file (meaning something that never truly got saved due to error, abandonment, etc.) versus an "un-used file" or whatever, meaning a file that was de-referenced, but may be used in the future. Looks like that will be addressed in #2239977: File temporary status is ambiguous. In the meantime though, this is a simple change that looks totally back-portable to me anyway, and we can build from it further in D8.

Committed and pushed to 8.x. Thanks! Moving down to 7.x for backport.

  • Commit 80f9e58 on 8.x by webchick:
    Issue #1399846 by slashrsm, Gábor Hojtsy, Berdir, brantwynn, visabhishek...
Dave Reid’s picture

Devin Carlson’s picture

Status: Patch (to be ported) » Needs review
FileSize
11.28 KB

A backport of #241 with a few minor changes:

  • Used file_temporary_maximum_age instead of temporary_maximum_age for the variable name as it isn't as obvious that it relates to files in D7 since you don't see that it is owned by system.file.
  • The DRUPAL_MAXIMUM_TEMP_FILE_AGE constant is left in for backwards compatibility and is used as the default value for file_temporary_maximum_age.
  • Only backported test changes that had a D7 equivalent.
  • Added an update function to accommodate the status column's description change.

Status: Needs review » Needs work

The last submitted patch, 248: cleanup-1399846_248.patch, failed testing.

Status: Needs work » Needs review

Status: Needs review » Needs work

The last submitted patch, 248: cleanup-1399846_248.patch, failed testing.

tsvenson’s picture

Late to the party, but I am 100% in agreement with @Dave Reid in regards to comment #247

The last submitted patch, 241: cleanup-1399846_241.patch, failed testing.

garphy’s picture

Assigned: Unassigned » garphy
garphy’s picture

Status: Needs work » Needs review
FileSize
11.28 KB
621 bytes
    variable_set('file_temporary_maximum_age', DRUPAL_MAXIMUM_TEMP_FILE_AGE + 2);

2 seconds seems too short. I replaced with 60 seconds.

kmoll’s picture

FileSize
846 bytes
12.11 KB

In order for this to work we need to modify what happens when a file is removed from the field. If it is removed an there is no other usage the file is deleted automatically. We need to change that so that if there is no other usage the file status is set to temporary, then it can be cleaned up by cron on the interval, or not removed if it is configured to never delete. I have updated the patch in #256 with this change.

Status: Needs review » Needs work

The last submitted patch, 257: cleanup-1399846-257.patch, failed testing.

kmoll’s picture

FileSize
2.19 KB
14.3 KB

Had to update the tests to reflect the change in #258

kmoll’s picture

Status: Needs work » Needs review

Status: Needs review » Needs work

The last submitted patch, 259: cleanup-1399846-258.patch, failed testing.

Dave Reid’s picture

+++ b/modules/file/tests/file.test
@@ -218,6 +218,14 @@ class FileFieldTestCase extends DrupalWebTestCase {
+    $this->assertTrue($file->status != FILE_STATUS_PERMANENT, $message);

Rather than relying on the $file object here, which may be stale, I think it might be better to query the database: $status = db_query("SELECT status FROM {file_managed} WHERE fid = :fid", array(':fid' => $file->fid))->fetchField(). I think that would resolve the test failures.

kmoll’s picture

FileSize
986 bytes
14.61 KB

I fixed the assert function to pull the status from the database and now the test pass.

kmoll’s picture

Status: Needs work » Needs review
Devin Carlson’s picture

The patch in #263 looks good and I've been using it on a number of D7 sites for about a month now without issue. Not marking as RTBC since I contributed to the backport in #248.

This would be great to get this into 7.34/7.35 as it benefits the 200,000+ sites with Media installed.

iMiksu’s picture

Assigned: garphy » Unassigned
Status: Needs review » Reviewed & tested by the community

I tested and seems to fix at least one of my issues, which is related to migration.

I have migration done with migrate.module and I was having one file field in destination which was not migratable field. Due to unmapped field and the hardcoded Drupal core cleanup, my migration updates was causing data loss during migration (removes the files).

I would vote for RTBC! Thanks!

iMiksu queued 263: cleanup-1399846-263.patch for re-testing.

Status: Reviewed & tested by the community » Needs work

The last submitted patch, 263: cleanup-1399846-263.patch, failed testing.

Devin Carlson’s picture

Status: Needs work » Reviewed & tested by the community

Random test failure.

mikeytown2’s picture

Last failure was on
"Autocomplete returns term aPRkCI9A after typing the first 3 letters."

Also just tested #263 and this sorta fixes the issue I had. File is still there but is now marked for deletion. #1483736-11: field_attach_update deletes file fields (content & file) In entity regardless of if they are included in the entity object. fixes the root cause; will be working on a file_usage sync module to add files removed from that table by accident.

David_Rothstein’s picture

Status: Reviewed & tested by the community » Needs work

I tested this patch and am able to reproduce the security issue I mentioned earlier (see #74 and #83 above). I created a new issue for this in Drupal 8 (after verifying that the problem does exist there also), and also included instructions for how to reproduce the issue in Drupal 7 with this patch applied (#2461845: Private files that are no longer attached to an entity should not suddenly become accessible to people who couldn't see them before). Not really sure the best way to deal with this.

For Drupal 7 backport, what is the main goal here - do we really even care about making the time to delete configurable (e.g. 6 hours vs 12 hours), or do we mainly care about the difference between "delete right away" and "never delete"? If the latter, maybe what we actually would want is something like a simplified version of https://www.drupal.org/project/file_lock in core? That could introduce security issues too if done incorrectly, but at least it would be more of an opt-in thing, and whether or not there's actually a security issue would probably depend on how you're using the system.. so some sites might want to use it as long as the consequences are carefully explained.

David_Rothstein’s picture

pfrenssen’s picture

Status: Needs work » Needs review
Issue tags: -Needs backport to D7
FileSize
17.55 KB
3.11 KB
pfrenssen’s picture

Updating attribution.

Status: Needs review » Needs work

The last submitted patch, 275: 1399846-275.patch, failed testing.

pfrenssen’s picture

Status: Needs work » Needs review
FileSize
18.39 KB
853 bytes

Also backporting the hunk that will take care of this failure :)

marcvangend’s picture

Status: Needs review » Needs work
+++ b/modules/file/tests/file.test
@@ -855,13 +867,11 @@ class FileFieldRevisionTestCase extends FileFieldTestCase {
+    $this->assertFileIsTemporary($node_file_r3, 'Second file is now set to temprary after deleting third revision, since it is no longer being used by any other nodes.');

Minor typo: "temprary".

pfrenssen’s picture

Status: Needs work » Needs review
FileSize
18.39 KB
930 bytes

Thanks @marcvangend!

dgtlmoon’s picture

I'm arriving at this issue from #2058025: Missing files in forward revisions and #1084436: doesn't seem to handle file attachments at all, this patch seemed to arrive just in time and fix my issues, I can't really comment on the approach but as long as Drupal *never* deletes files that are marked as permanent I'm very happy

cathsens’s picture

The patch is not compatible with the release 7.40.
system_update_7080 is declared two times.
I propose a patch updated.

pfrenssen’s picture

Thanks @cathsens!

pfrenssen’s picture

Updating attribution.

cathsens’s picture

FileSize
18.2 KB

I updated the previous patch for 7.41 release : it's better to set the code in hook_update_7080, instead of incremented in hook_update_7081.

cathsens’s picture

FileSize
18.2 KB

@pfrenssen, I didn't seen your comment! Now I have a gap between comment number and file name!
I re-upload to correct it. ;)

pfrenssen’s picture

Status: Needs review » Needs work
+++ b/modules/system/system.install
@@ -3183,6 +3183,15 @@
 function system_update_7080() {
   $spec = array(
+    'A field indicating the status of the file. Two status are defined in core: temporary (0) and permanent (1). Temporary files older than file_temporary_maximum_age will be removed during a cron run.',
+    'type' => 'int',
+    'not null' => TRUE,
+    'default' => 0,
+    'size' => 'tiny',
+  );
+  db_change_field('file_managed', 'status', 'status', $spec);
+
+  $spec = array(

+++ b/modules/system/system.test
@@ -882,6 +889,40 @@ class CronRunTestCase extends DrupalWebTestCase {
   /**

We should not change an existing update hook, since then this code would not be executed on websites that are running Drupal 7.41. In fact the approach from patch #282 was correct.

cathsens’s picture

cathsens’s picture

Status: Needs work » Needs review

Needs review on patch 282

pfrenssen’s picture

FileSize
18.39 KB

Reuploading the patch from #282 to avoid confusion. People often download the latest patch in the queue without reading all comments.

ademarco’s picture

Status: Needs review » Reviewed & tested by the community

Patch #291 looks good, I've manually tested it by setting the "Delete orphaned files after" to "Never" and files are correctly kept in "file_managed" table. If "Delete orphaned files after" is set to a specific amount of time then files are correctly deleted when running cron, after the set time has elapsed.

marcoka’s picture

update doesnt run on drupal 7.38

pfrenssen’s picture

@marcoka, please check that you have no other patches applied to the file 'modules/system/system.install'. The update hook number seems to be correct.

marcoka’s picture

ok i applied the patch the wrong way sorry.

sam.spinoy@gmail.com’s picture

Applied patch, can confirm that it is working.

rickvug’s picture

Proposed resolution

We propose to make file cleanup optional by introducing a configuration variable and UI configuration interface that:
1. Makes people aware of cleanup being a feature of Drupal core
2. Allows them to enable/disable it

Am I the only one that thinks this is the wrong solution? On most any site that I build I actually want a mix of behaviours, not a sitewide boolean on file cleanup. On something like a profile field I want cleanup on the unused photo. Ideally that cleanup happens quickly but hey, disk is cheap. On the other hand, I may want all photos on an article or used in a Slider to be re-useable and stored until explicitly deleted.

For a solution how about an API addition (ie. this can make it into 8.1, 8.2 etc) that marks a file as cleanup safe (ie. re-usable). Files with this attribute would never be deleted on cleanup. They would only be explicitly deleted. Further, Media Browsers would use this filter to ensure that single use files such as profile fields didn't make it into the general media library.

azinck’s picture

@rickvug: I'm basically on the same page with you. Without a media browser solution in core, this ends up being a solution that's pretty specifically tailored to the needs of Media/File Entity rather than generalized for all use-cases.

I would rather see the file usage API fleshed out a bit more (particularly to allow tracking at the entity revision level). Maybe there's some other info we need to put in file usage for non-entity-based usage-tracking scenarios? But whatever the case, there's no reason that media browser solutions cannot solve this problem entirely on their own using an approach like what File Lock does. Alternatively, modules like Scald and Asset don't struggle with this because they have "wrapper" entities that lay claim to the files. The status-based approach being proposed in this issue feels to me just like a workaround to modules not being able to express themselves clearly enough using the usage API, though I'll admit I don't quite understand why Media doesn't just bake the File Lock approach into its own code.

At least with File Lock I can see "oh, this file has been claimed by File Lock" and understand roughly why it's not being deleted. If we just have a status property that doesn't shed any insight at all into what module set the status and why it may have been set.

David_Rothstein’s picture

Status: Reviewed & tested by the community » Needs review
diff --git a/modules/file/file.module b/modules/file/file.module
index ae452a6..13bff7f 100644
--- a/modules/file/file.module
+++ b/modules/file/file.module
@@ -139,6 +139,20 @@ function file_file_download($uri, $field_type = 'file') {
     return;
   }
 
+  // Find out if a temporary file is still used in the system.
+  if ($file->status != FILE_STATUS_PERMANENT) {
+    $usage = file_usage_list($file);
+    if (empty($usage) && $file->uid != $user->uid) {
+      // Deny access to temporary files without usage that are not owned by the
+      // same user. This prevents the security issue that a private file that
+      // was protected by field permissions becomes available after its usage
+      // was removed and before it is actually deleted from the file system.
+      // Modules that depend on this behavior should make the file permanent
+      // instead.
+      return -1;
+    }
+  }

Having this security feature in the File module does not look right to me, since the File module can be turned off if the site is no longer using it - in that case I think there would still be a security hole. (I checked Drupal 8 before posting this publicly, but it does not appear to be a problem there because in Drupal 8 you are prevented from ever turning off the File module if any files exist on the site.)

David_Rothstein’s picture

Am I the only one that thinks this is the wrong solution?
...
@rickvug: I'm basically on the same page with you. Without a media browser solution in core, this ends up being a solution that's pretty specifically tailored to the needs of Media/File Entity rather than generalized for all use-cases.

Overall I agree with these concerns. I think the approach here needs more discussion, especially before we put it in Drupal 7. (See also my earlier questions in #273 which are related.)

Basically from what I can tell, the approach in this patch (which also is the approach currently in Drupal 8) means:

  1. There is no way to configure your site to maintain the current Drupal 7 behavior of simply deleteing files immediately after they are orphaned (even though that's a very reasonable behavior for many sites).
  2. If you want to make your orphaned files stick around forever, you also have to make your abandoned files stick around forever (e.g. a file that was uploaded on the node form but then someone changed their mind and decided not to create the node). I can't think of any good reason these concepts should be tied together.

It looks like earlier in this issue some other ideas (e.g. a per-field checkbox for "don't delete files in this field when they are orphaned", or maybe a sitewide checkbox) were discussed, and maybe they were abandoned too soon... because those do seem more sensible overall.

I am supportive of the idea of trying to solve this in core (especially because the security issues surrounding leaving these files non-deleted are difficult to get right) but it doesn't feel like the approach is fully ironed out yet.

I'm not sure what the right next step is. Should people who are concerned about this create a new issue seeing if it's possible to change the way Drupal 8 does this currently, and then hold off backporting anything to Drupal 7 until that has had a little more time to play out?

azinck’s picture

I'm not sure what the right next step is. Should people who are concerned about this create a new issue seeing if it's possible to change the way Drupal 8 does this currently, and then hold off backporting anything to Drupal 7 until that has had a little more time to play out?

I remain unconvinced that *any* change needs to happen. Core's current behavior is reasonable since it has no reason to keep orphaned files.

Many sites do have potentially complex logic around which files to keep and which not to keep, and these use cases can be solved via modules. Some modules might infer intent from a variety of factors while others might do it explicitly (file_lock or Scald or Asset). You can even imagine a module which extends the file upload form to add a checkbox to "keep this file". But regardless of the business logic that drives the decision, if a module determines that it wants to prevent a file from being deleted it need only lay claim to it in file_usage.

So count me in with the others (sun, jcisio, cweagans, rooby, etc.) who've already made the same argument in this thread years ago. I'd argue that the current behavior works as designed.

dgtlmoon’s picture

I fear the thread has changed from talking about UNUSED files to ORPHANED files which are totally different things

Orphaned - Not joined to any revision of any field of any revision of any node

Unused - Not joined to the current published revision of a node, but could be part of a revision that is the future or past (drafts/workbench) of any node revision or field revision

I think we should stick to the original subject here, which is Unused Files

  • webchick committed 80f9e58 on 8.3.x
    Issue #1399846 by slashrsm, Gábor Hojtsy, Berdir, brantwynn, visabhishek...

  • webchick committed 80f9e58 on 8.3.x
    Issue #1399846 by slashrsm, Gábor Hojtsy, Berdir, brantwynn, visabhishek...
Gábor Hojtsy’s picture

Issue tags: -Media Initiative +D8Media

Swap media tags to the right one.

Pol’s picture

There's a collision with the new system_update_7081().

Here's an updated patch for Drupal 7.51.

Berdir’s picture

See #2801777: Give users the option to prevent drupal from automatically marking unused files as temporary , we might want to consider to backport that issue instead of this, actually.

Pol’s picture

And also, renaming the system_update_7082 from this patch to system_update_7083 is not the right solution, it means that sites using this patch won't get the system_update_7082 from Drupal 7.51.

I'm going to check the issue you mentionned now.

DamienMcKenna’s picture

renaming the system_update_7082 from this patch to system_update_7083 is not the right solution,

Yes, it is, Drupal doesn't have another mechanism to handle this scenario, site builders just have to be aware of what they're doing.

Pol’s picture

Yes we should be aware and that's why I raised the issue.

I think that each hook_update_N() should be named by incrementing N by 10 instead of 1.

This could also be a way to verify if a schema has been altered or not:

$schema_altered = ($schema_version % 10 == 0) ? FALSE : TRUE;

Of course, this won't solve the problem but it will give these patches a way to live in our makefiles just the time it gets into core if they are eligible to.

Here's the reroll of the patch without the system_update_7082(), like you said, it's up to the site builders to be aware that they have to add it to their custom module.

Here's the content of the system_update_7082() from this patch:

 /**
 * Update the description of the 'status' column in {file_managed} to
 * accommodate the introduction of a configurable maximum age for temporary
 * files.
 */
function system_update_7082() {
  $spec = array(
    'A field indicating the status of the file. Two status are defined in core: temporary (0) and permanent (1). Temporary files older than ile_temporary_maximum_age will be removed during a cron run.',
    'type' => 'int',
    'not null' => TRUE,
    'default' => 0,
    'size' => 'tiny',
  );
  db_change_field('file_managed', 'status', 'status', $spec);
}

Status: Needs review » Needs work

The last submitted patch, 310: cleanup-files-1399846-306.patch, failed testing.

Pol’s picture

Status: Needs work » Needs review

Hi,

Following the recommandations of @Berdir, I've created a new patch based on the patch from D8.

Feedbacks are welcome.

  • webchick committed 80f9e58 on 8.4.x
    Issue #1399846 by slashrsm, Gábor Hojtsy, Berdir, brantwynn, visabhishek...

  • webchick committed 80f9e58 on 8.4.x
    Issue #1399846 by slashrsm, Gábor Hojtsy, Berdir, brantwynn, visabhishek...
jyraya’s picture

The last available patch cannot be applied with 7.56. My team and me tried on several environment without success.

I post then an update of it applicable on 7.56.

Now, what is missing to close this issue once and for all?

zeyus’s picture

Just an update...there's an issue on deleting files that are somehow referenced from a field that no longer exist.

Thought it was a core issue so I raised it here. https://www.drupal.org/node/2903556#comment-12228925\

Here's an updated patch file.

NWOM’s picture

Does #316 also handle deleting orphaned file folders? According to the Filefield Paths maintainer, this would be a core issue: #1483702: Empty folders not removed.

antims’s picture

Drupal often unexpectedly deleted some images I uploaded with ckeditor IMCE module.

poker10’s picture

Version: 7.x-dev » 8.0.x-dev
Status: Needs review » Fixed

Thanks everyone for the discussion about the possible D7 solution.

I think this should be a Won't fix for D7 (and therefore I am going to close this as Fixed for D8). There were multiple concerns raised in comments #3, #26, #297, #298, #300, etc.., that it would not be a good idea to backport this to D7. Let's summarize the most important concerns here:

  • D8/9/10 has a Media module in core, so this change (together with #2801777: Give users the option to prevent drupal from automatically marking unused files as temporary ) was a step to support a correct media workflow - D7 on the other side does not have Media module built-in and this patch would therefore introduce changes practically only for one contrib module, which is questionable
  • The patch introduces a big behavioral change from "delete all files" to the opposite
  • If we implement this, it would not be possible for sites to keep the current behavior (for BC) - files deleted from fields are currectly deleted immediately, if they are no longer referenced
  • If you want to make your orphaned files stick around forever, you also have to make your abandoned files stick around forever
  • This can be solved by contrib modules right now, see https://www.drupal.org/project/file_lock
  • This can be solved by the Media module itself if needed (everything that is needed is to maintain a single file usage)
  • If we implement this and you are left with some unused/orphaned files, you will be not able to delete them anymore

Looking at the points above it clearly seems to me that it is not a good idea to make such change and especially in this D7 phase. Regarding what @Berdir proposed in #307 (#2801777: Give users the option to prevent drupal from automatically marking unused files as temporary ), I think it could be a more feasible approach if we would like to do something with this in D7. But as mentioned, there is already a File Lock module doing the work..

Status: Fixed » Closed (fixed)

Automatically closed - issue fixed for 2 weeks with no activity.