Problem/Motivation

Currently, file_delete() has both verifications that the $file->uri is valid and that there are no remaining references to it (this check can be overridden with an optional $force parameter). Both those checks and the additional argument aren't compatible with the delete() interface provided by the Entity system.

Proposed resolution

Initially, the discussion centered around simply removing those checks. However, there have been arguments raised that this makes the already complicated file API harder to use. Various solutions have been discussed and proposed, including keeping some of the checks and aborting silently, throwing exceptions or providing helper/wrapper functions that do these checks.

The current proposal from #45 changes the existing file_usage_add()/file_usage_delete() functions to automatically handle the file temporary/permament status, which makes manual status changes and file deletions in most cases unecessary. Implementing modules only need to make sure to properly add and remove their usage to/from the uploaded files. It is still possible to explicitly call flle_delete() but modules which do so are expected to understand the consequences and do the necessary checks on their own. One such example is system_cron(), which deletes temporary files.

Remaining tasks

We need an agreement on the proposed solution and finish the patch, which means identifying and implementing necessary changes to the api documentation to explain the proper usage.

It also nether to be discussed if the file_usage_*() functions should be renamed to file_managed_*() and if that should happen in this issue or a follow-up.

User interface changes

None.

API changes

Modules relying on the file usage API do no longer need to manually change the file status or call file_delete(). Modules which use files without relying on file usage need to add the necessary checks. It is strongly recommended to rely on the file usage API.

Original report by [Dave Reid]

Spun off from #1361226: Make the file entity a classed object it was pointed out that we really should not have "custom" logic in file_delete() checking if the file is used or not. That should be the responsibility of the code that calls file_delete().

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

exlin’s picture

Removed logic and unnecessary $force parameter, but everything other is intact.

exlin’s picture

Status: Active » Needs review
FileSize
1.2 KB

new try

Status: Needs review » Needs work

The last submitted patch, RemoveCustomUsageCheck-1401558-1.patch, failed testing.

exlin’s picture

Status: Needs work » Needs review
FileSize
1.89 KB

Modified simpletest, expecting less errors.

Status: Needs review » Needs work

The last submitted patch, RemoveCustomUsageCheck-1401558-4.patch, failed testing.

exlin’s picture

Status: Needs work » Needs review
FileSize
3.81 KB

More modifications to tests, i believe removed lines are not needed anymore since responsibility about checking is moved to function calling file_delete().

Should pass this time.

Status: Needs review » Needs work

The last submitted patch, RemoveCustomUsageCheck-1401558-6.patch, failed testing.

exlin’s picture

Status: Needs work » Needs review
FileSize
3.9 KB
exlin’s picture

Removed one part of test

Status: Needs review » Needs work

The last submitted patch, RemoveCustomUsageCheck-1401558-9.patch, failed testing.

exlin’s picture

Status: Needs work » Needs review
FileSize
4.74 KB

Status: Needs review » Needs work

The last submitted patch, RemoveCustomUsageCheck-1401558-11.patch, failed testing.

exlin’s picture

Status: Needs work » Needs review
FileSize
4.66 KB
exlin’s picture

Comment // documentation for file_delete

Status: Needs review » Needs work
Issue tags: -Media Initiative

The last submitted patch, RemoveCustomUsageCheck-1401558-14.patch, failed testing.

cosmicdreams’s picture

Status: Needs work » Needs review

Status: Needs review » Needs work
Issue tags: +Media Initiative

The last submitted patch, RemoveCustomUsageCheck-1401558-14.patch, failed testing.

Berdir’s picture

There have been too many test assertions removed, the separate test method can probably be removed but the test assertions need to stay, the code there is currently wrong.

We can't just remove the check and continue calling file_delete() like we did before :)

I think there are multiple possible ways to continue here:

1. Require all callers of file_delete() to manually check the file usage.
2. Automatically call file_delete() if the usage count of a file goes zero.
3. Provide a file_delete_if_unused($file) function or something like that.

I kinda like the thought of the second option, saving does happen automatically for managed files, you just need to add your usage so it would make sense to have the file deleted once it's unused?

Having to do the check without providing some sort of help (2 or 3) would imho most certainly lead to bugs.

Might also be possible to do that in system_cron(), but we'd need a changed timestamp or something that get's updated when file usage changes (or on file usage itself) to prevent deleting files which shouldn't be (think of migrating file usage from nodes to something else and then cron runs in the middle of the process or something like that) so that would be quite a bit more complicated I guess.

Thougts?

Berdir’s picture

Status: Needs work » Needs review
FileSize
5.34 KB

The attached patch reverts the test changes and implemented suggestion 2 from my post above. I think this is quite a nice solution.

Yet another alternative would be to mark these files as temporary instead of deleting and let system_cron() deal with them.

I also converted the testInUse() to testUnusedFiles() to verify that the autodeletion works as expected by just removing the file_delete() calls.

Berdir’s picture

Note that there's more to deal with in file_delete().

The current entity_delete() function does not have a return value at all. file_delete() currently still has the ability to return FALSE if the file was not found. That's for example used in system_cron() (although that's kinda strange because we end up with *two* watchdog entries).

So, here's what we can do:

- Change entity_delete() to support a return value
- Throw an exception instead.
- Don't delete, add a watchdog entry, return nothing

Suggestions?

Dave Reid’s picture

Issue tags: +sprint
Berdir’s picture

Priority: Normal » Major

Raising to major, since this is blocking #1361226: Make the file entity a classed object which is major and therefore also the criticial #1368394: Convert all core entities to classed objects.

Reviews, plz? :)

chx’s picture

Status: Needs review » Needs work

Previously file_delete was a lie because it was file_delete_or_maybe_not but now file_usage_delete lies because it is file_usage_and_perhaps_file_too_delete.

aspilicious’s picture

Ok each function should have a clear task:

file_usage: provide info about the usage of the file
file_delete: just kill the file

To protect people from doing stupid things we can make a superwrappers that combine those.
Either way, doc blocks should contain clear examples and have to pass the "aspilicious" test ;)

ps: aspilicious test ==> "If it's understandable by aspilicious it's ok"

Berdir’s picture

Status: Needs work » Needs review
FileSize
6.62 KB

Ok, sounds like we have a plan, here we go.

- file_delete() is now a no-questions-asked-delete-whatever-happens.
- There is now a fle_delete_unused(), which does a file usage check before calling delete.
- Updated the file.filed to use the new function and the tests as well.
- Updated system_cron() to check file existence itself before deleting. I'm not sure about that stuff, as I don't see how a file that happened to not exist anymore should suddenly re-appear and then delete would work. Sounds like this would only fill up the watchdog table.

Feedback?

Berdir’s picture

Note that the file_valid_uri() stuff comes originally from #874326: Invalid scheme fails to halt processing of a stream request. We need to check what exactly the reasons for adding it were I think.

Berdir’s picture

chx’s picture

Yes exceptions would be better but until then please keep the sanity checks in place. It's really ugh if you dont get any warnings cos you try to delete something invalid.

Dave Reid’s picture

chx: Then we should prevent an invalid file from being added to {file_managed} in the first place right? We shouldn't care on delete. It should just delete. We don't run validation if the node is valid in node_delete().

Berdir’s picture

@chx: file_delete() is a API function and should not print messages using drupal_set_message(). And throwing an exception is red and scary. The patch continues to log a watchdog exception but *does* delete the file_managed entry. Unless there are cases where a file uri could suddenly become valid at a later point, we can't do anything with it at a later point anyway, so why not just delete it?

If we want checks before deleting a file in some cases, they should be outside of file_delete() and called directly by e.g. the form submit function, which can re-act better to this.

chx’s picture

So there are two known problems: a) the original issue wanted to allow only Drupal-defined streamwrappers . Whether it's adequate to do this on creation time can be debated b) the reason I was against removing the checks is -- what happens if you delete an URI which schema module is disabled?? The file is gone from the DB but the storage got no chance to react to this.

rfay’s picture

So the reason for doing a file_valid_uri() in file_delete() in #874326: Invalid scheme fails to halt processing of a stream request was:

The design decision was that we would not accept non-Drupal streamwrappers as limited in file_valid_uri(). If you accept any PHP streamwrapper, then bare filepaths are acceptable (because the PHP default streamwrapper is file://) as are a number of other streamwrappers.

I didn't really agree with this; My opinion was that we should accept any valid streamwrapper, but probably exclude bare paths. But the decision was to run it through file_valid_uri(), which is (as I remember) which bare paths are unusable in D7.

xjm’s picture

Priority: Major » Critical

This is apparently blocking #1361226: Make the file entity a classed object, which is the last issue to close out #1368394: Convert all core entities to classed objects. So, as much as I hate to do this when all our other thresholds look so fantastic, I think this needs to be critical.

Berdir’s picture

Ok, so...

Attempting to delete a file that belongs to a disabled stream wrapper is a valid point. However, what currently happens is that system_cron() tries to delete such temporary files that don't exist anymore on every cron run, resulting in 2-3 watchdog messages each time. And drupal_set_messages()...

I think we agree that the file reference stuff is ok to be removed and checked in the parent function. For file_valid_uri(), we have a bunch of options:

1. Keep the check, abort silently
2. Keep the check, change entity_delete() and similar functions to allow a return FALSE
3. Keep the check, throw an exception. That will need to be handled by all functions which are currently calling file_delete() to avoid failing hard.
4. Delete check, delete managed file row (Delete 'em all!)
5. Delete check, make calling code responsible for checking file_valid_uri() and let them decide.
6. Differentiate between "invalid stream wrapper" (throw exception) and non-existing files (allow delete)

What should we do and why?

chx’s picture

I love 6. That's what we should do, it's the closest mapping of reality to the situation. If the stream wrapper is not there then we have no sane way to continue, it's an exception but if the file gone away, go ahead and clean up.

1.: Aborting silently is never a good thing. 2.: that's not really helpful and out of scope. 3.: 6. is better 4.: NOPE. 5.: uh of course, pass the buck? NOPE.

Dave Reid’s picture

Yeah 6 seems best. Throw an exception on invalid stream wrapper (with a try/catch like in node_save()). Watchdog only on file no longer exists but allow it to continue without error.

sun’s picture

Please think of innocent coders and module developers.

We want a smart File API that performs actions intelligently. file_some_cryptic_name_delete() is the opposite of smart and intelligent.

As a module developer, I don't care whether File API's file_delete() lies to me as long as it performs a safe operation. If it only deletes usage but not the actual file, then that is a smart behavior. If my code doesn't account for possible file usage by other modules, then my code doesn't account for a modular system and thus is broken and I better fix it.

For internal usage and/or special cases, we can introduce a file_force_delete() or whatever is necessary, but pretty please think of the average API consumer, who just wants to save or delete a file. Exposing all the file usage stuff to them is inane and makes the API unnecessarily hard to use. Requiring API consumers to separately deal with file usage records, and even more ridiculous temporary/permanent flags, led to serious data loss bugs all over the place in core and contrib already. Caused by a File API that is inanely hard to learn and to get right for the 80% use-case. Make it smart.

Dave Reid’s picture

I'd rather not have a wrapper that checks usage and deletes or forces file deletion. If we provide the proper API then it's easy enough to just "use":

if ($file->isUsed()) {
  // Sorry, can't delete the file.
}
else {
  $file->delete();
}
Berdir’s picture

Ok, so discussed this with sun in irc and he is basically for keeping it as it is now which means keeping all three checks in place. However, what we can not keep are the return values, these have to go to be compatible with entity_delete().

His argument is that we are making the primary File API harder to use. In my opinion the real reason for that being hard to use is that there are in fact two parallel primary File API's and you need to use both of them.

Let's say the 80% use case is displaying a file managed widget, adding a file usage to that file when one is uploaded and at some point later, remove file usage again and delete the file. Right now, both things require *two* API calls. Although create is managed by the managed file API, you still need to make it permament with a file_save() call *and* call file_usage_add(). And to delete a file, you need remove the file usage *and* call file_delete(). Why?

We could say that for *managed* files, the primary API is file_usage_*(). That would mean that file_usage_add() would make a file status permament and removing all usages would make it temporary again, which in turn would then again mean that it will be deleted on system_cron(). Then we could safely assume that everyone who is messing with file status and/or file_delete() directly needs to know what he is doing. Like system_cron() for example, which *is* doing it correctly and checks references explicitly. And we can remove those checks.

That, however, is not within the scope of this issue. The goal of this issue is to get rid of those additional arguments and return values so that we can convert the file API to the entity system.

sun’s picture

FWIW, if you guys insist on file_load(), file_save(), file_delete() being pure file entity API functions, then I'd be fine with introducing separate file_managed_load(), file_managed_save(), and file_managed_delete() functions, which provide a consistent API for developers working with files in their modules and have all the nitty gritty details about usage and status inlined. In a nutshell, we'd replace most of the current file_*() calls in modules with file_managed_*() calls. (OO equivalents might be ->managedDelete())

Crell’s picture

I would be OK with #40 or similar. Functions/methods that "smartly" do what they say on the tin "and oh yeah this other thing too unless situation Y which we think you probably want" are a sure-fire way to produce an API that no one can use, because we almost always guess wrong.

Having a wrapping function/method that does a file_I'm_Done_With_This_So_Delete_If_You_Want() that internally calls file_delete() is fine, and is the correct way to do it. But the raw "I said delete and that means delete and nothing else, kthxbye" operation must be available in an obvious way.

Berdir’s picture

Mh, I'm not sure how file_managed_*() functions would help. First, I fail to see how they could replace the file_usage_*() functions so we end up with 3 API's instead of 2. Second, I have no idea what file_managed_save/load() would do differently than file_save() and file_load().

We already have a managed file API imho, the file_usage_* functions.

Kevin Morse’s picture

As a relatively new to Drupal user (< 1 year) who just started developing my first module a month ago I have this to offer:

The less lines of code that are required the better - especially to new developers who, if they are anything like me, are having a hard enough time trying to figure out how to "code in Drupal" as it is.

Every additional step that's required to do something simple (add or delete a file) is another place for me to make a mistake that could wreck something. Its also more code to maintain and makes understanding the code more difficult for newish users looking to get more involved. So far I haven't needed to work with the File API but seeing all this talk about the several hoops one has to jump through to work with files is not making me want to start anytime soon.

My, completely ignorant, preference would be something like the following (note I have NO experience working with this API):

  • file_load => Load the file and register it as being used.
  • file_save => Save the file and make sure its registered as being used.
  • file_safe_delete or file_unload => Should unregister the file and delete it if nothing else is using it or just leave the file there and let cron delete it.
  • file delete => Exactly what its name suggests. It needs to be STRONGLY suggested that the "safe version" should be used by everyone unless they absolutely require the "unsafe" version. Also mentions that this function is safe to call need to be axed from everywhere.

Are there any cases where you would want to mark a file as being unused but not want it to be deleted? If no, then file_usage_delete and the file_unload suggested above could be the same thing.

I assume file_unmanaged functions are for working with files and not marking them as being used? If that's the case then could the suggested file_load not also replace file_usage_add and the file_unload not also replace file_usage_delete?

Sorry if this post isn't helpful. If that's the case, feel free to disregard it! Hopefully though, it will provide at least a small bit of insight into how the uneducated masses see things.

Berdir’s picture

@Kevin Morse: Thanks for your feedback, but some of that doesn't make much sense to me. file_load() should just load the file, it shouldn't save or register anything. file_load() is called every time the file is displayed, not when it's uploaded.

These are the three API's that we have:

- file_unmanaged_*: Directly working with files on the file system, low-level API.

- file_*: Working with file entities, higher level API. Only necessary parts when using the managed file widget are saving as persistent and deleting.

- file_usage_*: Working with file usages. IMHO, it would make sense if these would become the actual "managed file" API. As mentioned above, the only thing that needs to be changed is adding file usages should make the file persistent (system_cron() already pretty much does that by not deleting temporary files with usages) and making them temporary when all usages are removed. We *could* rename them to file_managed_* although I have no clue what a file_managed_load() would do different than a file_load(). But it could be a simple wrapper, so that you always can use file_managed_* functions and we can simply say "Do not use file_* functions unless you know what you are doing. And no, you probably don't know!".

Just not sure why the table is called file_managed, as the API functions are not, nor is the entity :)

Not sure how to proceed here. Do we need to have that file_usage/managed_* functions in place to be able to remove the checks in this issue, should we introduce/change those functions here or do it as a follow-up so that we can proceed? This is currently a critical task and is block another one...

Berdir’s picture

Here's some example code to show what I mean..

- file_usage_add() marks a file as permanent. This removes the need for file_field_presave().
- file_usage_delete() marks a file as temporary if all usages are removed. We could alternatively also immediately delete it like one of my previous patches. This removes the need for file.module 's custom file field delete wrapper function.
- The only things a module that uses a managed file now needs to care about is adding and removing usages. Everything else is "managed" automatically.
- I haven't renamed any of the functions for now to keep things simple. We could now rename the add/remove functions to file_managed_add/remove and also introduce a file_managed_load wrapper for file_load().

The problem with the temporary is that files will only be deleted after 6h+ when cron runs, so the existing tests that check for deleted files need some tweaking. I've updated FileDeleteTest::testInUse() and FileFieldRevisionTestCase but others will probably fail.

The best way to see how this affects modules which are using managed files is looking at the changes in file.field.inc.

catch’s picture

+  // Make sure that a used file is not permament.
+  if ($file->status != FILE_STATUS_PERMANENT) {
+    $file->status = FILE_STATUS_PERMANENT;
+    file_save($file);
+  }

Isn't this ensuring that a used file is permanent if it's not already?

Patch looks good to me in general otherwise.

Berdir’s picture

Yes, sure, the comment is the wrong way round.

aspilicious’s picture

Ok besides of the comment is something wrong with this?
It's important to get this reviewed fast and good so we can unblock the entity issues.

Thnk you Berdir for all your work on this!

catch’s picture

Yeah I think this is a good way to go, and it's great we can remove entire hook implementations from file module. Not sure on immediately deleting the file vs. waiting the six hours, renaming to file_managed_*() also works for me, could be here or a follow-up.

tim.plunkett’s picture

FileSize
16.46 KB

Fixed that comment, since there was no other actionable feedback.

-  // Make sure that a used file is not permament.
+  // Make sure that a used file is permanent.

s/not permament/permanent

cfennell’s picture

I just tested this manually and was able to add a file field, upload a .txt file, attach it to a node and remove it from the node. I can't seem to get simpletest to work (for anything), so I can't run the file API tests, but the above testbot is happy.

Dave Reid’s picture

Status: Needs review » Reviewed & tested by the community

After doing a review of the new APIs and direction, I'm happy with #50. I've also had several people do UI/manual testing for this at the Twin Cities DrupalCamp sprint and we encountered no problems.

chx’s picture

My heart sings at the beauty of this patch. Revocable delete, simplified APIs, so so very nice.

Berdir’s picture

Yay, thanks for the RTBC. Might not be quite there yet, though. While working on updating the file entity patch based on these changes, I found a few issues that might need to be merged into this:

- user picture stuff. Will however be replaced with a filed soon.
- Some documentation references, e.g. in file_usage_add() and file_save_upload(). Also not sure if we have a (managed) file api overview docblock/group somewhere that might need to be updated (If we don't, then we might want to add one in another issue)

Berdir’s picture

Ok, here is a small re-roll. Documentation changes only, I did not touch the user picture stuff, I suggest to deal with that in #1292470: Convert user pictures to Image Field as that will remove the relevant code pieces anyway.

Leaving at RTBC, attaching an interdiff with the changes.

aspilicious’s picture

Yeah this is ok

catch’s picture

Title: Remove the usage handling logic from file_delete() » Change notice for: Remove the usage handling logic from file_delete()
Status: Reviewed & tested by the community » Active

Yeah this is great, also 7 files changed, 84 insertions(+), 125 deletions(-).

Committed/pushed to 8.x, thanks!

Will need a change notice.

Berdir’s picture

Status: Active » Needs review

Ok, here is a change notice: http://drupal.org/node/1593970

aspilicious’s picture

Looks good to me

sun’s picture

Status: Needs review » Needs work
+++ b/core/includes/file.inc
@@ -731,14 +728,17 @@ function file_usage_add(stdClass $file, $module, $type, $id, $count = 1) {
+  // Make sure that a used file is not permament.
+  if ($file->status != FILE_STATUS_PERMANENT) {
+    $file->status = FILE_STATUS_PERMANENT;

This comment still says the opposite of what the code is doing. ;)

Otherwise, lovely patch. Thanks for all of your hard work on this!

Berdir’s picture

Status: Needs work » Needs review
FileSize
550 bytes

Uh sorry, I forgot to include that fix into my patch.

sun’s picture

Status: Needs review » Reviewed & tested by the community
catch’s picture

Title: Change notice for: Remove the usage handling logic from file_delete() » Remove the usage handling logic from file_delete()
Status: Reviewed & tested by the community » Fixed

Heh well spotted sun. Committed/pushed the followup. Change notice looks good to me, so marking fixed.

Automatically closed -- issue fixed for 2 weeks with no activity.

Anonymous’s picture

Issue summary: View changes

Updated issue summary.

David_Rothstein’s picture