Handling overwriting of managed files (with unittests!)
| Project: | Drupal |
| Version: | 7.x-dev |
| Component: | file system |
| Category: | bug report |
| Priority: | critical |
| Assigned: | Unassigned |
| Status: | closed |
| Issue tags: | D7FileAPIWishlist |
While trying to move the user.module's user picture into the managed file framework I came across an oversight in the Files API. When you use FILE_EXISTS_REPLACE to overwrite an existing, managed, file using file_copy(), file_move() or file_save_data() the target file's database record isn't affected.
The case of file_copy() is relatively straightforward to remedy, if replace was specified just look for an existing file with the resulting path:
<?php
$file = clone $source;
$file->fid = NULL;
$file->filename = basename($filepath);
$file->filepath = $filepath;
?>becomes:
<?php
$file = clone $source;
if ($replace == FILE_EXISTS_REPLACE && ($existing = file_load(array('filepath' => $filepath)))) {
$file->fid = $existing->fid;
}
else {
$file->fid = NULL;
}
$file->filepath = $filepath;
?>And logically this makes sense, you start with two files--the source and target--and end with two files by changing the contents of the later. hook_update get fired and modules know to update themselves accordingly.
file_save_data() is also very simple we just try to load the resulting file path and if a file is found use its file id.
file_move() is a bit trickier. You're starting with two files--the source and target--but should only have one at the end. The filesystem bit is straightforward--just overwrite the file. The question really boils down to which file is deleted and which updated? Modules need to be informed that one file has been updated and one deleted.
I think that the most common case for using file_move() with overwriting is to replace an existing file with a new upload. In this case the best course of action is to load the destination record and update that, then after hook_file_move() is invoked call a soft delete (file_delete($source, FALSE)) so that the original is removed. That way any modules using the destination file won't be disturbed.
| Attachment | Size | Status | Test result | Operations |
|---|---|---|---|---|
| FILE_COPY_MOVE_FIXED.PATCH | 30.23 KB | Ignored | None | None |

#1
#2
re-rolled from CVS rather than git.
#3
The last submitted patch failed testing.
#4
i think this got caught by the testbot melt down.
#5
actually that last one had some of #330633: Temporary file cleanup needs some love (UnitTest included!) mixed in.
#6
drewish encouraged me to review this. I am not an expert in the file API, but here are some comments:
The patch modifies the behaviour of file_move() and file_copy() wrt
{files}.filenameof the destination file. Previouslyfilenamewas always set tobasename(filepath), i.e. it included the _nn suffix in case of overlapping filenames. This is no longer the case - and that is probably a good thing (afaik the reason to save bothfilenameandfilepathis to allow multiple files to have the same "public" name). However, file_save() isn't updated to the new behaviour - I suggest changing that too.When file_move() is used to rename a file (i.e. $destination is a filename rather than a directory), filename still reflects the old name. Or should destination always be a directory? It says so in the Doxygen comment, but in the testNormal() unit test $desired_filepath is a filepath including filename.
I think $file->filename should be checked in the unit tests as well.
I suggest renaming $delete_original to $delete_source (to reflect that it refers to $source).
+ $this->assertEqual($a->fid, $b->fid, t("File's id is the same."), 'File Unchanged');Is that proper English? The same as what? I suggest rephrasing to "The files' ids are identical" or "File ids are identical".
#7
c960657, i think you're being modest about your experience with the files api.
actually the filename has (at least since D5) been the name the file was uploaded with--rather than just the basename--for exactly the reasons you've suggested. it was really more of a shortcoming in the first versions of file_move() and file_copy() that it wasn't preserved during the operation. not sure what you mean about file_save() not being updated... it should save what ever filename is provided.
Yeah that was a docs bug, it's actually always let you specify the filename, it's just not been documented. I've fixed that for file_copy(), file_move(), file_unmanaged_copy(), and file_unmanaged_moved().
I'm not quite sure where you mean for that.
Good call, changed that and fixed the comment above the file_delete() to explain that it's not a forced delete.
Yeah, I can see your point... I thinking since it's comparing the same file it's one file's id... I've changed the variable names from $a and $b to $before and $after to make that clearer and removed the possessive bit so it's now:
<?php$this->assertEqual($before->fid, $after->fid, t("File id is the same."), 'File Unchanged');
?>
#8
:-)
Sorry, I meant file_save_data(). The following adds two rows with the filename column set to foo.txt and foo_0.txt, respectively. They should both be foo.txt.
file_save_data('lorem', 'foo.txt'); file_save_data('ipsum', 'foo.txt');I suggest expanding the tests so that the contents of the filename column is also verified, e.g. making sure that the above example generates the expected result.
I'll take a closer look at this later.
#9
The patch changes the behaviour of {files}.filename, but it still doesn't always match what I would expect. Here are some examples. Do you agree that these should be changed?
(In the following, "Expected/actual filenames" refer to the expected and actual values of the {files}.filename column for each of the mentioned files. All examples should be run with an empty files directory.)
When copying or moving to a complete file path, I think {files}.filename should reflect basename($destination):
<?phpfile_save_data('foo', 'foo.txt');
file_save_data('bar', 'bar.txt');
$foo = file_load(array('filepath' => file_directory_path() . '/foo.txt'));
file_copy($foo, 'bar.txt', FILE_EXISTS_RENAME);
?>
Actual filenames without patch: bar_0.txt bar.txt foo_0.txt
Actual filenames with patch: foo.txt bar.txt foo_0.txt
<?phpfile_save_data('foo', 'foo.txt');
file_save_data('bar', 'bar.txt');
$foo = file_load(array('filepath' => file_directory_path() . '/foo.txt'));
var_dump(file_move($foo, 'bar.txt', FILE_EXISTS_RENAME));
?>
Actual filenames without patch: bar_0.txt bar.txt
Actual filenames with patch: foo.txt bar.txt
When saving a file with a specified name, I think this should be saved in {files}.filename, even though the file is renamed to something else in the files directory.
<?phpfile_save_data('lorem', 'foo.txt', FILE_EXISTS_RENAME);
file_save_data('ipsum', 'foo.txt', FILE_EXISTS_RENAME);
?>
Actual filenames with and without patch: foo.txt foo_0.txt
It looks like these changes weren't included in the patch.
+ $this->assertEqual($before->fid, $after->fid, t("File id is the same."), 'File Unchanged');Use single quotes instead of double quotes.
+ function assertFileUnchanged($before, $after) {+ $this->assertEqual($a->fid, $b->fid, t("Files have the same ids %a-fid == %b-fid.", array('%a-fid' => $a->fid, '%b-fid' => $b->fid)), 'Same File');This message seems very elaborate (includes the actual fids) compared to most others in the bug. A left-over from debugging?+ /**+ * Implementation of getInfo().
+ */
+ 'description' => t('Filename munging and unmunging tests.'),+ function setUp() {+ $this->bad_extension = 'php';
+ $this->name = $this->randomName() . '.' . $this->bad_extension . '.txt';
+ }
In the munging tests, how about asserting the actual munged name, e.g. like this:
$name = $this->randomName();$bad_name = $name . '.' . $bad_extension . '.txt';
$munged_name = file_munge_filename($bad_name, '');
$this->assertEqual($munged_name, $name . '._' . $bad_extension . '.txt');
#10
c960657, thanks, first, for such a through review, and second for so patiently explaining the issue. you're absolutely right that i'd over looked aspects of the filenames.
in the case of file_save_data() i think it'd make sense to use the basename of their original destination filename. i think we're in agreement on this point.
with file_move() and file_copy() i'd like to give it some thought before i commit to anything. i'll be on the plane all day tomorrow so i'll have some time to try to flush out a fix for this.
#11
here's a re-roll that gets most of c960657's issues with the tests and other nitpicking points. doesn't hit the higherlevel filename issues.
#12
okay did some work on this to try to address the filenames in a sane, consistent manner. this looks at the replacement mode and if the destination was a a file (rather than a directory) and if it's a renaming around an existing file then it uses that for the filename.
i ran out of time before i could get the copy and move tests fully updated so this might need a bit of work but i'd say it's ready for a review in other respects.
#13
Adding to my review queue. Thanks drewish.
#14
+ // If we're replacing an existing file re-use it's database record."it's" should be "its" (in several comments).
file_copy() contains the following (some lines omitted):
$file->filename = basename($filepath);if ($replace == FILE_EXISTS_REPLACE && ($existing = file_load(array('filepath' => $filepath)))) {
$file->filename = $existing->filename;
}
else if ($replace == FILE_EXISTS_RENAME && is_file($destination)) {
$file->filename = basename($destination);
}
+ // Now exampine what actually made it into the database.I didn't look too closely at the tests yet.
#15
addressed all of c960657's issues
#16
The last submitted patch failed testing.
#17
i think the test was run on pgsql database and was failing for other reasons.
#18
Would be good to get another review from, c960657. The patch introduces some additional complexity, but it seems useful. It matches the behavior of 'cp' and 'mv' on Linux systems, which is what I think we should aim for.
#19
AFAICT none of the three examples in #9 yield the "expected" result - is that intended? It looks like $destination can be a path relative to either DRUPAL_ROOT or file_directory_path. The patch only handles the former (I think - I haven't looked to closely). I am not sure whether the latter is expected to work. The documentation isn't clear on how relative paths are resolved.
In the file_move() tests, you may want to verify that $source->filepath no longer exists.
function testNormal() {...
- $this->assertTrue(file_exists($source_file->filepath), t('The original file still exists.'));
- $this->assertTrue(file_exists($file->filepath), t('The copied file exists.'));
+ $this->assertNotEqual($source->fid, $result->fid, t("A new file id was created."));
+ $this->assertNotEqual($source->filepath, $result->filepath, t("A new filepath was created."));
+ $this->assertFalse($result, t("File move failed."));Should use single quotes (I'm not sure whether this is actually required by the "official" coding conventions?). Several occurences.
+ // Verify that was was returned is what's in the database.One was was one too many :-) Several occurences.
+ // Ensure that the existing file wasn't ovewritten.
"ovewritten"
#20
c960657, it would be helpful if you could rephrase what you think the expected vs actual are from #9, you list several files but it's not clear which is which. at this point i feel that the names are correctly handled. what don't you like?
in regards to the directories, IMHO we shouldn't be modifying any files outside drupal's files directory but we definitely don't test them. and due to the permissions issues i don't think it's realistic to try.
FileMoveTest should be checking that the files are removed. testExistingRename(), testExistingReplace() and testNormal() all call:
<?php$this->assertFalse(file_exists($source->filepath));
?>
In testExistingError() and testExistingReplaceSelf() the file isn't removed.
Fixed the typos pointed out.
#21
I agree. My point is that both
file_save_data('lorem', 'asdf.txt')andfile_save_data('lorem', file_directory_path() . '/asdf.txt')are equivalent, i.e. they both create a file in the files directory.<?php
file_save_data('lorem', 'foo.txt', FILE_EXISTS_RENAME);
file_save_data('ipsum', 'foo.txt', FILE_EXISTS_RENAME);
file_save_data('lorem', file_directory_path() . '/bar.txt', FILE_EXISTS_RENAME);
file_save_data('ipsum', file_directory_path() . '/bar.txt', FILE_EXISTS_RENAME);
?>
This example creates four files in the files:
example.org/files/foo.txt filename=foo.txtexample.org/files/foo_0.txt filename=foo_0.txt
example.org/files/bar.txt filename=bar.txt
example.org/files/bar_0.txt filename=bar.txt
Note how filename is modified with _0 only when $destination is specified without the file_directory_path() prefix. This seems inconsistent.
Hmm, don't know how I missed that.
#22
ah, good point. i'll look into that. here's a clean re-roll in the mean time.
#23
dropped the unrelated unit tests that had been merged in from #276280: Tests needed: file.inc. also got all the tests passing now that file_load_multiple() is the correct way to load by filepath.
#24
c960657, got the renaming names fixed.
#25
re-rolling now that #353207: Remove FILE_STATUS_TEMPORARY was committed.
#26
#27
re-rolling around some other commits.
#28
A few more nits:
<?php$file = new stdClass();
$file->fid = NULL;
$file->filepath = $filepath;
$file->filename = basename($filepath);
?>
<?php$file = (object) array(
'fid' => NULL,
'filepath' => $filepath,
'filename' => basename($filepath);
);
?>
+ // Verify that was was returned is what's in the database.For consistency with FileMoveTest and FileCopyTest, I suggest renaming FileSaveDataTest::testOverwriteError() to testExistingError().
/*** Test replacement when moving onto a file that already exists.
*/
function testExistingReplaceSelf() {
#29
c960657, I think that #348448 is more concerned with cases where
new stdClasswasn't being called. We're not getting any warnings and there's other places in core where this style is followed.Corrected all the other points though.
#30
it occurs to me that since file_save_upload() also has a replace parameter we should have this code there as well.
#31
Fixed some bugs when calling file_save_upload() with $replace = FILE_EXISTS_ERROR.
#32
The last submitted patch failed testing.
#33
The patch applies cleanly. There was an issue with one of the testing servers. Also, all the file tests run cleanly with the installed patch.
On a quick review I noticed
// If we're replacing an existing file re-use its database record.There are multiple places we're are used. Conjunctions should be removed. This should be 'we are'.
I'll try and give it a better review tomorrow.
#34
removed the contractions.
#35
+ * - FILE_EXISTS_REPLACE - Replace the existing file. If a managed file with+ * the destination name exists then its database entry will be updated and
+ * file_delete() called on the source file after hook_file_move is called.
+ * If no database entry is found then the source files record will be
+ * updated.
While this sounds like an accurate description of what the code below does, I'm left wondering "Why is this so important that you spent 5 lines talking about it?" An extra sentence with a summary of the implications would be good to add here, I think.
+ // If file_destination() returns FALSE then $replace == FILE_EXISTS_ERROR and+ // there's an existing file so we need to bail.
Heh. ;)
+ function assertDifferentFile($a, $b) {I like that assertFileUnchanged() uses parameter names that are indicative of what they're doing. Could we do the same here? $file1, $file2 would be an improvement.
Same for assertSameFile()
+ $contents = "file_put_contents() doesn't seem to appreciate empty strings so lets put in some data.";let's
+ * Test renaming when moveing onto a file that already exists.moving
That's it for an initial pass. Now that I know how big this patch is I'll set aside time accordingly. :\ Sorry, tried to do this and something else at the same time and it didn't really work well. :(
#36
addressed webchick's comments.
#37
A more thorough review this time. These are some wonderful tests to have, and a great improvement overall. I can tell that lots of work went into this.
A few minor issues, from the bottom, just to make things saucy! Once these are resolved, I'll do another quick pass but then I think this is good to go.
// Validate the uploaded picture.- $file = file_save_upload('file_test_upload', array('file_validate_is_image' => array()));
+ $file = file_save_upload('file_test_upload', array('file_validate_is_image' => array()), FALSE, $form_state['values']['file_test_replace']);
Could we expand that comment a bit more, since the new code is less readable than the old version? Something like "// Validate the uploaded picture, using the type of replacement specified by the form."
+ $this->assertEqual($contents, file_get_contents($existing->filepath), t('Contents of existing file were unchagned.'));unchanged
How come sometimes we do...
+ $existing = $this->createFile();+ $contents = $this->randomName(8);
+
+ $result = file_save_data($contents, $existing->filepath, FILE_EXISTS_REPLACE);
...and other times we do...
+ $contents = $this->randomName(8);
+ $existing = $this->createFile(NULL, $contents);
// Check the overwrite error.
+ $result = file_save_data('asdf', $existing->filepath, FILE_EXISTS_ERROR);
Should these not be consistent throughout the tests?
+ $this->assertFileHookCalled('update');+ $this->assertFileHookCalled('insert', 0);
+ $this->assertFileHookCalled('delete', 0);
Although I realize that $expected_count in assertFileHookCalled() defaults to 1, I wonder if we should specify 1 here anyway. This kind of looks like a mistake. Just musing ... feel free to ignore this comment if you'd like. ;)
I'd love to see an inline comment or two break up the code in testWithFilename(), testExistingRename(), testExistingReplace() and testExistingError(). Roughly one line of comments for every 4-5 lines of code (where they don't currently exist). For example, instead of:
+ function testExistingError() {
+ $contents = $this->randomName(8);
+ $existing = $this->createFile(NULL, $contents);
// Check the overwrite error.
- $file = file_save_data($contents, 'asdf.txt', FILE_EXISTS_ERROR);
- $this->assertFalse($file, t("Overwriting a file fails when FILE_EXISTS_ERROR is specified."));
+ $result = file_save_data('asdf', $existing->filepath, FILE_EXISTS_ERROR);
+ $this->assertFalse($result, t('Overwriting a file fails when FILE_EXISTS_ERROR is specified.'));
+ $this->assertEqual($contents, file_get_contents($existing->filepath), t('Contents of existing file were unchagned.'));
+ $this->assertFileHookCalled('insert', 0);
+ $this->assertFileHookCalled('update', 0);
+ $this->assertFileHookCalled('delete', 0);
+
+ // Ensure that the existing file wasn't overwritten.
+ $this->assertFileUnchanged($existing, file_load($existing->fid, TRUE));
}
Maybe (only comment changes):
+ function testExistingError() {
+ $contents = $this->randomName(8);
+ $existing = $this->createFile(NULL, $contents);
// Attempt to overwrite an existing file.
- $file = file_save_data($contents, 'asdf.txt', FILE_EXISTS_ERROR);
- $this->assertFalse($file, t("Overwriting a file fails when FILE_EXISTS_ERROR is specified."));
+ $result = file_save_data('asdf', $existing->filepath, FILE_EXISTS_ERROR);
+ $this->assertFalse($result, t('Overwriting a file fails when FILE_EXISTS_ERROR is specified.'));
// Ensure that the file contents are unchanged.
+ $this->assertEqual($contents, file_get_contents($existing->filepath), t('Contents of existing file were unchagned.'));
+ $this->assertFileHookCalled('insert', 0);
+ $this->assertFileHookCalled('update', 0);
+ $this->assertFileHookCalled('delete', 0);
+
+ // Ensure that the existing file wasn't overwritten.
+ $this->assertFileUnchanged($existing, file_load($existing->fid, TRUE));
}
I realize that one can read the assertions to get a good idea of what's going on, but I find the code much more scannable when this information is left-aligned with the code in the form of an inline comment. I think it would also help a new developer to understand at a glance what the test is doing, even if they don't understand SimpleTest.
+ // Ensure that the existing file wasn't ovewritten.overwritten.. this looks like it might've been copy/pasted a couple times.
+ $this->assertEqual($result->filename, $existing->filename, t("Filename was set to the basename of the source."));I do not grok this assertion. :) What does that mean? The code part just looks like "the file names were the same" but obviously there is some nuance here.
+ $result = file_copy(clone $source, $target->filepath, FILE_EXISTS_REPLACE);Why clone? Could we have a little one-liner here to explain?
+ // Look at the results.+ $this->assertTrue($result, t('File copied sucessfully.'));
+ $this->assertEqual($contents, file_get_contents($result->filepath), t('Contents of file were overwritten.'));
...
And while we're at it, a more descriptive comment here? Applies to a few sections of the code. I'd love it if each of the "Look at the results" actually explained what the code below was testing. I know it's not quite as copy/paste friendly, but it's much easier to read through.
+ $loaded_result = file_load($result->fid, TRUE);+ // Verify that the source file wasn't changed.
+ $this->assertFileUnchanged($source, $loaded_source);
+ // Verify that what was returned is what's in the database.
+ $this->assertFileUnchanged($result, $loaded_result);
+ // Make sure we end up with three distinct files afterwards.
+ $this->assertDifferentFile($loaded_source, $loaded_target);
*very* nit-picky, but could we please have a blank line above each of these comments, for consistency with elsewhere in this file? Applies to a few sections of the code.
A couple in-liners to break up testExistingError() and testExistingReplaceSelf() would also be good.
Also, there are a couple of inline comments that are missing periods. Tsk, tsk. ;)
+ function assertDifferentFile($file1, $file2) {+ $this->assertNotEqual($file1->fid, $file2->fid, t('Files have different ids.'), 'Different file');
+ $this->assertNotEqual($file1->filepath, $file2->filepath, t('Files have different paths.'), 'Different file');
+ }
function assertSameFile() logs the file IDs/paths in the assertion (%file1-fid == %file2-fid). This function probably should as well. Then again, assertFileUnchanged() does NOT do this for all of the properties it's checking, so maybe assertSameFile() ought not to either.
Is there a reason you logged it there and not the others?
Other than that, I couldn't find anything. :)
#38
Responding to your comments in reverse order to just desaucy things a bit so they don't get out of hand...
Added id/filepath from assertDifferentFile() to assertSameFile() and assertFileUnchanged().
Got sick of trying to get all the hooks checked correctly so I went ahead and added assertFileHooksCalled() which takes an array of hooks and ensures that only those file hooks are called.
Well which ones were they?
Not really, we need to ensure they've got different contents to ensure that the file wasn't overwritten.
So on the whole the comments should be a lot better.
#39
- // Look at the results+ // Look at the results.
Was the only missing period I found.
There were two places where the comments were longer than 80 characters, so I slit them to a new line.
- // Clone the object so we don't have to worry about the function changing our reference copy.+ // Clone the object so we don't have to worry about the function changing
+ // our reference copy.
- // Clone the object so we don't have to worry about the function changing our reference copy.+ // Clone the object so we don't have to worry about the function changing
+ // our reference copy.
Those are the only changes for the attached patch, and it should still apply.
#40
The last submitted patch failed testing.
#41
rerolled after making the changes listed in noahb's comments.
#42
fixed two comment issues webchick spotted.
#43
Excellent clean-up work!
Committed to HEAD. :)
#44
Automatically closed -- issue fixed for two weeks with no activity.