Problem/Motivation
There is a unique constraint on the uri column in {file_managed}. When inserting a new row it is only checked that the URI is unique on the filesystem, but not in the database. If the file has been deleted from the filesystem, but the entity still exists, this results in a PDO exception.
Proposed resolution
The code is already too complicated and buggy - previous patches posted in this issue were making it more complicated: refactor the code related to creating unique URIs.
Managed file operations, e.g. file_copy() / file_move() should not have the option to "take over" an existing file entity. It should be hardcoded to make sure the new URI is unique in database and filesystem. A file entity's URI can change, but the contents of a file entity's file must never change.
Replace constants:
FILE_EXISTS_RENAME, FILE_EXISTS_REPLACE, FILE_EXISTS_ERROR
should be:
FILE_CHECK_EXISTS_FALSE, FILE_CHECK_EXISTS_FS, FILE_CHECK_EXISTS_DB_FS
We do not need FILE_EXISTS_ERROR - code that needs this functionality can check itself.
The database should be checked for an existing URI before the filesystem as communicating with the filesystem can be very expensive, e.g. it could be in the cloud.
There is no situation where it is necessary to check database for an existing URI but not the filesystem.
Combine file_destination() and file_create_filepath() into file_uri_prepare() - a single function that ensures a URI is suitable for a new file.
Calls to file_valid_uri() just get in the way as it is perfectly valid to pass a filepath instead of a URI.
file_move() shouldn't return an entity as it is just modifying the entity that is passed to it.
file_copy() / file_move() / file_save_data() should set the filename of the resultant entity to the basename of the desired destination (provided it isn't a directory), not the basename of the resultant URI.
Remaining tasks
Review patch.
User interface changes
none
API changes
file_copy(File $source, $destination = NULL, $replace = FILE_EXISTS_RENAME) =>
file_copy(File $file, $destination = NULL)
file_unmanaged_copy($source, $destination = NULL, $replace = FILE_EXISTS_RENAME) =>
file_unmanaged_copy($source, $destination = NULL, $check_exists = FILE_CHECK_EXISTS_FS)
file_destination($destination, $replace) and file_create_filename($basename, $directory) =>
file_uri_prepare($uri, $check_exists = FILE_CHECK_EXISTS_FS)
file_move(File $source, $destination = NULL, $replace = FILE_EXISTS_RENAME) =>
file_move(File $file, $destination = NULL)
file_unmanaged_move($source, $destination = NULL, $replace = FILE_EXISTS_RENAME) =>
file_unmanaged_move($source, $destination = NULL, $check_exists = FILE_CHECK_EXISTS_FS)
drupal_file_exists($uri, $check_db = FALSE)
file_save_upload($source, $validators = array(), $destination = FALSE, $replace = FILE_EXISTS_RENAME) =>
file_save_upload($source, $validators = array(), $destination = NULL)
file_save_data($data, $destination = NULL, $replace = FILE_EXISTS_RENAME) =>
file_save_data($data, $destination = NULL)
file_unmanaged_save_data($data, $destination = NULL, $replace = FILE_EXISTS_RENAME) =>
file_unmanaged_save_data($data, $destination = NULL, $check_exists = FILE_CHECK_EXISTS_FS)
system_retrieve_file($url, $destination = NULL, $managed = FALSE, $replace = FILE_EXISTS_RENAME) =>
system_retrieve_file($url, $destination = NULL, $check_exists = FILE_CHECK_EXISTS_FS)
Original report by pwolanin
Found this in Drupal 7, but the fix should be the same for both 7 and 8 (and maybe 6?).
Discussed the problem with aaronwinborn and effulgensia and the it seems pretty clear we have a bug in terms of file_destination() because for a managed file it only checks if the file exists on disk.
This can lead to an SQL error if the URI exists in the {file_managed} table. For example, if I have 2 webnodes with local /tmp directories, a file being saved to temporary:// may not exist on disk, but may exist in the database. This can also happen if entries are not correctly purged when a file is deleted off disk.
There is a second bug which is a renaming race condition. If 2 users are trying to upload a file at about the same time, the simple incrementing logic may give both users the same file name.
Proposed fixes:
#1: when saving a managed file, check the URI in the database as well as on disk.
#2: when generating a suffix in the case of a duplicate file to be renamed, use a random 4-6 character string, rather than just incrementing.
Beta phase evaluation
| Issue category | Bug because Drupal throws a PDO exception and gets into a broken state that can only be resolved by editing the database directly. This prevents people from adding files. |
|---|---|
| Issue priority | Not critical because it only affects sites where the database and filesystem are no longer have consensus and the problem can be resolved by manually editing the database to match the filesystem. Major because this bug prevents file uploads from working and is encountered quite commonly, e.g. due to staging. |
| Prioritized changes | This bug is prioritized because it is a bug fix, it needs to be backported to D7, and because the fix reduces fragility and narrows the API. |
| Disruption | Disruptive because contrib modules using these functions will need to be updated. However, these changes are very minor. |
| BC layer | A bc layer is not required because it would introduce a lot of technical debt. The patch really simplifies the API. There is no benefit in retaining the broken API for compatibility purposes. The patch removes 216 lines of code. |
| Can it be postponed | If this change was postponed until 8.1 then we would need a BC layer which would increase technical debt. |
| Comment | File | Size | Author |
|---|---|---|---|
| #133 | drupal-refactor_file_inc-1556396-133-D7.patch | 7.77 KB | oadaeh |
| #120 | interdiff.txt | 26.9 KB | sriharsha.uppuluri |
| #120 | refactor_file_inc_to-1556396-120.patch | 92.4 KB | sriharsha.uppuluri |
| #116 | refactor_file_inc_to-1556396-116.patch | 92.42 KB | mgifford |
| #108 | refactor_file_inc_to-1556396-108.patch | 94.69 KB | sidharrell |
Comments
Comment #1
dave reidIt looks like file_create_filename() is also going to have this problem too.
Comment #2
pwolanin commentedIt's not clear if this is a problem with the normal core file module, but at least will be a problem with user pictures in user_validate_picture()
Comment #3
effulgentsia commentedI think the bug is real, but to make progress, we should find a nonbuggy way of triggering it. Which means, we either need a plausible scenario under which a public:// (or other non-temporary) file can be in {file_managed} but not on disk, or else a plausible scenario under which temporary:// files can exist (even if briefly) in {file_managed} (given a multiple web node scenario, we should always assume that temporary:// files might not be on disk at any particular time).
As per #2, a regular file field upload does not save an intermediary temporary:// file to {file_managed}, but possibly user module, media module, and other modules do, but the question is, is doing so ever legitimate, or is that a bug in itself?
Comment #4
pwolanin commentedYes, basically file_create_filename() is where the problem lies.
Possible we need to create an alternate version of that, or pass in a flag for managed files.
Comment #5
pwolanin commenteddiscussed w/ effulgensia - a simple way to reproduce (and write a test) is to delete the file off disk and then try to upload the same file again.
It's a #fail if for some reason the sysadmin deletes a file off disk (e.g. via SFTP) and then it's impossible to ever upload a replacement via the UI.
Comment #6
effulgentsia commentedI can confirm that #5 happens. Retitling and downgrading to normal, because I don't think it's major unless there's a way to reproduce it from Drupal interaction exclusively.
Comment #7
pwolanin commented@effulgensia - for user profiles it can happen for sure?
Comment #8
dave reidConsidering this is the first I've heard of this with over a year of Drupal 7, I'm inclined to say it's more of a un-typical server configuration or unintentionally malicious admin removing files, which to me says this is normal.
Comment #9
effulgentsia commentedHow about this? This does #1 from the issue summary. I think randomizing suffix for race conditions belongs in a separate issue.
Comment #10
pwolanin commentedLooks like a reasonable approach to me.
Comment #11
dries commentedThis patch looks good so I'm tempted to mark it RTBC.
The only question that came to mind is whether we need all these options? Can't we always check for the file to exist in both the database and on the file system? It seems odd that we would want to support inconsistent database tables.
Comment #12
effulgentsia commentedThe reason for FILE_CHECK_EXISTS_FILE is to make drupal_file_exists() by default be BC compatible with file_exists(), because:
1) I think we generally try to do that for drupal_*() versions of native PHP functions
2) So this patch can be backported to D7
3) So that when file_destination() is called from file_unmanaged_copy(), it works (unmanaged files are ones that don't exist in the {file_managed}).
The reason for FILE_CHECK_EXISTS_FILE_OR_ENTITY is to support the goal of this issue, which is to rename an upload to not clash with either something on disk, or something in the db. While we don't want to encourage inconsistency between db and disk, there are conditions in which it can happen (e.g., someone deletes a file from disk via the command line, but leaves the db record around, or an unmanaged file is copied to disk, and by definition isn't meant to have a db entry), and if we're saving an upload, we just want a name that's free both with respect to disk and with respect to db.
FILE_CHECK_EXISTS_FILE_AND_ENTITY is used by the test in this patch for asserting a file exists in both.
FILE_CHECK_EXISTS_ENTITY has no use-case so far, but is just there for symmetry.
Comment #13
pwolanin commentedThe function is also used for unmanged files - i.e. they are never listed in the database.
I guess the DB query is a no-op in that case, but it's a query we don't need to do.
Comment #14
pwolanin commentedI backported this to D7 and verified that it prevents the error there also when the patch is applied.
Comment #15
pwolanin commentedHere's the D7 patch - the .test file is in a different directory.
Comment #16
gábor hojtsyThe const keyword is not available in PHP 5.2.4/5.2.5 which is the minimim requirement for Drupal 7 as per http://drupal.org/drupal-7.0 and http://drupal.org/requirements
Comment #17
gábor hojtsyWups, sorry, the issue is still on D8. Here is an updated D7 patch though. This should not be committed to D8. I'm setting back to RTBC for the D8 patch. I did not test either patch.
Comment #18
dave reidRather than having an internal helper function, we should convert this to file_load_by_uri() so that it actually returs a file object like user_load_by_name() does and therefore can be re-used by core and other modules. I think the use of EFQ is unnecessary here too. Just use a normal entity_load() because file entities are always in the database.
Comment #19
pwolanin commented@Dave Reid - that's an easy change to make it like the user function, but note that the doxygen at http://api.drupal.org/api/drupal/includes!file.inc/function/file_load_mu... indicates that using the conditions is deprecated.
Comment #21
pwolanin commentedWell, that causes some interesting additional test failures.
Based on the deprecation comments and the fact that fully loading the entity breaks the tests, I think Alex's original patch was closer to right. If we want a re-usable function, we should make it return the fid for a URI, as suggested in the doxygen.
Comment #22
dave reidHrm, could we maybe get more information about what exactly was causing the fails since it wasn't a drastic number of errors?
Comment #23
pwolanin commented@Dave Reid - seems like we should, in any case, not use a deprecated parameter?
Comment #24
catchTagging.
Comment #25
lasbreyn commented#21: 1556396-zombies-21.patch queued for re-testing.
Comment #27
tim.plunkettRerolled for PSR-0 tests.
Comment #28.0
dave reidcomplete sentence
Comment #31
jbrown commentedThis patch is for the new issue summary and contains the zombie test from #9.
Comment #32
jbrown commentedComment #32.0
jbrown commentedIssue summary
Comment #33
jbrown commentedAdded some logic and tests for file_copy() / file_move() / file_save_data() so that filename of resultant entity is updated to the basename of the desired destination (provided it isn't a directory).The problem was identified here: #1415858: file_move() doesn't rename files correctly when the name of the file changes.
Comment #33.0
jbrown commentedFix typo
Comment #33.1
jbrown commentedadd info about entity filename
Comment #34
Letharion commentedPossibly an issue on it's own, please let me know if so:
Is there any reason why file_unmanaged_copy uses file_exists when it could do
fopen($source, "r")instead and get remote file support as well?Edit: Spun of into #1736312: Let file_unmanaged_copy read remote files as well.
Comment #35
jbrown commented#33: refactor-check-files-exist.patch queued for re-testing.
Comment #37
jbrown commented#33: refactor-check-files-exist.patch queued for re-testing.
Comment #38
dave reidComment #39
jbrown commentedreroll
Comment #40
berdirThis is going to conflict heavily with #1468328: Move file entity info, managed file, and file usage functionality into File module, which is now RTBC. Sorry about that :) I guess you want to wait with further re-rolls until that one is in, and then the easiest way to re-roll might be to copy the new functions over to file.module.
Comment #41
jbrown commentedYeah - I know ;-). I'll reroll again when the other one gets in.
Comment #43
jbrown commentedComment #44
mgiffordMarking related issue - #1163740: PDOException: SQLSTATE[23000]: Integrity constraint violation: 1062 Duplicate entry '' for key 2: INSERT INTO {file_managed} - I'm not sure if it's a duplicate, but worth pointing out here. This is definitely the master thread.
Comment #45
jbrown commentedThat isn't a duplicate. There are many ways that "Integrity constraint violation" can occur.
Comment #46
jbrown commentedComment #47
jbrown commentedComment #49
webchickThis sounds like more a task than a bug to me, so moving it there to put us back under thresholds.
Comment #50
jbrown commentedComment #51
jbrown commentedreroll
Comment #51.0
jbrown commentedminor
Comment #53
jbrown commentedComment #54
yesct commented#53: refactor-check-files-exist.patch queued for re-testing.
Comment #56
jbrown commentedComment #58
jbrown commentedComment #59
yesct commented@jbrown thanks for looking after this issue so well.
To aid getting reviews, it's helpful if you post a sentence describing what changes you made, and an interdiff. If you want a hand with making the interdiff, let me know or ask in irc.
Comment #60
jbrown commentedI've just been re-rolling the patch since #33 (August 15, 2012) - there aren't any changes.
Comment #62
yesct commentedOH! no wonder I was having trouble figuring out the differences. well, thanks again. :)
Comment #62.0
yesct commentedclarify
Comment #63
jbrown commentedFixed a test.
Comment #63.0
jbrown commentedClarify summary.
Comment #63.1
jbrown commentedSummary clarification.
Comment #64
yesct commentedthis has been being re-rolled for quite a while now.
is there any previous feedback that needs to be resolved?
Does this need a total re-review, or just a quick check to rtbc it?
I'm not sure what the next steps are to stop jbrown from re-roll-perpetuality. :)
Comment #65
pwolanin commentedpatch needs a re-roll
1 out of 13 hunks FAILED -- saving rejects to file core/includes/file.inc
Comment #66
jbrown commentedreroll
Comment #67
jbrown commentedChanged the entity query in drupal_file_exists() to use count().
Comment #68
jbrown commented#67: refactor-check-files-exist.patch queued for re-testing.
Comment #70
jbrown commentedComment #71
jbrown commentedComment #72
jbrown commentedReroll.
Comment #72.0
jbrown commentedUpdate summary.
Comment #73
jbrown commentedRemoving tag.
Comment #74.0
(not verified) commentedAdd comma.
Comment #75
pwolanin commentednot sure why this tag was removed - this is certainly a bug in D7
moving back to bug - I don't think this is just a task
Comment #76
pwolanin commentedpatch needs a major re-roll for D8 now
Comment #77
Blooniverse commentedWho is going to do this (D8 re-rolling)? Since it is quite a hassle to have outdated entries in the table file_managed which refer to deleted files (manually from the file system).
Comment #78
Blooniverse commented72: refactor-check-files-exist.patch queued for re-testing.
Comment #80
parasolx commented72: refactor-check-files-exist.patch queued for re-testing.
Comment #82
mgiffordNeeds a serious re-roll.
Comment #83
marthinal commentedI have started the re-roll but I need time. Let me continue and add the patch.
Comment #84
deanflory commentedAnything on this for Drupal 7 as I'm getting this when editing and changing nothing, then saving an Article:
Comment #85
star-szrUnassigning since it's been quite a while. This would be quite a challenging reroll and it may be easier to redo the patch with the most recent patch in mind.
Comment #86
sidharrell commentedmake testbot angry.
Comment #88
sidharrell commentedfixed some stuff.
Comment #90
sidharrell commentedmakin progress.
Comment #92
sidharrell commentedmore progress
Comment #94
sidharrell commentedI think I got all 8 testing green locally.
testbot, I know I've been hard on you lately, but it was for the greater good.
please be kind.
Comment #95
sidharrell commentedI just want to point out that the last time that patch was green was 2 years ago, and it's a 100kb patch.
Not to toot my own horn, but can I get a high five from someone?
That was some mean, nasty, thankless grunt work.
Where's my next "Needs reroll" tagged issue?
Comment #96
jbrown commentedThanks so much Sidney for taking the time to get this passing again - I spent the best part of a week in the summer of 2012 working on this.
The issue summary explains everything. I think the patch satisfies what @Dries asked for in #11.
The key point is that with this patch Drupal does not assume that what is on the filesystem necessarily matches what is in the database. They can diverge very easily, e.g. due to staging issues. Drupal should be able to handle this in a graceful manner instead of throwing a PDO exception that requires the developer to perform surgery in the database to resolve.
I was able to make the file API considerably simpler while solving this issue. From Sidney's latest patch:
28 files changed, 391 insertions(+), 607 deletions(-)
That's 216 fewer lines of code in core with a vastly more robust file API.
I have reviewed the patch and everything is in order. Because I am the original author of the patch I can't mark RTBC.
I not sure what the policy is at this stage, but this is a major bug that can only be resolved my fixing the API. my recommendation is for this too be committed. It makes the whole sub-system considerably simpler.
Can someone please review?
Comment #97
jbrown commentedComment #98
jbrown commentedComment #99
berdirYes, awesome work with that reroll.
I can't say if this can be committed to 8.0.x still. But to find that out, we need to make sure that the issue summary is up to date, prepare a change record and beta evaluation to argue why the API break would be worth it.
Another thing would be to check if we can find ways to make the API change smaller or keep a BC layer. Parts of file.inc recently moved to a FileSystem service. Maybe we can add the new API's there or make a new service for them, and keep the old functions ans depreated wrappers that still work as they did before.
See #2050759: Move drupal_chmod and other code in file.inc to a Drupal\Core\File\FileSystem class and the meta issue #2229865: [meta] Modernize File/StreamWrapper API.
Comment #100
jbrown commentedI think having a BC layer is a bad idea because the old API is fundamentally broken and this patch simplifies everything.
I'm happy to move these functions into a service, but this can happen in a follow up.
I'm going to go ahead and great a change record / beta evaluation.
Comment #101
sidharrell commentedAll that is above my paygrade.
I'm not familiar enough with the API to be able to weigh in. I've only been in Drupal for a couple weeks, and am just trolling the "Needs reroll" tickets, cause it doesn't actually require very much familiarity with Drupal, and lets me leverage my general php skills. I'll leave all the weighty discussion to my betters and move on to the next ticket.
At least you people have a good base to work off of, now. I'll be curious to see what comes of it.
Cheers.
Comment #102
jbrown commentedComment #103
jbrown commentedOkay - I've added the beta phase evaluation and created the draft change record: https://www.drupal.org/node/2426725
Can someone review them?
Comment #104
crotown commentedWe have an Elite customer at Acquia who may be experiencing this same PDOException using Drupal 7.32. What are the prospects of someone re-rolling this patch for Drupal 7? We could immediately test the rerolled patch on a live site (if it indeed solves their issue) and help to move the patch toward RTBC status.
Comment #105
sidharrell commented@crotown I rolled #17 (the last D7 version of the patch that I could see) against 7.32 and ran the full test suite against it locally overnight, came up with 51 fails. Gonna keep working on it. I should be able to get something posted today.
Comment #106
sidharrell commentedThat straight reroll of #17 against 7.32. Have to upload the patch here, so I can have a link to feed to simplytest.me.
Trying to setup a sandbox there.
Comment #107
sidharrell commentedsorry about the noise. I did ask in #drupal-infrastructure if there is an alternative way for my to use testbot outside of the issue, but haven't heard back. I'll change the version back to 8.0-dev soon.
Comment #108
sidharrell commentedchanging version back to 8-0-dev
requeing D8 patch to top of deck. interdiff 94-108 was 0 bytes.
Comment #109
sidharrell commentedComment #110
crotown commentedThe patch #107 seems to have solved the issue for our D7 client, I'll report after it gets serious live use but it looks good so far.
@sidharrell: I cannot thank you enough! But thanks much, even though it is not enough.
The Drupal community is a wonder to behold!
Comment #111
jhedstromPatch no longer applies.
Comment #112
pektinasen commentedI'll work on it and reroll the patch.
Comment #113
danylevskyiI'm a mentor. Helping @pektinasen to work on the issue.
Comment #114
pektinasen commentedThis reroll seems too big. Too many lines of code changed and file conflicts. I stop working on it.
If anyone wants to help me with this issue I'm at the re-roll table in the mentored sprint room.
Comment #115
danylevskyiComment #116
mgiffordI don't know what happened to:
core/modules/migrate/src/Plugin/migrate/destination/EntityFile.php
Mostly just a reroll though.
Comment #119
zaporylieFile core/modules/migrate/src/Plugin/migrate/destination/EntityFile.php was deleted in #2534012: Move module-specific migration support into the file module.
Comment #120
sriharsha.uppuluri commentedJust rerolled to apply patch successfully.
Comment #121
ndf commentedComment #124
Gaelan commentedStarting reroll.
Comment #125
Gaelan commented…or not, looks like it's already done.
Comment #127
Chris CharltonIs D7 patch stalled?
Comment #133
oadaeh commentedHello.
I have an updated D7 patch for the one in #107.
I originally created it back in Oct/Nov of last year, but I wasn't sure it would actually work for the problems we were seeing. It turned out that it did, and it's been in production since mid-Nov. However, I forgot to submit it to this issue after realizing it was a keeper.
With the recent security update, the patch for this issue needs some work, since they both operate in some of the same functions.
So, attached is the updated patch with the security release accounted for.
I could not get an interdiff created, so I did not include one.
I kept getting this error:
Comment #134
oadaeh commentedI'm going to go ahead and adjust the issue for testing the D7 patch. Feel free to revert back to D8 afterwards.
Comment #135
roderikThanks for the updated D7 patch. Verified manually that it's still essentially the same as the one in #15 - #107.
Back to 8.x.
Comment #141
mfbI think this issue could be closed as outdated, or at a minimum it needs an issue summary update. The original issue is no longer relevant as there is no unique constraint on file_managed.uri since #1314214: MySQL driver does not support full UTF-8 (emojis, asian symbols, mathematical symbols); and all functions in file.inc are deprecated.
Comment #142
larowlanAgree that
means this is no longer relevant