In the post-#142995: Add hook_file and make files into a 1st class Drupal object world the usefullness {files} table would be much greater if it had a single entry for each filepath.

There are several problems with the current setup:

  • Call file_load(array('filepath' => 'foo/bar.txt')); when there are two entries with 'foo/bar.txt', you could end up with a different filepath for multiple calls.
  • Call file_delete() on one of those objects. The file would be deleted by the other entry in the {files} table would still exist and be loadable. And contrib modules would only be notified about one of the deletions.

The big complication is what to do with the duplicates. We could easily merge the duplicate records and update the upload.module's tables but contrib modules would be left with dangling fids. There needs to be some mechanism to provide a mapping from duplicate to remaining fid. We could either use a table or a variable, the table would be handy for doing SQL level manipulations but realistically that'd be adding another table to the schema for a temporary purpose and when would it get removed? The variable seems better because it's not as heavy and can be removed easily.

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

drewish’s picture

ps. there's currently no index on filepath which we hit frequently.

chx’s picture

Rename old table, create new. Variable not heavy?? You obviously have not worked with a site that's old (ie coming up since 4.5) and have many millions of files. Don't even try to conjure a map. Keep the old table, create a happy new one, provide SQL and PHP snippets in the update docs and that's about it.

drewish’s picture

I gave this some more thought and considered how you'd resolve two identical records that differ only in fid and say MIME type or uid differs. it might be best to just rename the table to {old_files} and then recreate {files} with a unique filepath index and let each module migrate the files as part of their D7 update function... that way the module that used the file could be more intelligent about which is the correct file... or it could duplicate the physical file if need be.

drewish’s picture

Issue tags: +D7FileAPIWishlist
drewish’s picture

tagging

drewish’s picture

I think we can take a hit from #140860: Consistent table names and database handling of table names and just create a new {file} table and have each module migrate it's files there.

drewish’s picture

Title: Add a unique key to the {files}.filepath column » Rename {files} to {file} and add unique key to the filepath column
Status: Active » Needs review
FileSize
22.5 KB

Here's the start of a patch.

Status: Needs review » Needs work

The last submitted patch failed testing.

drewish’s picture

Status: Needs work » Needs review
FileSize
22.42 KB

here's an untested re-roll

c960657’s picture

+  $limit = 500;
+  $result = db_query_range("SELECT DISTINCT u.fid FROM {upload} u ORDER BY u.vid", array(), 0, $limit);
...
+    // If the fid has changed we need to update the {upload} record to use the
+    // new id.

AFAICT there is a problem with doing this in several passes. In the second and later passes, {upload}.fid contains both references to {files}.fid and {file}.fid, but the code always looks up the row in {files}. Isn't that a problem?

I think I would prefer to simply copy all rows from {files} to {file} (i.e. preserve fid) and make duplicate filenames into duplicate files. This would ease the burden of writing upgrade scripts for module authors. But I don't have much insight into how modules use managed files in practice, so this is just a gut feeling.

drewish’s picture

Status: Needs review » Needs work

c960657, giving this a closer look you're correct that the current code is broken. i think the fact that $new_file is being loaded from {files} rather than {file} is a bug. it also looks like the think that the update query on {upload} should have a condition to restrict the change to the current vid. that way we can work our way sequentially thorough the table. it might duplicate some work but it'll ensure that the changes are correct.

as for duplicating the files, i'm not sure i understand what you mean... the whole point of this is to remove duplicate filepaths.

c960657’s picture

Re: duplicating files:
If {files} contains two rows where filepath is sites/example.com/files/foo.txt, I suggest that these are converted to two rows in {file} with filepath set to sites/example.com/files/foo.txt and sites/example.com/files/foo_0.txt respectively, and that the latter file is created on the disk as a copy of the original file. This suggestion is based on the assumption that most installs will have very few duplicate rows - I don't know if this is actually the case.

aaron’s picture

subscribe

jmstacey’s picture

Keep an eye on #227232: Support PHP stream wrappers as this issue is now related.

jmstacey’s picture

files_table_rename_329301.patch is now in GitHub commit 4b0d6e76 and forward. Prepared for overhauls of filepath to uri and the removal of the status column.

drewish’s picture

Status: Needs work » Closed (duplicate)

marking this as a duplicate of #517814: File API Stream Wrapper Conversion