hook_file_references() was not designed for a highly flexible field storage

drewish - January 2, 2009 - 11:07
Project:Drupal
Version:7.x-dev
Component:file system
Category:bug report
Priority:critical
Assigned:Unassigned
Status:needs work
Issue tags:API clean-up, D7 API clean-up, D7FileAPIWishlist
Description

I propose we remove hook_file_references() and instead store the data in a table.

I was chatting with merlinofchaos and he was complaining about the decision to remove the nid from the files table in D6. The change was made to remove the hard coding that only allowed files to be associated with nodes. Instead each module now needs to provide its own table to joining files to content. The downside for Views is that there's no easy way to query for all the files associated with a node now.

His comments reminded me of another issue (#329311: hook_file_references() should have flag to return links to usages) that I'd been working on which was trying to find a way to build a list of links to pages where a file is used. It occurred to me that it might be possible to take care of both by having modules register their usage with core and write that to a table. That would provide data for Views to query for associating files with nodes, structured data that could be converted into links, and would also serve file_delete() a way to determine if a file is unused and can be safely deleted.

The table would need to store the file id, module name, a table name and the id in the table. Earl suggested 'object' and 'oid' as the names for the latter two columns. The object column is 64 characters long which as far as I could tell is longer than PostgreSQL allows.

The patch adds three new functions: file_get_usage(), file_add_usage() and file_add_usage() which hide the details of querying the table.

AttachmentSizeStatusTest resultOperations
file_usage.patch21.94 KBIdleFailed: Failed to install HEAD.View details | Re-test

#1

drewish - January 2, 2009 - 11:14
Version:6.x-dev» 7.x-dev

#2

Crell - January 2, 2009 - 16:12
Component:database system» file system
Status:needs review» needs work

Makes sense, but this isn't really a database issue. It's a file subsystem issue. Refiling.

Object and oid are both reserved words for PostgreSQL at least IIRC, if not in ANSI SQL, so -1 on using those.

#3

drewish - January 2, 2009 - 22:25
Status:needs work» needs review

According to http://drupal.org/node/141051 oid is okay but object is reserved. Changing it to 'type' and 'id' instead.

Just tested that all the file and upload pass on pgsql.

AttachmentSizeStatusTest resultOperations
file_353458.patch25.13 KBIdleFailed: Failed to install HEAD.View details | Re-test

#4

drewish - January 2, 2009 - 22:26

whoops, that last patch included #349671: Make the postgresql driver independant of the schema which is necessary to get pgsql working.

AttachmentSizeStatusTest resultOperations
file_353458.patch19.8 KBIdleFailed: Failed to install HEAD.View details | Re-test

#5

drewish - January 5, 2009 - 00:21

Trying out the new tagging system...

#6

drewish - January 5, 2009 - 04:15

fixed some problems with the comments and PHPDoc.

AttachmentSizeStatusTest resultOperations
file_353458.patch20.7 KBIdleFailed: Failed to install HEAD.View details | Re-test

#7

drewish - January 5, 2009 - 05:49

webchick:
drewish_, so in general, +1 to http://drupal.org/node/353458. My main worry is keeping that type column consistent among all the various modules.
[9:11pm] Druplicon:
http://drupal.org/node/353458 => Drop hook_file_references() in favor of a table based approach => Drupal, file system, normal, patch (code needs review), 4 IRC mentions
[9:12pm] webchick:
Also tying it to a table / key might be limiting.
[9:12pm] webchick:
But if we make it more "open" like "object" and "oid" then it gets *really* hard to drive consistency.
[9:12pm] drewish_:
webchick: yeah, for node, node_revision, comment, term it would work fine
[9:12pm] drewish_:
webchick: but for other types of ids it could be problematic
[9:12pm] webchick:
right.

#8

merlinofchaos - January 6, 2009 - 03:40

I'm in favor of this concept, but since the code is all DBTNG which I haven't gotten familiar with and testing framework which I haven't gotten familiar with, I can't actually offer a code review.

#9

System Message - January 8, 2009 - 01:05
Status:needs review» needs work

The last submitted patch failed testing.

#10

drewish - January 19, 2009 - 21:47

One other important aspect of this patch that just occurred to me is that theme's can't implement hook_file_references() and there for can't prevent files from being deleted. On the other hand they can easily insert database records reporting on useage.

#11

drewish - January 20, 2009 - 21:04
AttachmentSizeStatusTest resultOperations
file_353458.patch24.08 KBIdleFailed: Failed to apply patch.View details | Re-test

#12

catch - January 21, 2009 - 12:52

Couldn't the query in file_get_usage() be static? Doesn't seem to be any need for db_select() there.

#13

drewish - March 6, 2009 - 19:56
Status:needs work» needs review

catch, yeah probably should be... changed that.

revisting this because it'll really help #391330: File Field for Core get in since we can no longer query fields to see if any have a given file id.

bumped the update's version number. it also occurred to me that we'll need to make it clear that as part of the module upgrade to D7 they'll need to call file_add_usage() for each of their files.

AttachmentSizeStatusTest resultOperations
file_usage_353458.patch21.38 KBIdleFailed: 10458 passes, 3 fails, 4 exceptionsView details | Re-test

#14

System Message - March 6, 2009 - 20:40
Status:needs review» needs work

The last submitted patch failed testing.

#15

grendzy - March 7, 2009 - 17:19

subscribing

#16

jpetso - March 7, 2009 - 21:16
AttachmentSizeStatusTest resultOperations
file-usage-table.patch26.01 KBIdleFailed: 10461 passes, 2 fails, 0 exceptionsView details | Re-test

#17

jpetso - March 7, 2009 - 21:18

Oh man? d.o just swallowed my comment... good thing that Konqueror can recover it with "Back". So.

I had a look at it, and I think that when the last reference is removed, the file should be automatically deleted - there's no reason why modules should be calling both file_remove_usage() *and* file_delete(), because there's really only a use case for file_remove_usage() when file_delete() is called directly afterwards.

Removing the usage can't be done in file_delete() because it hasn't got all the arguments that file_remove_usage() needs, so I took the other road and put file_delete() into file_remove_usage() if no usages remain. As a bonus, that also makes it possible to simplify the semantics of file_delete(): it now always behaves like it did before with $force == TRUE, and the "soft delete" is now solely covered by file_remove_usage().

Other minor points tackled by this change of the patch:

  • Bug fixed: $file in file_add_usage() in user_save() doesn't exist (anymore?), that's $picture instead.
  • In user_save(), neither file_delete() nor file_remove_usage() is required for moved files, because the fid stays the same for the existing filepath when using file_move() with FILE_EXISTS_REPLACE, and file_move() also takes care of the remaining file_delete() stuff for the source file.
  • user_file_references() is not required anymore... removed.

I think my SimpleTest setup is kinda broken because totally unrelated tests break, so here's the updated patch without making sure that all the tests pass.

#18

jpetso - March 9, 2009 - 02:25
Status:needs work» needs review

The status change also got swallowed yesterday. Now, code needs review.

#19

System Message - March 9, 2009 - 03:05
Status:needs review» needs work

The last submitted patch failed testing.

#20

jpetso - March 18, 2009 - 13:34

Maybe I get around to set up a proper installation where the tests work as they should, and find out what goes wrong. (Can't be too difficult, actually.) But maybe someone else can get around to do that sooner than me :P

#21

jpetso - May 15, 2009 - 14:29
Status:needs work» needs review

Seems like no one has picked this up yet, and I'm back again. So, bugfixing time!
FileDeleteTest::testInUse(), bugs in system_cron(), and upload.module's form handling are now fixed - with a little luck, the test bot likes us now.

Feedback and reviews appreciated.

#22

jpetso - May 15, 2009 - 14:29

with patch.

AttachmentSizeStatusTest resultOperations
file-usage-table-try2.patch25.23 KBIdleFailed: Failed to apply patch.View details | Re-test

#23

System Message - May 15, 2009 - 14:36
Status:needs review» needs work

The last submitted patch failed testing.

#24

jpetso - May 15, 2009 - 22:43
Status:needs work» needs review

Nobody told the test bot that it can use -p1 as option for patch. Freaking standard Git patch format.

AttachmentSizeStatusTest resultOperations
file-usage-table-try2.patch25.19 KBIdleFailed: Failed to apply patch.View details | Re-test

#25

sun - May 16, 2009 - 19:27

Subscribing for later review, if no one beats me to it.

#26

System Message - May 16, 2009 - 23:20
Status:needs review» needs work

The last submitted patch failed testing.

#27

jpetso - May 28, 2009 - 13:13
Status:needs work» needs review

Rerolling to current HEAD. Also, improving hook_file_move() docs and example implementation. Also, implementing hook_file_move() for user.module and upload.module (using the example implementation) so that file references remain valid. That would fix the one TODO item that was still in there, I believe it's now ready for wide appeal.

AttachmentSizeStatusTest resultOperations
file-usage-table-try3.patch25.6 KBIdleFailed: Failed to install HEAD.View details | Re-test

#28

drewish - May 28, 2009 - 15:04
Status:needs review» needs work

testInUse() seems really wonky. The comment talks about deleting but that doesn't seem to be happening.

I'm also not so sure about the 'count' value. I kind of feel like it doesn't really matter how many times a module is using the file on the same type, id pair. The important thing is knowing where the file is used and being able to build some links to the usages: type='node', id=234 gives you url('node/' . 234).

#29

jpetso - May 28, 2009 - 19:30

- testInUse(): yup, good point - those strings must be leftovers from previous versions where file_remove_usage() didn't delete files when they weren't used anymore. I'll look into improving the explanations.

- I'd like to emphasize that you introduced the 'count' value by yourself, and while it's not that helpful for building backlinks, it frees modules from the need of counting the usages by themselves. Of course one might wonder if modules will at all need to use a single file on the same type+id multiple times, but in case we assume so, the 'count' value does have its merits.

#30

aaron - June 9, 2009 - 15:15

subscribe

#31

jpetso - June 22, 2009 - 15:39
Status:needs work» needs review

New version: fixes the test messages that drewish mentioned, and updates the update number in system.install. (Sorry for the long wait for just a few string fixes.)

My argument for the usefulness of the 'count' value has not yet been addressed, please comment on that. (Personally, I would expect an RTBC this time.)

AttachmentSizeStatusTest resultOperations
file-usage-table-try4.patch26.73 KBIdleFailed: Failed to install HEAD.View details | Re-test

#32

System Message - June 24, 2009 - 01:05
Status:needs review» needs work

The last submitted patch failed testing.

#33

tstoeckler - June 25, 2009 - 18:53
Status:needs work» needs review

Resending for test run.

#34

System Message - June 27, 2009 - 20:10
Status:needs review» needs work

The last submitted patch failed testing.

#35

drewish - August 19, 2009 - 23:35

i'm guessing this got wacked by the stream wrapper patch.

#36

sun - September 10, 2009 - 16:57
Issue tags:+API clean-up

Introducing a new tag for feature freeze: API clean-up.

#37

sun - October 9, 2009 - 13:19
Priority:normal» critical
Issue tags:+D7 API clean-up

Tagging absolutely critical clean-ups for D7. Do not touch this tag. Please either help with this or one of the other patches having this tag.

#38

sun - October 27, 2009 - 18:08
Status:needs work» needs review

Re-rolled against HEAD.

This still looks ready to fly...

AttachmentSizeStatusTest resultOperations
drupal.file-usage.38.patch28.6 KBIdleFailed: 14459 passes, 6 fails, 7 exceptionsView details | Re-test

#39

System Message - October 27, 2009 - 18:25
Status:needs review» needs work

The last submitted patch failed testing.

#40

sun - October 28, 2009 - 21:53

1) ok, the tests fail, because the new File module in core implements *heavy* workarounds and really awkward functions to make hook_file_references() properly work with file fields. I'll try to get the tests working first, afterwards we can think about further clean-ups in there (and it looks like a couple of functions could be simplified or even removed). This patch also allows to remove a major @todo from file.module.

That alone justifies this patch to land even after API freeze.

2) A second thought I had may be deferred to D8, but: Those file usage functions basically have the potential to entirely eliminate the {file}.status column. A file that isn't used anywhere should be deleted during garbage collection, no? :)

#41

effulgentsia - October 28, 2009 - 22:34

subscribing

#42

sun - October 29, 2009 - 03:38

1) It turns out that the workarounds for hook_file_references() implemented in File field are quite heavy, so I'm attaching a work-in-progress patch here.

2) Still seems very valid to me. Probably really too late, but oh, nice conclusion. But then again, please also note:

3) All files uploaded via File field will be deleted after 6 hours. That is, because {file}.status is not updated properly. (And since that does not happen in File field at all so far, I'm heavily struggling to find the proper place to call file_add_usage()....)

Any feedback/help from someone who knows all this stuff more than me would be highly appreciated.

AttachmentSizeStatusTest resultOperations
drupal.file-usage.41.patch37.16 KBIgnoredNoneNone

#43

yched - November 1, 2009 - 10:57

Might be good to ping quicksketch about this...

#44

quicksketch - November 8, 2009 - 08:04

I'm heavily struggling to find the proper place to call file_add_usage()....

The call to file_add_usage() (or setting the status column to 1, if we don't get this in) should be done in file_field_update(). This function is called when the node is actually saved. It should also be done on insert, but since file_field_insert() calls file_field_update(), we can add it in just that one place.

Generally my feeling on this patch is a huge +1. File module *does* have to do all kinds of wonky querying to get file references, especially with Field API's abstraction layer over performing actual queries. File module right now is doing individual queries on every file field on the Drupal site, retrieving entire rows, then just calling count() on the complete rows. It's ridiculously inefficient, so I'd love to see this added if possible.

#45

sun - November 8, 2009 - 17:54
Title:Drop hook_file_references() in favor of a table based approach» hook_file_references() was not designed for a highly flexible field storage
Category:task» bug report

Considering all kind of possible field storage engines, I see no way around fixing this, in the way this patch attempts to.

#47

scroogie - November 12, 2009 - 19:55

If I understand correctly, this is also needed for modules that write files to the files table without being referenced anywhere (like the various file managers for D6 do at the moment), because File would otherwise delete them?

 
 

Drupal is a registered trademark of Dries Buytaert.