hook_file_references() was not designed for a highly flexible field storage
| Project: | Drupal |
| Version: | 7.x-dev |
| Component: | file system |
| Category: | bug report |
| Priority: | critical |
| Assigned: | Unassigned |
| Status: | needs work |
| Issue tags: | API clean-up, D7 API clean-up, D7FileAPIWishlist |
I propose we remove hook_file_references() and instead store the data in a table.
I was chatting with merlinofchaos and he was complaining about the decision to remove the nid from the files table in D6. The change was made to remove the hard coding that only allowed files to be associated with nodes. Instead each module now needs to provide its own table to joining files to content. The downside for Views is that there's no easy way to query for all the files associated with a node now.
His comments reminded me of another issue (#329311: hook_file_references() should have flag to return links to usages) that I'd been working on which was trying to find a way to build a list of links to pages where a file is used. It occurred to me that it might be possible to take care of both by having modules register their usage with core and write that to a table. That would provide data for Views to query for associating files with nodes, structured data that could be converted into links, and would also serve file_delete() a way to determine if a file is unused and can be safely deleted.
The table would need to store the file id, module name, a table name and the id in the table. Earl suggested 'object' and 'oid' as the names for the latter two columns. The object column is 64 characters long which as far as I could tell is longer than PostgreSQL allows.
The patch adds three new functions: file_get_usage(), file_add_usage() and file_add_usage() which hide the details of querying the table.
| Attachment | Size | Status | Test result | Operations |
|---|---|---|---|---|
| file_usage.patch | 21.94 KB | Idle | Failed: Failed to install HEAD. | View details | Re-test |

#1
#2
Makes sense, but this isn't really a database issue. It's a file subsystem issue. Refiling.
Object and oid are both reserved words for PostgreSQL at least IIRC, if not in ANSI SQL, so -1 on using those.
#3
According to http://drupal.org/node/141051 oid is okay but object is reserved. Changing it to 'type' and 'id' instead.
Just tested that all the file and upload pass on pgsql.
#4
whoops, that last patch included #349671: Make the postgresql driver independant of the schema which is necessary to get pgsql working.
#5
Trying out the new tagging system...
#6
fixed some problems with the comments and PHPDoc.
#7
webchick:drewish_, so in general, +1 to http://drupal.org/node/353458. My main worry is keeping that type column consistent among all the various modules.
[9:11pm] Druplicon:
http://drupal.org/node/353458 => Drop hook_file_references() in favor of a table based approach => Drupal, file system, normal, patch (code needs review), 4 IRC mentions
[9:12pm] webchick:
Also tying it to a table / key might be limiting.
[9:12pm] webchick:
But if we make it more "open" like "object" and "oid" then it gets *really* hard to drive consistency.
[9:12pm] drewish_:
webchick: yeah, for node, node_revision, comment, term it would work fine
[9:12pm] drewish_:
webchick: but for other types of ids it could be problematic
[9:12pm] webchick:
right.
#8
I'm in favor of this concept, but since the code is all DBTNG which I haven't gotten familiar with and testing framework which I haven't gotten familiar with, I can't actually offer a code review.
#9
The last submitted patch failed testing.
#10
One other important aspect of this patch that just occurred to me is that theme's can't implement hook_file_references() and there for can't prevent files from being deleted. On the other hand they can easily insert database records reporting on useage.
#11
#12
Couldn't the query in file_get_usage() be static? Doesn't seem to be any need for db_select() there.
#13
catch, yeah probably should be... changed that.
revisting this because it'll really help #391330: File Field for Core get in since we can no longer query fields to see if any have a given file id.
bumped the update's version number. it also occurred to me that we'll need to make it clear that as part of the module upgrade to D7 they'll need to call file_add_usage() for each of their files.
#14
The last submitted patch failed testing.
#15
subscribing
#16
#17
Oh man? d.o just swallowed my comment... good thing that Konqueror can recover it with "Back". So.
I had a look at it, and I think that when the last reference is removed, the file should be automatically deleted - there's no reason why modules should be calling both file_remove_usage() *and* file_delete(), because there's really only a use case for file_remove_usage() when file_delete() is called directly afterwards.
Removing the usage can't be done in file_delete() because it hasn't got all the arguments that file_remove_usage() needs, so I took the other road and put file_delete() into file_remove_usage() if no usages remain. As a bonus, that also makes it possible to simplify the semantics of file_delete(): it now always behaves like it did before with $force == TRUE, and the "soft delete" is now solely covered by file_remove_usage().
Other minor points tackled by this change of the patch:
I think my SimpleTest setup is kinda broken because totally unrelated tests break, so here's the updated patch without making sure that all the tests pass.
#18
The status change also got swallowed yesterday. Now, code needs review.
#19
The last submitted patch failed testing.
#20
Maybe I get around to set up a proper installation where the tests work as they should, and find out what goes wrong. (Can't be too difficult, actually.) But maybe someone else can get around to do that sooner than me :P
#21
Seems like no one has picked this up yet, and I'm back again. So, bugfixing time!
FileDeleteTest::testInUse(), bugs in system_cron(), and upload.module's form handling are now fixed - with a little luck, the test bot likes us now.
Feedback and reviews appreciated.
#22
with patch.
#23
The last submitted patch failed testing.
#24
Nobody told the test bot that it can use -p1 as option for patch. Freaking standard Git patch format.
#25
Subscribing for later review, if no one beats me to it.
#26
The last submitted patch failed testing.
#27
Rerolling to current HEAD. Also, improving hook_file_move() docs and example implementation. Also, implementing hook_file_move() for user.module and upload.module (using the example implementation) so that file references remain valid. That would fix the one TODO item that was still in there, I believe it's now ready for wide appeal.
#28
testInUse() seems really wonky. The comment talks about deleting but that doesn't seem to be happening.
I'm also not so sure about the 'count' value. I kind of feel like it doesn't really matter how many times a module is using the file on the same type, id pair. The important thing is knowing where the file is used and being able to build some links to the usages: type='node', id=234 gives you url('node/' . 234).
#29
- testInUse(): yup, good point - those strings must be leftovers from previous versions where file_remove_usage() didn't delete files when they weren't used anymore. I'll look into improving the explanations.
- I'd like to emphasize that you introduced the 'count' value by yourself, and while it's not that helpful for building backlinks, it frees modules from the need of counting the usages by themselves. Of course one might wonder if modules will at all need to use a single file on the same type+id multiple times, but in case we assume so, the 'count' value does have its merits.
#30
subscribe
#31
New version: fixes the test messages that drewish mentioned, and updates the update number in system.install. (Sorry for the long wait for just a few string fixes.)
My argument for the usefulness of the 'count' value has not yet been addressed, please comment on that. (Personally, I would expect an RTBC this time.)
#32
The last submitted patch failed testing.
#33
Resending for test run.
#34
The last submitted patch failed testing.
#35
i'm guessing this got wacked by the stream wrapper patch.
#36
Introducing a new tag for feature freeze: API clean-up.
#37
Tagging absolutely critical clean-ups for D7. Do not touch this tag. Please either help with this or one of the other patches having this tag.
#38
Re-rolled against HEAD.
This still looks ready to fly...
#39
The last submitted patch failed testing.
#40
1) ok, the tests fail, because the new File module in core implements *heavy* workarounds and really awkward functions to make hook_file_references() properly work with file fields. I'll try to get the tests working first, afterwards we can think about further clean-ups in there (and it looks like a couple of functions could be simplified or even removed). This patch also allows to remove a major @todo from file.module.
That alone justifies this patch to land even after API freeze.
2) A second thought I had may be deferred to D8, but: Those file usage functions basically have the potential to entirely eliminate the {file}.status column. A file that isn't used anywhere should be deleted during garbage collection, no? :)
#41
subscribing
#42
1) It turns out that the workarounds for hook_file_references() implemented in File field are quite heavy, so I'm attaching a work-in-progress patch here.
2) Still seems very valid to me. Probably really too late, but oh, nice conclusion. But then again, please also note:
3) All files uploaded via File field will be deleted after 6 hours. That is, because {file}.status is not updated properly. (And since that does not happen in File field at all so far, I'm heavily struggling to find the proper place to call file_add_usage()....)
Any feedback/help from someone who knows all this stuff more than me would be highly appreciated.
#43
Might be good to ping quicksketch about this...
#44
The call to file_add_usage() (or setting the status column to 1, if we don't get this in) should be done in file_field_update(). This function is called when the node is actually saved. It should also be done on insert, but since file_field_insert() calls file_field_update(), we can add it in just that one place.
Generally my feeling on this patch is a huge +1. File module *does* have to do all kinds of wonky querying to get file references, especially with Field API's abstraction layer over performing actual queries. File module right now is doing individual queries on every file field on the Drupal site, retrieving entire rows, then just calling count() on the complete rows. It's ridiculously inefficient, so I'd love to see this added if possible.
#45
Considering all kind of possible field storage engines, I see no way around fixing this, in the way this patch attempts to.
#47
If I understand correctly, this is also needed for modules that write files to the files table without being referenced anywhere (like the various file managers for D6 do at the moment), because File would otherwise delete them?