Use Mime Detect mimetype when validating
isaac77 - May 20, 2009 - 00:56
| Project: | FileField |
| Version: | 6.x-3.0 |
| Component: | Code |
| Category: | bug report |
| Priority: | normal |
| Assigned: | Unassigned |
| Status: | postponed |
Jump to:
Description
Thanks for an important module! I know that in the 5.x version the mimedetect module determined the filemime value stored in the files table in the database. Is this no longer the case?
It seems that mimedetect module is now only used to determine whether or not a file should be allowed for upload. The core file_get_mimetype function seems to be used to set the actual filemime value set in the database.
Is this correct? If so, might it be possible to allow mimedetect to (if present) set the filemime value stored in the database? Or is that impossible now that core functions seem to be taking care of the insert?
Appreciate any clarification you can give.

#1
You're correct in your assessment, Drupal core now handles insertion into the database table and determines the mime type of the file (based on the extension). Mime Detect is only used to confirm that the extension and mime-type are correct based on the contents of the file. It's definitely possible to override this behavior by implementing hook_file_insert() and changing the value of the $file->mimetype property. Note that the file has already been inserted into the database by the time hook_file_insert() has been triggered, so you'd need to do an update to the existing record.
Implementation details aside, why would you want to use Mime Detect anyway for the mime type? As far as I know the mime type list included with core is the same as the mime type list used by Mime Detect, so it wouldn't have any difference if Mime Detect were used.
#2
This causes a problem in some cases.
The way FileField checks that the extension matches the mime type is by using Mimedetect.module on the file and then compare that mime type to the one that is already saved to the database (see
filefield_validate_extensions()). The problem is that the two actually don't always match up. For example, Drupal core uses "application/rtf" for RTF files while thefilecommand on Linux (which Mimedetect uses) usually uses "text/rtf". So FileField complains that the file extension doesn't match the content of the file, although it does.I changed the file /usr/share/file/magic.mime (and magic.mime.mgc) on my Ubuntu box to fix this. A better solution would be if FileField used a single method for detecting mime types rather than two. Maybe it's MimeDetect that should implement hook_file_insert() and always set the mime type for new files?
#3
Thanks for the clarifications. zoo33's concern about consistency seems more important, but one more question:
Isn't mimedetect module more robust than file_get_mimetype? I thought that file_get_mimetype only looks at the file extension, while mimedetect resorts to other methods (the magic db, etc) if the file extension does not map.
If that is the case, it might be ideal to use mimedetect to assign the mime value for the database (if mimedetect is present). But I certainly defer to your judgement if you feel this would introduce unneeded complexity.
FYI, the particular concern that brought me to this is the inability of file_get_mimetype to identify a .flv file as flash video.
#4
Yeah we had the same problem with zip files for a long time, see #444724: not recognizing the ZIP mime-type (application/zip). Updating the magic file is definitely a good way to fix this temporarily. Let's change this to a bug report and I'll see if there's a way that we can simply disregard the $file->mimetype provided by core and do the checking based on MimeDetect's extension checking. Note that this still won't affect the value stored in the database (which really isn't that important) but will just use MimeDetect's mimetype when it comes to validation.
#5
#6
Actually! Looking at our code, we're ALREADY using MimeDetect for the database value:
$file->filemime = module_exists('mimedetect') ? mimedetect_mime($file) : file_get_mimetype($file->filename);So the problem is really that MimeDetect isn't compatible with itself in some cases (such as the .flv example) when the Mimetype is not within the Magic File.
#7
You're right, that line seems like it should do it. But the function that contains the line, field_file_save_file, does not seem to be called from anywhere (except filefield.devel.inc). Am I missing something? Thanks again!
#8
Oh right, silly me. The mimetype is officially saved in field_file_save_upload(), which calls file_save_upload() (part of core) which makes the entry into the files table. The only way we can change the mime type is to update the table record after it's been inserted.
So again though, using the mime type when inserting into the database isn't the important thing is it? We just want Mime Detect to be used both for finding the mimetype based on the extension and checking it on upload.
#9
Thanks for the clarification! Using mimedetect to set the database value might be an advantage in cases where the core file_get_mimetype function fails, and mimedetect could use the magic file (or some other trickery) to find the appropriate mimetype.
Or is that such a rare occurence that it's not worth the added complexity of inserting new values after file_save_upload() has created the db record?
#10
I think updating files in the database to use Mime Detect's mimetype might be out of scope for FileField. Maybe Mime Detect should do that by default, but this issue would be better solved by using a coherent way of validating file extensions against the files' actual content. Unfortunately, I don't think Mime Detect has an interface for looking up file extensions.
The reason I'm back in this issue again is that I've been bitten by this once more. Apparently, Mime Detect (or the
filecommand) interprets Excel files as being Word files. Changing the magic file was trickier this time, as there are a number of different content strings thatfileuses to detect Word files.But that's more of a Mime Detect issue i guess.
I think my solution for now will be to just comment out the Mime Detect validation in FileField. I'm not sure how to solve this properly, but I think it will require changes to both FileField and Mime Detect.
#11
Well after re-reading this post a few times, I can't think of any changes that need to be made to FileField. Any suggestions would be welcome, but I think the problem is mostly Mime Detect's magic file being inaccurate in some cases, not a problem with FileField. Unless there's something specific that FileField can do, this will probably be moved to won't fix.
#12
I think a relevant change would be to not use MimeDetect for this at all. The mismatch between core's mime detection and that of MimeDetect is what makes this a problem.
The other alternative would be to somehow use MimeDetect for the actual file records, but that's handled by core, so...
#13
By "this" do you mean validation? If FileField doesn't use MimeDetect, is there much point in having MimeDetect installed at all?
I really think this is just a matter of MimeDetect's magic file being incorrect in certain cases, I don't think there's any need to change FileField because MimeDetect *does* use the same list as core (it actually calls the function directly). The only way the comparison differs is if MimeDetect returns a type (via the magic file) that isn't in the core list. In which case it's a bug in the magic file that should be fixed in MimeDetect.
#14
OK, that is not the way I seem to remember it, but I'll do some homework and come back if I have something to add.