I've been doing a LOT of big gallery uploads and imports ... on behalf of a client who's trying to supply captions as fast as they can, but ... I also need to force naming conventions on the images, and renaming images, whilst negotiating lists of captions to match up is a terror.
I'm now using a flat-file syntax to drop the relevant info into my directories as we move them around. I'm choosing to use a version popularized years ago by ACDSee image browser, as I've since seen it used by other image software as well.
Now, when importing my galleries, all the fields are filled out for me!

In keeping with the very important goal of keeping the image package as simple as possible, all I've done is introduce a single hook callback that allows a third-party module to provide metadata to the image_import form.
It looks like this:
function image_import_form() {
...
..
$files = file_scan_directory($dirpath, '.*');
+ // Allow other modules to supply metadata about the images being imported
+ // hook_mediadescribe() returns a metadata array to add extra information, possibly including a description or title.
+ $metadata = module_invoke_all('describemedia', $dirpath);
...
..
foreach ($files as $file) {
$info = image_get_info($file->filename);
+ $meta = $metadata[$file->filename];
...
..
$form['files']['body'][] = array(
'#type' => 'textfield',
+ '#default_value' => $meta['description'],
'#size' => 20,
);
}
Then I worked my logic into a small module that exists to read the standard description file metadata and feed it back to image_import. The idea then came to me that if I made the hook useful enough, more modules could use it, both as consumers and suppliers, so I now have hook_describemedia() !
I'm thinking of releasing this import/export module (yes, it exports image galleries also). At the moment it's only really an add-on to image_import.module, but I like the way the code is going.
I don't mind rolling it for D6 also, but I'd like to request this hook be made available in image_import.module. What are the chances? Adding hooks is good mojo :)
Releasing a module with a patch that needs to be applied to another one is not :(
My module does the following:
mediadescriber.module README
/**
* @file mediadescriber.module
*
* Accesses flat-file descriptions alongside images (or other managed files).
*
* In the first case, this is designed to enhance image_import functionality, to
* enable rapid uploading of image galleries prepared offline, and to enable
* those galleries to be migrated between Drupal sites easily.
*
* This module creates and updates 'descript.ion' files as used by some old-
* school file metadata managers - specifically ACDSee.
*
* It slso declares a hook for image_import to use to extract that info when
* scanning directories.
*
* This module provides no interface or options, it just enhances contextual
* metadata behind-the-scenes when importing or modifying image resources.
* It does NOT re-scan image directories for modifications made after import, it
* just assists filling out the input fields the first time around.
*
* Currently it takes ACDSee-type format of
*
* {(optionally quoted) filename}{whitespace}{caption}{newline}
*
* It should also handle an equivalent simple CSV file,
* replacing [whitespace] with [,]
*
* WHY READ - to assist adding captions to images when importing them in bulk
* from other places ... like a desktop image management tool.
*
* WHY WRITE - so that files/image directories can be accessed using other
* means, or exported/archived to disk with at least some of the extra metadata
* retained.
*
* WHY? - Because flat-file storage of information about files is still one of
* the most robust and portable ways of keeping information about them available
* when archiving, zipping, emailing, or transferring them. Databases are more
* powerful for real classifications, but you are always locked into using the
* database to get that information out again. Flat-file information from 10
* years ago is still available if you dig up the old disks. Not so with any
* CMS or software package.
*
* This type of functionality (the hook_describemedia()) can be re-implemented
* in other modules to read, for example:
*
* = metadata.xml as found in some fileshare apps,
* = playlist.m3u MP3 Playlist information
* = Embedded EXIF data or ID3 tags
* = RSS media or podcast formats
* = xmlsitemap information about files or URLs
*/
hook_describemedia
/**
* A mechanism for modules to extract and return information (metadata or
* content) about given files, eg media images.
*
* Possibly even URLs one day.
* ###########################################
*
* $metadata = hook_describemedia('files/images');
* OR
* $metadata = module_invoke_all('mediadescribe', 'files/images');
*
* $metadata = array(
* 'files/images/cat.jpg' => array(
* 'title' => 'LOLCats #42',
* 'description' => 'I can haz cheezburger',
* )
*
* 'files/images/hello.jpg' => array(
* 'title' => 'The Goatse',
* 'description' => 'You've been goatse\d'!',
* )
* );
*
* ###########################################
*
* @see mediadescriber_describemedia()
*
* @param $dirpath a folder to scan, recursively.
* - may also be a single file getting information requested about it
*
* @return $metadata an array, keyed by filename, that may contain attributes,
* such as title, caption, or other relevant values.
*
*/
function hook_describemedia($dir) {
| Comment | File | Size | Author |
|---|---|---|---|
| #5 | image.image-import-file-metadata.patch | 1.97 KB | sun |
| mediadescriber-20080508.tar_.gz | 4.04 KB | dman | |
| mediadescriber.png | 13.37 KB | dman | |
| image_import-hook_describemedia.patch | 1.79 KB | dman |
Comments
Comment #1
dman commentedSmall typos in the doc, s/mediadescribe/describemedia/ , as I changed the function name to a better verb towards the end. It's hook_describemedia(). My module is mediadescriber.module :-)
Comment #2
sunIf you'll change that hook name into 'file_metadata' I'll probably commit the patch for image*. However, the mediadescriber module definitely does not belong into Image package.
Comment #3
dman commentedI'm certainly happy for mediadescriber to have a life of its own - maybe working with EXIF module instead. That's why I am simply proposing a hook here where it doesn't hurt. Code was provided for proof-of concept.
In the (long) meantime I've actually continued research into file metadata via RDF and XMP as well as EXIF, and am still quite happy with the approach I outlined here. I'll review my changes from the interim, rename (I agree) and re-submit. May not be this week.
.dan.
Comment #4
sunNew features go into HEAD first.
Comment #5
sunCommitted attached patch to 6.x.
I guess we could rename that into hook_file_load() or similar in D7. I quickly scanned file.inc, but there does not seem to be any code related to file metadata yet.
Comment #6
dman commentedThanks for this (I've been slow about following up, but did use the hook in a recent thing)
FYI, my mediadescriber is now in a sandbox
http://cvs.drupal.org/viewvc.py/drupal/contributions/sandbox/dman/mediad...
- reads descript.ion, EXIF and also XMP metadata.
Unstable, not exactly a ready-to-use release, but looking forward.
Comment #7
sunWow! That looks very promising! :)
I was not sure whether invoking the hook for a single file would be sufficient, but it made most sense to me as a first implementation. If you want, we could also add another hook_file_metadata_multiple() or similar, because it looks like your code could act faster if it would get passed a bunch of files instead of just one.
Comment #8
sunIf I'm allowed to, I would suggest a better/common namespace for your modules. Something with a "meta" and/or "file" prefix, possibly "meta_description" and "meta_exif". And "mediascriber" would become just "meta" or similar - forming the starting ground and namespace for many more modules in contrib (maybe not even limited to files in the long run).
btw, couldn't your code use PHP's own EXIF functions (http://php.net/manual/en/book.exif.php) if they are available?
Comment #9
dman commentedI did start with a generic namespace, thinking broadly that the approach could/should spread to video/audio files without too much trouble.
As it stands, yes the namespace is in flux, although 'meta' is maybe a bit meta for folk to find.
the mediadescriber is a bit ambitious right now, trying to do too much at once, when all I really intended is to enhance the image_import screen. But I found that there was no passthrough way to get more data in there, so ended up catching the node_save, re-reading the metadata and adding extra fileds (taxonomy) there.
Messy, like I say.
Half the time, yes my code works better on a folder than a file, but that's because of the descript.ion file format. In other cases like embedded XMP, then it's just whatever.
So my code does read a folder whenever it's asked for anything, statically caches the result and returns per-file results on request. Works OK for now, and means I don't have to choose if per-file or per-dir is more appropriate. Thus metadata_multiple is a hidden implementation detail.
I looked at PHP EXIF, but it's a bit raw. Looking at real EXIF data, there are hundreds of tagnames and crap in use. the library I'm using here seems to account for a lot of the vagaries. Plus big big happiness with its XMP support which comes for free, and I'm starting to love.
.dan.
Comment #10
sunHeh, thanks for following up. :)
Honestly, I'm opposed to too broad namespaces normally, but given the idea and (first) implementation, I think that a new meta namespace would be beneficial for the Drupal community. There are countless of ways to retrieve meta data for something, and it would be a royal pain if all related modules would have completely different names.
Advancing on "meta" information retrieval (and update/storage?) - by appending the object type in question (e.g. meta_file ...) and the provider (e.g. meta_file_exif), this may form a solid namespace for modules?
Sorry, just wild ideas here. Hope you can get something out of it. :)
Comment #11
dman commentedmeta+file+exif makes sense to me.
meta_url_header
meta_user_linkedin ???
Comment #12
sunWell, I'm not sure. But let's bear in mind that we're talking module names only...? If I would find those modules on drupal.org or in my sites/all/modules directory, I would assume:
meta_url_header: Provides meta information for URLs based on the request header?
meta_user_linkedin: Provides meta information for users based on some weird LinkedIn FOAF webservice?
meta_node_content: Provides meta information for nodes based on their content field values?
Comment #13
dman commentedYep, that was pretty much what I was thinking. :-)
Just spitballing, I'm not planning to build these or even propose them. But yeah, linkedin provides metadata about an individual over some RPC.
Actually, I tried thinking about the data syntax we would pass back in an attempt to be consistent. As it is, different resources have seriously different requirements. Even Adobe Bridge file management suite hasn't been able to solve that usefully.
So I need image, video, document and audio (at least) docs to be quite different.
Internally, I'm a big fan of RDF, so I'll be (as much as possible) converting attribute names down to Dublin Core types.
But if there is any commonalty, I guess it will be supplied on a per-syntax basis, that's where the code will be gathered.
so instead of
meta_file_exif()
or
meta_image_exif()
I'm thinking
meta_exif_image
meta_id3_audio
and still double up with a generic:
meta_file_xmp
or
meta_resource_xmp
??? I dunno. I should stop thinking and get back to test cases.