(this stems from a #drupal-dojo discussion)

Meta-data will be stored as resource or file ID, key name ("Author" etc.), and value. There will be a second table that lists what resource types can have each key (YouTube video ID applies to YouTube videos but not private image files).

On install, a resource type module should add allowed keys to the database. The idea is that these will be customizable (through an interface eventually), but we still want to have some fields that are standard for other modules to reference as needed.

Some kind of resource type hierarchy is desirable, so that a YouTube video is an Image + some extra metadata and a Flickr photo is an Image + some extra metadata.

Interface and workflow for adding metadata to a resource:

* Meta-data form will contain a type select (Image, Video, etc.), and fields to enter the title, genre, etc. metadata associated with the type
* Tags, genres, etc. seem to make more sense as autocomplete, selects (respectively), rather than having all metadata be text fields
* Choose a file for upload. While it is transferring, we make an educated guess about file type based on extension. Pre-select the type (Image, etc.) from the metadata select.
* If our initial guess is wrong, user selects a different type. If we don't have enough information to make a guess, user still selects a type. No real penalty for being wrong.

Comments

kwinters’s picture

Some more thoughts on the data types for meta data:

* It seems like we should integrate the core tagging system into a resource, and have that be consistent with node tagging
* A "Genre" field would need key, valid resource types, data type (select / autocomplete / text), plus a list of all valid genres (the ID3 list)

kwinters’s picture

Keep-alive, I'll be working on this after DCDC with any luck.

aaron’s picture

You coming to DCDC then, kwinters?

kwinters’s picture

Sadly I can't go, but Coalmarch is sending someone else (Sheena), who might stop in.

Brian@brianpuccio.net’s picture

I am very interested in metadata handling with images. Here's some relevant information:

Not only am I interested, I'm a (entry level) php person, so I'm more than willing to learn what's needed and help out. I just may need some direction from time to time to make sure whatever I'm working on integrates nicely with all the other multimedia solutions.

kwinters’s picture

I believe this coming week we'll be working on the code for this. What we could use the most is sensible and organized data at first (so we have a good idea of what to make core functionality) and then after the basics are done you could work on something like geotagging as a sub-module.

aaron’s picture

after a lengthy discussion w/ arthurf, alexua & drewish, we decided to forgo trying to store metadata ourselves entirely, since there are already better things in place, notably the rdf module.

instead, we'll simply give various form tab to mime type mapping, and allow implementations to plug into that, defining the metadata they want and storing it themselves. we'll provide a default, probably just 'title' for now, and leave it at that.

a new issue will be to provide an rdf-media bridge module.

aaron’s picture

Status: Active » Needs work

there's now a placeholder form in place for metadata now, currently a rough fake tabset at $form['media_browser']['media_browser_metadata'] in media.module. this needs to pull in the real forms built by registered modules (and altered by helper modules, such as imagefield-media or rdf-media bridges.

kwinters’s picture

I think we'll still want to provide a reasonable framework and API ASAP, which was really the point of this request. A useful set of metadata tags is somewhat ancillary and could be easily done by a sub-module at a later date, but still important to have a minimal standard (author + title + description?) to make sub-module development easier.

Some kind of structure for meta-data needs to be standard equipment for the media module, so that all the media "player" sub-modules work nice with the media "field" ones. If some people use RDF and others use Bob's Metadata Emporium but those use different Media API structures, it means more development time to support both and in all likelihood poor support all around.

You might still be OK if you get RDF integration as "core" or "official" media functionality, since whatever solution just needs to be unified, flexible, and easy to develop for.

aaron’s picture

Yes, I agree full-heartedly. We had a long discussion about this at the media sprint. I leaned on the side of including official meta-data support in the module, while arthurf & drewish argued on the other side. In the end, we agreed that creating yet another metadata solution probably wasn't the best way to go, and to let other modules do the heavy lifting for this.

With the current drive to put RDF in core, RDF seems the best place to look. Thus, I opened the #438388: RDF - Media Bridge Module issue.

However, although I favor the media module leading the way somehow, at the same time, I don't want to require the RDF module be enabled, if an administrator simply wants the current imagefield behavior of title/alt being the extent of stored metadata.

Thus, at the end of our discussion, we decided to implement a few bridge solutions. So if the filefield/imagefield media bridge module is installed, then title/alt will be provided (and stored in the current system as is). If the RDF module (and its bridge) is installed as well, then metadata storage & handling will be deferred to that module. Media will provide a hook for these modules to insert and modify their respective forms, and no more.

kwinters’s picture

I think the following would be reasonable behavior:

* Both bridges come as part of the standard install, but enabling one or the other is up to the admin.
* Admins that want just basic functionality (to avoid overhead most likely) will choose the CCK bridge.
* Admins who want more and extensible functionality will use RDF.
* The bridge modules should be interchangeable (that is, if it were OOP they would implement an interface). The only difference is the data source.
* Sub-modules that want some meta-data access will act the same regardless of what bridge is chosen. If it's the CCK one, "genre" etc. would just always be blank, but the method to get the data is the same.
* Sub-modules that really need the extended meta-data will simply require the RDF bridge (a jukebox-type module or discography for examples).

The other issue is consistent data mapping (making sure that "band" used by your jukebox is the same DB field used by discography as "author", etc.). Failure here could mean entering the band's name once per module that wants to use the file, and that kind of mess is one of the reasons we needed Media to start with.

A good overall solution would be to provide a way for the modules using the metadata to map an internal field name to the actual name for that field in the database (like the way you can map an RSS field to a CCK field in http://drupal.org/project/feedapi). Even though Jukebox and Discography call the field something different in the code, the site admin could say that they are equivalent in some way.

Failing that, having a unified base set of metadata fields would also do the trick. Jukebox and Discography could both require Media_Meta_Base or whatever, and they will inherently have normalized data. However, both coming up with a good set and getting wide adoption would be a challenge.

kwinters’s picture

The CCK bridge might work out a little strange because of the connection to nodes. If you put the same file on two nodes, could you have different meta-data on each of those nodes for the same resource? That seems undesirable. Having access to some really basic lists of fields is a good idea but this may not be the way to do it.

If we still have any desire to only display meta-data fields that are relevant to a file type (image, music, etc.) then that is also something to consider, since that means a layer of organization to the meta-data.

nadavoid’s picture

Updated Implementation Plan, from discussion with @aaron at Drupal Camp Colorado

media_metadata (table structure)
-----------------
mid (serial) | fid (int, relates to 'file' table) | key (string) | value (string)
1 | 21 | genre | documentary
2 | 21 | genre | action
3 | 21 | title | Awesome Video
4 | 21 | license | CC by SA

Two new hooks
----------------

/*
 * @return list of keys that this module cares about
 */
hook_media_metadata_keys ($mimetype) {
  return array();
  // return array('genre', 'license'); // if module cares about these two 
}

/*
 * @param $file is a file object
 * @return array: field(s) in accordance with Form API.
 */
hook_media_metadata_form ($file) {
  if ($file->mimetype == 'video') {
    $form['genre'] = array(
      '#type' => 'select',
      '#title' => t('Genre'),
      '#options' => my_module_genre_options(),
    );
    $form['duration'] = array(
      '#type' => 'item',
      '#value' => my_module_duration($file),
    );
    return $form;
  }
}
aaron’s picture

"@return array: field(s) in accordance with field api."

actually, that's FAPI (Form API).

nadavoid’s picture

Thanks Aaron, I've corrected that now. (#13)

davebv’s picture

I do not know if this is the right place, but could this be integrated, or be somehow related to getid3 module? to get metadata filled automatically in case it is audio, for example.

jmstacey’s picture

davebv: The short answer is yes. We were actually thinking about modules like getid3 when deciding whether or not to implement this rather than attempt building on RDF. We settled on this simple system because we didn't want to force the complexity of RDF on users. In contrast, this simple key/value system will generally be transparent to the user.

Edit: So this is a win/win situation. It's not initially complex, but leaves the door open to those that do want RDF and other modules with the additional complexity of configuration and mapping.

aaron’s picture

I've begun implementing this now, in media_metadata.module (available in the latest d6 dev from cvs).

aaron’s picture

needs some work -- for instance, i currently have it build a unique array, but now i remember from discussions that we need to respect multiple modules who want to implement 'title', for instance.

aaron’s picture

that means we'll need to add 'module' to the table, and invoke each module individually when building the array so we can associate it with the module, probably.

jmstacey’s picture

Assigned: kwinters » Unassigned
Issue tags: +gsoc, +gsoc2009, +gsoc2009-jmstacey

I made made a copy of aaron's work in HEAD (Drupal 7). What API functions shall we provide?

Here is my draft:

/**
 * Returns an array of file IDs that match the given key value pair.
 *
 * Given a key name or data (or both), the related file IDs will be returned.
 *
 * If there is data to be returned, the associative array will always contain
 * mid, fid, name, and data.
 *
 * @param $name
 *   An optional key name.
 * @param $data
 *   An optional data value.
 * @param $unhandled
 *   TRUE returns ALL key value pairs even if they are no longer managed. By default only
 *   key value pairs that have a handler will be returned. This is to protect against
 *   metadata added by another module that no longer exists.
 * @return
 *   Returns an array of file IDs ({file}.fid) that match the given key value pair.
 *   An empty array is returned if there are no results.
 */
function media_metadata_by_pair($name = NULL, $data = NULL, $unhandled = FALSE) {}

/**
 * Returns the key value pairs of the given file ID.
 *
 * Given a file ID, this function will return an associative array containing
 * it's metadata.
 *
 * If there is data to be returned, the associative array will always contain
 * mid, fid, name, and data.
 *
 * @param $fid
 *   A file ID.
 * @param $unhandled
 *   TRUE returns ALL key value pairs even if they are no longer managed. By default only
 *   key value pairs that have a handler will be returned. This is to protect against
 *   metadata added by another module that no longer exists.
 * @return
 *   Returns an array of file IDs ({file}.fid) that match the given key value pair.
 *   An empty array is returned if there are no results.
 */
function media_metadata_by_fid($fid, $unhandled = FALSE) {}

/**
 * Returns the key value pairs of the given URI.
 *
 * This is a convenience method. If you already have the file ID use
 * media_metadata_by_fid(), otherwise unecessary resources will be wasted.
 *
 * If there is data to be returned, the associative array will always contain
 * mid, fid, name, and data.
 *
 * @param $uri
 *   A stream such as public://foobar.txt
 * @param $unhandled
 *   TRUE returns ALL key value pairs even if they are no longer managed. By default only
 *   key value pairs that have a handler will be returned. This is to protect against
 *   metadata added by another module that no longer exists.
 * @return
 *   Returns an array of file IDs ({file}.fid) that match the given key value pair.
 *   An empty array is returned if there are no results.
 */
function media_metadata_by_uri($uri, $unhandled = FALSE) {}

/**
 * Add a new metadata key value pair to the file ID.
 *
 * @param $fid
 *   The file ID.
 * @param $name
 *   The key name.
 * @param $data
 *   The data value.
 * @return
 *   Returns the new mid of the key value pair on success, or FALSE on failure.
 */
function media_metadata_add($fid, $name, $data) {}

/**
 * Deletes a metadata key value pair.
 *
 * @param $mid
 *   The metadata ID.
 * @return
 *   Returns TRUE on success, or FALSE on failure.
 */
function media_metadata_delete($mid) {}

This should cover general use and anything more complex would probably need to be done through a custom query. Are there any other functions that we need?

naught101’s picture

Subscribe

I'm wondering why not store file url+meta data as a node, the way the image module does? That way you'd have the taxonomy features, it'd be searchable, and you wouldn't have problems with multiple out-of-sync meta data stores (@#12).

There's a module that already basically does this, without all the other good stuff that Media plans to have: http://drupal.org/project/filenode

aaron’s picture

it's a fine balance, discovering metadata that would always be stored w/ a file (such as GetID3's duration, artist, title, etc), and what would be stored with a specific instance (such as imagefield's title/alt). i definitely am leaning in the direction of the first, which is what this would implement. it doesn't force the issue, and modules such as filenode and imagefield are still free to create their own instantiated metadata. however, i definitely do not want to require everyone to create an entire node for each file. doing that defeats the purpose and intent of this module. and frankly, you still have the same issue of "multiple out-of-sync meta data stores (@#12)", because the whole intent of filenode is to allow files + metadata to be attached to other nodes, so you just sugarcoat the issue. it just adds all the baggage of another node to a file. forcing the hand here would tell folks, 'you can't really use upload or filefield w/o creating a new node', which is bad mojo.

you bring up a good point of searchability though. we'll definitely want to hook this into the search system.

aaron’s picture

i've implemented media_metadata_add() from #21. i wonder, however, if we should allow multiple values for the same key of a file? currently, this is the behavior.

i suppose that yes, we should: that allows for storing, say, multiple video genres without needing to serializing the array.

it means that we leave it up to implementing modules to determine if a title exists, for instance, so that the key can be overwritten. we'll need to add that to the api.

aaron’s picture

http://api.drupal.org/api/function/db_delete/7 returns 'A new DeleteQuery object for this connection.' I'm not sure how to turn that into the TRUE/FALSE required by media_metadata_delete($mid), so for now I'm going to not return anything.

aaron’s picture

with RDFa going into core #493030: RDF #1: core RDF module we should examine that before going forward.

pwolanin’s picture

Version: 6.x-1.x-dev »
arthurf’s picture

I've created File Metadata module to store metadata attached to a file. Using the new D7 file hooks, we are able to load and save data easily.

Your module merely needs to add data to the file:


$file->metadata = array(MODULE_NAME => 
    array(IDENTFIER => array(
      KEY1 => VALUE1,
      KEY2 => VALUE2,
   )
); 

Here, MODULE_NAME is your module name, IDENTIFIER is a unique id (if your module implements separate kinds of metadata- eg: one set for video, one set for audio). KEY is the name of the data you are storing and VALUE is the value.

The module hasn't even been tested yet, but in the interests of moving things along......

aaron’s picture

Status: Needs work » Fixed

fixed; using fields now.

davebv’s picture

I am testing the media module, and I would like to use the id3 metadata. I am a bit confused on how to get the metadata, I do not find how to get the "fields" for that.

Any help would be much appreciated.

JacobSingh’s picture

@davebv : There is nothing implemented for this presently. My guess is it would be implemented in a couple ways - 1 as a bridge between fields - so that fields could be created for id3 related stuff and as the file is updated, the fields get copies of those values.

Alternately, it could be loaded as needed through field hooks.

Basically, it needs to be built.

Status: Fixed » Closed (fixed)

Automatically closed -- issue fixed for 2 weeks with no activity.

aaron’s picture

Status: Closed (fixed) » Active
Issue tags: -gsoc, -gsoc2009, -gsoc2009-jmstacey +beta blocker

this was closed pre-maturely, and needs to be dealt with.

gusaus’s picture

Version: » 7.x-1.0-beta5

Looks like this issue wasn't important enough to block the beta? Any progress on this over the past year?

gusaus’s picture

Seems like #780848: Merge getID3 with FileField Meta in Drupal 7 should fit into the plan somewhere?

arthurf’s picture

Status: Active » Closed (fixed)

We've got fields on files, I'm closing this.

aendra’s picture

@arthurf -- Mind providing a link to some documentation so users arriving at this page from Google have an idea where to start? Thanks!

firfin’s picture

Documentation can be found on http://drupal.org/documentation/modules/media Like most documented projects this is mentioned on the project page.
Of special interest for this issue are probably:
Using file fields in content types Media 2.x Quick Start Guide
Using fields on file types Displaying media