Problem/Motivation

In #1447790: image_load makes admin/content/media unusably slow the {image_dimensions} has been added to cache image dimensions and avoid re-reading an image file when its dimensions are needed.

Instead of doing it only for image dimensions, the File entity module should provide a generic framework for its consumers to retrieve files meta-data. This way we could prevent having the same kind of performance issue with other kind of meta-data (EXIF, ID3, etc.).

Proposed resolution

On file insert, update and load the meta-data for a file should be read and stored in some kind of generic store and added as an array in a property of the $file object (something like $file->metadata).

Getting a file meta-data should of a file should be pluggable. Additional modules should be able to provide meta-data to File entity.

The meta-data storage should be flexible enough for any type of meta-data (string, number but also more complex types).

The main goal should be to provide an unified solution for consumers of file entities to efficiently retrieve meta-data when using file_load and file_load_multiple. To keep things simple, providing efficient querying over the meta-data should be the concern of separated and dedicated modules. For instance, a module could store selected ID3 audio and video meta-data as fields values while another could store the selected EXIF meta-data in a dedicated table exposed to views.

An alternative solution may be to lazy-load meta-data when requested once we have a FileEntity class (needs #1361226: Make the file entity a classed object which is blocked by #1401558: Remove the usage handling logic from file_delete()). This way we avoid the overhead of loading too much rarely used data on file loads.

The two approaches could also be combined with basic meta-data loaded on file loads, but with lazy-loaded extended meta-date lazy-loaded. For instance, image dimensions and base ID3 tags (title, artists, album, etc.) could be loaded on file load while things like video/audio play time, bitrate, channels number, codec, etc. could be lazy-loaded (in groups) when requested.

API changes

A new hook or plugin should be added to allow modules to provide their own meta-data.

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

pbuyle’s picture

Issue summary: View changes

typo

pbuyle’s picture

Issue summary: View changes

Add lazy-loading idea.

Dave Reid’s picture

Issue tags: +sprint, +Media Initiative
Dave Reid’s picture

Assigned: Unassigned » Dave Reid

We are going to go ahead with a {file_metadata} table and convert the {image_dimensions} table to use the new format since it will be compatible with the new Entity Property API for D8. The only things that should be stored in this table are things that can be extracted from the raw file in the file system without any kind of Drupal context: things like image dimensions, audio track length, video size and length, etc.

aaron’s picture

this was discussed in very early meetings, as far back as 2008. It is a very good idea, that should allow for all kinds of cool uses, such as integration with GetID3, storing YouTube information, such as title and duration, and lots of other useful tidbits

Dave Reid’s picture

Issue tags: +7.x-2.0 alpha blocker

I would really like to add this to the blocker list as something to push for in D8 before freeze and since it adds an API I want people to start being able to use it.

Devin Carlson’s picture

Dave Reid’s picture

Status: Active » Needs work
FileSize
10.94 KB

Initial patch adding a metadata API.

Dave Reid’s picture

FileSize
10.94 KB

Revised patch, still need to remove some references to image_dimensions.

Dave Reid’s picture

Status: Needs work » Needs review
FileSize
13.83 KB

One more version.

Status: Needs review » Needs work

The last submitted patch, 1496942-metadata-api.patch, failed testing.

Dave Reid’s picture

Status: Needs work » Needs review
FileSize
14.52 KB

Revised patch that fixes a major bug in the new function.

Status: Needs review » Needs work
Issue tags: -sprint, -Media Initiative, -7.x-2.0 alpha blocker

The last submitted patch, 1496942-metadata-api.patch, failed testing.

Devin Carlson’s picture

Status: Needs work » Needs review

#10: 1496942-metadata-api.patch queued for re-testing.

Status: Needs review » Needs work
Issue tags: +sprint, +Media Initiative, +7.x-2.0 alpha blocker

The last submitted patch, 1496942-metadata-api.patch, failed testing.

aaron’s picture

aaron’s picture

Status: Needs work » Needs review
aaron’s picture

aaron’s picture

Status: Needs review » Needs work

The last submitted patch, 1496942-metadata-api-17.patch, failed testing.

Dave Reid’s picture

Category: feature » task
Devin Carlson’s picture

Status: Needs work » Needs review
FileSize
19.95 KB

A patch to address a number of small issues with #10 (mainly using db_merge instead of db_insert and accommodating changes in tests).

aaron’s picture

Status: Needs review » Reviewed & tested by the community

bravo!

Devin Carlson’s picture

Status: Reviewed & tested by the community » Fixed

Committed #20 to File entity 7.x-2.x. Thanks everyone!

Please file separate follow-up issues for any enhancements you require, locations the API could be implemented or troubles with the upgrade path you come across, etc.

Automatically closed -- issue fixed for 2 weeks with no activity.

Anonymous’s picture

Issue summary: View changes

And related issues references.