Files inserted by the WYSIWYG need to create a record in the Drupal {file_usage} table. Patch from #101 provides code that handles the insertion of images and properly accounts for node revisions as well as providing tests for various use cases raised in this issue.

How to test

To review this patch, please run the tests included with it with an eye for the conditions identified in #86, #89, #102, and #104

Notes

Further consideration may be needed for non-node entities.

Original description: file_usage_list() does not reflect media added through WYSIWYG. it does when added through file, image, or media element fields.

CommentFileSizeAuthor
#153 media-fix_missing_placeholder_regex-1268116-153.patch809 bytesPascalAnimateur
#151 media-fix_missing_placeholder_regex-1268116-151.patch809 bytesPascalAnimateur
#147 add-missing-test-1268116-147.patch9.58 KBDevin Carlson
#144 media-1268116.144.patch7.16 KBtobiasb
#136 1268116-file_usage_count_revisions_test-136.patch17.05 KBtreksler
#134 1268116-file_usage_count_revisions_test-134.patch17 KBtreksler
#132 1268116-file_usage_count_revisions_test-132.patch17 KBaaron
#130 1268116-file_usage_count_revisions_test-130.patch17 KBaaron
#127 1268116-file_usage_count_revisions_test-127.patch7.46 KBaaron
#119 1268116-file_usage_count_revisions_test-119.patch16.99 KBParisLiakos
#117 1268116-file_usage_count_revisions_test.diff17 KBarthurf
#116 1268116-file_usage_count_revisions_test.diff9.56 KBarthurf
#101 1268116-file_usage_count_revisions_test-100.patch15.18 KBbbinkovitz
#100 1268116-file_usage_count_revisions_test-100.patch15.18 KBbbinkovitz
#99 1268116-file_usage_count_revisions_test-99.patch15 KBbbinkovitz
#97 1268116-file_usage_count_revisions_test-97.patch7.49 KBbbinkovitz
#96 1268116-file_usage_count_revisions_test-95.patch15.65 KBbbinkovitz
#95 file_usage_count_revisions_4.diff6.99 KBarthurf
#93 file_usage_count_revisions_3.diff6.91 KBarthurf
#89 file_usage_count_revisions_1.diff6.9 KBarthurf
#85 1268116-file_usage_count_revisions_test-85.patch11.51 KBbbinkovitz
#82 1268116-file_usage_count_revisions_test-82.patch11.52 KBbbinkovitz
#80 6912728-file_usage_count_revisions_test-80.patch5.38 KBbbinkovitz
#74 file_usage_count_revisions.diff5.88 KBarthurf
#71 file_usage_count_revisions.diff5.87 KBarthurf
#70 file_usage_count.diff4.89 KBarthurf
#52 media-filter-file-usage-1268116-52.patch4.61 KBbbinkovitz
#44 media-filter-file-usage-1268116-44.patch4.55 KBDavid_Rothstein
#44 interdiff-40-44.txt1.75 KBDavid_Rothstein
#41 Screen Shot 2012-08-24 at 10.36.45.png17.65 KBmrfelton
#40 media-filter-file-usage-1268116-40.patch4.5 KBJackinloadup
#37 media-filter-file-usage-1268116-37.patch4.15 KBaaron
#36 media-filter-file-usage-1268116-36.patch4.21 KBDavid_Rothstein
#34 media-filter-file_usage-1268116-34.patch3.42 KBtheunraveler
#32 media-filter-file_usage-1268116-32.patch3.47 KBtheunraveler
#29 media-filter-file_usage-1268116-29.patch3.21 KBtheunraveler
#28 media-filter-file_usage-1268116-28.patch3.21 KBtheunraveler
#25 media-filter-file_usage-1268116-25.patch3.21 KBtheunraveler
Support from Acquia helps fund testing for Drupal Acquia logo

Comments

aaron’s picture

also see #1222576: file_usage_list() is defective (might return incomplete results), which will make this information even more crucial (especially as a views filter).

Dave Reid’s picture

It's not just about file_usage_add() (which would probably require our filter system to provide context information - #226963: Context-aware text filters (provide more meta information to the filter system)), but ensuring we can also call file_usage_delete() when appropriate which I'm not sure we'd be able to do.

aaron’s picture

this likely depends on #226963: Context-aware text filters (provide more meta information to the filter system), although Dave Reid pointed out on IRC we might be able to discover the context with debug_backtrace(), if we wanted to put it through the wringer... :P

Dave Reid’s picture

Assigned: Unassigned » aaron
Issue tags: +Media Sprint 2011
Dave Reid’s picture

Issue tags: +file usage
tsvenson’s picture

Combined with Drupal's "annoying" behavior of automatically, without warning, deleting files/things when it thinks its not used anymore this is a big problem.

I started a discussion about this on g.d.o yesterday: http://groups.drupal.org/node/191183 and also added a great use case of how bad it can get today.

To sort this out is tricky though. But letting Drupal decide what should be deleted or not is definitely not the right way of doing it!

Dave Reid’s picture

Priority: Normal » Major

Bumping priority

szantog’s picture

When I started my work three months ago with media, I saw this problem, I started a sandbox project designed by tracking an "advanced files usage" of media module. Since then the media module changed a lot, but this was my idea to solve this:

1. Using hook_node_insert to add file_usage record to database.
2. Using hook_node_update to compare embed media files in old and new version of node, and update file_usage if necessary.
3. In hook_node_delete remove the file_usage record.

To do this, we doesn't need context aware filters.

das-peter’s picture

If this goes in, media would also be the right place for this patch I've created: #1193036-13: Firefox handling of Drag & Drop images: inserted as binary blobs in WYSIWYG editors.
Goal is to enable drag & drop / copy & paste in the WYSIWYG editor.
I've talked to sun and as far as I understand him the main issue he sees with the patch is that the file usage wouldn't be tracked.

tsvenson’s picture

@szantog: Like your idea, was actually discussing the exact same approach with one of our developers last Friday.

Two questions:

  • What happens if an embedded media file exists in an earlier revision of a node, but has been deleted from the currently published node. How is that counted?
  • What if a module such as Workbench Moderation is used. Then you can both have a revision that is live (published) while at the same time work on a new revision (draft). If I then delete for an image in the new draft, what will then happen with the one that should still be shown in the currently published revision?
szantog’s picture

@tsvenson, Yes, on the revision question was that, where my work paused - I completely misunderstood the file_usage system. Now, I think it should be work exactly the same, like a filefield.

I saw, where the file_usage_add funtion is used: The http://api.drupal.org/api/drupal/modules--file--file.field.inc/function/... and http://api.drupal.org/api/drupal/modules--file--file.field.inc/function/...

Based on this, if a new node is created, lets start play with file usage:
1. We use two files, with fid 1 and 2, in node 1.
The file usage table is this:

Fid  Module Type Id Count
 1    media node  1     1
 2    media node  1     1

2. Update the node and create new revision, all of two files stay in use.

Fid  Module Type Id Count
 1    media node  1     2
 2    media node  1     2

3. Update the node and create new revision, delete the FID 1, and add fid 3.

Fid  Module Type Id Count
 1    media node  1     2
 2    media node  1     3
 3    media node  1     1

4. Switch back to an earlier revision (a new revision is created!):

Fid  Module Type Id Count
 1    media node  1     3
 2    media node  1     4
 3    media node  1     2

5. Delete the last revision in 3:

Fid  Module Type Id Count
 1    media node  1     2
 2    media node  1     3
 3    media node  1     1

6. Delete the last revision in 4:

Fid  Module Type Id Count
 1    media node  1     1
 2    media node  1     2

I don't know, what about workbanch moderation, but I think, if this behavior (what is the same, as a normal filefield) isn't good for some other module, this module should fix it. As I think there shouldn't be problem with workbanch moderation. What revision is live - it's irrelevant for file_usage table, only need decrease or increase the counter, when revision is added/deleted.

If the line of idea is good, I can work with this on the next week.

aaron’s picture

This is a complete hack, but here it is to get the ball rolling.


/**
 * Implements hook_filter_info().
 */
function altvideo_filter_info() {
  $filters['altvideo_filter'] = array(
    'title' => t('Record WYSIWYG usage of media'),
    'description' => t('This filter will record any file usage in WYSIWYG, in the context of a node. Must be before the %convert filter. Provided by the custom altvideo module.', array('%convert' => t('Converts Media tags to Markup'))),
    'process callback' => 'altvideo_filter',
    'tips callback' => 'media_filter_tips',
  );

  return $filters;
}

/**
 * Discover & record any embedded media in the {file_usage} table.
 */
function altvideo_filter($original) {
  $nid = &drupal_static('altvideo_nid');

  // Get the stack of function calls.
  $backtrace = debug_backtrace();
  // We are looking for the node_load function call to get the NID
  // that is being processed.
  foreach ($backtrace as $function_call) {
    // Log nodes
    if ($function_call['function'] == 'node_load') {
      if ($nid != $function_call['args'][0]) {
        // Extract the nid.
        $nid = $function_call['args'][0];
        // Delete any prior file usage from this embed.
        db_delete('file_usage')
          ->condition('module', 'altvideo')
          ->condition('id', $nid)
          ->execute();
      }
      $text = ' ' . $original . ' ';
      preg_replace_callback("/\[\[.*? ]]/s", 'altvideo_token_to_markup', $text); // Remove the space after the ?
    }
  }
  return $original;
}

/**
 * Replace callback to convert tag into markup
 * @param string $match
 * Takes a match of tag code
 * @return
 * Return the replaced markup
 */
function altvideo_token_to_markup($match) {
  $match = str_replace("[[", "", $match);
  $match = str_replace("]]", "", $match);
  $tag = $match[0];

  try {
    if (!is_string($tag)) {
      throw new Exception('Unable to find matching tag');
    }

    $tag_info = drupal_json_decode($tag);

    if (!isset($tag_info['fid'])) {
      throw new Exception('No file Id');
    }
    if (!isset($tag_info['view_mode'])) {
      // Should we log or throw an exception here instead?
      // Do we need to validate the view mode for fields API?
      $tag_info['view_mode'] = media_variable_get('wysiwyg_default_view_mode');
    }
    $file = file_load($tag_info['fid']);
    if (!$file) {
      throw new Exception('Could not load media object');
    }

    // Track the fid of this file in the {file_usage} table.
    altvideo_track_usage($file->fid);
  }
  catch (Exception $e) {
  }
}

function altvideo_track_usage($fid) {
  $nid = &drupal_static('altvideo_nid');
  $file = file_load($fid);
  file_usage_add($file, 'altvideo', 'node', $nid);
}
blainelang’s picture

This approach looks real clever specially using backtrace as I had not ever considered using backtrace this way. I've recently been looking into how the file_usage table is used for another module 'filedepot' that I'm porting over to D7 and wanted to note there is a file_usage_delete() and it could be used instead of the direct SQL to delete the usage record in the above code.

sun’s picture

Something is a bit off here...

If media module allows to upload files via an editor, then the resulting files are still part of media, and thus, part of your site's "media library."

Hence, since a file in a library inherently defines usage on its own, media module should declare a usage.

Dave Reid’s picture

No. Wrong.

You have a node reference field. You create nodes meant to be referenced. Those nodes are data and do not yet have any "usage". Reference some of those nodes. Save. De-reference some of those nodes. Save.

Do those nodes still exist? YES.

Repeat with any other entity reference type EXCEPT FILES. We shouldn't have to track usage for something that is uploaded into the "Media library" (aka just file_managed) that is not yet associated with any content.

sun’s picture

You nicely repeated what I stated in #1399846-3: Make unused file 'cleanup' configurable already:

The file usage API was deliberately added to D7. It is required to figure out when a managed file is obsolete.

This means that Media module wants Drupal to manage files without letting it know how or where the managed file is used. Thus, Media module should either not use managed files for its purpose, or it should properly register their usage.

If you treat "files like nodes", then you assume that there's a defined entity in the system. If the file itself is the entity, then the entity itself declares a file usage.

In that case, the file entity might have further data (or even fields) attached to it. It cannot be deleted if it's still in use.

The file usage API is essentially what makes up a managed file. If you don't want that "managed" behavior, then don't use managed files.

But it sounds pretty clear to me that you actually do want to have that managed file behavior (because you explicitly want to treat files as entities, and thus, attach further data to them, and also retain them, even in the case when they are deleted).

However, both use-cases formally are a file usage on their own. The file is in use when any additional data is attached to it, and the file is also used permanently if it should be retained forever (as in a "media library" concept).

Thus, the root cause is that no usage is registered for a file. Register it.

It's not the job of a low-level File API to care for the diversity of possible use-cases. All it knows about is: "This is a file I'm supposed to manage. Is it still used anywhere? No? Alright, then it's garbage and can be deleted."

rickvug’s picture

I think Sun may has a point here about Media needing to register files. The default delete functionality that we have now is exactly what is desired for use cases such as profile photos, where the upload should be deleted if the user changes their photo or cancels their account. These images probably shouldn't be shown in the Media library either. Perhaps what Media needs is configuration that defines which files it should track (marking as used by media). Files that are not tracked by Media shouldn't be shown in the browser and should be deleted when removed from a file field, as per now.

On the same token, WYSIWYG file usage should also be tracked. This would prevent problems where someone wants to remove a file from the media library and accidentally breaks node(s) where the file is embedded into the body.

How does this sound?

agentrickard’s picture

By sun's logic it sounds as if file_entity should be claiming ownership of files-as-entities, which would then get folded in to Drupal 8.

I think, however, that discussion is not related to this actual issue, which is that a file added to a node via WYSIWYG is, in fact, managed by that node and needs to be tracked as such.

aaron’s picture

Yes, I agree with that, agentrickard. sun's argument has its merits, but belongs more properly at #1239056: Remove Media at content edit page cause the file to be completely deleted from server/database?. It has no bearing on whether to mark a usage for an embedded media instance.

Codecaster-2’s picture

Is this the feature I need if I want to know where was the image posted originally, so I can add a link in the image gallery that says something like "image originally posted at X"?

Is there any other way to link the images to the nodes they were attached to in the first place?

Thanks.

theunraveler’s picture

Until filters are context-aware, is this something that could be done on hook_entity_{insert/update/delete}? If this seems like a sound approach, I can write a patch.

agentrickard’s picture

I think hook_field_attach_insert / update / delete, actually, which give you more detailed data about the field.

See http://api.drupal.org/api/drupal/modules!field!field.api.php/function/ho...

Dave Reid’s picture

Yes this is actually possible to do with hook_field_attach_insert/update. I have some code that I would like to eventually upload to this issue but I'm currently unable to do so because of client code release policy. If anyone wants to work on it until I can resolve it they should be more than welcome to do so, but I also don't want people to waste time on something that I've already kinda solved. :/

das-peter’s picture

As soon as this is fixed I'd like to port this patch to the media module: #1193036-13: Firefox handling of Drag & Drop images: inserted as binary blobs in WYSIWYG editors.
At least if there are no objections.

theunraveler’s picture

OK, here's an initial pass at it. A couple notes:

1. I am likely doing the field stuff wrong, particularly lines 57-68 in the patch. Any suggestions?
2. I'm a little concerned about the "media" namespace in the file_usage table. It seems like we could inadvertently wipe out other file_usage entries from elsewhere in the media module. Should this be something like "media_filter"?

theunraveler’s picture

Status: Active » Needs review

Status: Needs review » Needs work

The last submitted patch, media-filter-file_usage-1268116-25.patch, failed testing.

theunraveler’s picture

Status: Needs work » Needs review
FileSize
3.21 KB

Sorry, this patch should pass.

theunraveler’s picture

Whoops, blew away the updated regex.

aaron’s picture

Status: Needs review » Needs work

I like that approach. Unfortunately, it looks as though it will not remove file usage when an entity embedding the media has been deleted.

aaron’s picture

maybe that is a simple matter of implementing hook_field_attach_delete().

theunraveler’s picture

Seems like you could just implement hook_entity_delete() to remove all entries in the file_usage table. Does this seem good?

theunraveler’s picture

Status: Needs work » Needs review
theunraveler’s picture

Cleaned up the field stuff a bit. I'm sure it's still wrong.

Dave Reid’s picture

David_Rothstein’s picture

Rerolled with a few small cleanups. (Mainly what I did is break _media_filter_parse_from_fields() up into smaller helper functions, since I need to reuse this code in another patch.)

I think the field code looks pretty reasonable. I'm not sure if using field_get_items() is technically correct here (since it will use the current language that the field would happen to be displayed in, which can change) but in practice it might not matter much since the media tags are likely to be the same across different translations of the text.

aaron’s picture

Jackinloadup’s picture

Tested patch in #37. Works as expected. Great work!

When adding the same file to the same node twice file usage reflects that.

I'm using the interface provided by #1286508-12: Provide a file usage report tab at file/%file/usage

Jackinloadup’s picture

This issue is affected by #1451316: Clean up wysiwyg-media.js

After applying images still work great but due to changes "applications" and I would assume other non image types will most likely not be tracked. Heads up to all you great peoples.

Jackinloadup’s picture

The attached patch works with #1451316: Clean up wysiwyg-media.js and corrects the issue I brought up in #39.
All new code in this patch is related to MEDIA_TOKEN_REGEX_ALT

Please review.

mrfelton’s picture

Status: Needs review » Needs work
FileSize
17.65 KB

Slight issue with this. I created a node and attached an image to an image field. The usage count at /file/1/usage (provided by #1286508: Provide a file usage report tab at file/%file/usage) showed the file was used once on my node. Then, I edited the same node and added the same file to the body through the WYSIWYG. I then had two entries for the same file at /file/1/usage both showing a count of 1. I then edited the same node again and added the same file yet again through the WYSIWYG. This time one of the file usage entries incremented by one.

So the end result is like in this screenshot.

Screen Shot 2012-08-24 at 10.36.45.png

I would have thought that it show just one entry with a count of 3. Essentially it appears to be counting usage of the file through the WYSIWYG separate from usage of the same file attached to a field on the same node.

Dave Reid’s picture

@mrfelton: I would actually say that it's working as designed. We aren't exposing the 'Type' information properly. It should be returning 'Node' and 'Media' for the two different rows' first column. Or #1286508: Provide a file usage report tab at file/%file/usage needs to be fixed to group all of the same entity type + ID together.

mrfelton’s picture

mysql> select * from file_usage;
+-----+--------+------+----+-------+
| fid | module | type | id | count |
+-----+--------+------+----+-------+
|   1 | file   | node |  1 |     1 |
|   1 | media  | node |  1 |     2 |
+-----+--------+------+----+-------+

So I have 'file' and 'media' for the one added to the image field (using the media widget), and 'media' and 'node' for the two added to the node body through wysiwyg.

David_Rothstein’s picture

Quick and trivial reroll of #40 to get rid of some annoying PHP notices caused by this patch (due to the fact that field_get_items() can sometimes return FALSE rather than an empty array).

ParisLiakos’s picture

I am not sure, where this got stuck?
@mrfelton is dave's answer enough to get this back to NR?

HyperGlide’s picture

@mrfelton and @David_Rothstein any thoughts per #45.

bbinkovitz’s picture

media-filter-file-usage-1238116-44.patch works for me.

mysql> select * from file_usage;
+-----+--------+------+----+-------+
| fid      | module    | type    | id     | count     |
+-----+--------+------+----+-------+
|   1      | media      | node   |  3     |     1       |
|   2      | media      | node   |  4     |     1       |
+-----+--------+------+----+-------+

2 rows in set (0.00 sec)

bbinkovitz’s picture

Status: Needs work » Needs review
arthurf’s picture

I tested the following with success with the patch on #44
* adding via wysiwyg to the body field
* adding multiple to the wysiwyg field
* node in publish and unpublished state
* deleting the node

What other considerations do we need to take into account?

agentrickard’s picture

Not storing false positives on the regex?

tsvenson’s picture

@arthurf: Have you tested it with revisions?

AFAIK a file that is deleted from a file/image field (core) in later revisions is still being counted. So, we need to comply with how that works too.

bbinkovitz’s picture

Removing the file markup from the node without deleting the node removes the file from the usage table. This is good.

Only FIDs that correspond to existing files are added to file_usage. Also good.

Changing "type" from "media" to something else (I used "dave", "image", and "panda" as tests) does not prevent inclusion in the usage table. This seems bad? Here's a version of the patch that only includes if type="media".

arthurf’s picture

@tsvenson ok, so from what I understand to deal with revisions I think we can't delete the usage here. The delete query that's done on each update should be removed from the patch. However we're going to need to query the db to see if this fid is already counted. We also need to be aware of how many times this fid is on this node to ensure that we can increment correctly. So probably we need to do something like:

$file = file_load($file_reference['fid']);
$uses= file_usage_list($file);
$count = 0;
if (! empty($usage['media'][$entity_type][$entity_id])) {
  $count = $uses['media'][$entity_type][$entity_id];
}
// Is the number of uses on this entity more than what is currently stored?
// If so the usage has to be incremented to the current count.
if ($count < $usage[$file_reference['fid']) {
  file_usage_add($file, 'media', $entity_type, $entity_id, $usage[$file_reference['fid'] - $count );
}
arthurf’s picture

@agentrickard I think bbinkovitz's patch on #52 should address the regex concern because it's checking the type of the insert code based on the string- it isn't improving the regex but it is using the data itself.

tsvenson’s picture

@arthurf: Yeah, its a tricky one this how it should be counted. If the current revision is the only revision, then the count should go down if a file is removed. But if there exists earlier revisions, they need to be parsed as well to check if a file was used in any of them.

I suppose the easies is to simply parse all revisions of an entity and build a file use table from that.

How is used files kept track on? Is it capable of knowing if a file is used in both a file field and the text are for example?

On tsvenson.com I have used the model to upload images to a file field and then embedded them in the text area. Thus, those files are used "twice" on the same node/revision.

arthurf’s picture

@tevenson: from what I understand of revisions we have to keep track of the usage on revisions no matter what. So what I'm thinking is that we simply never decrement the usage of a file because of this. However, as you note, there maybe instances where there is only one revision in which case we should.

So I think then the code has to check to see if there are revisions of this particular field (is this simply version id == entity id?) and in this case we can decrement the file usage, otherwise we only increase the file usage count if the current usage is greater than what is returned by file_usage_list() which is what the proto code I posted above is intended to do.

Does that seem to address both cases?

Also, per your comment about file upload with wysiwyg, the usage table tracks these separately based on the module that is using the file, so from my perspective that is correct. Are you thinking that the usage should only be one in this case?

agentrickard’s picture

Introducing parsing of all revisions is, IMO, waaaaaaay out of scope for this patch.

If you need that behavior, use File Lock.

arthurf’s picture

@agentrickard - is it sufficient to simply not decrement file usage based on the current entity save operation? If I understand things correctly this should keep the file usage for previous revisions even if the file is removed from the field for this current revision. The only time we would need to increment the file usage is if the current file usage for file (ie: added more times to this field) is greater than the current file usage count. I think this will protect revisions which use this file while handling a file added more than once to a field. I think.

agentrickard’s picture

Given that the {file_usage} table is used for garbage collection, I would argue that any non-binary value in that table for an nid is meaningless.

arthurf’s picture

@agentrickard- so are you in agreement that we should not decrement file_usage unless the current revision is the only revision of this entity? I'm a bit unclear on how you're thinking we should proceed.

tsvenson’s picture

@agentrickard: The problem with getting this right is that the file usage table and particularly the "garbage collection" was implemented without proper analysis of the consequences. In my view it has caused a lot more problems than it solved.

A massive amount of valuable resources has now been spent on compensating for that behaviour and coming up with methods, such as this one, to try and make sure that Drupal doesn't decide that a file is not relevant for the site any longer. Promptly delete it without any warning whatsoever as a result.

Based on that, I don't actually agree that parsing older revisions is out of the scope here. Simply because the goal is to implement a correct counting of file usage in text area fields. The only way, as I see it with my limited knowledge, to do that, and comply with how the file/image fields work, is to parse the various versions of the content that exists.

arthurf’s picture

From what I can tell the patch in #52 will correctly increment the current usage of files on this entity. The question we need to figure out is how and how much to deal with revisions.

I am proposing that we do not need to worry about revisions so long as we capture the initial usage. Once a revision exists, the file's usage is captured. I think we can safely never decrease the file usage quantity for this entity, we only have to increase it if this file is added a subsequent time (ie: two copies in the text) to this entity.

There maybe an exception to this if the current entity revision is the only revision- in this case we might have to decrement the file usage if the file is removed. Somebody with good knowledge about versioning should chime in here.

The only way to do this properly that I see is to add a version id to the file_usage table. I don't think that is going to happen and more importantly it would also require significant amount of work up stream. I think for now we're better off writing a wrapper around file_usage_add() which takes a version id argument. This might be appropriate for file entity- no idea, just throwing the idea out there.

@agentrickard I take your point that the usage can be thought of as binary as the file is either in use on this particular entity or it isn't. While I'm not sure if that's the right approach, I don't think your point would alter what I'm proposing above significantly- the code would still be basically the same.

I think we're actually pretty close on this- the code is mostly here in this thread, it's really just an issue of figuring out if we have correctly captured the use cases.

John Pitcairn’s picture

What happens when:

1 - A file is added to an existing item which already has revisions.
2 - The file is removed in a later revision.
3 - Earlier revisions containing the file are manually or automatically deleted, but there is still more than one revision.

If you do not worry about revision-tracking, recorded usage will be incorrect, since the file is no longer used anywhere.

I can see this being a followup issue however.

arthurf’s picture

@John Pitcairn I think you might have misunderstood me. I'm suggesting that we do not need to parse through versions of entities so long as we do not decrement the file usage count. There is one possible exception to this if the current revision is the only revision.

As per your third point, we can't address that without tracking vids in the file usage table and there would be a significant amount of code that would need to be written to support this for other use cases as well.

agentrickard’s picture

For the use case in #63, just use File Lock. That's what the module is for.

These discussions (#61 and #63) are out of scope of WYSIWYG tracking. Revision tracking in {file_usage} is a separate issue.

arthurf’s picture

@agentrickard - I disagree. For one we don't need a second module to handle this. Secondly the proposed behavior is what is expected of modules that implement file_usage- if there is agreement the proposed functionality correctly handles revisions. Unless this is not the case I don't think yet another module is the answer.

agentrickard’s picture

Increment / decrement if you must, but do so on the current revision. Don't try to parse the entire version history. That way lies madness.

arthurf’s picture

@agentrickard #67 does not address the solution that I've proposed.

agentrickard’s picture

Ideally, we would be incrementing/decrementing the file usage based on the revision being saved. If removing a file turns out to be the only 'version' using the file, and file_usage drops to 0, that sounds like proper behavior.

But we would only be parsing the current change, not the entire revision history.

arthurf’s picture

FileSize
4.89 KB

The following I think does what I've proposed in the thread above. This patch includes #42, #44, #52 from above. I've tested under the following conditions:
* content type revisions off: image add, image delete, image add multiple, image delete multiple.
* content type revisions on: image add, image add multiple, image delete, image delete multiple.

In each case I'm seeing what I think should be the correct behavior.

arthurf’s picture

Here is a revised version of the above patch which I believe should handle revision deletion correctly. Here's why I'm think we maybe able to do this safely:

* when a revision is created we count all the files in the field and increment the file_usage value
* the value in file_usage reflects the total number of file uses across all of this entity's revisions
* when a revision is deleted we can count the number of file uses in that revision the same way we do for saving them
* we can delete this file count from the file_usage table

I think this technique will actually reflect the proper number of files in use and allow for deletion of versions without leaving cruft in the file_usage table.

I've tested this a bit myself but it would be good to hear others thoughts on this.

tsvenson’s picture

After a short IRC discussion with @arthurf the way this patch works as outlined in #71 seem to cover all needs. Great work all.

I haven't had the ability to actually test the patch so can't set it to RTBC (yet).

agentrickard’s picture

Should this last bit be an IF, to ensure we pass a proper $file object:

+function media_field_attach_delete_revision($entity_type, $entity) {
+  list($entity_id) = entity_extract_ids($type, $entity);
+  $files = media_entity_field_count_files($entity_type, $entity);
+  foreach ($files as $fid => $count) {
+    $file = file_load($fid);
+    file_usage_delete($file, 'media', $entity_type , $entity_id, $count);
+  }
+}

To:

+function media_field_attach_delete_revision($entity_type, $entity) {
+  list($entity_id) = entity_extract_ids($type, $entity);
+  $files = media_entity_field_count_files($entity_type, $entity);
+  foreach ($files as $fid => $count) {
+    if ($file = file_load($fid)) {
+      file_usage_delete($file, 'media', $entity_type , $entity_id, $count);
+    }
+  }
+}
arthurf’s picture

Good point, re-rolled.

bbinkovitz’s picture

So, am I understanding correctly what this latest patch does? I am trying to write some tests for this.

* Upon save, if a new revision is not being created and a file is being removed from the body of the node, the usage table will decrement immediately.
* Upon save, if a new revision is being created and a file is being removed from the body of the node, the usage table will not decrement.

Is that correct?

arthurf’s picture

@bbinkovitz both of those statements are correct.

HyperGlide’s picture

@arthurf do you feel this patch is ready for testing on a wider scale? I

Great work so far!

arthurf’s picture

@HyperGlide - yes, it would be great to have more testing under different circumstances. I tested what were the obvious cases to me but the more eyes the better.

bbinkovitz’s picture

@arthurf, Is it your intention for the following behavior to occur?

  1. Create node with two usages of a file. File usage increments by two.
  2. Edit node, removing one usage and creating new revision. File usage increments by one.

In other words, the usages of a given file on a given node = the sum total of all usages on all revisions.

The incrementing in step 2 was surprising to me, although I'm not sure what behavior I expected. I just wanted to make sure I understood your intention correctly.

bbinkovitz’s picture

I've attached patch containing a test that ascertains four things:

  1. That file usage is incremented for each occurrence of the file in the markup
  2. That the file usage is decremented for each occurence of the file removed from the markup within a revision
  3. That the file usage is not decremented when all usages are removed in a new revision
  4. That the file usage is incremented by n when a new revision is created with n occurrences of the file in the markup.

Status: Needs review » Needs work

The last submitted patch, 6912728-file_usage_count_revisions_test-80.patch, failed testing.

bbinkovitz’s picture

D'oh. Re-rolled. Includes arthurf's patch and the tests for it. And has the right issue number now too.

bbinkovitz’s picture

Status: Needs work » Needs review

Status: Needs review » Needs work

The last submitted patch, 1268116-file_usage_count_revisions_test-82.patch, failed testing.

bbinkovitz’s picture

Status: Needs work » Needs review
FileSize
11.51 KB

Third time's the charm.

arthurf’s picture

@bbinkovitz- I think this is a good start. There are some code style issues that need to be addressed and some additional tests particularly for revisions I'd like to see:

  • new node with one file inserted, vid == 1, file usage == 1
  • resave same node, vid == 2, file usage == 2
  • resave node with additional file in body, vid == 3, file usage == 4
  • resave node with no file in body, vid == 4, file usage == 4
  • resave node with one instance of the file in body, vid == 5, file usage == 5
  • delete revision 2, file usage == 4
  • delete revision 5, file usage == 4
  • delete revision 3, file usage == 2
  • delete node, file usage == 0
bbinkovitz’s picture

It appears that deleting the node doesn't set usages to zero.

arthurf’s picture

@bbinkovitz - I just tested the patch from #74 with nodes with and without revisions and every case that I've tested with a node delete it correctly deletes the entries from file_usage. Is your comment specific to the tests you've written?

arthurf’s picture

bbinkovitz identified a case where the patch on #74 fails- when a node which is revisioned is saved not as a new revision the usage count was still incremented. In this case the usage has to be adjusted based on the previous revision compared to the current.

Status: Needs review » Needs work
Issue tags: -sprint, -file usage, -Media Initiative

The last submitted patch, file_usage_count_revisions_1.diff, failed testing.

arthurf’s picture

Status: Needs work » Needs review

#89: file_usage_count_revisions_1.diff queued for re-testing.

Status: Needs review » Needs work
Issue tags: +sprint, +file usage, +Media Initiative

The last submitted patch, file_usage_count_revisions_1.diff, failed testing.

arthurf’s picture

Status: Needs work » Needs review
FileSize
6.91 KB

Rerolled patch

Status: Needs review » Needs work

The last submitted patch, file_usage_count_revisions_3.diff, failed testing.

arthurf’s picture

Status: Needs work » Needs review
FileSize
6.99 KB

One more go

bbinkovitz’s picture

Argh. Ok, well just because I already spent a bunch of time on this, here's my (probably way less efficient) solution to this problem.

Also, arthurf, your patch works great. However, when I use it I get errors such as:

Notice: Undefined offset: 3 in _media_filter_add_file_usage_from_fields() (line 77 of /sites/all/modules/media/includes/media.filter.inc)

bbinkovitz’s picture

Here's arthurf's latest patch, rerolled with the same tests that I wrote for mine. I also corrected what I think may have been a small typo on line 202 that looked like it might have been throwing some of the errors.

ParisLiakos’s picture

seems test file is missing from last patch

bbinkovitz’s picture

So it is. Here's another go at actually including the tests.

bbinkovitz’s picture

New version of the previous patch to fix an undefined index error upon decrementing file usages within the same revision number.

bbinkovitz’s picture

Somehow the last comment posted twice. Sorry about that.

wizonesolutions’s picture

Status: Needs review » Needs work

Did not work for my test case:

1. Edited a node.
2. Inserted an existing PDF (Document file type) with fid 285 into the node. Prior to insertion, file_usage showed its count at 1.
3. Saved.
4. Checked file_usage again. Count was still 1.

Generic file formatter. The filter needs to look at the data attribute apparently because this formatter doesn't use the Media-style filter tags.

bbinkovitz’s picture

I'm having trouble even testing this because I'm running into #1743040: "You have not selected anything!" error message, when a selection has clearly been made. It seems problematic to try to fix this because then my patch would rely on another patch. How to proceed?

arthurf’s picture

Status: Needs work » Needs review

@wizonesolutions

I've tested this and have not been able to confirm your experience. Here's what I did:

I created a new node and inserted a pdf into the body area with the wyswig. When I disable the the wysiwyg I see the following markup:

<p><span class="file media-element file-default" data-file_info="%7B%22fid%22:%224%22,%22view_mode%22:%22default%22,%22type%22:%22media%22%7D"><img alt="" class="file-icon" src="/24b6.net/www/modules/file/icons/application-pdf.png" title="application/pdf" /> <a href="http://localhost/24b6.net/www/sites/default/files/test_0.pdf" type="application/pdf; length=77346">test.pdf</a></span></p>

When saved, file usage reports expected results:

mysql> select * from file_usage;
+-----+--------+------+----+-------+
| fid | module | type | id | count |
+-----+--------+------+----+-------+
|   4 | media  | node |  3 |     1 |
+-----+--------+------+----+-------+

I edit the node again, copying the markup above so that there are two instances of the same PDF markup in this body area. Code looks like:

<p><span class="file media-element file-default" data-file_info="%7B%22fid%22:%224%22,%22view_mode%22:%22default%22,%22type%22:%22media%22%7D"><img alt="" class="file-icon" src="/24b6.net/www/modules/file/icons/application-pdf.png" title="application/pdf" /> <a href="http://localhost/24b6.net/www/sites/default/files/test_0.pdf" type="application/pdf; length=77346">test.pdf</a></span></p>

<p><span class="file media-element file-default" data-file_info="%7B%22fid%22:%224%22,%22view_mode%22:%22default%22,%22type%22:%22media%22%7D"><img alt="" class="file-icon" src="/24b6.net/www/modules/file/icons/application-pdf.png" title="application/pdf" /> <a href="http://localhost/24b6.net/www/sites/default/files/test_0.pdf" type="application/pdf; length=77346">test.pdf</a></span></p>

When I save the node, I see:

mysql> select * from file_usage;
+-----+--------+------+----+-------+
| fid | module | type | id | count |
+-----+--------+------+----+-------+
|   4 | media  | node |  3 |     2 |
+-----+--------+------+----+-------+

This is to be expected because I did not create a new revision of the node. Now I edit the node again and do not change anything other than to check the "create a new revision" box. I see:

mysql> select * from file_usage;
+-----+--------+------+----+-------+
| fid | module | type | id | count |
+-----+--------+------+----+-------+
|   4 | media  | node |  3 |     4 |
+-----+--------+------+----+-------+

These are the expected results.

@wizonesolutions it seems like either the patch didn't apply correctly or your markup is somehow wrong. Can you confirm that the markup that you're using looks something like the above? Also what view mode are you choosing to insert the file- in this example I'm using the default.

ParisLiakos’s picture

Assigned: aaron » Unassigned
Issue tags: -file usage +Needs issue summary update

100+ comments..we need an issue summary here

arthurf’s picture

Summary:

Files inserted by the WYSIWYG need to create a record in the Drupal {file_usage} table. Patch from #101 provides code that handles the insertion of images and properly accounts for node revisions as well as providing tests for various use cases raised in this issue.

To review this patch, please run the tests included with it with an eye for the conditions identified in #86, #89, #102, and #104

Further consideration may be needed for non-node entities.

wizonesolutions’s picture

Issue summary: View changes

Update summary.

wizonesolutions’s picture

I put this into the issue summary.

arthurf: I will look at your notes and respond in a bit.

bbinkovitz’s picture

I am not able to reproduce the problem either.

I am curious about this: "Generic file formatter. The filter needs to look at the data attribute apparently because this formatter doesn't use the Media-style filter tags."

The generic file formatter is a field display, not a view mode. If I'm understanding this issue correctly, field display settings are applied separately, and shouldn't affect file usage tracking. Am I misunderstanding something here? Picking a file formatter directly instead of a view mode is not standard behavior and is not supported.

Dave Reid’s picture

I'm curious as to why _media_filter_add_file_usage_from_fields() is so complex and why we can't do something more like the following:

list($entity_id) = entity_extract_ids($entity_type, $entity);

// Clear out usage for this entity.
db_delete('file_usage')
  ->condition('module', 'media')
  ->condition('type', $entity_type)
  ->condition('id', $entity_id)
  ->execute();

$file_counts = media_entity_field_count_files($entity_type, $entity);
$files = file_load_multiple(array_keys($file_counts)); // Load the files in one operation instead of separately with file_load()
foreach ($file_counts as $fid => $file_count) {
  if (!empty($files[$fid])) {
    file_usage_add($files[$fid], 'media', $entity_type, $entity_id, $file_count);
  }
}

If we can at all avoid having file_usage_delete() that would be good since it would prevent files from getting deleted unnecessarily.

arthurf’s picture

@Dave Reid - The reason why I changed the code in the original patch is to support revision handling, I might be missing something but it seems to me that if you delete from file usage you can't keep track of the total number of files that are used across all revisions. That was my thinking anyway.

Dave Reid’s picture

Then how can we account for existing nodes with lots of revisions that have embedded files before the user updates to a version with this patch?

arthurf’s picture

@Dave Reid - I tried to explain my thinking in #71 . Though that code is wrong, the fixes and tests further down the thread seem to confirm that this does work.

bbinkovitz’s picture

@Dave Reid depending on how many revisions exist, this could be pretty heavy. You'd be parsing the entire revision history on every save, which means essentially loading the full node as many times as there are revisions.

It also only works for nodes, until other entities start supporting revisions, so everything else would have to continue to be parsed the same way.

A better option would be a button that can be accessed from Media's administration page or some such, that, when pressed, searches through all nodes whose last update was within a selected date range, and parses through their revision histories, rewriting their file_usage table rows as needed. That way the user will understand the performance toll involved and can schedule that maintenance at their convenience. It also makes sense to bulk update everything if we're going to be updating anything.

wizonesolutions’s picture

There are already other entities with revision support - Field Collection Item (the actual entity that gets created and associated with a Field Collection field), for example. Entity API supports revisions, so it should be assumed that any entity might support revisions. There are functions to check for this IIRC.

arthurf’s picture

The method for finding media data in fields is handled in media_filter_parse_from_fields() and _media_filter_fields_with_text_filtering(). _media_filter_fields_with_text_filtering will get all the fields from an entity and parse each field that has text-filtering enabled. These functions do not have any node specific functionality. The major concern is the conditional _media_filter_add_file_usage_from_fields() which attempts to figure out if the entity being saved is a new revision or not.

if (empty($entity->revision) && empty($entity->old_vid) && empty($entity->is_new) && ! empty($entity->original)) {

My hunch is that the patch makes some node specific assumptions about the data being passed in .

@wizonesolutions can you test this patch with a field collection?

arthurf’s picture

This updates the patch from #101. I wanted to more explicitly test for file deletion against file usage. I've done some abstraction of the tests and some code style cleanup.

arthurf’s picture

Sorry, last diff didn't include the other changes, reroll.

aaron’s picture

Status: Needs review » Reviewed & tested by the community

I have tested this against all the conditions listed in #86, #89, #102, and #104. I have also tested it with field collections. Everything checks out fine from my end.

ParisLiakos’s picture

+++ b/tests/media.file.usage.testundefined
@@ -0,0 +1,269 @@
+    $nid = $node->nid;
+    return $node->nid;

no need for $nid variable.

+++ b/tests/media.file.usage.testundefined
@@ -0,0 +1,269 @@
+    $node->body['und'][0]['value'] = $this->generateFileMarkup($fid, 2);

we should use LANGUAGE_NONE here instead of und.This happens in several places

Sending to bot for testing

ParisLiakos’s picture

Version: 7.x-2.x-dev » 7.x-1.x-dev
Status: Reviewed & tested by the community » Patch (to be ported)

I went through the entire thread and patch..i cant find anything else besides some docs fixes, but i wont hold this patch for this..also we have really good test coverage here..congrats all and good job!:)
I have committed this.

We should consider backporting it to 1.x i think

HyperGlide’s picture

Much needed patch. ++

With all the recent commits should there be a new release?

ParisLiakos’s picture

gmclelland’s picture

does this mean we don't need to use http://drupal.org/project/file_lock anymore?

arthurf’s picture

@gmclelland yes- media should record file usage properly now so files won't be deleted if they are in use in a text field.

gmclelland’s picture

@arthurf - thank you for the clarification

ParisLiakos’s picture

aaron’s picture

Here you go, I have re-rolled this patch for version 7.x-1.x. This just needs to be manually tested, which I have not done yet.

ParisLiakos’s picture

Status: Patch (to be ported) » Needs review
treksler’s picture

the patch in #127 adds a reference to test/media.file.usage.test in media.info but omits the actual tests

aaron’s picture

Oops! Sorry about that.

treksler’s picture

ok the one in #130 refers to test/media.file.usage.test but puts the file in tests/media.file.usage.test
once, I move the file to test/ I get: Class 'MediaTestHelper' not found.

it seems to me that perhaps future versions of media module have a wrapper for DrupalWebTestCase?
is that necessary here? I changed it to DrupalWebTestCase, and all tests passed.

sorry, I am new to Drupal 7, I would submit a patch, but I have to read up on how to do that properly first.

aaron’s picture

I'm sorry about the misnamed directory. This patch fixes that. I don't know offhand about the wrapper for the DrupalWebTestCase but I will let the test bot take care of checking that out.

Status: Needs review » Needs work

The last submitted patch, 1268116-file_usage_count_revisions_test-132.patch, failed testing.

treksler’s picture

Status: Needs work » Needs review
FileSize
17 KB

I took the last patch and swapped out MediaTestHelper with DrupalWebTestCase

Status: Needs review » Needs work

The last submitted patch, 1268116-file_usage_count_revisions_test-134.patch, failed testing.

treksler’s picture

looks like there were some exceptions in creating the test files
Undefined property: stdClass::$filemime Notice media.types.inc 196 media_is_type()

I am pretty sure that I did not get these exceptions last time, but anyway..

I have to admit that since I am not intimately familiar with Drupal 7 or the Media module codebase yet, I have no idea what is going on there.
BUT, upon taking a closer look, it seems that the other tests that are already there are setting up the test files in a round about way..

Eg. testQueryMedia() in media.entity.test calls:
$file = file_uri_to_object($files[0]->uri);
instead of:
$file = $files[0];

no Idea why, but making that change works.
maybe it fills in a default missing media type? just a guess.

I am uploading a patch that should work.

treksler’s picture

Status: Needs work » Needs review
Dave Reid’s picture

Issue tags: +7.x-2.0 alpha blocker
ParisLiakos’s picture

Issue tags: -7.x-2.0 alpha blocker

this is already in 2.x
this issue is open for backport only

Dave Reid’s picture

Blargh, thanks ParisLiakos.

treksler’s picture

Status: Needs review » Reviewed & tested by the community

the only review that was needed was to verify whether it was OK for the test to extend DrupalWebTestCase directly (instead of MediaTestHelper) since the tests pass, I think the review is unneccessary. correct me if I am wrong.

treksler’s picture

the odds of this ever getting committed are what exactly? should I bother testing patches in the future?

ParisLiakos’s picture

@treksler given this is a major bug, i would say it is gonna committed rather soon and be in next 1.x version
thanks for testing!
A maintainer will care for it once he finds time

tobiasb’s picture

FileSize
7.16 KB

This is a patch for the current version "2.0-unstable7". Useful for drush make or something.

aaron’s picture

Status: Reviewed & tested by the community » Fixed

I've committed this. Thanks for the hard work everyone!

David_Rothstein’s picture

Status: Fixed » Reviewed & tested by the community

It looks like the test file (test/media.file.usage.test) was left out of the commit?

Devin Carlson’s picture

Assigned: Unassigned » Devin Carlson
Status: Reviewed & tested by the community » Needs review
Issue tags: +needs backport to 1.x
FileSize
9.58 KB

A patch to add the missing test file (just checking that it's still passing).

Devin Carlson’s picture

Status: Needs review » Fixed

Committed #147 to 7.x-1.x.

Automatically closed -- issue fixed for 2 weeks with no activity.

Anonymous’s picture

Issue summary: View changes

Updated issue summary.

PascalAnimateur’s picture

Issue summary: View changes

I can only get the file usage to work on image files, not on document / audio / video. Is this normal?

I'm using the following:
media 7.x-2.0-alpha4+26-dev
file_entity 7.x-2.0-beta1+22-dev
wysiwyg 2.x-dev
CKEditor 4.4.7

Even with CKEditor's ACF disabled, the media token [[...]] isn't stored in the database for audio/video/document.

Am I missing something?

Update: It turns out the JS function replacePlaceholderWithToken didn't recognize the needed <audio> / <video> / <iframe> tags. The problem is around line 97 of media_wysiwyg.filter.js, where the code expects the media-element class to be found on either img or span tag. I managed to fix this by adding the following lines after the regex for span:

regex += '|<audio[^>]+' + classRegex + '[^>]*?>[\\S\\s]+?</audio>';
regex += '|<video[^>]+' + classRegex + '[^>]*?>[\\S\\s]+?</video>';
regex += '|<iframe[^>]+' + classRegex + '[^>]*?>[\\S\\s]+?</iframe>';
PascalAnimateur’s picture

cclafferty’s picture

Thanks for this PascalAnimateur your solution almost fixed this problem for me.

I found two problems with the regex in the replacePlaceholderWithToken code:

  • Closing tags weren't being honoured correctly because forward slashes weren't being escaped. </span> needs it's forward slash escaped like this <\/span>
  • Whitespace wasn't being matched correctly: [\\S\\s]+? needed to be replaced with [\\S\\s]*?

Here is my final working combined combined code:

regex += '|<span[^>]+' + classRegex + '[^>]*?>[\\S\\s]*?<\/span>';
regex += '|<audio[^>]+' + classRegex + '[^>]*?>[\\S\\s]*?<\/audio>';
regex += '|<video[^>]+' + classRegex + '[^>]*?>[\\S\\s]*?<\/video>';
regex += '|<iframe[^>]+' + classRegex + '[^>]*?>[\\S\\s]*?<\/iframe>';
PascalAnimateur’s picture

Re-rolled the patch with information from #152.

Thanks @cclafferty !

rooby’s picture

Doing [\\S\\s] seems unusual. It means any non-whitespace or any whitespace character, which means any character, so why not just do .*??

cclafferty’s picture

@rooby I don't think . matches line breaks whereas [\\S\\s] does.
I was having a play around with the regex here if you want to have a look: https://regex101.com/r/bY2kY1/1

rooby’s picture

Ah very true. My mistake.

rdentry’s picture

I'm still having trouble with the File Usage not updating for image added via wysiwyg. I am currently using 7.x-2.0-rc3+3-dev . Does anyone know if this patch has since been rolled into 7.x-2.x-dev? I see that the patch proposed in #101 is for 7.x-1.x-dev.

Reproducing my issue is as follows:

  • Add an image through the "media browser" link in the "ckeditor module wysiwyg editor"
  • Publish Node
  • Check the "Used In" column here: /admin/content/file/list . And the image shows up in the list, but says that it is "Used In" 0 places

Any help with this or guidance to finding the solution would be much appreciated.

darrell_ulm’s picture

Closed issue, but am seeing same as #157 https://www.drupal.org/node/1268116#comment-11885837

joseph.olstad’s picture

@darrell_ulm , open a new issue with your findings and link this one to it . We won't re-open this closed issue but feel free to make a new issue.

darrell_ulm’s picture

Thank you for that. Solved now by looking at config more.