Currently the media filter turns off text format caching, which means that any text format which contains this filter will not be cached by the Filter module.

This is something of a performance hit - although mitigated by the fact that most filtered content passes through the Field API (which has its own cache), it's still not good because it means any other filtered text (e.g., user signatures) using a format with this filter in it will not be cached at all...

The patch to change this is simple at first (attached), but it may have side effects. When the media filter is run, it does depend on some of the current site settings; e.g., first media_variable_get('wysiwyg_default_view_mode') is called, then media_load(), then media_get_file_without_label(), etc. If the output of those change, then the filtered media content can change too, so unless the filter cache is being appropriately cleared at those times it might lead to stale data.

I have not tested this, but if that's the case then actually making it work correctly may be more trouble than it's worth :(

Comments

catch’s picture

Priority: Normal » Major

The field API does not cache uncached filters - it entirely respects hook_filter_info() - see http://drupalcontrib.org/api/function/text_field_load/7

So this is very much needed. The field content cache and/or entity cache are very easy to clear if needed. This is taking 12% of the request on the front page of site I'm profiling, and that front page is only rendering the teasers of four nodes.

JacobSingh’s picture

Status: Needs review » Needs work

I'm in favor of this, but we should also include code to invalidate the caches which are affected when they are changed.

-J

catch’s picture

Started looking at this.

There doesn't appear to be a UI for media_variable_get('wysiwyg_default_view_mode'); so not sure how that would change? If the default changes the caches could be cleared in an update. But I may be missing something since this is the first time I've had a proper look at media module, and I've only been looking at these bits.

More problematic is media_load(). I can't see any way to track which textareas (let alone what they're attached to) reference which media, but disabling check_markup() caching is sufficiently expensive that it might be worth trying to work around this.

In _media_markup() we have $media['fid'] available - that's the only way to know which media are referenced anywhere.

When this runs, we can record in a tracking table that the media is referenced by the filter - a table with a single column of 'fid' would work. Just check if it's there, if it's not add a row, delete rows when media are deleted, consider wiping the table in hook_flush_caches() since it's fine if it's empty in a cold start situation (apart from the expense of building it), and this would save media that are no longer referenced being left in there indefinitely.

Then when media are updated, check in this table whether the fid is there, and if it is, clear the field and filter caches (entirely). The 30% hit from check_markup() from just four node teasers is sufficiently bad that I think this could be worth the effort.

On a normal media update, if it's not in a textarea, then that's just going to be a well-indexed select. If it is referenced, then you'll have a full field cache clear, but unless a site is embedding all their media and updating it all the time, that should still be a net win.

catch’s picture

Status: Needs work » Needs review
StatusFileSize
new8.09 KB

Here's a patch, not tested yet.

Adds a media_filter_usage table, this holds just fid and timestamp.

When an fid is found by _media_markup(), calls media_filter_usage_add(). This does a db_merge() on that table, but only if we haven't updated it in the past month to avoid too many db_merge() happening.

When media is updated, check if there's an entry in this table, and if there is clear the entire filter + field caches, more discussion of this in the code comments.

In hook_flush_caches() delete records from media_file_usage() that have a last updated timestamp of more than 120 days ago - this prevents fids staying in there forever, even if the text that referenced them has long since been changed. Since it's done during hook_flush_caches(), the filter and field caches have just been emptied, so valid entries will be put back in the table as those get built again.

I think if we went for just flushing everything on media update, that could easily lead to expiring the field and filter caches much too often and could be counter productive. On the other hand trying to figure out which specific caches to flush would be expensive in itself, and even more complicated.

Will test this tomorrow, update the patch for any bugs found and post some profiling comparisons - put posting up for the general approach now.

catch’s picture

StatusFileSize
new99.51 KB
new107.4 KB
new8.1 KB

Updated patch fixes a notice - was calling media_filter_usage($fid); when it should be media_filter_usage($media['fid']);

More interestingly, when you change a filter from cache => 0 to cache => 1, core doesn't make any changes to the {filter_format} table even when doing a full cache flush. For here we can likely load all the filters and save them again in an update, and that ought to set things correctly, will write a patch for that tomorrow, but feels like a core bug.

Also attaching before/after profile output.

catch’s picture

StatusFileSize
new8.36 KB

Same patch now with an update to work around the core bug, opened an issue for that at #993230: Text format cacheability is not updated.

JacobSingh’s picture

I didn't test this, but assume the following:

1. Embed some media using view_mode=media_large
2. Change the properties of the media_large view mode (or the style it references).

If I understand the patch correctly, this won't take effect until a cache clear. I didn't bother to code dive, but I doubt core would just clear this up. Or would it? Does the styles module need to clear all filter caches when a definition is updated?

Does core do that for format changes?

catch’s picture

The output of check_markup() is now cached as part of the field cache via text_field_load(), bypassing the normal filter cache. So for the case of a view mode changing, field_update_instance() would take care of the cache clearing.

It looks like there are two cases that need to be covered then:

* Clear the filter cache in hook_update_instance() - for when the media filter is used in a non text field like custom blocks etc.
* Have styles module (or media module via a hook) clear the filter and field caches when a style is updated.

Will start looking at that now.

catch’s picture

StatusFileSize
new9.81 KB

New patch implements hook_field_formatter_update() and hook_file_style_flush() to clear the field and filter caches on those two events. Also makes some minor changes elsewhere in the patch.

catch’s picture

StatusFileSize
new10.08 KB

filter_format_save() without manually setting $format->filters will wipe out existing filter settings, updated patch that accounts for that -background at #993230: Text format cacheability is not updated

effulgentsia’s picture

Issue tags: +Performance

Thanks for the work on this. I'm reviewing it now. Please update this issue if there's any new info I should be aware of.

catch’s picture

No new changes here, should be ready for review as far as I know.

effulgentsia’s picture

Status: Needs review » Fixed
StatusFileSize
new11.15 KB

#10 looks really good. I did some minor cleanup, and committed this patch to CVS HEAD. Thanks again.

effulgentsia’s picture

StatusFileSize
new1.22 KB

Committed this as well. Not strictly necessary with default core config, since image width and height aren't added to the markup by default, but I think it's a good idea, in case someone decides to do so.

Status: Fixed » Closed (fixed)
Issue tags: -Performance

Automatically closed -- issue fixed for 2 weeks with no activity.