Context-aware filters

rötzi - February 26, 2008 - 13:54
Project:Drupal
Version:7.x-dev
Component:filter.module
Category:feature request
Priority:normal
Assigned:cwgordon7
Status:patch (code needs work)
Description

At the moment, a filter only gets a piece of text without any information about where this text comes from. Sometimes it can be useful to know where the text comes from to generate appropriate output. The example I have is the following:
You have your nodes grouped (e.g. by organic groups) and encounter a wikilink [[Page name]]. Now you would like to know in which group the current node is to link to the corresponding page in the same group.

The patch changes this. You can pass the context to the filter as a new argument in check_markup and hook_filter. If no context is available, just pass NULL and you have the same behaviour as now. A filter can then check if a context is set and behave differently depending on the context.

Also I changed the name of 'check_markup' to 'apply_filters'. This function does a lot more than just "checking", i.e. filtering and caching.

The patch has still some issues, for example should the cache id be different depending on the context, and the documentation probably needs to be adapted as well. But I first want to know if this feature will be considered at all.

AttachmentSize
filters_with_context.patch8.34 KB

#1

cwgordon7 - February 27, 2008 - 18:13

Big +1! This would be awesome!!

#2

Susurrus - March 2, 2008 - 02:58
Status:active» patch (code needs review)

I think makes a lot of sense. +1 here and a status update.

#3

moshe weitzman - March 2, 2008 - 12:51

Subscribe. I have felt the pain of missing context many times. +1.

#4

Wim Leers - March 3, 2008 - 13:14

Subscribing.

#5

chx - March 3, 2008 - 14:24

I am inclined to won't fix. This is a task for nodeapi op view not filters. The very concept of filters is not being content aware. What will you do with something that's not a node...?

#6

moshe weitzman - March 3, 2008 - 14:31

The problem is that nodeapi view is not cached. We don't want to run nasty filter code on every request.

#7

cwgordon7 - March 4, 2008 - 01:10

I can see other use cases beyond nodes. For example, comments, block descriptions, etc., are all filtered; these could potentially all benefit from this.

And if this is node-specific, then we may need to reconsider how nodes are filtered by the filter system, because the current version just doesn't work.

#8

chx - March 4, 2008 - 03:30

Whatever. Feel free to ignore me on this.

#9

cwgordon7 - May 10, 2008 - 16:51
Status:patch (code needs review)» patch (code needs work)

patch no longer applies cleanly on HEAD.

#10

sun - June 22, 2008 - 11:34

+1 on the feature.
-1 on renaming check_markup(), additional arguments are sufficient.
-1 on limiting this to nodes.

As outlined in Add context for check_markup() at g.d.o, check_markup() is used for blocks, comments, nodes, profile fields, and users in core currently (see actual occurrences over there).

The goal should be to pass any type of content (a string) along with the object itself (most often an object) to check_markup(). Leaving conditional code for handling a certain object to filters that care.

#11

moshe weitzman - July 3, 2008 - 21:43

I suppose that a module can use nodeapi('view') and do its own caching of the result. Same for non module operations. You don't *have* to use the filter system.

#12

mariuss - July 3, 2008 - 21:44

Here is an example where you need context, and it is not for a node but a taxonomy term:

The Glossary module uses taxonomies to store the glossary data. It provides a filter that will highlight and link words that are part of the glossary. Even glossary term descriptions have this filter applied. Ideally when looking at a glossary term you don't want that term to be highlighted in its own definition, it looks silly. But without some context it is impossible for the glossary filter to figure when words to ignore and when.

See: http://drupal.org/node/276198

#13

sun - July 3, 2008 - 22:55

#8000: Filter contexts (and fix some small issues in filtering) has been marked as duplicate of this issue. Includes a patch as well.

Well, now that I recognize the issue number of that issue, I would rather want to mark this one duplicate... ;)

#14

mfer - July 4, 2008 - 15:23

Let's think about the scalability of this solution in the database. Currently, for every instance we have a database entry. If contexts were added in we would have to multiply that by the number of contexts for that instance in the most basic of cases. If two or more filters took advantage of contexts it would be a much larger growth rate than simple multiplication. This just doesn't seem scalable and could quickly grow out of control.

I can understand the issue of contexts. It's something I've personally struggled with.

What if, instead, we added a postprocess op to hook_filter (or a hook_filter_postprocess). When content is processed through filters we record the filters applied to it. When it's loaded we pass in the content to the postprocess op for the applied filters. This would let us do something similar to the hook_nodeapi view operation everywhere and let the filters define it for themselves. Thoughts?

#15

sun - August 2, 2008 - 02:01

@mfer: There seems to be a misunderstanding - the goal of this issue is to pass information about the filtered text to all invoked filters. This does not change input formats or the number of filter per input format at all. But filters would finally be able to use the passed-in information intelligently, f.e. using $node->nid or $user->uid to render some markup that is tied to the object itself.

Steven also noted that filter caching could be a caveat, if some dumb filters try to use data outside of the cached content - however, for my use-case (Inline API) I just want to get some information passed in, so the contents of a filtered field are able to query some data and/or directly use values from the passed in object.

#16

cwgordon7 - August 2, 2008 - 03:03
Assigned to:Anonymous» cwgordon7

Due to the comment by Steven, I feel I have no choice but to roll this patch to prove that I am not just subscribing and +1'ing and "fully expect[ing] others in the community to solve it, while I reap the benefits." :(

Patch later tonight. Not to be a reroll.

#17

cwgordon7 - August 2, 2008 - 05:15
Status:patch (code needs work)» patch (code needs review)

This changes the filtering system to cache everything on a per-context basis.

AttachmentSize
filter_context_01.patch14.56 KB

#18

mfer - August 2, 2008 - 16:18

@cwgordon7, thanks for the patch. First, I like the idea of knowing more about the object in a filter. It's more a matter of making sure the solution doesn't break other parts of drupal.

One thing to note about this patch is that if in hook_nodeapi there is a conditional addition to the node object in the load operation there may be (at least) two different rows in the cache_filter table (which might, already, be the table with the most rows in drupals db). This wouldn't affect all conditional includes but it could for some. This leaves a window open for sloppy or unknowing developers to over fill the filter_cache table.

Is this an acceptable risk?

#19

cwgordon7 - August 2, 2008 - 18:09

@#18 - This is a definite and unavoidable side-effect, imo. By the very nature of being context-aware, if a node is loaded differently in two different requests, the filtered text itself may result in different text as well.

Another option would be to add a separate op to hook_nodeapi (case 'filter') in which only modules intending to use filtering on nodes with node contexts would add their stuff. I don't like that, however, as it doesn't really allow modules to use other modules' node properties effectively.

A third option would be to cache based on id (e.g. nid, cid, or bid). However, with this option, if a node or comment or block etc changes, the filter is still cached. I kind of like this option better; the only problem, really, is having to delete from the cache_filter table after node/comment/etc. updates where like '%%:%%:type:%d', and that's relatively easy to accomplish.

Other thoughts?

#20

mfer - August 2, 2008 - 19:32
Status:patch (code needs review)» patch (code needs work)

As a proof of the issue from #18, in comment_nodeapi a comment count is added to the node object. Each time there is a new comment added a new comment count will show up on the node and a new row will end up in cache_filter. On a large site like d.o this would be very bad.

If we attempt cache invalidation like the 3rd option in #19 what would the affects (performance) be and what would be the best way? Especially on a mysql myisam table with table level locking.

@cwgordon7 don't feel discouraged. This is a very tough problem to solve.

“There are only two hard things in Computer Science: cache invalidation and naming things” --Phil Karlton

#21

cwgordon7 - August 2, 2008 - 20:05

What if a filter relies on the comment count? :P

No, but the thing is, I agree with you. If something relies on the comment count, it can just case 'no cache': return TRUE;. Or we could add another hook_filter op, e.g. 'context dependent': return (bool);.

Thoughts?

#22

NancyDru - October 2, 2008 - 19:07

+1 for the idea, and possibly for the rename.

BUTS:

  1. I would rather see the context param moved to the end. By your own description, this is an optional parameter. There are going to be lots of changes in contribs just from the renaming; let's at least make them work with a global search-and-replace.
  2. As Marius pointed out, I am very interested in filtering terms as well.
  3. Perhaps it is unnecessary to mention this here, but filters do not operate on nodes; they operate on text. That text may come from nodes, but is not limited to that.
  4. Can we not apply a cache lifetime to the filter cache to assist in limiting this table?

#23

mfer - October 2, 2008 - 19:55

@NancyDru Context aware filters are quite a complicated issue. The solution in the patch above needs some major rework.

Adding a cache lifetime isn't a solution to the problem of cache invalidation in this case. It's a kind of work around or hack that doesn't address how the system can get out of hand and where this is flawed. Even with a cache invalidation of 1 hour I can come up with cases that could kill the filter cache on a medium sized site. It's a design flaw. A simple modification of the anonymous filtering system won't work. If there was no caching this would work. The issue is the filter caching.

Think about it this way. If module loads (via hook_nodeapi) anything that's custom per user, time, or anything that changes regularly the caching system will blow up. The information stored in an object needs to stay static or relatively static. This is a major limiting factor.

If the caching system were out of the loop on some of this context information the issue wouldn't be there. Maybe the solution is run the filter system like it does now for caching. Then, after it loads a cached piece of information allow a hook or function to modify the filtered text. A kind of 2 stage filtering system where the context part of it is not cached. It would work similar to how hook_nodeapi with the view op is used now to alter text on nodes but it wouldn't be limited just nodes.

Does this make sense? Thoughts?

 
 

Drupal is a registered trademark of Dries Buytaert.