Project:Drupal core
Version:8.x-dev
Component:filter.module
Category:feature request
Priority:normal
Assigned:cwgordon7
Status:needs review
Issue tags:FilterSystemRevamp, token

Issue Summary

At the moment, a filter only gets a piece of text without any information about where this text comes from. Sometimes it can be useful to know where the text comes from to generate appropriate output. The example I have is the following:
You have your nodes grouped (e.g. by organic groups) and encounter a wikilink [[Page name]]. Now you would like to know in which group the current node is to link to the corresponding page in the same group.

The patch changes this. You can pass the context to the filter as a new argument in check_markup and hook_filter. If no context is available, just pass NULL and you have the same behaviour as now. A filter can then check if a context is set and behave differently depending on the context.

Also I changed the name of 'check_markup' to 'apply_filters'. This function does a lot more than just "checking", i.e. filtering and caching.

The patch has still some issues, for example should the cache id be different depending on the context, and the documentation probably needs to be adapted as well. But I first want to know if this feature will be considered at all.

AttachmentSizeStatusTest resultOperations
filters_with_context.patch8.34 KBIdleFAILED: [[SimpleTest]]: [MySQL] Unable to apply patch filters_with_context.patch.View details | Re-test

Comments

#1

Big +1! This would be awesome!!

#2

Status:active» needs review

I think makes a lot of sense. +1 here and a status update.

#3

Subscribe. I have felt the pain of missing context many times. +1.

#4

Subscribing.

#5

I am inclined to won't fix. This is a task for nodeapi op view not filters. The very concept of filters is not being content aware. What will you do with something that's not a node...?

#6

The problem is that nodeapi view is not cached. We don't want to run nasty filter code on every request.

#7

I can see other use cases beyond nodes. For example, comments, block descriptions, etc., are all filtered; these could potentially all benefit from this.

And if this is node-specific, then we may need to reconsider how nodes are filtered by the filter system, because the current version just doesn't work.

#8

Whatever. Feel free to ignore me on this.

#9

Status:needs review» needs work

patch no longer applies cleanly on HEAD.

#10

+1 on the feature.
-1 on renaming check_markup(), additional arguments are sufficient.
-1 on limiting this to nodes.

As outlined in Add context for check_markup() at g.d.o, check_markup() is used for blocks, comments, nodes, profile fields, and users in core currently (see actual occurrences over there).

The goal should be to pass any type of content (a string) along with the object itself (most often an object) to check_markup(). Leaving conditional code for handling a certain object to filters that care.

#11

I suppose that a module can use nodeapi('view') and do its own caching of the result. Same for non module operations. You don't *have* to use the filter system.

#12

Here is an example where you need context, and it is not for a node but a taxonomy term:

The Glossary module uses taxonomies to store the glossary data. It provides a filter that will highlight and link words that are part of the glossary. Even glossary term descriptions have this filter applied. Ideally when looking at a glossary term you don't want that term to be highlighted in its own definition, it looks silly. But without some context it is impossible for the glossary filter to figure when words to ignore and when.

See: http://drupal.org/node/276198

#13

#8000: Filter contexts (and fix some small issues in filtering) has been marked as duplicate of this issue. Includes a patch as well.

Well, now that I recognize the issue number of that issue, I would rather want to mark this one duplicate... ;)

#14

Let's think about the scalability of this solution in the database. Currently, for every instance we have a database entry. If contexts were added in we would have to multiply that by the number of contexts for that instance in the most basic of cases. If two or more filters took advantage of contexts it would be a much larger growth rate than simple multiplication. This just doesn't seem scalable and could quickly grow out of control.

I can understand the issue of contexts. It's something I've personally struggled with.

What if, instead, we added a postprocess op to hook_filter (or a hook_filter_postprocess). When content is processed through filters we record the filters applied to it. When it's loaded we pass in the content to the postprocess op for the applied filters. This would let us do something similar to the hook_nodeapi view operation everywhere and let the filters define it for themselves. Thoughts?

#15

@mfer: There seems to be a misunderstanding - the goal of this issue is to pass information about the filtered text to all invoked filters. This does not change input formats or the number of filter per input format at all. But filters would finally be able to use the passed-in information intelligently, f.e. using $node->nid or $user->uid to render some markup that is tied to the object itself.

Steven also noted that filter caching could be a caveat, if some dumb filters try to use data outside of the cached content - however, for my use-case (Inline API) I just want to get some information passed in, so the contents of a filtered field are able to query some data and/or directly use values from the passed in object.

#16

Assigned to:Anonymous» cwgordon7

Due to the comment by Steven, I feel I have no choice but to roll this patch to prove that I am not just subscribing and +1'ing and "fully expect[ing] others in the community to solve it, while I reap the benefits." :(

Patch later tonight. Not to be a reroll.

#17

Status:needs work» needs review

This changes the filtering system to cache everything on a per-context basis.

AttachmentSizeStatusTest resultOperations
filter_context_01.patch14.56 KBIdleFAILED: [[SimpleTest]]: [MySQL] Invalid patch format in filter_context_01.patch.View details | Re-test

#18

@cwgordon7, thanks for the patch. First, I like the idea of knowing more about the object in a filter. It's more a matter of making sure the solution doesn't break other parts of drupal.

One thing to note about this patch is that if in hook_nodeapi there is a conditional addition to the node object in the load operation there may be (at least) two different rows in the cache_filter table (which might, already, be the table with the most rows in drupals db). This wouldn't affect all conditional includes but it could for some. This leaves a window open for sloppy or unknowing developers to over fill the filter_cache table.

Is this an acceptable risk?

#19

@#18 - This is a definite and unavoidable side-effect, imo. By the very nature of being context-aware, if a node is loaded differently in two different requests, the filtered text itself may result in different text as well.

Another option would be to add a separate op to hook_nodeapi (case 'filter') in which only modules intending to use filtering on nodes with node contexts would add their stuff. I don't like that, however, as it doesn't really allow modules to use other modules' node properties effectively.

A third option would be to cache based on id (e.g. nid, cid, or bid). However, with this option, if a node or comment or block etc changes, the filter is still cached. I kind of like this option better; the only problem, really, is having to delete from the cache_filter table after node/comment/etc. updates where like '%%:%%:type:%d', and that's relatively easy to accomplish.

Other thoughts?

#20

Status:needs review» needs work

As a proof of the issue from #18, in comment_nodeapi a comment count is added to the node object. Each time there is a new comment added a new comment count will show up on the node and a new row will end up in cache_filter. On a large site like d.o this would be very bad.

If we attempt cache invalidation like the 3rd option in #19 what would the affects (performance) be and what would be the best way? Especially on a mysql myisam table with table level locking.

@cwgordon7 don't feel discouraged. This is a very tough problem to solve.

“There are only two hard things in Computer Science: cache invalidation and naming things” --Phil Karlton

#21

What if a filter relies on the comment count? :P

No, but the thing is, I agree with you. If something relies on the comment count, it can just case 'no cache': return TRUE;. Or we could add another hook_filter op, e.g. 'context dependent': return (bool);.

Thoughts?

#22

+1 for the idea, and possibly for the rename.

BUTS:

  1. I would rather see the context param moved to the end. By your own description, this is an optional parameter. There are going to be lots of changes in contribs just from the renaming; let's at least make them work with a global search-and-replace.
  2. As Marius pointed out, I am very interested in filtering terms as well.
  3. Perhaps it is unnecessary to mention this here, but filters do not operate on nodes; they operate on text. That text may come from nodes, but is not limited to that.
  4. Can we not apply a cache lifetime to the filter cache to assist in limiting this table?

#23

@NancyDru Context aware filters are quite a complicated issue. The solution in the patch above needs some major rework.

Adding a cache lifetime isn't a solution to the problem of cache invalidation in this case. It's a kind of work around or hack that doesn't address how the system can get out of hand and where this is flawed. Even with a cache invalidation of 1 hour I can come up with cases that could kill the filter cache on a medium sized site. It's a design flaw. A simple modification of the anonymous filtering system won't work. If there was no caching this would work. The issue is the filter caching.

Think about it this way. If module loads (via hook_nodeapi) anything that's custom per user, time, or anything that changes regularly the caching system will blow up. The information stored in an object needs to stay static or relatively static. This is a major limiting factor.

If the caching system were out of the loop on some of this context information the issue wouldn't be there. Maybe the solution is run the filter system like it does now for caching. Then, after it loads a cached piece of information allow a hook or function to modify the filtered text. A kind of 2 stage filtering system where the context part of it is not cached. It would work similar to how hook_nodeapi with the view op is used now to alter text on nodes but it wouldn't be limited just nodes.

Does this make sense? Thoughts?

#24

.

#25

Version:7.x-dev» 8.x-dev

#26

Issue tags:+token

Tagging as this blocks creating an easy, simple token filter.
#141430: Merge Token and Rep[lacement]Tags module

#27

#28

Issue tags:+token

WTF. It's a valid tag since it's blocking a potential token issue.

#29

To anyone concerned: You need to completely read and understand #8000: Filter contexts (and fix some small issues in filtering)

I've spent the last years tinkering about this challenge and as stated in that issue, it's one of the most critical problems for filter module authors, but we yet have to find a solid, and most importantly, reliable and secure solution.

Before attempting to write patches, we need very very creative and innovate ideas. If you have one, you can reach me anytime, everywhere.

#30

Subscribing.

Haven't checked the patches yet, but one thing that got me stuck when writing a filter once, was that the filter doesn't know who's content it's filtering. Therefore my whole plan to convert /me is having lunch to *Xano is having lunch* didn't work out.

#31

Status:needs work» needs review

Here's a thought: introduce a 'context callback' in hook_filter_info() that takes $type and $object as parameters, where $type is something like 'node', 'comment', or 'block', and $object is the $node, $comment, or $block object. The hook would then return an array of contextual data that it needs in order to filter. The caching is then handled based on changes in the contextual data: if the contextual data changes, the cache needs to be skipped. This effectively creates a new layer in which filters tell us when re-application of filters becomes necessary. Comes with a test to confirm that it's working as advertised. To confirm that this meets our needs, let's examine some use cases:

1. Converting /me is having lunch to *[Username] is having lunch*.

<?php
function example_filter_context_callback($text, $filter, $format, $type, $object) {
  switch (
$type) {
    case
'node':
    case
'comment':
     
$account = user_load($object->uid);
     
// Filter will have to recache when the account name changes, either when the author of the comment/node changes or when the user changes their name.
     
return array(
       
'user_name' => $account->name,
      );
  }
}
?>

2. Converting [[Image:1]] to the first image attached to the node.

<?php
function example_filter_context_callback($text, $filter, $format, $type, $object) {
  switch (
$type) {
    case
'node':
      if (!empty(
$node->images)) {
       
// Filter will have to recache when the images on a node change at all; i.e., they're added, removed, changed, or reordered.
       
return array(
         
'inline_images' => $node->images,
        );
      }
      break;
  }
}
?>

3. Creating wikilinks based on organic group:

<?php
function example_filter_context_callback($text, $filter, $format, $type, $object) {
  switch (
$type) {
    case
'comment':
     
$object = node_load($object->nid);
    case
'node':
      if (!empty(
$object->og) && $group = node_load($object->og)) {
       
// Filter will have to recache when the node's organic group changes (or is removed, or added, etc.), and also when the organic group's title changes.
       
return array(
         
'og_nid' => $group->nid,
         
'og_title' => $group->title,
        );
      }
  }
}
?>

4. Pirate filter needs to know if it's September 19th or not.

<?php
function example_filter_context_callback($text, $filter, $format, $type, $object) {
 
// Caching is invalidated when the date changes to September 19th (pirate turns on), and also when the date changes to September 20th (pirate turns off).
 
return array(
   
'pirate_day' => (date('md') == '0919'),
  );
}
?>

Note: these are all just code samples, they may not be correct in the particulars but the ideas behind them should be clear.

Even though this won't go in until 8.x, please let me know what you think!

Thanks!
Charlie

AttachmentSizeStatusTest resultOperations
filter_context_02.patch29.28 KBIdleFAILED: [[SimpleTest]]: [MySQL] Unable to apply patch filter_context_02.patch.View details | Re-test

#32

Rerolled the patch to keep up with HEAD.

AttachmentSizeStatusTest resultOperations
filter_context_03.patch27.6 KBIdleFAILED: [[SimpleTest]]: [MySQL] Unable to apply patch filter_context_03.patch.View details | Re-test

#33

Subscribing.

#34

Patch no longer applied, rerolled.

I'd appreciate some reviews on this, please. Thanks!

AttachmentSizeStatusTest resultOperations
filter-context-226963-34.patch26.83 KBIdleFAILED: [[SimpleTest]]: [MySQL] 29,940 pass(es), 23 fail(s), and 996 exception(es).View details | Re-test

#35

Status:needs review» needs work

The last submitted patch, filter-context-226963-34.patch, failed testing.

#36

Thanks for keeping this rollin', @cwgordon7.

However, unless someone beats me to it, I need to find some time to do an in-depth review of #8000: Filter contexts (and fix some small issues in filtering) and related/duplicate issues, especially with regard to @unconed's strong objections (the original author of Drupal's text format and filter system), and pull out + summarize all of that into a separate, public post somewhere, possibly g.d.o. This will have to catch all and possibly hidden details and has to be 100% unbiased.

Without that, this issue won't go anywhere.

So, to clarify: We don't need further patches or patch reviews. This issue attempts to invalidate and ignore the objections of the person who originally wrote and designed the API you are trying to change. That API hasn't changed since it was invented, so it's not clear why we believe that all of those objections would suddenly not exist anymore. We need an absolutely rock solid and informed base, which allows us to prove that this change can be done and none of @unconed's counter-arguments still stand.

#37

Based on my quick read, it seemed that unconed's primary objection was that he wanted to solve the larger, overall context problem and not implementing something in the filter system which could then be adapted to a larger context system (*ahem*...Butler). In addition, it's SEVEN years after Steven's concerns about this. Drupal has changed a lot (yes, including the filter API), mostly for the better. We have tokens in core and this issue would help pave the way for a token filter. I don't think we should have to 100% disprove everything Steven said in order for this to fly.

#38

Steven's objections:

- Filters are used for more than just nodes (e.g. comments, but also profile fields and such). Any filter functionality which requires node access will have a high hack-factor IMO and would probably benefit from more extensive filter-system modifications. I know this is a rather annoying, but persistent fact of the filter system.

This system takes this into account and does not special case nodes, in fact blocks, which are not entities, can be used as contextual information as well. The 'type' parameter is passed in.

- Just giving access to the node object doesn't achieve much IMO: the filtering module has no ownership over the node, so any advanced filtering functionality would have to be implemented in odd and non- or badly configurable ways. At most, some extra node info can be read out, but IMO your mentioned use for this patch is rather unorthodox.

I have trouble understanding what this means. Perhaps this doesn't apply anymore, but made sense at the time the comment was written? If someone can clarify, please do so. In any case, we allow modules to provide context callbacks to extract the pieces of content they're going to need.

The biggest problems I see with the filter system is that it is 'stupid'. It treats anonymous pieces of text. Your patch clearly addresses this by giving contextual info by passing the node along. However, many other filter ideas require other kinds of contextual info, such as admins being able to bypass HTML tag limitations (user/role info), allowing user-chosen format options such as HTML vs Plain-text (text origin metadata) and such.

IMHO we shouldn't be considering a user's roles while filtering. Users should either have access to a filter, or not. Keeping the system simple is the way to minimize the potential for security holes. I'm not sure what was meant by "allowing user-chosen format options"; if someone can clarify, again, please do so.

Still, IMO we need to work out a proper context system for situations like this: people often want access to the current node from a block or from the theme.

Yes, it would be awesome if we had some sort of global contextual system (i.e. butler) but here we are 7 years later and we don't. This is a tangible, achievable feature and I don't think it should be forced to wait on a global contextual system; if and when such a system were introduced, it does not look like there would become any harder to change the filtering system over to such a system with or without this patch.

Plus, what about filtering comments? Maybe we need to pass the $comment object along, so filters can operate on that. And shouldn't we then also include the $node that the comment is replying to (remember, node_load is a db query)? The list goes on

This problem is solved by allowing modules to specify the contextual data they want in the context callback. If they need a comment's node, they can pass it along.

I've thought about defining what the filter system should and should not handle. I don't think it should handle only transformations unknown to the user, but I do think that it should only occupy itself with transforming the actual content that was typed: adding links, glossary tips, paragraph tags, smileys, etc. These all replace what the user typed with a more extensive /equivalent/.

I think it's a perfectly legitimate use case of a filtering system to know some contextual data. Filters do this all the time - the pirate filter, for example uses the contextual data of whether or not it's September 19th to change the way it filters. A filter that replaces "~~~" with the author's username and link to the author's page would be relying on contextual information, but would be perfectly legitimate.

I don't think this is all of Steven's objections, but that's at least most of what was on the other issue sun linked to.

This patch should fix at least some of the failures from earlier.

AttachmentSizeStatusTest resultOperations
filter-context-226963-38.patch26.84 KBIdleFAILED: [[SimpleTest]]: [MySQL] 29,939 pass(es), 22 fail(s), and 49 exception(es).View details | Re-test

#39

Status:needs work» needs review

#40

Status:needs review» needs work

The last submitted patch, filter-context-226963-38.patch, failed testing.

#41

I've got 2 concerns / questions about this system:

  1. The context variable passed into each of the filters has the total context for all the filters. It's not a setting/info for the filter but a context for all filters being applied to the text. Should it function this way or allow for each filter to build its own context?
  2. This system opens a potential to blow up the filter caching system. For example, fields cache the text of body on a node. What about the case where a filter adds the node object to the context on a site where nodes get a fair number of comments. For every variation in the node object (not the fields value) there will be a new row in the filter cache. This would be nice for the case were you use a part of the node object on the filter. But, it makes keeping the cache clean difficult and creates a high potential to make this table full of not only a lot of rows but many that aren't being used.

    Right now when text is updated the cache key for the filter is the same so it replaces the old cached row with a new one. With this change, and figuring a node entity is attached, the updated value would change on the node. This would change the cache key. So, the old cached row would remain and a new one would be added.

    For sites with much activity there is quite a potential for this setup to cause the filter table to grow in size quite a bit. Imagine a site like Drupal.org and the number of comments on a node. We would have to be very careful about filter selection and the contexts included to make sure the cache_filter table/bin doesn't grow to many times the size it is now.

    I'm not opposed to the change. We just need to make warning signs so people know they are jumping off a cliff and for the lemmings that follow that first person to also know there is a cliff ahead.

    There are only two hard things in Computer Science: cache invalidation and naming things.
    -- Phil Karlton

#42

Status:needs work» needs review

To address mfer's questions / concerns:

1. That's an interesting question. I'm honestly not sure which is better, but I did it this way because I felt there was more harm in providing too little information to the hook than providing too much. In addition, the langcode is added to the context, which all filters may need to check to function properly, and is logically grouped with contextual data. I'm open to changing this, though; it's not very hard to switch the way this works.

2. Good point. I've added a cache_clear_all() call to check_markup so that no more than one entry will ever exist in the cache_filter bin for any given $text. That is, if the contextual data changes, the cached filter entry will be cleared for that text, and replaced by a new one for the new contextual data.

This patch will hopefully pass tests.

AttachmentSizeStatusTest resultOperations
filter-context-226963-42.patch30.77 KBIdleFAILED: [[SimpleTest]]: [MySQL] 29,730 pass(es), 64 fail(s), and 51 exception(es).View details | Re-test

#43

Forgot to add files.

AttachmentSizeStatusTest resultOperations
filter-context-226963-43.patch32.43 KBIdleFAILED: [[SimpleTest]]: [MySQL] 29,699 pass(es), 44 fail(s), and 37 exception(es).View details | Re-test

#44

FWIW, the filter cache was nearly removed at #569238: Make check_markup() not cache by default and probably will be removed in D8. The idea is that Field API does sufficient caching of its own.

#45

Status:needs review» needs work

The last submitted patch, filter-context-226963-43.patch, failed testing.

#46

Status:needs work» needs review

One more try.

AttachmentSizeStatusTest resultOperations
filter-context-226963-46.patch32.45 KBIdlePASSED: [[SimpleTest]]: [MySQL] 29,937 pass(es).View details | Re-test

#47

subscribing. this would make #1268116: WYSIWYG does not record file usage a piece of cake.