Comments

Assigned:Unassigned» Wim Leers

Let's get this issue going.

@effulgentsia suggested in #1782838-38: WYSIWYG in core: round one — filter types to use flags instead of constants. I commented on that in #1782838-39: WYSIWYG in core: round one — filter types, @chx said in #1782838-41: WYSIWYG in core: round one — filter types and [##1782838-43: WYSIWYG in core: round one — filter types that we'd still have to do a fair amount of bikeshedding to get to a point where everybody agrees on the flag names and @Crell said in [##1782838-49: WYSIWYG in core: round one — filter types that we really need to get rid of the current constants.

To hopefully give a complete, concise and clear big picture of the current filter types approach, please read this change notice: http://drupal.org/node/1817474. Then let's move this forward :)

Issue tags:+plugins

I think the invisimail example in the change notice clearly explains why this is all unclear. :-) Also, I could see code that can reversibly transform Markdown or similar. (Regex for li tags, turn them into *. etc.) So the idea that those four cases are definitive is wrong, IMO. At minimum they should be bitflags.

For more background on why a fixed non-binary list of options is a problem, see Eaton's talk from DrupalCon DC (good API design) or my talk from DrupalCon Chicago (Aphorisms of API Design). Short version: It always, always bites you in the butt when two different people try to extend your list.

Of course, the best solution is to convert filters to plugins, which means they could have an arbitrary number of isWhatever() methods on them or some other more robust interface. Filters have long been discussed as a good use case for plugins; just no one has gotten around to doing that conversion yet. :-)

I'd say Markdown is the one example that is provably irreversible. Markdown allows you to mix regular HTML with Markdown syntax. You could never know if <strong>this is strong</strong> was originally **this is strong** or <strong>this is strong</strong>. But that's besides the point, really.
I agree that the list should be extensible. At the same time, I think we should get as close as possible to a definitive list.

Can converting filters to plug-ins be a post-feature freeze thing?

Where is the invisimail example unclear? I think that clarifying the language around which filter type to use when can be a sufficient solution?


Overall, the facts are really simple: the filter system allows you to do anything. That's great, except when you need to be able to know in advance what the filter can do. That's why it's probably impossible to make this absolutely, 100% clear: the filter types stuff is an afterthought instead of part of the design. The filter system's current design? "Apply a series of functions that transforms an arbitrary string of unicode characters into another string of unicode characters."

Imagine the more organized case: a 3-stage filter system: stage 1 is a *single*, *optional* "non-HTML mark-up language to HTML" conversion (not a "filter", but a "markup_language"), stage 2 is an (un?)ordered list of transformations (not a "filter", but a "transformation", though this is very close to the main purpose of "filter"s), stage 3 is built in to the system and allows the user to define a list of allowed tags and for each tag, a list of allowed attributes. That would result in better control of the eventual HTML, which is something we'd eventually need anyway if we want all HTML to be as optimal as possible.

Because, after all, why don't we enforce HTML output? The filter system is only used to generate HTML. So why don't we enforce that? Are there use cases where we don't want to generate HTML? If so, which ones, and why do they need the filter system at all?

(For the record: this is not meant to be an attack, or a rant. It's intended to be a genuine question.)

Right now we don't have any non-HTML filter use cases that I know of. However, if we start sending out JSON-LD strings a lot more as part of WSCCI/Entity/Serialization fun, those will often need filtering as well, and overlapping filtering. The body of a field in a JSON string may be serialized HTML, but it may also be sanitized some other way. Eg, invisimail would likely still be needed for email addresses in JSON-LD strings (but the Javascript option wouldn't be usable, only the entity encoding), and turning public://myfile.png into http://www.example.com/sites/default/files/myfile.png would still need to be done.

A more rigid pipeline is an interesting approach. That may help clear up the constant confusion over which filters need to run before/after some other filter to avoid breaking. Right now, that's trial and error or black magic. That could also justify and clarify a fixed, non-changeable list of possible types.

My understanding is that a straight port of system X to plugins is on the table for Slush.