Support for Drupal 7 is ending on 5 January 2025—it’s time to migrate to Drupal 10! Learn about the many benefits of Drupal 10 and find migration tools in our resource center.
Problem/Motivation
#1809702: WYSIWYG: Add Aloha Editor to core adds support for WYSIWYG editing of core text fields! A requirement of that issue is to add two new capabilities to the filter system:
- Have check_markup() run some, but not all, text filters when sending text to a WYSIWYG editor. In particular, we want to run all filters that protect against XSS exploit or otherwise limit the allowed HTML. But we do not want to run token expansion or similar kinds of transformation filters, instead leaving it up to client-side WYSIWYG plugins to handle those.
- Enable the Aloha module (or alternate contrib module) to determine when to not enable WYSIWYG editing for a text field. For example, some filters (e.g., Markdown) are (to varying degrees) incompatible with WYSIWYG editing.
This issue is about addressing the Filter system changes needed to support the above.
Proposed resolution
With great help and inspiration from #807996: [meta] Input filters and text formats, this patch proposes adding a 'type' key to hook_filter_info() information that must be set to one of the following:
- FILTER_TYPE_MARKUP_LANGUAGE: The filter converts something that's not HTML to HTML in a way that is not compatible with WYSIWYG editing.
- FILTER_TYPE_HTML_RESTRICTOR: The filter restricts the HTML allowed, for example, to protect against XSS.
- FILTER_TYPE_TRANSFORM_REVERSIBLE: The filter performs a transform for which a WYSIWYG plugin exists to perform the same transformation (and its reverse) client-side. For example,
<img data-caption="Druplicon">
may be (reversibly!) transformed to<figure><img><figcaption>Druplicon</figcaption></figure>
. - FILTER_TYPE_TRANSFORM_IRREVERSIBLE: The filter performs a transform for which a WYSIWYG plugin does not exist to perform the transformation client-side (especially, the reverse of it); instead, the WYSIWYG editor displays the unfiltered text. For example, Token Filter.
User interface changes
No direct UI changes.
API changes
- New function
filter_get_filter_types_by_format()
- New optional $filter_types_to_skip parameter in
check_markup()
- New required key 'type' in
hook_filter_info()
Caveats
- A design goal for this issue was to classify filter types as generically as possible, rather than focus on just a WYSIWYG (or particular WYSIWYG editor) use case (see #37). However, a perfect decoupling of type from use case has not been achieved. Whether a filter is FILTER_TYPE_MARKUP_LANGUAGE, FILTER_TYPE_TRANSFORM_REVERSIBLE, or FILTER_TYPE_TRANSFORM_IRREVERSIBLE depends on the desired WYSIWYG experience, and the WYSIWYG plugins available (see #38). Other ways of classifying filters were explored in this issue, but none were considered substantially better.
- When using a WYSIWYG editor, any filter of type FILTER_TYPE_HTML_RESTRICTOR runs both on "output" (from server to browser) and on "input" (by the WYSIWYG editor prior to submitting the form). This results in data loss (if disallowed HTML is in the text to begin with, the process of editing it in WYSIWYG strips it from what is then saved back). But, we think that losing disallowed HTML is a reasonable and logical consequence of WYSIWYG editing.
Comment | File | Size | Author |
---|---|---|---|
#58 | drupal_wysiwyg-in-core-round-one-1782838-58.patch | 14.47 KB | Wim Leers |
#46 | interdiff_32_45.txt | 19.34 KB | chx |
#45 | 1782838-45-wysiwyg-round-one.patch | 14.09 KB | Lars Toomre |
#45 | interdiff-1782838-44-45.txt | 5.5 KB | Lars Toomre |
#44 | drupal_wysiwyg-in-core-round-one-1782838-44.patch | 14.31 KB | Wim Leers |
Comments
Comment #1
Wim LeersAnd … patch!
It is possible to test in combination with the 8.x version of the Aloha module. Steps:
- apply the above patch
- perform a fresh D8 install, with the standard install profile
- apply the #1664602-4: Allow attributes to be passed to drupal_add_[css|js] (SRI) patch
- install the 8.x Aloha module: http://drupalcode.org/project/aloha.git/shortlog/refs/heads/8.x-2.x
Now Aloha Editor should appear wherever you have a processed text field: in the comments, on your node body, etc.
(Pushing Aloha 8.x code right now, it needs a circular reference to this issue… :))
UPDATE: Aloha 8.x code online — relevant issue: #1782326: Port Aloha 7.x-2.x to Drupal 8 (i.e. a 8.x-2.x version).
Comment #2
Wim LeersComment #3
Wim LeersAfter planning everything so carefully, of course I forget to add the new file with new unit tests to the patch.
Comment #5
Wim LeersLooks like testbot is unable to deal with the newish method of adding new files in diffs (http://stackoverflow.com/a/857696).
Trying the old skool way.
Comment #7
Wim LeersFirst failing test can't be us. Second failing test fixed. (One-line change.)
Comment #8
chx CreditAttribution: chx commentedjust return $filter->status;
if (!$filter->status) perhaps?
if (!isset($result). I must say this needs a ton of comments. I am happy for using array_reduce, heaven knows I am, but you must realize this is the first reduce-with-lambda in core so it needs a bit more explanation.
Why all this dance? Why can't we just default filter_types_to_skip to an empty array? I never see a === FALSE check . It'd simplify the code and the doxygen.
So now the security filter is the last and so we need to enable p and br? Weird-ish, why this wasn't a problem before?
Comment #9
sun@Wim Leers: Can we split the "filter types" chapter into an own issue, please? There's a long backlog of issues related to that, and I'd like to make sure that we're taking all prior art, considerations, and concerns properly into account. (I'll try to collect issues)
Comment #10
larowlanComment #11
webchickThis is all plumbing-related stuff so far, so I'm not sure we really need accessibility review on this particular patch. But of course, accessibility review on http://drupal.org/project/aloha is very welcome and encouraged. We're tracking a few issues in the Edit module queue, which can be moved over here when/if necessary.
Comment #12
larowlanRemoving tag, sorry for being over-zealous - we're seeing some AU Govt agencies starting to enquire about ATAG compliance so I'm being vigilant, but perhaps too much in this case. Will wait for round two.
@mgifford asked the question here https://getsatisfaction.com/aloha_editor/topics/how_accessible_is_aloha but not response yet.
Comment #13
Wim Leers#8:
filter_autop
andfilter_autourl
filters, which both generate HTML. In the issue summary, I've explained why that is a problem for WYSIWYG editing.<p>
and<br>
were indeed not allowed before, even though they implicitly were allowed! If you'd use those tags yourself in the "Filtered HTML" format, they'd be filtered away, but they'd be added by thefilter_autop
filter, which is executed later.#9: Without that, this issue would just be about "allowed tags", which is a tiny scope. I'd prefer to keep all API changes to the filter system in one patch, because they relate to the same problem space (as explained in the issue summary).
If you feel strongly about this though, I'll of course do that. I'm not aware of a long backlog, I only know about #807996: [meta] Input filters and text formats and the other issues that one links to.
#12: AE's accessibility is being discussed over at https://github.com/alohaeditor/Aloha-Editor/issues/590. I've just had a call with the AE guys two days ago, and they're arranging to hire Everett Zufelt as a consultant to ensure accessibility is top-notch. They actually want to go further than any other WYSIWYG editor in terms of accessibility :) You're welcome to join the discussion there!
Comment #15
Wim Leers#13: drupal_wysiwyg-in-core-round-one-1782838-13.patch queued for re-testing.
Comment #16
Crell CreditAttribution: Crell commentedWim, slap me if this is too far off topic, but one long-standing gripe with the filter system is that it's global, rather than per-field. Certain fields should be plain text only, regardless of the user, while on others an epic WYSIWYG makes sense, while on others you want just a limited set of manual tags. That has always been missing in core; filters are either all-on or all-off, and user-based access controlled. In D5 and D6 there was the filerbynodetype module (maintained by me) that sort of worked for body fields; in D7, we finally just recently got that functionality into better_formats, per-field.
It seems like that capability dovetails with what's being done here, and is an implication of it; certain filters only make sense in certain situations, one of which is "will there be a WYSIWYG on it", but another is simply "does the business case even exist for this filter to even be possible here". Is that something that makes sense to incorporate in this work, or is it actually separate? (It feels like it's part and parcel here, but but I could be wrong.)
Comment #17
Wim LeersAFAICT that's a separate issue: this one is about providing plumbing that is absolutely necessary for "true WYSIWYG" editors to do their work, no matter if it's on a per-format or per-field basis. The changes you're suggesting have — again, AFAICT — significantly more far-reaching implications (and API changes).
Comment #18
geerlingguy CreditAttribution: geerlingguy commentedI use the Wysiwyg Linebreaks module in tandem with a Wysiwyg editor to get text to and from an HTML-like format; it was originally adapted from JS used by Wordpress for their Wysiwyg integration (don't know if they still use it anymore) to make sure that when you switch from a plain textarea with no tags to a WYSIWYG (which requires
<p>
,<br />
and the like), and optionally, back to a plain textarea, the text line spacing and readability is preserved.This is a bit overkill for a WYSIWYG-as-the-default-and-enabled option, of course (not sure if that's what's being proposed, but it seems so), since end users will practically never see the actual plain text of the edited area... but I like being able to turn off Wysiwyg editors for certain roles, and typing without tags (in lieu of Markdown) is certainly more fun than having to remember all the boilerplate markup.
Comment #19
webchickTagging.
Comment #20
Bojhan CreditAttribution: Bojhan commentedCould you expand on the UI implications of this? I'd probally also be helpful if this got some reviews, perhaps we can do a g.d.o/core post?
Comment #21
webchickThere are no UI implications of this, yet. This is the filter system foundations that are required to support WYSIWYG editors in core.
I guess we could try g.d.o/core, but IMO there are only a select few people who have the background to review this particular patch. We've reached out to all of them. They are all swamped with other D8 things they are working on. :( We've now asked effulgentsia to dig in.
Comment #22
Wim LeersTracking HEAD.
Comment #23
Bojhan CreditAttribution: Bojhan commentedSeems like this new patching round should get a review from the maintainer of filter.module.
@sun Could you take a look at this?
Comment #24
effulgentsia CreditAttribution: effulgentsia commentedOverall, I think this is great, and makes a lot of sense.
Why make the value of a constant into a string? Why not use numbers like we do elsewhere?
Let's resolve this @todo. I think we should either throw an exception when no type is specified or make one of our types the default.
Ick. Why hard-code the format of the setting? Can we instead change 'allowed tags setting' to 'allowed tags callback' to let each filter that wants to restrict tags decide how to return the array of tags?
Is this relevant to this issue?
I'm not sure about removing these prior to adding aloha.module. I would bundle it into that patch. Otherwise, this patch is a UX regression: standard install gives you non-wysiwyg input without linebreak assistance. Or, maybe a separate patch along the lines of #18, where we replace these with a JS solution for non-wysiwyg.
Comment #25
Wim LeersFixed. Assimilated system.module.
Fair enough :)
Throwing an exception is the simplest solution. Assuming a default makes upgrading easier. But also less precise. The best default we can do is default to FILTER_TYPE_TRANSFORM_TEXT, because it is essentially the "worst case assumption": anything would essentially qualify under this. But it prevents us from reasoning correctly about the filters in a text format: we may default a filter to FILTER_TYPE_TRANSFORM_TEXT, but what if it's actually a HTML-generating filter such as Textile or Markdown? Instant incorrect reasoning. Potentially even worse: imagine somebody has written a special purpose "security" filter. Hence it should never not run. The default classification would prevent it from not running. No matter what default you choose, other things always break down.
The only conclusion possible is that throwing an exception is the only valid solution.
Excellent point. I tried to keep changes a minimum, but you're right that that is a poor design.
Fixed. Assimilated filter.module's process and settings callbacks.
Agreed. Removed from the patch.
Correspondingly, also removed the changed default settings for the
filter_html
filter: the<p>
and<br>
tags that were added to the default list of allowed tags, have been removed.Yes, but not anymore. It is necessary to make the tests pass when the
filter_autop
filter is disabled, the RDFa module's tests assume a<p>
tag will magically appear. When thefilter_autop
filter is no longer part of the default filters, that assumption no longer holds true.By undoing the filter system changes in the standard install profile (the previous point addressed, your last point), however, this is no longer necessary. Removed from the patch.
Comment #26
Wim LeersDarn :( Careful preparation, but then I accidentally include another patch: ignore the changes in common.inc (they're from #1664602-4: Allow attributes to be passed to drupal_add_[css|js] (SRI)).
Reroll to deal with that. I also noticed that I'm calling
$filters_metadata
what's called$filters_info
elsewhere. I wanted to do that in a separate reroll to make the above patch about the *actual* changes, not renaming things.Comment #28
Wim Leers#26: drupal_wysiwyg-in-core-round-one-1782838-26.patch queued for re-testing.
Comment #30
effulgentsia CreditAttribution: effulgentsia commentedThanks. This looks ready to me, except for some minor nits and questions below, and since it's marked as critical priority, I'm tempted to RTBC. However, not doing so yet, to give time for others to review, especially @sun, given the second sentence of #9.
Extraneous newlines.
For this and the other constants, I like this information, but are we okay with coupling this to wysiwyg like this? Personally, I'm okay with that, but pointing this out for others who might have alternate ideas.
I don't think this statement belongs here. It's up to the filter to decide how to balance robustness with code readability with performance.
Is it okay to mention a contrib module like this? Is it okay to use curly quotes in code files? Is it okay to mention fancycar.com, a commercial website? Hoping people more familiar with our docs standards can answer that.
Do we actually need to distinguish FILTER_TYPE_TRANSFORM_DOM and FILTER_TYPE_TRANSFORM_TEXT at all? Would it be simpler to just have FILTER_TYPE_TRANSFORM? Their server-side handling is identical. The only difference is that DOM allows JS to either emulate or not and TEXT doesn't allow that, but why wouldn't there be use cases of JS emulating text transformations (e.g., formatting dates to user's locale)?
I'm not sure if this is needed at all, but if it is, it should go above the constants into a @defgroup. See "@defgroup block_caching" in common.inc for an example.
One newline would suffice.
Comment #31
David_Rothstein CreditAttribution: David_Rothstein commentedHigh level review. I have several questions:
* List item
into<li>List item</li>
) as a generator, but Typogrify (which does things like transformWYSIWYG
into<span class="caps">WYSIWYG</span>
) as a transform filter, and I don't immediately see the difference.The documentation also doesn't explain what "allowed tags" means; is it "allowed on input" (as in, an
<img>
tag entered by the user will ultimately be displayed), or "allowed on output" (as in, the text format is capable of producing<img>
tags in the end, although not necessarily arbitrary ones entered by a user directly)? I assume it's supposed to be the former.A long time ago, Gábor Hojtsy came up with a simple way to allow the WYSIWYG editor to reconfigure itself like this. (I take about 2% credit for the idea since he was inspired to think of it after seeing a slide in a talk I gave, but it was pretty much all him :)
Basically, the idea is for the WYSIWYG configuration code to treat the filter system like a black box, almost as if it's performing a unit test on it. For each WYSIWYG button, we know what kind of output it produces and how that's supposed to be displayed in the end, so we can just test it and see if it works. As a simple example, a "Bold" button could call
check_markup('<strong>test</strong>', $format)
and see if it returns something functionally equivalant to what was put in. If it does, then the button is compatible with the text format and can be added to the WYSIWYG.I'm not sure what happened to that idea or if it ever made it into the issue queue, but it always made sense to me and doesn't require the WYSIWYG configuration code to have deep knowledge of how the filter system works.
Comment #32
Wim Leers#30:
Thanks for *not* setting this to RTBC. I wouldn't be comfortable with this being committed without at least having @sun's blessing. Further, David_Rothstein clearly did a stellar review, raising many good points :)
Fixed.
These should be removed before committing; I only added them to make the patch review easier. Only the first line of the comment block before each
const
definition should remain. I removed the long, detailed comments in the attached reroll.See my reply to #31, point 1.
#31:
You and @effulgentsia hit on a good point: why the hell is such an implementation detail mentioned? For two reasons: 1) I couldn't think of a better name, 2) the "DOM vs. text" thing is explicitly present in #807996: [meta] Input filters and text formats.
First: why are they separate? The assumption is that DOM-based transformation filters are *reversible* ("reliably reversible transformations"), whereas text transformation filters are *irreversible* ("text-based, irreversible transformations").
Second: better names then are FILTER_TYPE_TRANSFORM_BIDIRECTIONAL and FILTER_TYPE_TRANSFORM_UNIDIRECTIONAL (for the _DOM and _TEXT ones, resp.). Alternatively: FILTER_TYPE_TRANSFORM_REVERSIBLE and FILTER_TYPE_TRANSFORM_IRREVERSIBLE.
Either of those alternative naming schemes communicate far more clearly what's happening: it doesn't matter whether you implement it using DOM traversal or a simple
str_replace()
, what matters is the reversibility.To describe it more formally, the following only holds true for FILTER_TYPE_TRANSFORM_REVERSIBLE filters:
x = 'foobar'; t'(t($x)) == $x
. It must not just be "somewhat reversible", it must be "perfectly reversible", in that you get the exact same result back if you reverse a reversible transformation. If that's not the case, it's an irreversible transformation.I'm going to call them _REVERSIBLE and _IRREVERSIBLE in the remainder of my reply.
With the above information regarding the FILTER_TYPE_TRANSFORM_* filter types in mind, your question is still valid, but now it's about FILTER_TYPE_HTML_GENERATOR vs. FILTER_TYPE_TRANSFORM_IRREVERSIBLE. E.g.: it is impossible to reliably translate back e.g. HTML to Markdown, because Markdown allows one to mix HTML and Markdown, hence it is is irreversible. I think it's possible to define a format that can be transformed to HTML and transformed back to the original mark-up. But this is not true for virtually all formats out there (they're usually a simplification of HTML, after all).
You could indeed regard FILTER_TYPE_HTML_GENERATOR as pointless. Markdown, Textile, PHP, etc. could be considered FILTER_TYPE_TRANSFORM_IRREVERSIBLE. If there is a magical mark-up language that can be transformed into HTML and back again, then it would indeed be possible to use a HTML WYSIWYG editor on this.
So, from the theoretical POV, I agree.
From the practical POV, however, we deal with FILTER_TYPE_HTML_GENERATOR and FILTER_TYPE_TRANSFORM_IRREVERSIBLE differently. FILTER_TYPE_TRANSFORM_IRREVERSIBLE filters are assumed to be non-essential and thus skippable. FILTER_TYPE_HTML_GENERATOR filters are assumed to be essential and thus non-skippable.
In other words: for text formats with FILTER_TYPE_HTML_GENERATOR filters, we're not working with HTML at all if we skip those filters, and thus it's something that's not editable in a "true WYSIWYG" editor (which is by definition HTML-based, since the language of the web is HTML itself). However, without FILTER_TYPE_TRANSFORM_IRREVERSIBLE filters, you still have a piece of HTML, albeit possibly without some niceties.
Note: it is assumed that text formats *without* FILTER_TYPE_HTML_GENERATOR filters already contain HTML (which is a reasonable assumption since Drupal is a *web* CMS, meaning that it outputs HTML) and thus *are* editable in a WYSIWYG editor.
So, we could rename FILTER_TYPE_HTML_GENERATOR to FILTER_TYPE_TRANSFORM_IRREVERSIBLE_ESSENTIAL. Then we'd have:
Maybe it's a misnomer. But I think it's sufficiently explicitly stated what such a filter is about: "strip HTML tags that the user MAY NOT use". It may be a misnomer because there are so many security aspects (e.g. CSRF) that are not so extremely simplistic as "disallow certain HTML tags".
FILTER_TYPE_HTML_TAG_RESTRICTOR may be a better name?
Regarding the special treatment for FILTER_TYPE_SECURITY/FILTER_TYPE_HTML_TAG_RESTRICTOR filters: even when doing WYSIWYG editing, 1) we want to prevent that potential security threats (e.g.
<script>
tags) are not dealt with because FILTER_TYPE_SECURITY/FILTER_TYPE_HTML_TAG_RESTRICTOR filters did not run, 2) we want to make sure users are seeing a realistic representation of what their content will eventually (i.e. when "completely" filtered) look like, so the same set of tags should be allowed.Hopefully it's already clear to you by the time you reach this part of my reply.
In 8 words: to prevent crappy HTML from polluting our contents.
In a nutshell: WYSIWYG editing is about editing HTML, but to this day, many (most?) pieces of content in Drupal are not actually HTML, they're some derivative of it, which is then transformed into HTML. We must of course maintain our "on output filtering" approach, to guarantee security etc. Imagine the simple case of e.g. the extlink filter or some "text link ads" filter: without skipping those filters, their output would be saved into the database. The HORROR! Hence we want to be able to skip inessential, irreversible transformations (FILTER_TYPE_TRANSFORM_IRREVERSIBLE_INESSENTIAL).
Most excellent point! To which I don't have a good answer. WYSIWYG editors typically offer you a plethora of buttons that you can then configure. The idea is that we automatically configure the WYSIWYG editor as much as possible: if the
<em>
tag is disallowed by the text format, then we don't show a button for it, etc. But indeed, for e.g. anchors, it makes a lot of sense to configure which attributes are allowed, and thus whether the WYSIWYG editor should provide a UI for it. (And in fact, Aloha Editor supports this, at least to some degree.)IMVHO, it's excellent follow-up issue material.
Good find! I should have documented this better (though it is documented implicitly in hook_filter_FILTER_allowed_tags() in filter.api.php), but: the "allowed tags" callbacks are only necessary for *some* filters of the type FILTER_TYPE_SECURITY. Or, as I've stated above: FILTER_TYPE_SECURITY should probably be renamed to FILTER_TYPE_HTML_TAG_RESTRICTOR. In that context, it makes a lot more sense that only then you may need "allowed tags" callbacks.
But in case you'd have an imaginary
filter_html_escape
filter that *is* of the is of the FILTER_TYPE_HTML_TAG_RESTRICTOR type, then it should be indicating that it allows *no* HTML tags. I.e., it should return the empty array. In that case, the logic in filter_get_allowed_tags_by_format() should work fine (it uses array_intersect() after all).It is indeed "allowed on input". As before, filters can essentially do what they want, so they could indeed insert an image tag even though it is not explicitly allowed.
VERY interesting idea! :)
With that, it would indeed be theoretically possible to get rid of the "allowed tags" callback. Do you see any downsides of going down that path?
In the reroll of the patch:
Comment #33
effulgentsia CreditAttribution: effulgentsia commentedThanks, Wim. I'm still not clear on why we want to identify any difference between FILTER_TYPE_TRANSFORM_REVERSIBLE and FILTER_TYPE_TRANSFORM_IRREVERSIBLE_INESSENTIAL. As far as the filter system is concerned, their behavior is identical: do not apply them in PHP when sending text to wysiwyg. Maybe you're imagining some future code that would need to distinguish between these two? If so, what's that code you're imagining?
Just brainstorming a bit, but if the only meaning of these filter types is in relation to wysiwyg, would it be more direct and clearer to call them as such? E.g, instead of $filter_info['type'], make it $filter_info['wysiwyg'], and make the constants FILTER_WYSIWYG_DISALLOW, FILTER_WYSIWYG_APPLY, and FILTER_WYSIWYG_SKIP?
For check_markup(), what if instead of passing it $filter_types_to_skip, we instead overload $format_id to be either a string id, or a $filters array? We do stuff like this in Field API (e.g., the $display parameter in field_default_prepare_view()). Then, whatever code is integrating a wysiwyg editor can be responsible for calling check_markup() with the set of $filters it wants applied.
Finally, I like David's / Gabor's idea of getting rid of allowed tags callback from this issue. It will narrow the scope of this issue to just one thing, and wysiwyg testing check_markup() directly seems an option worth exploring in the aloha module. I like how that approach can test attributes as well as tags. If it turns out to not work well for some reason, we can add 'allowed tags callback' at that time.
Comment #34
Sylvain Lecoy CreditAttribution: Sylvain Lecoy commentedThe downside with this idea is that it does Not scale at all. Imagine you have to run n times the format function, which you don't know about the complexity at all. So the simple rendering of an editor can possibly take ages because of this phase of init.
Comment #35
effulgentsia CreditAttribution: effulgentsia commentedWhat the editor needs this for is to determine which toolbar buttons to hide/show. It can generate a test string that covers every toolbar button, call check_markup() once per format, and cache the result. It's possible we'll discover a performance limit, but I don't think we have enough evidence of it at this time to make the idea not worth trying.
Comment #36
David_Rothstein CreditAttribution: David_Rothstein commentedRight, the only time this code would need to run is when someone is changing either the WYSIWYG setup or text format setup, which should be pretty rare administrative operations. It would not need to run every time the WYSIWYG is displayed.
I'm not even sure it's conceptually a "cache" as much as it is "configuration" (in the sense that modules like http://drupal.org/project/wysiwyg already provide a user interface for administrators to configure which WYSIWYG buttons to display; this would basically just be additional data which restricts the choice and/or allows them to be configured automatically).
Haven't had time to read through and digest the other comments above yet - just wanted to pop in to help clarify the performance issue.
Comment #37
Wim Leers#33:
FILTER_TYPE_TRANSFORM_REVERSIBLE is essentially "WYSIWYG-compatible": because it is reversible, there can be a JS implementation of these filters that at least can give an indication of what the end result would look like. Upon saving the content, however, the changes made by the filter would be undone by the JS, so that you again end up with your source mark-up, the one that may be saved to the DB.
FILTER_TYPE_TRANSFORM_IRREVERSIBLE (later in #32 suggested to be called FILTER_TYPE_TRANSFORM_IRREVERSIBLE_INESSENTIAL) is by nature "WYSIWYG incompatible": it is irreversible, so if it would be applied in JS, there would be no way to go back to the source mark-up, which we need to be able to save it to the DB without saving any cruft that may be generated by the filter.
This is what I said in the OP, strong emphasized lines indicating the relevant parts:
That's what I specifically wanted to avoid! :)
Why avoid this? Because different WYSIWYG editors may have different needs; a specific filter may come with JS support for WYSIWYG editor A but not B. That's why I tried to come up with a categorization in the first place: so that modules can *reason* about the filters used by text formats.
Thoughts?
Comment #38
effulgentsia CreditAttribution: effulgentsia commentedFirst off, I just want to say thank you to Wim and everyone you worked with (sun, quicksketch, others) in thinking through this problem space. As identified back in quicksketch's DrupalCon London presentation and probably well before then by others as well, Drupal's "apply all filters on output all of the time" approach is fundamentally incompatible with a good wysiwyg experience. To solve this, we need a richer language for describing and working with filters. And a lot of good thinking has gone into identifying the 4 types of filters described in the issue summary and in the patch. Given both my difficulties in understanding the nuances in how each filter maps to the type that it does, and David's observations in #31.1-#31.3, I think we can still improve the terminology or documentation, but I think that's minor relative to the excellent analysis you did getting us here.
Wim and I chatted about #37, and he helped clear up one point of confusion for me. For FILTER_TYPE_TRANSFORM_REVERSIBLE:
WYSIWYG use case: these filters MUST NOT be applied when feeding a piece of text into the WYSIWYG editor. Instead, they MAY be re-implemented in JavaScript for each supported WYSIWYG editor.
And for FILTER_TYPE_TRANSFORM_IRREVERSIBLE:WYSIWYG use case: these filters MUST NOT be applied when feeding a piece of text into the WYSIWYG editor. Furthermore, they MUST NOT be re-implemented in JavaScript.
So my confusion here is, given the MAY in the first part (which implies a "may choose not to"), if I have a filter for which I have not supplied a JavaScript implementation, how do I know whether to set its type to FILTER_TYPE_TRANSFORM_REVERSIBLE or FILTER_TYPE_TRANSFORM_IRREVERSIBLE? Am I stuck in a purely philosophical question (this filter is "in priciple" reversible vs. this filter is "in principle" irreversible), or is there any practical consequence to the choice? Wim's conclusion was that this should not be a philosophical question, but a practical one: that if you have not actually supplied a JavaScript implementation, you should set the type to FILTER_TYPE_TRANSFORM_IRREVERSIBLE, and if some other module ends up supplying a JS implementation on your behalf, then it can alter the type to FILTER_TYPE_TRANSFORM_REVERSIBLE.With that out of the way, I'm more comfortable having a total of 4 types rather than the 3 I was suggesting earlier.
Next, I tried to think through #31.2: what's the difference between a "generator" and an "irreversible transform". #32 answers this by invoking a concept of "essential" or "non-skippable". In other words, if showing Markdown syntax within a wysiwyg editor is deemed a bad experience, then it should be set to FILTER_TYPE_GENERATOR, with the result that wysiwyg cannot be used for text containing Markdown. OTOH, if showing Markdown syntax within a wysiwyg editor is deemed ok, then it should be set to FILTER_TYPE_TRANFORM_IRREVERSIBLE, with the result that wysiwyg can be used, but any Markdown text is shown as Markdown text not as "what you see is what you get". But then, if some module comes along and implements a JS markdown to html (and back) transformer, then it can set the markdown filter to FILTER_TYPE_TRANFORM_REVERSIBLE, allowing true wysiwyg editing that's automagically converted back into Markdown for database storage. This fluidity of how to classify Markdown based on what code is available and what user experience is desired makes me question the approach of trying to assign semantic names to filter types.
So, brainstorming a little, I came up with an idea of filter flags:
- FILTER_FLAG_REQUIRED_ON_OUTPUT = 0x01
- FILTER_FLAG_POSSIBLE_ON_INPUT = 0x10
I'm still tentative on these names (feedback welcome), but the idea being that each combination of these two flags correlates with one of the 4 filter types in #32:
- 0x00 replaces FILTER_TYPE_TRANSFORM_IRREVERSIBLE: wysiwyg possible but not "true"
- 0x01 replaces FILTER_TYPE_GENERATOR: wysiwyg not possible
- 0x10 replaces FILTER_TYPE_TRANSFORM_REVERSIBLE: true wysiwyg possible
- 0x11 replaces FILTER_TYPE_RESTRICTOR: true wysiwyg possible, filter (e.g., tag stripping) is applied both server-side and client-side.
I don't know if this moves us closer to or farther away from clarity, and usefulness to contexts other than wysiwyg. Thoughts?
Comment #39
Wim LeersTo me, it initially feels more harder to grasp. But that might simply be because I'm so familiar with the names I came up with myself.
In general, I think your direction is a good one. Your dual binary filter flags approach goes back to the "filter on output" vs. "filter on input" aspect. I think the aspect that makes it hard to grasp is the inclusion of "required" vs "possible" in these flags. How about this:
- FILTER_FLAG_ESSENTIAL = 0x001
- FILTER_FLAG_SUPPORTS_OUTPUT = 0x010
- FILTER_FLAG_SUPPORTS_INPUT = 0x100
However, the second flag is really *always* true, so may want to remove that one. (Unless we start building client-side/JS/WYSIWYG-only filters, of course… Though I can't really think of a use case for that.) Then we'd end up with:
- FILTER_FLAG_ESSENTIAL = 0x01
- FILTER_FLAG_SUPPORTS_INPUT = 0x10
This is identical to what you have, but just with different names, that make more sense to *me* — but that could really just be me. FILTER_FLAG_SUPPORTS_INPUT would still imply that there's a reversible JS implementation: you send it the "source mark-up", the JS can transform it to the intended end-result, but upon saving, it can transform it back to the "source mark-up". Right?
In summary: your proposal is functionally identical, but *might* indeed improve clarity. What's the best/fastest way to figure out if it indeed improves clarity?
EDIT: I just noticed I said that the *first* flag is always true, but I of course meant the second one (FILTER_FLAG_SUPPORTS_OUTPUT).
Comment #40
effulgentsia CreditAttribution: effulgentsia commentedI like your improved flag names.
For starters, comments from some of the 45 people following this issue :)
Comment #41
chx CreditAttribution: chx commentedA consensus is forming around removing the allowed tags callback from this issue. I agree.
wow I never knew restrictor was a word... good to know.
filter_get_allowed_tags_by_format() should move $filters_info = filter_get_filters(); outside of array_filter and use() it in the closure. One also wonders why dont we store this somewhere. It's not like this can change often. Although perhaps that can be done more easily (no separate update path) after filters have been converted to CMI so let's just leave a @TODO for that.
if ($filter_types_to_skip && in_array($filter_info[$name]['type'], $filter_types_to_skip)) <= just run in_array. It costs next to nothing to run an in_array against an empty array.
By now it's a very simple patch that is close , finally.
Comment #42
Wim Leers@chx: Are you implicitly saying that you think the flags approach outlined in #38 and #39 looks good to you?
Comment #43
chx CreditAttribution: chx commentedThe approach yes, the actual flag names are bikeshed and we can happily bikeshed on december 2 until april 1...
Comment #44
Wim Leers- allowed tags callback removed
- filter_get_allowed_tags_by_format() removed
- if ($filter_types_to_skip && in_array($filter_info[$name]['type'], $filter_types_to_skip)) improved as per @chx's suggestions — this was a relic of when $filter_types_to_skip defaulted to FALSE
- I decided to stick with the non-flag-based approach for now, because that's easier to understand for most people than bitwise flags. The flags would need to change anyway, so I might as well go that route instead.
- I renamed FILTER_TYPE_HTML_GENERATOR to FILTER_TYPE_MARKUP_LANGUAGE, which communicates more clearly for what purpose it should be used. David Rothstein correctly argued that it's in fact similar to an irreversible transformation, but I think this communicates more clearly what I originally intended.
- I also renamed FILTER_TYPE_HTML_TAG_RESTRICTOR to FILTER_TYPE_HTML_RESTRICTOR (removed the "tag") because it has been rightly pointed out that some filters may strip attributes instead.
- FYI: I implemented the "blackbox allowed tag testing" as suggested by @effulgentsia in #31 in Aloha 8.x-2.x: http://drupalcode.org/project/aloha.git/commitdiff/fe6178a (#1815246: Stop relying on filter_get_allowed_tags_by_format(), figure out which tags & attributes are allowed through blackbox testing). I've proven it to work, with the sole exception that I had to switch from blackbox testing to "greybox" testing to be able to detect that in Filtered HTML, the "p" tag is not really allowed. (filter_html removed it, but filter_autop added it again, the solution was to use two p tags without separating them with a newline, so that filter_autop's magic wouldn't interfere.)
Patch attached. All tests pass locally.
Comment #45
Lars Toomre CreditAttribution: Lars Toomre commentedI noticed a few documentation and/or coding style issues in #44. The attached patch includes my corrections and inderdiff shows what I changed.
The biggest changes I made relate to #500866: [META] remove t() from assert message. Hence, I removed the t() from assert messages in the new tests.
Comment #46
chx CreditAttribution: chx commentedThanks everyone, IMO this is ready. I missed the interdiff from #44 so here is the 32-45 interdiff .
Comment #47
Wim Leers@Lars: thanks!
@chx: WOOT!
Updated the issue title to reflect the scope.
Comment #47.0
Wim LeersAdding link to Aloha Editor issue.
Comment #48
effulgentsia CreditAttribution: effulgentsia commentedI updated the issue summary and posted a notification to gdo/core.
Comment #49
Crell CreditAttribution: Crell commentedWhile I appreciate the potential DX of "just use a constant flag!", there are at least two problems with this.
1) Constants ought to be bound to a class or interface so that they can be lazy loaded. It doesn't look like there is such a class available at present, but IMO the entire filter system should get turned into plugins later. We may be best off punting on this point for now.
2) Much ink has been spilt pointing out that multi-value constants of this sort are inherently dangerous. They are by design completely not extensible. The minute a contrib module decides that it wants to define filter type 4, and some other module decides it also wants to define filter type 4, you have a fatal error (if you're lucky; you could also end up with data loss).
Unless we are 120% certain that there is absolutely no use case for contrib to ever add to this list, these should be string values, not global constants. (And knowing contrib, I don't think it's possible to be 120% certain about anything.)
(To be fair, that is the case for menu types, for instance. That's not extensible. However, those are also bitflags, not discrete constants.)
Purdy. :-)
This almost makes me thing the constants should be bitflags, not ints. That would actually be more consistent with the use of such constants elsewhere in Drupal, and I think in general.
This should be two separate tests. Merging two tests into a single test method is a moderately large testing faux pas.
Speaking as a module developer (I maintain a filter module, invisimail), it looks like there's not much here that affects me yet other than specifying the type of filter I have. I'm not sure what type invisimail would be, though, especially as, depending on its configuration, it may or may not generate HTML or even Javascript. I would need better guidance for deciding if I should mark it FILTER_TYPE_MARKUP_LANGUAGE, FILTER_TYPE_HTML_RESTRICTOR (since it does affect HTML, but can also generate it), or FILTER_TYPE_TRANSFORM_IRREVERSIBLE (since it's only reversable in some situations, which varies by configuration).
Comment #50
chx CreditAttribution: chx commentedFiled #1816160: Should FILTER_TYPE_* be bitflags strings or stay ints? and #1816162: Make FilterAPITest a Unit Test and split it for Crell's concerns.
Comment #51
webchickThat sounds like a "needs work."
Comment #52
Wim Leers#49: Thanks for the review! :)
Hm, I don't see what you mean here. There's three test cases, each with a different variety of the fifth parameter to
check_markup()
. Or do you want me to test the "default" case (the one with the default value for the fifth parameter) in a separate test method?Indeed :)
invisimail does not contain a mark-up language; hence FILTER_TYPE_MARKUP_LANGUAGE is out of the question.
FILTER_TYPE_HTML_RESTRICTOR vs. FILTER_TYPE_TRANSFORM_IRREVERSIBLE can potentially be argued about (hence the need for bitflags/better names), but the main purpose of the invisimail module is not to protect the reader from the resulting HTML to do evil things, the purpose is to perform a transformation so that spambots cannot contact the author. Hence: FILTER_TYPE_TRANSFORM_IRREVERSIBLE.
I *think* @chx kept the status at RTBC and created those follow-up issues to get this in ASAP, so … setting back to RTBC. I can address the test improvements here, if you prefer that. We should definitely to the FILTER TYPE constants -> bit flags + naming stuff in a follow-up, because that is very much prone to bikeshedding.
Comment #53
webchickCool, I'm comfortable with those items being follow-ups.
This has been assigned to sun for awhile, and I would still love to hear from him here, but in the meantime David and Alex and chx gave this very thorough reviews. So I'd like to commit this in ~24 hours, barring anymore serious objections.
Comment #54
chx CreditAttribution: chx commentedI *think* @chx kept the status at RTBC and created those follow-up issues to get this in ASAP, so … setting back to RTBC <= you are completely correct. I made it clear that while I do have issues with the flag names but that's not to hold this patch up. Totally addressable in followups.
Comment #55
catchPatch looks fine to me as well, as do the follow-ups.
Comment #56
Crell CreditAttribution: Crell commentedI'm fine with those being follow-ups process-wise, but I think the DX of those flags still needs a fair bit of work if I'm that confused by them. I'll follow up on the test spinoff with what I meant there.
Comment #57
webchickDarn. Went to commit this, but does not apply. :( Can we get a quick re-roll?
Comment #58
Wim LeersReroll from #45.
Comment #59
webchickYay! Bot came back green. Heeeeere we gooooo....!
Committed and pushed to 8.x. Thanks!
Let's continue further refining the terminology/docs/representation of these flags in #1816160: Should FILTER_TYPE_* be bitflags strings or stay ints?.
This will need a change notice, since it affects filter module authors. Writing it might even be able to help fix the terminology. :)
Comment #60
Wim LeersChange record created: http://drupal.org/node/1817474.
Comment #61
Wim Leers.
Comment #62
Tor Arne Thune CreditAttribution: Tor Arne Thune commentedComment #64
Eric_A CreditAttribution: Eric_A commentedHm, I'd say that the change record and issue summary would probably profit a lot from a note on why filter_url and filter_autop are (temporarily?) being left on the Filtered HTML text format, despite these filters being of type FILTER_TYPE_MARKUP_LANGUAGE.
I was reading about filter_url and filter_autop being of type FILTER_TYPE_MARKUP_LANGUAGE and it all just didn't add up... For now I'll just continue enjoying some more comments and patches on this beautiful issue :-)
Comment #65
Wim Leers@Eric_A: well, the Filtered HTML format has always had those filters. For Aloha to work, they would need to be removed (as stated earlier in this issue), but until Aloha is in core over at #1809702: WYSIWYG: Add Aloha Editor to core, there's no need to do that.
And "beautiful issue"? :)
Comment #66.0
(not verified) CreditAttribution: commentedUpdated and shortened issue summary to match the latest patch.