Problem/Motivation
CKEditor 4.1 includes an Advanced Content Filter. Drupal HEAD is currently using it in default mode, which means CKEditor filters all user entered content (including content pasted from other web pages, Microsoft Word, etc.) to conform to what is allowed by the enabled toolbar buttons and plugins for the text format being used. For example, if in configuring the "Basic HTML" format, the administrator hasn't added any toolbar buttons or plugins that provide <table>
functionality, then upon pasting HTML content from Word, CKEditor will automatically clean up the HTML to not include <table>
and related tags. This is mostly a good thing: to the extent that the configured toolbar matches what is allowed by Drupal's text filters, it helps maintain parity between what the content editor sees in the wysiwyg, and what they'll get when they save the node and view it (i.e., WYS = WYG).
However, there are some situations in which there isn't a perfect match between the configured toolbar and the configured filters. Some examples:
- The Basic HTML format in HEAD allows for the
code
tag, but there's no enabled CKEditor button for it. CKEditor has a "source" mode for entering HTML by hand, so you can use that to markup a code snippet withcode
tags, save the node, and upon viewing the node, that markup is there, as expected, since the format allows it. However, when re-editing the node, CKEditor applies its default ACF rules and strips out thecode
tags, since there's no enabled toolbar button for it. - Drupal filters can be more fine-grained than CKEditor buttons. For example, you might want a Drupal filter that allows
class
orstyle
attributes on some tags, but not on others. For WYS to equal WYG, you'd want these rules applied within the editor as well, and relying on CKEditor to figure it out from the enabled buttons is not sufficient. - You may want the CKEditor interface to be smart about which options it shows in various dialogs. For example, if you have a Drupal filter that strips
target
attributes from links, you probably also want CKEditor hiding that option from its link dialog. CKEditor's ACF allows for setting theallowedContent
configuration to what you want, and it will automatically adjust all of its UI options accordingly.
Proposed resolution
Rather than letting CKEditor guess the allowedContent
from its button/plugin configuration, have Drupal explicitly set allowedContent
to match what the format's text filters allow.
To be able to generate the proper value for that setting, we must know which HTML tags and attributes are allowed by the text format, so this issue is introducing filter_get_allowed_html_by_format()
and an "allowed html callback" for filters of the type FILTER_TYPE_HTML_RESTRICTOR
. (This was originally proposed over at #1782838: WYSIWYG in core: round one — filter types, but back then it was not something you could touch and see; now you can.)
Remaining tasks
All done; needs reviews!
User interface changes
None.
API changes
- Filters of the
FILTER_TYPE_HTML_RESTRICTOR
type may now define an "allowed html callback". - New utility function:
filter_get_allowed_html_by_format()
. Allows modules to gain insight about which HTML tags and attributes are allowed by a certain text format.
Comment | File | Size | Author |
---|---|---|---|
#32 | ckeditor_acf-1936392-32.patch | 41.93 KB | Wim Leers |
#32 | interdiff-fixes.txt | 4.22 KB | Wim Leers |
#32 | interdiff-renaming.txt | 16.52 KB | Wim Leers |
#27 | ckeditor_acf-1936392-27.patch | 40.53 KB | Wim Leers |
#27 | interdiff-minor_fixes_26.txt | 7.05 KB | Wim Leers |
Comments
Comment #1
Wim LeersThis patch:
allowedContent
setting automatically. When noFILTER_TYPE_HTML_RESTRICTOR
filters are present in a text format, it is set totrue
(to allow anything); otherwise it is set based on the return value offilter_get_allowed_tags_by_format()
. Tests updated accordingly.FILTER_TYPE_HTML_RESTRICTOR
type filters to have an "allowed tags callback". API documentation,filter_html
& tests updated accordingly.FilterAPITest
fromWebTestBase
toDrupalUnitTestBase
. Speed FTW :)Comment #1.0
Wim LeersClarified the D7 situation for the security problem.
Comment #2
Wim LeersFix notice.
Comment #3
sunGiven what is being specified in the callback, 'allowed_tags' is a misnomer. It's 'allowed_html'.
Aside from that, I need to think some more through this.
Comment #4
Wim LeersI also found the name for the callback bad, but couldn't think of anything better. It's so obvious now :) Thanks! Fixed.
Comment #5
Wim Leers#1872226: Ability to configure CKEditor’s pasting to match the currently active text format merged into this issue.
Comment #6
Wim LeersChasing HEAD.
Comment #7
effulgentsia CreditAttribution: effulgentsia commentedAccording to http://ckeditor.com/blog/CKEditor-4.1-RC-Released and http://docs.ckeditor.com/#!/guide/dev_advanced_content_filter, if
allowedContent
isn't specified, which is currently the case in Drupal HEAD, then CKEditor automatically defaults it to whatever is allowed by the enabled buttons and plugins. Therefore, I'm not immediately able to see how this issue is needed on security grounds. I tried theiframe
andscript
attacks listed in the issue summary for all formats installed by Standard profile, and was not able to get CKEditor to run that XSS code. If someone is able to come up with an attack that works, please post it. Otherwise, I suggest we remove "security" from the tags and title, and downgrade this to a major.In terms of the usability part of this issue (helping wys really be wyg), why do the approach taken here (setting allowedContent based on what the Drupal filter settings are) rather than enforcing consistency between the buttons/plugins and the Drupal filter settings in the first place? Wouldn't the latter be better UI in terms of consistency between how the administrator configures the toolbar and the toolbar that actually gets displayed?
Comment #8
greggles@effulgentsia - were you testing in chrome or firefox? Some browsers will block scripts that appear to come "from the user".
Comment #9
effulgentsia CreditAttribution: effulgentsia commentedI was testing in Firefox, but what I found wasn't just that CKEditor didn't run the code, it also stripped it from the content, so CKEditor was filtering, not the browser.
Comment #10
quicksketchOur syncing between the Editor settings and the Filter settings is currently CKEditor-specific and will likely remain that way in Drupal 8. There is a layer of abstraction in there to allow other modules to update filter settings in the same way CKEditor does, but right now there's no such thing as a generic "Editor plugin" that would work with every WYSIWYG out there. Instead right now we only have CKEditor plugins, and any other module that provides an editor (or multiple editors) would handle its own plugins in its own way. This means that if we wanted to enforce filtering consistency between buttons/plugins and filter settings, each filter would need to integrate with each individual editor because we don't have a generic API for handling plugins across all editors. By using the approach Wim proposes here, each filter doesn't need to directly provide integration with any editor at all. The editor is responsible for utilizing the information provided by the filters and making its own configuration match. So basically, this shifts responsibility of integrating with a particular editor from the filter to the editor, and saves filter authors the need from needing to write an editor-specific integration.
Comment #11
Wim Leers#7:
RE: Security
allowedContent
to the list of allowed tags/attributes/styles/classes. Hence it will allow anything that the buttons indicate that they might generate. This implies that it might allow for too much: it will allow for style attributes (which we almost never want as Drupal users, plus it involves security risks), as well as any class (also questionable) as well as any kind of attribute (also questionable, and also involves security risks, thinkonClick
etc.). So even though the most obvious risks may be mitigated, when users install additional CKEditor plugins, which might want to generate e.g.onClick
attributes, then attackers still have free reign.allowedContent === true
onClick
to work. Users with access to this text format can do anything they want. To convey this "anything goes" nature to CKEditor, we must setallowedContent = true
.I hope this clarifies things sufficiently.
Why allowedContent and not just enforce button/plugin vs.
filter_html
setting consistency?Simply because using the CKEditor toolbar/plugin configuration by itself is insufficiently granular. As I said above: it might (and typically will) allow for too much: style attributes for starters, but much more than that.
Essentially: the CKEditor buttons/plugins configuration is fairly coarse, Drupal text format filter configuration can be done much more granular. As of CKEditor 4.1, via its ACF feature, CKEditor now also has a very granular configuration, so we can achieve a 1:1 mapping. We just have to start using it :)
#10: I hadn't even considered it that way, because that would result in an insane many:many relationship graph. It would indeed be conceptually feasible to have a Drupal filter correspond to one or more WYSIWYG editor buttons. But practically, not so much: each WYSIWYG plugin/button could have alternative/overridden implementations, resulting in many:more-than-many relationships, resulting in a code jungle. And then we're not even talking about supporting multiple WYSIWYG editors…
As I said above, you can view CKEditor's ACF as a way of restricting HTML in CKEditor like we have for many years in Drupal's filter system.
This is a universal concept, that any WYSIWYG editor could implement.
I had a discussion with the CKEditor folks yesterday about a related problem space (improved paste from Word). That will also hugely benefit from setting the allowed HTML tags, to avoid pasting in cruft. However, that's when I realized that the patches so far only define which tags and attributes are allowed, it does not yet allow text formats to indicate which specific styles and classes are allowed. I'll have to fix that (though that could even happen in a follow-up issue, because this implements the bulk of the work). But I first want sign-off on the overall approach.
Comment #11.0
Wim LeersFix typo.
Comment #11.1
effulgentsia CreditAttribution: effulgentsia commentedUpdated issue summary.
Comment #12
effulgentsia CreditAttribution: effulgentsia commentedSince no one identified how this patch improves security relative to letting CKEditor use its default ACF, I'm adjusting the issue title, tags, and priority accordingly. I also updated the summary to reflect my current understanding of the problem space (in particular, that we're now discussing custom ACF vs. default ACF rather than the original summary that was discussing custom ACF vs. no ACF).
@quicksketch: thanks for #10; that was helpful. It helped inform the changes I made to the summary, but I left the consideration of supporting other editors out of the summary for now, for simplicity. If we end up needing those considerations to resolve this issue properly, we can update the summary again at that time.
This is lessening the accuracy relative to CKEditor's default. _filter_xss_attributes() filters out all "on*" attributes, but that's not reflected here. As a result, this patch allows
onclick
in through the editor (go into source mode, type that, then switch back), whereas HEAD doesn't. It seems that CKEditor doesn't run onclick events on its content anyway, so this isn't necessarily a security problem, but it still seems to be going backwards, and I'm not clear how we fix that, since CKEditor's ACF doesn't support regex rules, from what I can tell.How would this work with something like Media module (and transformation filters in general)? A common Media configuration is to have the "Limit allowed HTML tags" filter not include
<img>
. But for the Media filter to run later, and convert the inline JSON codes into<img>
tags. Meanwhile, on the CKEditor side, the plugin inserts<img>
tags, which only on detach() does it turn back into the JSON codes. Wouldn't the above code disallow<img>
tags fromallowedContent
(since it's computing the intersection) and prevent the plugin from working? Or does CKEditor now support a transformation API that could allow the content to always contain the JSON codes only, and therefore, the transformation to<img>
would happen in a different part of its pipeline and be immune fromallowedContent
?Comment #13
Wim LeersI disagree. That's what I tried to do in #11, but apparently not clearly enough :)
I said this:
But, you're absolutely right when you say that
That's why I said this at the bottom of #11:
Maybe I should have called out
filter_html
's specific restrictions (nostyle
attributes and noon*
attributes) that are thus not passed on to CKEditor: since we don't pass any allowed styles to ACF, that's already in accordance (but by omission rather than intentionally).A problem there is that
filter_html
uses both a whitelist (for the allowed tags) and a blacklist (for the disallowed attributes). CKEditor's ACF currently only supports a whitelist. So it's impossible to express our disallowance ofon*
attributes. They're going to address that in one of the next releases (TBD, probably 4.2).That could be a reason for wanting to postpone this issue until after ACF has blacklist support.
You're also right that ACF currently does not support regex rules (or even wildcard rules), but that's coming soon: http://dev.ckeditor.com/ticket/10202.
(I wanted to keep the discussion focused on the high level aspects, that's why I didn't elaborate on these details; but I probably should have. Apologies.)
Your assumption that CKEditor would insert
<img>
tags which upon detaching CKEditor would be turned back into JSON codes assumes that we're using CKEditor Widgets for Media AFAICT. Without CKEditor Widgets, the JSON code would simply be always visible: when editing without or with CKEditor, it would only be rendered as an image when the end user is looking at it.If you did not refer to CKEditor Widgets, then please clarify.
If you did refer to CKEditor Widgets, then:
allowedContent
, then I'd consider that a (critical) bug. We're starting to work on CKEditor Widgets next week, in order to provide the CKEditor developers with feedback to improve CKEditor Widgets, and if this immunity weren't the case, that'd probably be the first bug report we'd provide.Comment #14
effulgentsia CreditAttribution: effulgentsia commentedI'd like to comment more on #13, but for now, setting to "needs work", because I think we need more from the ACF in general. Not postponing though, because we can use this issue to clarify what Drupal needs, and help get that info to the CKEditor devs.
Comment #15
Wim Leers#14: AFAICT it's as simple as also requiring ACF to support 1) wildcarded things (ticket 10202), 2) blacklist support.
I'll await your further comments on #13 before relaying this to the CKEditor developers.
Comment #16
Wim LeersSomething that wasn't mentioned in the issue summary yet: the growing importance also makes structured content more important, for which this issue is of the utmost importance. See http://wimleers.com/article/drupal-8-structured-content-authoring-experi... for more details.
Here's a reroll of the patch in #6, from almost 2.5 months ago. It doesn't address any of the feedback yet, it's just changing things to apply to HEAD again. NR to get testbot to run, don't actually review.
Comment #18
Wim LeersForgot a file :/
Comment #19
Wim LeersWhat's new
All previous patches were incomplete in their coverage of possible HTML restrictions that filters may impose. They only conveyed the following filter restrictions:
Consequently, that was also the only information we were able to convey to CKEditor's
allowedContent
setting.This patch allows for the following filter restrictions:
on-*
))data-*
)) and for it a range of allowed attribute values*
which means it contains attribute restrictions that apply to all allowed tags.As you can see, this is very expressive. The only things missing are highly dynamic things like regular expression support to specify attribute values or even callback functions. Only in the most exotic edge cases, those things are necessary; this already allows for much more expressiveness than Drupal core itself needs!
(That means this is able to express more than CKEditor's ACF can, which doesn't allow you to specify specific attribute values for attributes other than
style
andclass
. Maybe some day ACF will support that too, despite the limited use for that.)Changes in code
PHP
To retrieve this metadata from filters,
FilterInterface
now hasFilterInterface::getHTMLRestrictions()
.FilterBase::getHTMLRestrictions()
returnsFALSE
, so that filters that are not in fact restricting HTML even provide a sane, correct answer.There's a new function called
filter_get_html_restrictions_by_format()
that calculates the end result of applying all those filters in succession (it ignores filters that aren't of theFILTER_TYPE_HTML_RESTRICTOR
type). This is what text editors like CKEditor can use to configure features like CKEditor's ACF.JS/CKEditor
In previous patches, we only had a whitelist of tags and attributes to work with. Now, with
FilterInterface::getHTMLRestrictions()
, we also have styles and classes. Which means we can now configure the full range of CKEditor's ACF. As of this patch, we do that.However, CKEditor does not yet have attribute wildcard support (http://dev.ckeditor.com/ticket/10202) nor blacklist support (http://dev.ckeditor.com/ticket/10276). Yet Drupal core's
filter_html
filter always strips thestyle
attribute and anyon*
attribute, which is a restriction like described in point 5: for all allowed HTML tags, disallow a specific attribute, with a wildcard.Since CKEditor's ACF does not have any blacklist support yet, not will it come very soon, and this patch is so very important, the best thing I was able to do is to provide a temporary hack that adds support for it until CKEditor supports it explicitly.
So that's what I did. See the changes to
core/modules/ckeditor/js/ckeditor.js
.Before vs. after
Using the "Basic HTML" text format, 3 examples:
<code>
tags before this patch would cause them to be stripped. With this patch, they're maintained.style
andon*
attributes are stripped both before and after this patch. The difference is that before, those attributes simply didn't happen to be allowed by any of the CKEditor buttons. After this patch, they're stripped because the text format is configured like that.class
ordata-*
attributes would be stripped before this patch. With this patch, they're maintained. Again: not by pure chance, but because the text format is configured that way.Use this sample HTML to evaluate: select the Basic HTML text format, click the Source button, paste it, click Source button again (to preview, this applies ACF), click Source button again (to look at resulting source):
Note that the closing "code" tag intentionally contains a space ("co de") in all code samples because
filter_html
apparently isn't smart enough; causing problems here on d.o… :)Result before patch:
Result after patch:
Conclusion
Comment #20
quicksketchI've only read through about half the patch so far. Here are two things that stuck out:
- I don't think we need to be so elaborate in describing our tag blacklist as a "hack". I think it's a perfectly valid solution to provide a blacklist of our own by using a "instanceLoaded" event until CKEditor provides one natively. However maybe we should simply make it explicit rather than assuming filtered_html is being added. Couldn't we add a simple check server-side if "filtered_html" is present in the filter list and pass that to the editor's JS settings? I think it's only the assumption that it must be added that's the hacky part.
- filter_get_html_restrictions_by_format() has a return value of TRUE here, but the return value for what happens with TRUE is not documented:
Comment #21
jessebeach CreditAttribution: jessebeach commentedSmall comment fix: 'to blacklisting' to 'to blacklist'.
After reading a couple times through this issue, the approach taken here seems correct. We do not want the filter/format system in Drupal to bend to the needs of editors (like CKEditor). Instead, Drupal should produce a standard, predictable representation of the content output and that editors will need to consume. If there is information loss in the exchange, we need to correct that loss at the editor level, as Wim has been pushing for with the CKEditor folks.
I walked through the JavaScript portion of the patch. It looks good. I added restrictive filters to each of the basic text formats in Drupal and walked through the a content item edit page load for each format. I'm able to add attributes to HTML elements except the attributes style and on*.
I don't feel I can give a sufficient review of the PHP portion of the patch that would allow me to RTBC this issue, but from the perspective the JavaScript, it's ready to go.
Comment #22
Wim Leers#20:
… but it is a hack because it goes into CKEditor's ACF feature's internals:
_ACF_HACK_to_support_blacklisted_attributes()
is not using a public API!I'm working closely with the CKEditor guys to either remove this code altogether (blacklisting support in ACF) or to have an official API to do something like this. Once we get to that point, I'd be fine with removing the "warning! hack! SCARY!" language :)
You're right that we can make it so that it only has effect when the text format actually uses the
filter_html
filter. I preferred to not do that to keep it intentionally hackily broken, to very explicitly convey that this must be temporary. The editor settings are then also not polluted with metadata only used by this hack.But if you feel strongly about this, I'll definitely add it.
You've misunderstood the code :). That's a return statement in a closure for the call to
array_filter()
; it's not a return statement for thefilter_get_html_restrictions_by_format()
function itself!#21: Fixed the typo. :)
Comment #24
Wim LeersDozens of "The test did not complete due to a fatal error." fatals, despite #22 being nigh identical to #19. Testbot failure. Retesting.
Comment #25
Wim Leers#22: ckeditor_acf-1936392-22.patch queued for re-testing.
Comment #26
Gábor HojtsyWhere does $attribtue_values come from here?! What am I missing?
Ie. how does $attributes['style'] used in the function?! It does not take an argument.
In contrast does Drupal allow for more restrictions? Its not clear here at all.
"format_tags"
most => must
These are supposed to be wrapped in @code tags no?
Woot :)
Nothing is allowed vs. nothing is disallowed :)
Wouldn't these be
,
, etc. Not just the names of the tags?
I don't think we use & *anywhere* like this(?) I'd just use good old "and" :)
What? Replaces all content with information(?) it seems to return the text as-is passed in with no replacement, so does not seem like a correct description?
Comment #27
Wim Leers#26: Thanks! All fixed. :) Existing tests slightly improved. The reason that the
p br a strong
case also worked fine is simple: that's in fact also supported by thefilter_html
format, but almost nobody knows that :)Now also with test coverage that would have prevented the first problem you pointed out in #26.
Comment #28
Wim LeersPeople are starting to notice the problems in D8 HEAD and are starting to file bug reports that this issue fixes!
#2020081: H4 element not distinguishable from rest of text (P element)
#2020093: Unable to add formatting in source unless the corresponding wysiwyg menu item is enabled
Comment #29
Gábor HojtsyThanks for the expanded test coverage and the separate interdiffs especially. Looks good to me!
Comment #30
johnheaven CreditAttribution: johnheaven commentedWim asked me to test this here and it solves the problem that I had. Thanks guys!
Comment #31
effulgentsia CreditAttribution: effulgentsia commentedRelatively minor nits, but "needs work" at least on the incorrect checking of a never defined $allowed_values (see below).
Just a stylistic suggestion, but since the need to implode is an artifact of CKEditor's data format, I think it would be better to move it out of the closure and into where you actually set the variable further down in the function. Feel free to disagree though.
Though it makes no functional difference, Drupal's convention for PHP code is to use
elseif
as one word.I assume the reason this is needed is that CKEditor essentially ignores 'attributes' for 'style' and 'class'? In other words, if
'attributes' => TRUE
is set, but 'styles' and 'classes' are left unspecified, then CKEditor removes all styles and classes? If so, then a comment to that effect would be helpful. If not, then a comment as to why this is actually needed would be helpful.s/$allowed_values/$allowed_styles/. We need test coverage for this as well.
s/$allowed_styles/$allowed_classes/.
s/$allowed_values/$allowed_classes/. + test coverage.
And once more. (another hunk from the one 2 above).
Docs needs fixing for the fact that the first level of the array is 'allowedTags' or 'forbiddenTags'. And the 2nd level of the array varies between these two.
AFAIK, in Drupal PHP code, array keys are always lowercase (i.e.,
'allowed_tags'
). Perhaps except when we transfer them directly to JS, but that's not what we're doing here. Unless you can find a counter example, I suggest making that change here to match our conventions.Ditto.
Comment #32
Wim LeersThis already *is* CKEditor-specific code, so that's intentional :)
All points in #31 addressed. Two seperate interdiffs that combined form the whole set of changes: one with fixes for #31, one with the renaming of
allowedTags
toallowed
andforbiddenTags
toforbidden_tags
, along with updated + extended docs onFilterInterface
.(I talked to effulgentsia in chat to discuss the name changes.)
Comment #33
effulgentsia CreditAttribution: effulgentsia commentedThanks.
Comment #34
Dries CreditAttribution: Dries commentedCommitted to 8.x. Thanks!
Comment #35
xjmDo we have a d.o issue to do our part once the CK ticket is resolved? If no, can we file it, and if yes, can we link it here and in the summary? Thanks!
Comment #36
Wim LeersAwesome! This is the last absolutely essential foundational CKEditor integration issue :) Everything else is to provide a delightful experience, but this fixes a fundamental flaw. So glad to have this in :)
#35: Great point — follow-up created: #2023629: Remove our CKEditor ACF work-around once ACF supports blacklisting.
Comment #37.0
(not verified) CreditAttribution: commentedUpdated issue summary.