Problem/Motivation
Over at #1911884: Enable CKEditor in the Standard install profile, the Standard install profile's "Filtered HTML" format is being split into two:
- "Restricted HTML" for anonymous users. The only change wrt "Filtered HTML" is the enabling of setting "rel=nofollow" on all links.
- "Basic HTML" for authenticated users. The changes wrt "Filtered HTML": disabling of
filter_autop
andfilter_url
filters, allowing of<img>
tags, enabling of thefilter_html_image_secure
filter.
And over at #1933914: Enable CKEditor in the Standard install profile's "Full HTML" text format, we're working to enable CKEditor by default for the "Full HTML" text format as well.
At the same time, over at #1933896: Prevent enabling a text editor if a text format has one or more FILTER_TYPE_MARKUP_LANGUAGE filters enabled, we're working on preventing users from enabling text editors in situations where it should be disallowed (e.g. when using Markdown).
However, one of the not-really-expected situations where a text editor is disallowed by the same reasoning, is in Drupal 7's default "Filtered HTML" and "Full HTML" text formats, both of which use both filter_autop
and filter_url
filters. This means that sites upgrading from Drupal 7 to Drupal 8 (meaning that they have content using those text formats, which the upgrade process won't & can't touch) will not be able to just turn CKEditor on and be done with it.
(If it doesn't make sense to you that using filter_autop
and filter_url
precludes you from using CKEditor, then please read #1911884: Enable CKEditor in the Standard install profile and #1933896: Prevent enabling a text editor if a text format has one or more FILTER_TYPE_MARKUP_LANGUAGE filters enabled FIRST.)
Proposed resolution
In order to use CKEditor, Drupal 8 sites upgraded from Drupal 7 would have to modify all existing content to stop using the filter_autop
and filter_url
filters. It's easy enough to write code that does this upgrading: a Queue/Batch API implementation that goes over every piece of content ever created, applies _filter_autop()
and _filter_url()
and saves it back.
However, the problem lies not with "typical" content like nodes etc., but with content that is using text formats that lives somewhere Drupal/we can't know about — after all, contrib modules are allowed to do crazy things.
Finally: the secondary question is who should provide this "upgrade existing content to migrate away from using filter_autop
and filter_url
" functionality: core or a contrib module?
Remaining tasks
- Get feedback on "how to do the upgrading".
- Get feedback on "who should provide the upgrading"?
- Implement.
User interface changes
TBD (likely none).
API changes
TBD (likely none).
Comments
Comment #1
NaheemSays CreditAttribution: NaheemSays commentedNot sure what the best place for this is, but in the cases where an editor needs to be turned off, not having linespaces makes things really dificult to read (in my experience, there is often a need to disable the editor...).
For Drupal 7, there is a module http://drupal.org/project/wysiwyg_linebreaks that helps mostly fix this usability issue (mostly, as it makes pasting more awkward).
Wouldn't that be a better way that disabling autop?
Comment #2
quicksketchThanks @nbz for pointing out that module! I didn't realize we had a solution already made in D7. And thanks @WimLeers for making this issue.
CKEditor right now at least formats your HTML so that it has line breaks, but all paragraphs also have
<p>
tags. So at least from a legibility standpoint, new content shouldn't be difficult to read.@WimLeers and I have discussed another alternative that isn't mentioned and that's to forego any kind of upgrade path entirely and simply make a JS auto-p filter. Exactly like wysiwyg_linebreaks, it would convert any content that already has auto-p enabled into including paragraph tags before the editor is added. Likewise, when the editor is about to be removed from the page, the paragraph tags are removed again. It's the same approach used by WordPress for their WYSIWYG handling.
I *think* Wim's opposition to that approach is that auto-p isn't "pure HTML". It requires the HTML to be manipulated before it can be edited. I don't see that as a particular problem though. If a user prefers to have p-tags in their database, they can remove the auto-p filter from their text format and new content will be created including the p-tags. However if a user doesn't want p tags, they can enable auto-p and the WYSIWYG will both be able to read auto-p'd content and convert it back to stripping the p's on save. It also makes it so that you can freely move between all text formats that have auto-p enabled without needing to reformat their content.
I prefer making auto-p work with WYSIWYGs rather than a massive upgrade path for several reasons:
We can even still do #1933896: Prevent enabling a text editor if a text format has one or more FILTER_TYPE_MARKUP_LANGUAGE filters enabled, which is necessary for formats that are truly irreversible, but for auto-p, we can change its type to FILTER_TYPE_TRANSFORM_REVERSIBLE.
A few downsides to this approach:
The first two problems I see as minor. Striping out existing p-tags that have no effect on the presentation of the content won't cause any problems. As for the content in the database being "true HTML", I don't think that matters at all. Anonymous content is still going to have auto-p'd HTML saving new content into the database. Nearly every Drupal site in existence has auto-p enabled on at least one text format already. This isn't anything new and it's not a problem.
The historical problems with auto-p have mostly been resolved. Though we still have a few that are stuck:
These problems stem from the fact that we used WordPress's auto-p code, but when they updated it upstream, we didn't pull in their fixes. Anyway, I don't think the problems with auto-p are large ones. If you're an individual that doesn't like auto-p, it can be avoided on a per-site (or piece of content) basis.
Comment #3
sunI agree with @quicksketch and consider this issue won't fix.
We essentially need JS-level filters that are executed upon serialization of the editor contents back into the form element. This was a huge PITA in XHTML times, but now with HTML5, it should be doable.
Comment #4
quicksketchThe basic title of this issue still applies: "How do we want to facilitate enabling of CKEditor for sites upgraded from Drupal 7 to Drupal 8?" This problem needs to be fixed one way or another. I think that accommodating auto-p is a better solution than trying to eliminate it, and it fixes the upgrade problem at the same time.
@sun How has HTML5 changed the filtering problems? It looks like the issue you referenced (#510552: Invalid XHTML: missing trailing slashes, absolute urls and uppercase tags) has a working solution, but it's still regexes. My plan was to adopt the current auto-p code that already exists, which has existed for years, so it's not dependent on HTML5 in any way.
Yes if we added this is would definitely need to go before the editor is initialized and after the editor has been removed. That way it will work with all editors regardless and it will prevent us from needing to hack CKEditor's filtering system.
Comment #5
sunThe main problem with parsing and massaging HTML strings was that the resulting, serialized HTML string (the one that you put back into the form's textarea for submission) was not valid XHTML. Various browsers just simply serialize into HTML and ignore that the actual document type is XHTML. jQuery caused some extra fun, since it ignored and failed to serialize certain attributes.
However, since HTML5 essentially goes back to the roots of HTML and is even more permissive with regard to what constitutes broken/invalid HTML, all of these problems should be irrelevant by now. Likewise, the current versions of jQuery are using the native DOM serialization methods of modern browsers, so I assume/hope that this gem should also be obsolete by now.
Comment #6
quicksketchAh gotcha. So even though we may still be down to regexes for updating the content, at least we don't need to worry about maintaining XHTML strict formatting. Sounds good to me.
Comment #7
Wim LeersMore or less. My opposition is that we're pretending that auto-p is not a mark-up language of its own. I understand your and sun's tendency to solve this in a way that doesn't complicate things when upgrading from a Drupal 7 site. I like that we're making things easier both for Drupal and for those site owners by going the "JS auto-p route".
But I think that by doing so (JS auto-p), you're introducing new problems (disadvantages you did not list):
filter_autop
.Or did you think there were other filters out there that could benefit from a similar "apply before attaching a text editor" and "unapply after detaching (serializing) a text editor" flow? I can't think of any.
filter_autop
fromFILTER_TYPE_MARKUP_LANGUAGE
toFILTER_TYPE_TRANSFORM_REVERSIBLE
is going to pose problems, too.filter_autop
is not HTML, and thus logically, it equates to a (very minimal) markup language that is not HTML. Any text format with >=1FILTER_TYPE_MARKUP_LANGUAGE
filters indicates that it's a non-HTML mark-up language-based text format. That's really the case here.— Note that this also indicates where I'm coming from: it's not puristness, it's the practicality of making this work with in-place WYSIWYG editing, which is inherently HTML-only.
FILTER_TYPE_TRANSFORM_(IR)REVERSIBLE
filters means that for users to be able to do in-place editing, they may not edit the HTML as it is rendered on the page, no, they must retrieve an untransformed version for editing! Otherwise, reversible transformations (e.g. an image caption rendered in a certain,theme_caption()
'd way) and irreversible transformations (e.g. typographic enhancements:"true WYSIWYG"
to“true <span class="caps">WYSIWYG</span>”
) would be saved into the DB! We want transformation filters to still do their magic, instead of saving the filtered result.By marking
filter_autop
asFILTER_TYPE_TRANSFORM_REVERSIBLE
, but still doing the reversing through JS, we're telling Drupal thatfilter_autop
should not run, and that we should be doing a roundtrip to the server to retrieve the text withoutfilter_autop
… only to have the auto-p JS add the<p>
tags again…However, I don't want to embark on an endless discussion. So here's my proposal to allow for everything quicksketch said in #2: I don't think it's possible to use one of the current filter types for
filter_autop
if it's going to have a JS auto-p sister. I think a new filter type is in order:FILTER_TYPE_MARKUP_LANGUAGE_REVERSIBLE
.This solves all the new problems your proposal entailed and which I outlined above. No special casing: new filter type. Reasoning about text formats' markup language is not broken, hence it doesn't break in-place WYSIWYG editing. It does not cause additional, pointless roundtrips.
Sounds reasonable? :)
This assertion in #2 seems to be an oversimplification:
Lots of people have had problems with it. Lots of people were frustrated by it. It has apparently even earned the adjective of "infamous" in WordPress circles. It's easy to list dozens of blog posts on how to disable or work around it. Here's a few (if you read any, read the first two):
Comment #8
quicksketchI'm fine with that if it solves the problem. All the filter constants are still a bit baffling to me because they're all meta-named concepts, as opposed to being named based on what functionality they describe. If they were functional constants (and bitwise) like the menu system, then we could combine different functional properties together based on what integration we've provided. i.e. FILTER_JS_TO_HTML & FILTER_PHP_TO_HTML would mean that a filter could be converted from its pseudo-format to HTML through both JS and PHP. Then instead of having several different constants that mean some strange combination of filtering ability, all the filters are defined by their bitwise properties in any combination.
I think that all might be out of scope at this point. Any solution that makes the concept of auto-p compatible with the filter module defined types is fine with me. There's certainly no real technical reason why we need to do a server-side round-trip to convert HTML into auto-p'd content; it's just our architecture isn't compatible with such an idea yet.
Which you will be able to do with a single checkbox in Drupal. We're not forcing anyone to use it, we're providing it as a convenience of compatibility and for upgrading. If someone truly despises it, it will be trivial to disable.
Comment #9
Wim LeersI realize the filter types as they currently exist are far from ideal. We can improve that. We should not do that here. Introducing something like
FILTER_TYPE_MARKUP_LANGUAGE_REVERSIBLE
to facilitate the solution in in #2 is the easiest way forward without breaking other things.So: +1 for everything in #8.
(Filter types are only necessary because the whole filter system is a black box that does filtering/processing in different steps, without any ability to programmatically reason about them, which is necessary if you want to make filters work well with text editors.)
Comment #10
sunPlease note that there is technically no reason for undoing filter_autop. The filter can be re-applied to already autop-filtered text.
From my perspective, we're only discussing an undo facility for markup purity reasons. But strictly speaking, a text that went through filter_autop and which is output on a page can be inline/in-place-edited as-is and saved in a WYSIWYG manner, without a server/backend roundtrip to fetch the original/unprocessed text. When this saved text is rendered again, filter_autop will still run, but the filter will effectively do nothing.
Comment #11
quicksketchGood point. We'll be able to run the auto-p filter in all situations when the editor is attached, regardless of whether it's back-end (where the P tags actually need to be added) or the front-end, inline editing (where P tags already exist). This seems to me like as far as technical implementation, FILTER_TYPE_TRANSFORM_REVERSIBLE may work already without the need to add a new filter type. We just let the filtering go both directions every time and don't worry that it's unnecessary when inline editing. It's not the cleanest technical implementation, but it does make things easy.
Comment #12
Wim Leers#10: Are you saying we don't have to to reverse (undo)
filter_autop
upon saving? I know that when doing in-place WYSIWYG editing, we wouldn't have to do a round trip to retrieve the un-autop'd text (because we need those<p>
s!), but are you now saying that it's okay to be saving those<p>
tags anyway? That's dangerous, because then you're assuming the WYSIWYG editor will insert the required whitespace to ensurefilter_autop
keeps working (because<p>line 1</p><p>line 2</p>
). The only way that can work, is by also allowing the<p>
tag wheneverfilter_autop
is used.#11: Marking
filter_autop
asFILTER_TYPE_TRANSFORM_REVERSIBLE
will cause in-place editing to go back to the server and retrieving a version without anyFILTER_TYPE_TRANSFORM_REVERSIBLE
(andFILTER_TYPE_TRANSFORM_IRREVERSIBLE
) filters. That's the only way in-place editing can know which filters it shouldn't apply. Hence the need for a new text format. I already explained all this in full detail in #7.2.3.Comment #13
quicksketchNo, @sun is saying that if you decide to inline-edit a piece of content, the JS-version of the auto-p filter can run on the content to "add p-tags" even though the addition of any p-tags is not really necessary. Adding p-tags to content that already has p-tags has no effect. On save, all p-tags will be stripped out whether you started with an inline editor or started with backend editor. Just from a simplification point of view, every time the editor is attached we can run the auto-p tags and every time the editor is detached we can remove the auto-p tags, regardless of where the editor is used.
Ah, I didn't understand that "TRANSFORM_REVERSIBLE" really meant, "TRANSFORM_REVERSIBLE-only-by-PHP". We definitely need a new constant then, but FILTER_TYPE_MARKUP_LANGUAGE_REVERSIBLE doesn't make any sense that a markup language could be reversed without a server-side request while a TRANSFORM_REVERSIBLE filter could not.
Comment #14
Wim Leers#13:
RE filter type woes: to be fair, this is the first and only filter that has a JS filter implementation, let alone an "unfilter" implementation… :) Clearly, the whole filter classification stuff needs to be reevaluated, so I'll go with whatever name you see fit.
RE 'I didn't understand that "TRANSFORM_REVERSIBLE" really meant, "TRANSFORM_REVERSIBLE-only-by-PHP"': That's not what it really means, that's what just what it currently means for lack of JS implementations. We just haven't arrived at that point yet, and I think that by this point that's D9 material (with the exception of
filter_autop
, of course).I've said it at least back in October that JS implementations of these should be possible: #1782838-37: WYSIWYG in core: round one — filter types.
RE "FILTER_TYPE_MARKUP_LANGUAGE_REVERSIBLE doesn't make any sense": I personally still think it's pretty silly to explicitly support
filter_autop
+ WYSIWYG. It's a contradiction in terminis. I'm just going along with this because there are more important things to solve than to get this perfect.Comment #15
quicksketchIf we get to handling captions of any kind, these also will likely be "reversible" on the JS side. It should be trivial to convert something like this:
though JS into:
If we get into caption handling (which I hope we do), I think JS-based reverse filtering is going to come into play there also.
Comment #16
webchickUpgrade path-related things seem fairly major.
Comment #17
Wim LeersI think https://drupal.org/project/wysiwyg_linebreaks may be the answer.
For sites that don't wish to update their content to contain
<p>
and<br />
tags, installing that module should make it instantly compatible with CKEditor.I hope the maintainer of the module can chime in here.
Comment #18
geerlingguy CreditAttribution: geerlingguy commentedI just finished porting Wysiwyg Linebreaks to Drupal 8 yesterday as part of the D8CX sprint at #mwds, and it's working great alongside CKEditor/Editor.
I think just telling people who have existing content with the autop filter enabled to install the Linebreaks module should be adequate. I've been using the module on sites with TinyMCE and CKEditor in Drupal 6 and Drupal 7 for a couple years, and most of the edge cases (where a table is nested in a strong tag with weird newlines all around, etc., and the line breaks get funky) have been fixed.
Also, Wysiwyg Linebreaks lets the user choose whether he'd like to force the conversion all the time (so
<p>
and<br />
tags are never stored with the content in the database—they're only put on the content when a Wysiwyg is enabled for a textarea, then removed when the Wysiwyg detaches), or just convert the linebreaks into tags when the editor loads, and do nothing on the detach.I've thought about also adding some sort of batch functionality or an action that could process existing content and add the tags via Batch API or VBO, but I haven't really cared to do it for myself, since the module handles everything I do adequately :)
Comment #19
Wim LeersAs per #17 and #18, this is a solved problem, so demoting to "normal".
When nearing release, we should make sure to document this in whatever turns out to be the appropriate location, to aid people in upgrading from Drupal 7 to Drupal 8. Marking as postponed until we approach release.
Comment #20
catchMigrate makes this a non-issue.