Problem/Motivation

Over at #1911884: Enable CKEditor in the Standard install profile, the Standard install profile's "Filtered HTML" format is being split into two:

  1. "Restricted HTML" for anonymous users. The only change wrt "Filtered HTML" is the enabling of setting "rel=nofollow" on all links.
  2. "Basic HTML" for authenticated users. The changes wrt "Filtered HTML": disabling of filter_autop and filter_url filters, allowing of <img> tags, enabling of the filter_html_image_secure filter.

And over at #1933914: Enable CKEditor in the Standard install profile's "Full HTML" text format, we're working to enable CKEditor by default for the "Full HTML" text format as well.

At the same time, over at #1933896: Prevent enabling a text editor if a text format has one or more FILTER_TYPE_MARKUP_LANGUAGE filters enabled, we're working on preventing users from enabling text editors in situations where it should be disallowed (e.g. when using Markdown).

However, one of the not-really-expected situations where a text editor is disallowed by the same reasoning, is in Drupal 7's default "Filtered HTML" and "Full HTML" text formats, both of which use both filter_autop and filter_url filters. This means that sites upgrading from Drupal 7 to Drupal 8 (meaning that they have content using those text formats, which the upgrade process won't & can't touch) will not be able to just turn CKEditor on and be done with it.

(If it doesn't make sense to you that using filter_autop and filter_url precludes you from using CKEditor, then please read #1911884: Enable CKEditor in the Standard install profile and #1933896: Prevent enabling a text editor if a text format has one or more FILTER_TYPE_MARKUP_LANGUAGE filters enabled FIRST.)

Proposed resolution

In order to use CKEditor, Drupal 8 sites upgraded from Drupal 7 would have to modify all existing content to stop using the filter_autop and filter_url filters. It's easy enough to write code that does this upgrading: a Queue/Batch API implementation that goes over every piece of content ever created, applies _filter_autop() and _filter_url() and saves it back.

However, the problem lies not with "typical" content like nodes etc., but with content that is using text formats that lives somewhere Drupal/we can't know about — after all, contrib modules are allowed to do crazy things.

Finally: the secondary question is who should provide this "upgrade existing content to migrate away from using filter_autop and filter_url" functionality: core or a contrib module?

Remaining tasks

  1. Get feedback on "how to do the upgrading".
  2. Get feedback on "who should provide the upgrading"?
  3. Implement.

User interface changes

TBD (likely none).

API changes

TBD (likely none).

Comments

NaheemSays’s picture

Not sure what the best place for this is, but in the cases where an editor needs to be turned off, not having linespaces makes things really dificult to read (in my experience, there is often a need to disable the editor...).

For Drupal 7, there is a module http://drupal.org/project/wysiwyg_linebreaks that helps mostly fix this usability issue (mostly, as it makes pasting more awkward).

Wouldn't that be a better way that disabling autop?

quicksketch’s picture

Thanks @nbz for pointing out that module! I didn't realize we had a solution already made in D7. And thanks @WimLeers for making this issue.

not having linespaces makes things really dificult to read (in my experience, there is often a need to disable the editor...).

CKEditor right now at least formats your HTML so that it has line breaks, but all paragraphs also have <p> tags. So at least from a legibility standpoint, new content shouldn't be difficult to read.

@WimLeers and I have discussed another alternative that isn't mentioned and that's to forego any kind of upgrade path entirely and simply make a JS auto-p filter. Exactly like wysiwyg_linebreaks, it would convert any content that already has auto-p enabled into including paragraph tags before the editor is added. Likewise, when the editor is about to be removed from the page, the paragraph tags are removed again. It's the same approach used by WordPress for their WYSIWYG handling.

I *think* Wim's opposition to that approach is that auto-p isn't "pure HTML". It requires the HTML to be manipulated before it can be edited. I don't see that as a particular problem though. If a user prefers to have p-tags in their database, they can remove the auto-p filter from their text format and new content will be created including the p-tags. However if a user doesn't want p tags, they can enable auto-p and the WYSIWYG will both be able to read auto-p'd content and convert it back to stripping the p's on save. It also makes it so that you can freely move between all text formats that have auto-p enabled without needing to reformat their content.

I prefer making auto-p work with WYSIWYGs rather than a massive upgrade path for several reasons:

  • It makes upgrading less dangerous. Existing user content is not mass-modified. Instead, end-user content is unchanged. Only when an individual piece of content is edited is any changing done, but those changes (adding p-tags) are removed again on save.
  • It makes switching between text formats more compatible. At any time you could switch between any text formats that already have auto-p enabled. Without it, switching between an auto-p'd format and one that doesn't have one will cram all the paragraphs together.
  • It gives users the choice. If you want auto-p, you can enable it on your text format. If you don't, you don't need to.
  • It's proven. WordPress has been doing this ever since it added a WYSIWYG. It looks like wysiwyg_linebreaks uses the same code as WordPress (our PHP auto-p code also came from WordPress, so it's not surprising).
  • It's easy. All the code we need is already written, though we have to adjust it for Drupal 8's editor and edit modules.

We can even still do #1933896: Prevent enabling a text editor if a text format has one or more FILTER_TYPE_MARKUP_LANGUAGE filters enabled, which is necessary for formats that are truly irreversible, but for auto-p, we can change its type to FILTER_TYPE_TRANSFORM_REVERSIBLE.

A few downsides to this approach:

  • Any manually specified p tags (that don't have any attributes) will be stripped out when editing a piece of content, since we're not going to know which p-tags are manually-entered ones and which ones were automatic (WordPress also has this behavior).
  • The content in the database isn't "true HTML".
  • The auto-p filter has historically had a few problems such as not supporting HTML5 elements.

The first two problems I see as minor. Striping out existing p-tags that have no effect on the presentation of the content won't cause any problems. As for the content in the database being "true HTML", I don't think that matters at all. Anonymous content is still going to have auto-p'd HTML saving new content into the database. Nearly every Drupal site in existence has auto-p enabled on at least one text format already. This isn't anything new and it's not a problem.

The historical problems with auto-p have mostly been resolved. Though we still have a few that are stuck:

These problems stem from the fact that we used WordPress's auto-p code, but when they updated it upstream, we didn't pull in their fixes. Anyway, I don't think the problems with auto-p are large ones. If you're an individual that doesn't like auto-p, it can be avoided on a per-site (or piece of content) basis.

sun’s picture

I agree with @quicksketch and consider this issue won't fix.

We essentially need JS-level filters that are executed upon serialization of the editor contents back into the form element. This was a huge PITA in XHTML times, but now with HTML5, it should be doable.

quicksketch’s picture

I agree with @quicksketch and consider this issue won't fix.

The basic title of this issue still applies: "How do we want to facilitate enabling of CKEditor for sites upgraded from Drupal 7 to Drupal 8?" This problem needs to be fixed one way or another. I think that accommodating auto-p is a better solution than trying to eliminate it, and it fixes the upgrade problem at the same time.

@sun How has HTML5 changed the filtering problems? It looks like the issue you referenced (#510552: Invalid XHTML: missing trailing slashes, absolute urls and uppercase tags) has a working solution, but it's still regexes. My plan was to adopt the current auto-p code that already exists, which has existed for years, so it's not dependent on HTML5 in any way.

We essentially need JS-level filters that are executed upon serialization of the editor contents back into the form element.

Yes if we added this is would definitely need to go before the editor is initialized and after the editor has been removed. That way it will work with all editors regardless and it will prevent us from needing to hack CKEditor's filtering system.

sun’s picture

The main problem with parsing and massaging HTML strings was that the resulting, serialized HTML string (the one that you put back into the form's textarea for submission) was not valid XHTML. Various browsers just simply serialize into HTML and ignore that the actual document type is XHTML. jQuery caused some extra fun, since it ignored and failed to serialize certain attributes.

However, since HTML5 essentially goes back to the roots of HTML and is even more permissive with regard to what constitutes broken/invalid HTML, all of these problems should be irrelevant by now. Likewise, the current versions of jQuery are using the native DOM serialization methods of modern browsers, so I assume/hope that this gem should also be obsolete by now.

quicksketch’s picture

Ah gotcha. So even though we may still be down to regexes for updating the content, at least we don't need to worry about maintaining XHTML strict formatting. Sounds good to me.

Wim Leers’s picture

I *think* Wim's opposition to that approach is that auto-p isn't "pure HTML". It requires the HTML to be manipulated before it can be edited.

More or less. My opposition is that we're pretending that auto-p is not a mark-up language of its own. I understand your and sun's tendency to solve this in a way that doesn't complicate things when upgrading from a Drupal 7 site. I like that we're making things easier both for Drupal and for those site owners by going the "JS auto-p route".

But I think that by doing so (JS auto-p), you're introducing new problems (disadvantages you did not list):

  1. We're going to introduce special knowledge (edge case-y stuff) into editor.module's JS, just for the sake of making things easier for sites using filter_autop.
    Or did you think there were other filters out there that could benefit from a similar "apply before attaching a text editor" and "unapply after detaching (serializing) a text editor" flow? I can't think of any.
  2. Changing filter_autop from FILTER_TYPE_MARKUP_LANGUAGE to FILTER_TYPE_TRANSFORM_REVERSIBLE is going to pose problems, too.
    1. It breaks reasoning about text formats: it's a fact that filter_autop is not HTML, and thus logically, it equates to a (very minimal) markup language that is not HTML. Any text format with >=1 FILTER_TYPE_MARKUP_LANGUAGE filters indicates that it's a non-HTML mark-up language-based text format. That's really the case here.
    2. Because of the above, it breaks in-place editing: it's only possible to do in-place editing for HTML content, not for Markdown content. So we must be able to reliably tell what the mark-up language for a piece of content is (or simpler: wether it's just HTML or something else).
      — Note that this also indicates where I'm coming from: it's not puristness, it's the practicality of making this work with in-place WYSIWYG editing, which is inherently HTML-only.
    3. It breaks reasoning for in-place WYSIWYG editing: the presence of >=1 FILTER_TYPE_TRANSFORM_(IR)REVERSIBLE filters means that for users to be able to do in-place editing, they may not edit the HTML as it is rendered on the page, no, they must retrieve an untransformed version for editing! Otherwise, reversible transformations (e.g. an image caption rendered in a certain, theme_caption()'d way) and irreversible transformations (e.g. typographic enhancements: "true WYSIWYG" to “true <span class="caps">WYSIWYG</span>”) would be saved into the DB! We want transformation filters to still do their magic, instead of saving the filtered result.
      By marking filter_autop as FILTER_TYPE_TRANSFORM_REVERSIBLE, but still doing the reversing through JS, we're telling Drupal that filter_autop should not run, and that we should be doing a roundtrip to the server to retrieve the text without filter_autop… only to have the auto-p JS add the <p> tags again…

However, I don't want to embark on an endless discussion. So here's my proposal to allow for everything quicksketch said in #2: I don't think it's possible to use one of the current filter types for filter_autop if it's going to have a JS auto-p sister. I think a new filter type is in order: FILTER_TYPE_MARKUP_LANGUAGE_REVERSIBLE.

This solves all the new problems your proposal entailed and which I outlined above. No special casing: new filter type. Reasoning about text formats' markup language is not broken, hence it doesn't break in-place WYSIWYG editing. It does not cause additional, pointless roundtrips.

Sounds reasonable? :)


This assertion in #2 seems to be an oversimplification:

It's proven. WordPress has been doing this ever since it added a WYSIWYG. It looks like wysiwyg_linebreaks uses the same code as WordPress (our PHP auto-p code also came from WordPress, so it's not surprising).

Lots of people have had problems with it. Lots of people were frustrated by it. It has apparently even earned the adjective of "infamous" in WordPress circles. It's easy to list dozens of blog posts on how to disable or work around it. Here's a few (if you read any, read the first two):

quicksketch’s picture

I think a new filter type is in order: FILTER_TYPE_MARKUP_LANGUAGE_REVERSIBLE.

I'm fine with that if it solves the problem. All the filter constants are still a bit baffling to me because they're all meta-named concepts, as opposed to being named based on what functionality they describe. If they were functional constants (and bitwise) like the menu system, then we could combine different functional properties together based on what integration we've provided. i.e. FILTER_JS_TO_HTML & FILTER_PHP_TO_HTML would mean that a filter could be converted from its pseudo-format to HTML through both JS and PHP. Then instead of having several different constants that mean some strange combination of filtering ability, all the filters are defined by their bitwise properties in any combination.

I think that all might be out of scope at this point. Any solution that makes the concept of auto-p compatible with the filter module defined types is fine with me. There's certainly no real technical reason why we need to do a server-side round-trip to convert HTML into auto-p'd content; it's just our architecture isn't compatible with such an idea yet.

It's easy to list dozens of blog posts on how to disable or work around it.

Which you will be able to do with a single checkbox in Drupal. We're not forcing anyone to use it, we're providing it as a convenience of compatibility and for upgrading. If someone truly despises it, it will be trivial to disable.

Wim Leers’s picture

I realize the filter types as they currently exist are far from ideal. We can improve that. We should not do that here. Introducing something like FILTER_TYPE_MARKUP_LANGUAGE_REVERSIBLE to facilitate the solution in in #2 is the easiest way forward without breaking other things.

So: +1 for everything in #8.

(Filter types are only necessary because the whole filter system is a black box that does filtering/processing in different steps, without any ability to programmatically reason about them, which is necessary if you want to make filters work well with text editors.)

sun’s picture

Please note that there is technically no reason for undoing filter_autop. The filter can be re-applied to already autop-filtered text.

From my perspective, we're only discussing an undo facility for markup purity reasons. But strictly speaking, a text that went through filter_autop and which is output on a page can be inline/in-place-edited as-is and saved in a WYSIWYG manner, without a server/backend roundtrip to fetch the original/unprocessed text. When this saved text is rendered again, filter_autop will still run, but the filter will effectively do nothing.

quicksketch’s picture

Please note that there is technically no reason for undoing filter_autop. The filter can be re-applied to already autop-filtered text.

Good point. We'll be able to run the auto-p filter in all situations when the editor is attached, regardless of whether it's back-end (where the P tags actually need to be added) or the front-end, inline editing (where P tags already exist). This seems to me like as far as technical implementation, FILTER_TYPE_TRANSFORM_REVERSIBLE may work already without the need to add a new filter type. We just let the filtering go both directions every time and don't worry that it's unnecessary when inline editing. It's not the cleanest technical implementation, but it does make things easy.

Wim Leers’s picture

#10: Are you saying we don't have to to reverse (undo) filter_autop upon saving? I know that when doing in-place WYSIWYG editing, we wouldn't have to do a round trip to retrieve the un-autop'd text (because we need those <p>s!), but are you now saying that it's okay to be saving those <p> tags anyway? That's dangerous, because then you're assuming the WYSIWYG editor will insert the required whitespace to ensure filter_autop keeps working (because <p>line 1</p><p>line 2</p>). The only way that can work, is by also allowing the <p> tag whenever filter_autop is used.

#11: Marking filter_autop as FILTER_TYPE_TRANSFORM_REVERSIBLE will cause in-place editing to go back to the server and retrieving a version without any FILTER_TYPE_TRANSFORM_REVERSIBLE (and FILTER_TYPE_TRANSFORM_IRREVERSIBLE) filters. That's the only way in-place editing can know which filters it shouldn't apply. Hence the need for a new text format. I already explained all this in full detail in #7.2.3.

quicksketch’s picture

#10: Are you saying we don't have to to reverse (undo) filter_autop upon saving?

No, @sun is saying that if you decide to inline-edit a piece of content, the JS-version of the auto-p filter can run on the content to "add p-tags" even though the addition of any p-tags is not really necessary. Adding p-tags to content that already has p-tags has no effect. On save, all p-tags will be stripped out whether you started with an inline editor or started with backend editor. Just from a simplification point of view, every time the editor is attached we can run the auto-p tags and every time the editor is detached we can remove the auto-p tags, regardless of where the editor is used.

#11: Marking filter_autop as FILTER_TYPE_TRANSFORM_REVERSIBLE will cause in-place editing to go back to the server and retrieving a version without any FILTER_TYPE_TRANSFORM_REVERSIBLE (and FILTER_TYPE_TRANSFORM_IRREVERSIBLE) filters.

Ah, I didn't understand that "TRANSFORM_REVERSIBLE" really meant, "TRANSFORM_REVERSIBLE-only-by-PHP". We definitely need a new constant then, but FILTER_TYPE_MARKUP_LANGUAGE_REVERSIBLE doesn't make any sense that a markup language could be reversed without a server-side request while a TRANSFORM_REVERSIBLE filter could not.

Wim Leers’s picture

#13:

RE filter type woes: to be fair, this is the first and only filter that has a JS filter implementation, let alone an "unfilter" implementation… :) Clearly, the whole filter classification stuff needs to be reevaluated, so I'll go with whatever name you see fit.

RE 'I didn't understand that "TRANSFORM_REVERSIBLE" really meant, "TRANSFORM_REVERSIBLE-only-by-PHP"': That's not what it really means, that's what just what it currently means for lack of JS implementations. We just haven't arrived at that point yet, and I think that by this point that's D9 material (with the exception of filter_autop, of course).
I've said it at least back in October that JS implementations of these should be possible: #1782838-37: WYSIWYG in core: round one — filter types.

RE "FILTER_TYPE_MARKUP_LANGUAGE_REVERSIBLE doesn't make any sense": I personally still think it's pretty silly to explicitly support filter_autop + WYSIWYG. It's a contradiction in terminis. I'm just going along with this because there are more important things to solve than to get this perfect.

quicksketch’s picture

We just haven't arrived at that point yet, and I think that by this point that's D9 material (with the exception of filter_autop, of course).

If we get to handling captions of any kind, these also will likely be "reversible" on the JS side. It should be trivial to convert something like this:

<div class="caption-wrapper">
  <img data-fid="10" src="example.png />
  <div class="caption">Example caption</div>
</div>

though JS into:

<img data-caption="Example caption" data-fid="10" src="example.png" />

If we get into caption handling (which I hope we do), I think JS-based reverse filtering is going to come into play there also.

webchick’s picture

Priority: Normal » Major

Upgrade path-related things seem fairly major.

Wim Leers’s picture

I think https://drupal.org/project/wysiwyg_linebreaks may be the answer.

For sites that don't wish to update their content to contain <p> and <br /> tags, installing that module should make it instantly compatible with CKEditor.

I hope the maintainer of the module can chime in here.

geerlingguy’s picture

I just finished porting Wysiwyg Linebreaks to Drupal 8 yesterday as part of the D8CX sprint at #mwds, and it's working great alongside CKEditor/Editor.

I think just telling people who have existing content with the autop filter enabled to install the Linebreaks module should be adequate. I've been using the module on sites with TinyMCE and CKEditor in Drupal 6 and Drupal 7 for a couple years, and most of the edge cases (where a table is nested in a strong tag with weird newlines all around, etc., and the line breaks get funky) have been fixed.

Also, Wysiwyg Linebreaks lets the user choose whether he'd like to force the conversion all the time (so <p> and <br /> tags are never stored with the content in the database—they're only put on the content when a Wysiwyg is enabled for a textarea, then removed when the Wysiwyg detaches), or just convert the linebreaks into tags when the editor loads, and do nothing on the detach.

I've thought about also adding some sort of batch functionality or an action that could process existing content and add the tags via Batch API or VBO, but I haven't really cared to do it for myself, since the module handles everything I do adequately :)

Wim Leers’s picture

Assigned: Unassigned » Wim Leers
Issue summary: View changes
Priority: Major » Normal
Status: Active » Postponed
Issue tags: +revisit before beta

As per #17 and #18, this is a solved problem, so demoting to "normal".

When nearing release, we should make sure to document this in whatever turns out to be the appropriate location, to aid people in upgrading from Drupal 7 to Drupal 8. Marking as postponed until we approach release.

catch’s picture

Status: Postponed » Closed (won't fix)
Issue tags: -revisit before beta

Migrate makes this a non-issue.