Change record status: 
Project: 
Introduced in branch: 
8.x
Description: 

Modules that provide filters for Drupal's filter system should now define the "type" of each filter. (This is necessary for other modules, such as WYSIWYG modules, to reason about text formats.)

Filter types

There are four filter types (NOTE: these are still prone to change in #1816160: Should FILTER_TYPE_* be bitflags strings or stay ints?):

  • FilterInterface::TYPE_MARKUP_LANGUAGE: Non-HTML markup language filters that generate HTML. Examples: Markdown, Textile, but also Drupal core's filter_autop and filter_url filters.
  • FilterInterface::TYPE_HTML_RESTRICTOR: HTML tag and attribute restricting filters. Examples: HTML Purifier, Drupal core's filter_html.
  • FilterInterface::TYPE_TRANSFORM_REVERSIBLE: Reversible transformation filters. Examples: <img data-caption="Druplicon"> may be (reversibly!) transformed to <figure><img><figcaption>Druplicon</figcaption></figure>.
  • FilterInterface::TYPE_TRANSFORM_IRREVERSIBLE: Irreversible transformation filters. Examples: the Typogrify filter would transform WYSIWYG and I said "foo"! into <span class="caps">WYSIWYG</span> and I said “foo”!, respectively. Text link ad systems would transform fancy car into something like <a href="http://fancycar.example.com">fancy car</a>. Neither of those text-based transformations are reliably reversible (even though it might be possible to do implementations where they are reversible!)

D7:

/**
 * Implements hook_filter_info().
 */
function filter_filter_info() {
  $filters['filter_html'] = array(
    'title' => t('Limit allowed HTML tags'),
    'process callback' => '_filter_html',
    …
  );
}

D8:

/**
 * Implements hook_filter_info().
 */
function filter_filter_info() {
  $filters['filter_html'] = array(
    'title' => t('Limit allowed HTML tags'),
    'type' => FilterInterface::TYPE_HTML_RESTRICTOR,
    'process callback' => '_filter_html',
    …
  );
}

Related to this, FilterFormatInterface::getFilterTypes() and FilterFormatInterface::getHtmlRestrictions() have been added. The former will return an array of all unique filter types used in the text format it's called on. The latter will return a structured array conveying the HTML restrictions of a text format — or FALSE if there aren't any HTML restrictions (i.e. if a text format has zero FilterInterface::TYPE_HTML_RESTRICTOR filters) .

check_markup() can skip filters of a certain type

There's a new $filter_types_to_skip parameter to check_markup(), which defaults to the empty array. Trying to skip FilterInterface::TYPE_HTML_RESTRICTOR filters is disallowed.

None of the existing check_markup() need to change! This is new functionality only, that will only be needed by relatively "special" modules.

D7:

check_markup($text, $format_id = NULL, $langcode = '', $cache = FALSE);

D8:

check_markup($text, $format_id = NULL, $langcode = '', $cache = FALSE, $filter_types_to_skip = array());

Why?

This all ties back to the goal of having "true WYSIWYG" editing in Drupal core. "True WYSIWYG" editing is based on HTML: you're editing HTML live, and thus the original mark-up should also be HTML. Hence, if a text format contains a FilterInterface::TYPE_MARKUP_LANGUAGE filter, then no "true WYSIWYG" editing is possible. FilterInterface::TYPE_HTML_RESTRICTOR filters don't impede HTML-based editing (unless they forbid even the most basic HTML tags) and are essential for security, so they do not pose a problem.
Then we get to the interesting part: transformation filters. They're classified as either reversible or irreversible. A reversible filter is assumed to provide a JS implementation of the filter, so that these transformations can be applied "live" (thus resulting in "true WYSIWYG"), but also reversed upon saving. On the other hand, irreversible filters would not be applied, because even though we could apply them while editing, we wouldn't be able to reliably reverse them for storing the content in the database.

Filter type example

For some filters, it might not be very obvious which type they should be categorized under. For example, Invisimail is a filter to hide e-mail addresses from spam bots. The tricky part: depending on its configuration, it may or may not generate HTML or even Javascript. Because it generates HTML, you might think FilterInterface::TYPE_MARKUP_LANGUAGE is appropriate. It's not preventing certain HTML tags or attributes, but it's obfuscating HTML, so FilterInterface::TYPE_HTML_RESTRICTOR might also seem appropriate. But what it really does, is take some HTML and transform it into something else, so one of the transformation filters is also an option. To make it even more complicated: depending on the configuration, it can be reversible, so it could either FilterInterface::TYPE_TRANSFORM_IRREVERSIBLE or FilterInterface::TYPE_TRANSFORM_REVERSIBLE.

The answer: Invisimail does not contain a mark-up language; hence FilterInterface::TYPE_MARKUP_LANGUAGE is out of the question.
FilterInterface::TYPE_HTML_RESTRICTOR vs. FilterInterface::TYPE_TRANSFORM_IRREVERSIBLE can potentially be argued about (hence the need for bitflags/better names, see #1816160: Should FILTER_TYPE_* be bitflags strings or stay ints?), but the main purpose of the Invisimail module is not to protect the reader from the resulting HTML to do evil things, the purpose is to perform a transformation so that spambots cannot contact the author. Hence: FilterInterface::TYPE_TRANSFORM_IRREVERSIBLE. If it is only sometimes reversible, depending on the configuration, I'd advise to split it into two different filters: one that's always reversible, and one that's never reversible.

Note: once the "true WYSIWYG" editing lands in core, the reversible vs. irreversible filter types would be used to indicate to the user which filters can work "live", i.e. inside the WYSIWYG editor. So: reversible = good, irreversible = bad.

Impacts: 
Module developers
Updates Done (doc team, etc.)
Online documentation: 
Not done
Theming guide: 
Not done
Module developer documentation: 
Not done
Examples project: 
Not done
Coder Review: 
Not done
Coder Upgrade: 
Not done
Other: 
Other updates done