Modules that provide filters for Drupal's filter system should now define the "type" of each filter. (This is necessary for other modules, such as WYSIWYG modules, to reason about text formats.)
Filter types
There are four filter types (NOTE: these are still prone to change in #1816160: Should FILTER_TYPE_* be bitflags strings or stay ints?):
FilterInterface::TYPE_MARKUP_LANGUAGE
: Non-HTML markup language filters that generate HTML. Examples: Markdown, Textile, but also Drupal core'sfilter_autop
andfilter_url
filters.FilterInterface::TYPE_HTML_RESTRICTOR
: HTML tag and attribute restricting filters. Examples: HTML Purifier, Drupal core'sfilter_html
.FilterInterface::TYPE_TRANSFORM_REVERSIBLE
: Reversible transformation filters. Examples:<img data-caption="Druplicon">
may be (reversibly!) transformed to<figure><img><figcaption>Druplicon</figcaption></figure>
.FilterInterface::TYPE_TRANSFORM_IRREVERSIBLE
: Irreversible transformation filters. Examples: the Typogrify filter would transformWYSIWYG
andI said "foo"!
into<span class="caps">WYSIWYG</span>
andI said “foo”!
, respectively. Text link ad systems would transformfancy car
into something like<a href="http://fancycar.example.com">fancy car</a>
. Neither of those text-based transformations are reliably reversible (even though it might be possible to do implementations where they are reversible!)
D7:
/**
* Implements hook_filter_info().
*/
function filter_filter_info() {
$filters['filter_html'] = array(
'title' => t('Limit allowed HTML tags'),
'process callback' => '_filter_html',
…
);
}
D8:
/**
* Implements hook_filter_info().
*/
function filter_filter_info() {
$filters['filter_html'] = array(
'title' => t('Limit allowed HTML tags'),
'type' => FilterInterface::TYPE_HTML_RESTRICTOR,
'process callback' => '_filter_html',
…
);
}
Related to this, FilterFormatInterface::getFilterTypes()
and FilterFormatInterface::getHtmlRestrictions()
have been added. The former will return an array of all unique filter types used in the text format it's called on. The latter will return a structured array conveying the HTML restrictions of a text format — or FALSE
if there aren't any HTML restrictions (i.e. if a text format has zero FilterInterface::TYPE_HTML_RESTRICTOR
filters) .
check_markup()
can skip filters of a certain type
There's a new $filter_types_to_skip
parameter to check_markup()
, which defaults to the empty array. Trying to skip FilterInterface::TYPE_HTML_RESTRICTOR
filters is disallowed.
None of the existing check_markup()
need to change! This is new functionality only, that will only be needed by relatively "special" modules.
D7:
check_markup($text, $format_id = NULL, $langcode = '', $cache = FALSE);
D8:
check_markup($text, $format_id = NULL, $langcode = '', $cache = FALSE, $filter_types_to_skip = array());
Why?
This all ties back to the goal of having "true WYSIWYG" editing in Drupal core. "True WYSIWYG" editing is based on HTML: you're editing HTML live, and thus the original mark-up should also be HTML. Hence, if a text format contains a FilterInterface::TYPE_MARKUP_LANGUAGE
filter, then no "true WYSIWYG" editing is possible. FilterInterface::TYPE_HTML_RESTRICTOR
filters don't impede HTML-based editing (unless they forbid even the most basic HTML tags) and are essential for security, so they do not pose a problem.
Then we get to the interesting part: transformation filters. They're classified as either reversible or irreversible. A reversible filter is assumed to provide a JS implementation of the filter, so that these transformations can be applied "live" (thus resulting in "true WYSIWYG"), but also reversed upon saving. On the other hand, irreversible filters would not be applied, because even though we could apply them while editing, we wouldn't be able to reliably reverse them for storing the content in the database.
Filter type example
For some filters, it might not be very obvious which type they should be categorized under. For example, Invisimail is a filter to hide e-mail addresses from spam bots. The tricky part: depending on its configuration, it may or may not generate HTML or even Javascript. Because it generates HTML, you might think FilterInterface::TYPE_MARKUP_LANGUAGE
is appropriate. It's not preventing certain HTML tags or attributes, but it's obfuscating HTML, so FilterInterface::TYPE_HTML_RESTRICTOR
might also seem appropriate. But what it really does, is take some HTML and transform it into something else, so one of the transformation filters is also an option. To make it even more complicated: depending on the configuration, it can be reversible, so it could either FilterInterface::TYPE_TRANSFORM_IRREVERSIBLE
or FilterInterface::TYPE_TRANSFORM_REVERSIBLE
.
The answer: Invisimail does not contain a mark-up language; hence FilterInterface::TYPE_MARKUP_LANGUAGE
is out of the question.
FilterInterface::TYPE_HTML_RESTRICTOR
vs. FilterInterface::TYPE_TRANSFORM_IRREVERSIBLE
can potentially be argued about (hence the need for bitflags/better names, see #1816160: Should FILTER_TYPE_* be bitflags strings or stay ints?), but the main purpose of the Invisimail module is not to protect the reader from the resulting HTML to do evil things, the purpose is to perform a transformation so that spambots cannot contact the author. Hence: FilterInterface::TYPE_TRANSFORM_IRREVERSIBLE
. If it is only sometimes reversible, depending on the configuration, I'd advise to split it into two different filters: one that's always reversible, and one that's never reversible.
Note: once the "true WYSIWYG" editing lands in core, the reversible vs. irreversible filter types would be used to indicate to the user which filters can work "live", i.e. inside the WYSIWYG editor. So: reversible = good, irreversible = bad.