Now that tags use templates (which implicitly filter HTML when not told otherwise), we run into some double-escaping problems. The standard BBCode text format has always used filter_html_escape to escape all HTML before rendering BBcode. Given this, the following would be problematic:
[quote="Author's Name" <author@email.example>]
This is a [b]quote[/b] by <author@email.example>.
[/quote]
First, the entire text is escaped, as it should be:
[quote="Author's Name" <author@email.example>]
This is a [b]quote[/b] by <author@email.example>.
[/quote]
After the [b] has been rendered, the quote tag will be instantiated with the following values:
// The \' is part of this literal, not the string.
$tag->option = '"Author\'s Name" <author@email.example>'
$tag->content = Markup::create('This is a <strong>quote</strong> by <author@email.example>');
It might then be inserted into a Twig template like this:
<div class="author">{{ tag.option }}</div>
<div class="blockquote">
{{ tag.content }}
</div>
Now, $tag->content is properly designated markup (because of the already-rendered BBCode inside), but $tag->option is a regular string. Escaping it again will lead to the following:
<div class="author">"Author's Name" &lt;author@email.example&gt;</div>
<div class="blockquote">
This is a <strong>quote</strong> by <author@email.example>
</div>
The author will then be presented like this:
"Author's Name" <author@email.example>
I'm not sure if this means we should designate ALL the values as safe markup, or if this opens up other issues.
| Comment | File | Size | Author |
|---|---|---|---|
| #7 | 0002-Issue-2627826-Parser-Rewrite-4.0.patch | 17.07 KB | cburschka |
| #5 | 0001-Issue-2627826-Parser-Rewrite-4.0.patch | 17.41 KB | cburschka |
Comments
Comment #2
cburschkaMaybe this is a genuine use case for ::prepare().
I could see XBBCode completely replace every tag token it finds with a markup-safe version before the HTML filter can run, and then decoding it later.
Comment #3
cburschka> markup-safe
Eg. base64.
Comment #4
cburschkaThis is also going to require a Regex overhaul. Some things are going to become easier when we can read the tag before HTML escaping, but it's still going to be a big issue.
My current idea for the tag regex (no subgroup parsing; just a match) is
'\[\/[a-z0-9_]+\]|\[[a-z0-9_]+(((?<=\s)\w+=((\'([^\']|([^\\](\\\\)*\\\')*\'|\"([^\"]|([^\\](\\\\)*\\\")*\"|[^\'"\s]([^\s]|(\\\\)*\\\s)*))(?=\s|]))*|=([^\]]|([^\\](\\\\)*\\\])*)\]'1. Closing tags with no attributes.
2. Opening tag containing:
2a. a sequence (each item preceded by space and followed by space or ]) of word-keys followed by an = followed by:
2ai. A single-quoted value that contains a string of
2aia. Not single quotes
2aib. A non-backslash followed by an odd number of backslashes followed by a single-quote.
2aii. A double-quoted value that contains a string of
2aiia. Not double quotes
2aiib. A non-backslash followed by an odd number of backslashes followed by a double-quote.
2aiii. A value beginning with a non-space and containing a string of
2aiiia. Non-spaces
2aiiib. A non-backslash followed by an odd number of backslashes followed by a space.
2b. a = followed by a string of
2bi. Not closing square brackets
2bii. A non-backslash followed by an odd number of backslashes followed by a closing square bracket.
Comment #5
cburschkaComment #7
cburschkaWhoops.
Comment #8
cburschkaThis needs way, way more test cases though.
Comment #10
cburschkaActually, that's for another issue. Those test cases are still needed anyway.