Now that tags use templates (which implicitly filter HTML when not told otherwise), we run into some double-escaping problems. The standard BBCode text format has always used filter_html_escape to escape all HTML before rendering BBcode. Given this, the following would be problematic:

[quote="Author's Name" <author@email.example>]
  This is a [b]quote[/b] by <author@email.example>.
[/quote]

First, the entire text is escaped, as it should be:

[quote="Author's Name" &lt;author@email.example&gt;]
  This is a [b]quote[/b] by &lt;author@email.example&gt;.
[/quote]

After the [b] has been rendered, the quote tag will be instantiated with the following values:

  // The \' is part of this literal, not the string.
  $tag->option = '"Author\'s Name" &lt;author@email.example&gt;'
  $tag->content = Markup::create('This is a <strong>quote</strong> by &lt;author@email.example&gt;');

It might then be inserted into a Twig template like this:

<div class="author">{{ tag.option }}</div>
<div class="blockquote">
  {{ tag.content }}
</div>

Now, $tag->content is properly designated markup (because of the already-rendered BBCode inside), but $tag->option is a regular string. Escaping it again will lead to the following:

<div class="author">&quot;Author&#039;s Name&quot; &amp;lt;author@email.example&amp;gt;</div>
<div class="blockquote">
  This is a <strong>quote</strong> by &lt;author@email.example&gt;
</div>

The author will then be presented like this:

"Author's Name" &lt;author@email.example&gt;

I'm not sure if this means we should designate ALL the values as safe markup, or if this opens up other issues.

Comments

cburschka created an issue. See original summary.

cburschka’s picture

Component: Code » Core
Category: Bug report » Feature request

Maybe this is a genuine use case for ::prepare().

I could see XBBCode completely replace every tag token it finds with a markup-safe version before the HTML filter can run, and then decoding it later.

cburschka’s picture

> markup-safe

Eg. base64.

cburschka’s picture

Category: Feature request » Plan

This is also going to require a Regex overhaul. Some things are going to become easier when we can read the tag before HTML escaping, but it's still going to be a big issue.

My current idea for the tag regex (no subgroup parsing; just a match) is

'\[\/[a-z0-9_]+\]|\[[a-z0-9_]+(((?<=\s)\w+=((\'([^\']|([^\\](\\\\)*\\\')*\'|\"([^\"]|([^\\](\\\\)*\\\")*\"|[^\'"\s]([^\s]|(\\\\)*\\\s)*))(?=\s|]))*|=([^\]]|([^\\](\\\\)*\\\])*)\]'

1. Closing tags with no attributes.
2. Opening tag containing:
2a. a sequence (each item preceded by space and followed by space or ]) of word-keys followed by an = followed by:
2ai. A single-quoted value that contains a string of
2aia. Not single quotes
2aib. A non-backslash followed by an odd number of backslashes followed by a single-quote.
2aii. A double-quoted value that contains a string of
2aiia. Not double quotes
2aiib. A non-backslash followed by an odd number of backslashes followed by a double-quote.
2aiii. A value beginning with a non-space and containing a string of
2aiiia. Non-spaces
2aiiib. A non-backslash followed by an odd number of backslashes followed by a space.
2b. a = followed by a string of
2bi. Not closing square brackets
2bii. A non-backslash followed by an odd number of backslashes followed by a closing square bracket.

cburschka’s picture

Title: Rethink HTML escaping » Rewrite the parser and use prepare().
Status: Active » Needs review
StatusFileSize
new17.41 KB

Status: Needs review » Needs work

The last submitted patch, 5: 0001-Issue-2627826-Parser-Rewrite-4.0.patch, failed testing.

cburschka’s picture

Status: Needs work » Needs review
StatusFileSize
new17.07 KB

Whoops.

cburschka’s picture

Status: Needs review » Needs work

This needs way, way more test cases though.

  • cburschka committed 384aa1e on 8.x-3.x
    Issue #2627826: Parser Rewrite 4.0
    
    This patch separates the parser into...
cburschka’s picture

Status: Needs work » Fixed

Actually, that's for another issue. Those test cases are still needed anyway.

  • cburschka committed 384aa1e on 9.x-1.x
    Issue #2627826: Parser Rewrite 4.0
    
    This patch separates the parser into...

  • cburschka committed 384aa1e on 9.x-2.x
    Issue #2627826: Parser Rewrite 4.0
    
    This patch separates the parser into...

Status: Fixed » Closed (fixed)

Automatically closed - issue fixed for 2 weeks with no activity.