Rewrite the parser and use prepare(). [#2627826]

Now that tags use templates (which implicitly filter HTML when not told otherwise), we run into some double-escaping problems. The standard BBCode text format has always used filter_html_escape to escape all HTML before rendering BBcode. Given this, the following would be problematic:

[quote="Author's Name" <author@email.example>]
  This is a [b]quote[/b] by <author@email.example>.
[/quote]

First, the entire text is escaped, as it should be:

[quote="Author's Name" &lt;author@email.example&gt;]
  This is a [b]quote[/b] by &lt;author@email.example&gt;.
[/quote]

After the [b] has been rendered, the quote tag will be instantiated with the following values:

  // The \' is part of this literal, not the string.
  $tag->option = '"Author\'s Name" &lt;author@email.example&gt;'
  $tag->content = Markup::create('This is a <strong>quote</strong> by &lt;author@email.example&gt;');

It might then be inserted into a Twig template like this:

<div class="author">{{ tag.option }}</div>
<div class="blockquote">
  {{ tag.content }}
</div>

Now, $tag->content is properly designated markup (because of the already-rendered BBCode inside), but $tag->option is a regular string. Escaping it again will lead to the following:

<div class="author">&quot;Author&#039;s Name&quot; &amp;lt;author@email.example&amp;gt;</div>
<div class="blockquote">
  This is a <strong>quote</strong> by &lt;author@email.example&gt;
</div>

The author will then be presented like this:

"Author's Name" <author@email.example>

I'm not sure if this means we should designate ALL the values as safe markup, or if this opens up other issues.

Comment	File	Size	Author
#7	0002-Issue-2627826-Parser-Rewrite-4.0.patch	17.07 KB	cburschka
#5	0001-Issue-2627826-Parser-Rewrite-4.0.patch	17.41 KB	cburschka

Comments

Comment #1

3 December 2015 at 04:37

cburschka created an issue. See original summary.

Comment #2

cburschka

they

commented 5 December 2015 at 18:49

Component:	Code	» Core
Category:	Bug report	» Feature request

Maybe this is a genuine use case for ::prepare().

I could see XBBCode completely replace every tag token it finds with a markup-safe version before the HTML filter can run, and then decoding it later.

Comment #3

cburschka

they

commented 5 December 2015 at 18:57

> markup-safe

Eg. base64.

Comment #4

cburschka

they

commented 8 December 2015 at 13:30

Category:

Feature request

» Plan

This is also going to require a Regex overhaul. Some things are going to become easier when we can read the tag before HTML escaping, but it's still going to be a big issue.

My current idea for the tag regex (no subgroup parsing; just a match) is

'\[\/[a-z0-9_]+\]|\[[a-z0-9_]+(((?<=\s)\w+=((\'([^\']|([^\\](\\\\)*\\\')*\'|\"([^\"]|([^\\](\\\\)*\\\")*\"|[^\'"\s]([^\s]|(\\\\)*\\\s)*))(?=\s|]))*|=([^\]]|([^\\](\\\\)*\\\])*)\]'

1. Closing tags with no attributes.
2. Opening tag containing:
2a. a sequence (each item preceded by space and followed by space or ]) of word-keys followed by an = followed by:
2ai. A single-quoted value that contains a string of
2aia. Not single quotes
2aib. A non-backslash followed by an odd number of backslashes followed by a single-quote.
2aii. A double-quoted value that contains a string of
2aiia. Not double quotes
2aiib. A non-backslash followed by an odd number of backslashes followed by a double-quote.
2aiii. A value beginning with a non-space and containing a string of
2aiiia. Non-spaces
2aiiib. A non-backslash followed by an odd number of backslashes followed by a space.
2b. a = followed by a string of
2bi. Not closing square brackets
2bii. A non-backslash followed by an odd number of backslashes followed by a closing square bracket.

Comment #5

cburschka

they

commented 11 December 2015 at 13:52

Title:	Rethink HTML escaping	» Rewrite the parser and use prepare().
Status:	Active	» Needs review

Status	File	Size
new	0001-Issue-2627826-Parser-Rewrite-4.0.patch	17.41 KB

Comment #6

11 December 2015 at 13:59

Status:

Needs review

» Needs work

The last submitted patch, 5: 0001-Issue-2627826-Parser-Rewrite-4.0.patch, failed testing.

Comment #7

cburschka

they

commented 11 December 2015 at 14:00

Status:

Needs work

» Needs review

Status	File	Size
new	0002-Issue-2627826-Parser-Rewrite-4.0.patch	17.07 KB

Whoops.

Comment #8

cburschka

they

commented 11 December 2015 at 14:16

Status:

Needs review

» Needs work

This needs way, way more test cases though.

Comment #9

11 December 2015 at 15:27

cburschka committed 384aa1e on 8.x-3.x

Issue #2627826: Parser Rewrite 4.0

This patch separates the parser into...

Comment #10

cburschka

they

commented 11 December 2015 at 15:28

Status:

Needs work

» Fixed

Actually, that's for another issue. Those test cases are still needed anyway.

Comment #11

12 December 2015 at 12:04

cburschka committed 384aa1e on 9.x-1.x

Issue #2627826: Parser Rewrite 4.0

This patch separates the parser into...

Comment #12

12 December 2015 at 12:53

cburschka committed 384aa1e on 9.x-2.x

Issue #2627826: Parser Rewrite 4.0

This patch separates the parser into...

Comment #13

26 December 2015 at 12:54

Status:

Fixed

» Closed (fixed)

Automatically closed - issue fixed for 2 weeks with no activity.

Rewrite the parser and use prepare().

Comments

Comment #1

Comment #2

Comment #3

Comment #4

Comment #5

Comment #6

Comment #7

Comment #8

Comment #9

Comment #10

Comment #11

Comment #12

Comment #13

News items

Our community

Documentation

Drupal code base

Governance of community