</code> string prematurely stops syntax highlighting

JohnAlbin - January 15, 2008 - 00:05
Project:Code Filter
Version:5.x-1.x-dev
Component:Code
Category:bug report
Priority:normal
Assigned:Unassigned
Status:active
Issue tags:Invalid XHTML
Description

The following text will cause codefilter to prematurely stop syntax highlighting:

<code>
<?php
print "Hello </code>";
?>
</code>

Here's a live example:

<?php
print "Hello
";
?>

As you can (currently) see, that definitely needs to get fixed.

In addition, an html snippet that includes <code> will also cause Code filter to stop syntax highlighting prematurely:

<code><p>This is a snippet of <code>html</code> that includes the code tag</p></code>

Live Example:

<p>This is a snippet of html that includes the <code>code element.

#1

JohnAlbin - January 15, 2008 - 00:09
Title:<code> within <code> prematurely stops syntax highlighting» </code> string prematurely stops syntax highlighting

It's actually the </code> string that causes the error, not <code>.

#2

zeta ζ - January 15, 2008 - 00:26

Whereas:

<?php
print "<code>Hello </code>";
?>
is OK.

It only fails if there is an unclosed <code> outside the

<?php

?>
.

#3

NancyDru - January 15, 2008 - 00:38

As I mentioned in the other post, I had this problem too, so I'm tracking it.

#4

zeta ζ - January 15, 2008 - 00:40

Example #2a is not very useful, because any html should be outside the <?php ?>.

If the code tags are outside the <?php ?>, the whole lot needs another pair of embracing code tags, which is when codefilter fails.

Should we not also bear in mind the vast back catalogue of nodes that were written with the codefilter as it works now? Maybe we might need

if (nid < 210000) {
  ...
}
for d.o.

#5

NancyDru - January 15, 2008 - 02:34

I doubt that fixing this will break any existing nodes because people haven't been able to get it to work. Also bear in mind that codefilter is not used solely on DO - I have two sites that I use it on and neither one is in any danger of reaching 210,000 nodes in my lifetime.

#6

zeta ζ - January 15, 2008 - 04:22

Nancy: I wasn’t intending to put it in codefilter just in d.o.

Currently can’t get my testsite to put <?php ?> in <code>tags</code>, so I’ll have to work on this.

#7

soxofaan - January 15, 2008 - 13:34

The solution/workaround we offer in GeSHi filter is to support both <code> and [code] as code block delimiters, so if you need <code> in a code block you write [code]...<code>...</code>...[/code]

Supporting all sorts of nesting corner cases will lead to a regular expression nightmare or the need for a parser, which isn't worth it IMHO.

#8

NancyDru - January 15, 2008 - 14:42

I looked at the module and admit I don't understand preg-replace. Can you increment a depth counter every time you encounter the start tag and decrement when you encounter an ending tag and only quit when the depth is back to zero?

#9

JohnAlbin - January 15, 2008 - 16:11

Nancy, unfortunately, you can't implement a depth counter because the code can just be snippet of code and there's no guarantee that it will be well-formed (with matching opening and closing tags.) See my first example for what I mean.

Stefaan's solution is not bad. But I'm testing another solution (which I'm not sure will work at this stage).

This issue is similar to: http://drupal.org/node/38047 ("?>" string prematurely stops syntax highlighting)

#10

zeta ζ - January 15, 2008 - 16:38

Ah yes… I was thinking of a depth counter (using a gargantuan regex) :-(

#11

zeta ζ - January 15, 2008 - 19:53

I think soxofaan #7 is on the right track. Although I wouldn’t support both as such. Rather I would leave <code>...</code> as html tags and require a different pair of tags to invoke codefilter to do its work (eg. for this post [cf]). There would then be no reason to nest codefilter tags (unless you want to specify what to type to invoke codefilter, In which case we could handle the single exception of [cf][cf].*[/cf][/cf] ie. no need to handle partial quote).

Rationale:
the W3C doesn’t seem to have much to say about <code>...</code> CODE: Designates a fragment of computer code. I’m not even sure they were thinking of HTML as computer code. By default <code>...</code> renders only as mono-spaced, and doesn’t even preserve consecutive spaces. In a browser, partial snippets like <code>...<code>...</code> break the page, and <code>...</code>...</code> fail (both as you might expect).

I’m confident this will make codefilter much simpler and quicker: Although it is supposed to filter input, it does so at output time (and time again) as far as I understand, so could be the source of a performance hit if we handle too many edge cases.

#12

corsix - January 15, 2008 - 18:41

As #9 mentioned the "?>" premature closing issue, I feel it is pertinent to point out that the patch in that issue handles this edge case as well (see http://img174.imageshack.us/img174/7770/codefilterow7.png for an example).

#13

NancyDru - January 15, 2008 - 19:33

@zeta-zoo: it's interesting that they have so much more to say about <pre>.

@corsix: I'm sure Mr. Albin is aware of that and considering it.

#14

mr.baileys - March 16, 2009 - 09:27

Subscribing. I was trying to show my users how to use the code filter by embedding <code>-tags inside another set of <code>-tags...

As an unfortunate side effect, nesting <code>-tags also seems to render the resulting document invalid XHTML because of an unmatched </code> tag.

 
 

Drupal is a registered trademark of Dries Buytaert.